Conversations summary extraction
Overview
This package allows to extract descriptive statistics on identified conversations in recordings. The set used for the extraction must contain conversation annotations which is to say have the columns segment_onset
, segment_offset
, speaker_type
and conv_count
.
The Derive annotations pipeline can be used to derivate conversation annotations from diarized annotations; we recommend using this on vtc automated annotations to have automated conversation annotations.
A csv file containing the statistics is produced along with a YML parameter file storing all the options used for the extractions
$ child-project conversations-summary --help
usage: child-project conversations-summary [-h] --set SETNAME
[--recordings RECORDINGS]
[-f FROM_TIME] [-t TO_TIME]
[--rec-cols REC_COLS]
[--child-cols CHILD_COLS]
[--threads THREADS]
path destination {custom,standard}
...
positional arguments:
path path to the dataset
destination segments destination
{custom,standard} pipeline
custom custom conversation extraction
standard standard conversation extraction
optional arguments:
-h, --help show this help message and exit
--set SETNAME Set to use to get the conversation annotations
--recordings RECORDINGS
path to a CSV dataframe containing the list of
recordings to sample from (by default, all recordings
will be sampled). The CSV should have one column named
recording_filename.
-f FROM_TIME, --from-time FROM_TIME
time range start in HH:MM:SS format (optional)
-t TO_TIME, --to-time TO_TIME
time range end in HH:MM:SS format (optional)
--rec-cols REC_COLS comma separated columns from recordings.csv to include
in the outputted conversations (optional), NA if
ambiguous
--child-cols CHILD_COLS
comma separated columns from children.csv to include
in the outputted conversations (optional), NA if
ambiguous
--threads THREADS amount of threads to run on
The conversation extraction will always have the following columns:
column |
info |
---|---|
conversation_onset |
start of the conversation (ms) inside the recording |
conversation_offset |
end of the conversation (ms) inside the recording |
voc_count |
number of vocalizations inside the conversation |
conv_count |
identifier of the conversation (unique across the recording) |
interval_last_conv |
interval (ms) between the end of previous conversation end start of the current conversation (NA for first) |
recording_filename |
recording of the conversation |
The list of supported functions is shown below:
Callable |
Description |
Required arguments |
---|---|---|
assign_conv_type |
Compute the conversation type (overheard,
dyadic_XXX, peer, parent, triadic_XXX,
multiparty) depending on the participants
|
|
is_speaker |
is a specific speaker type present in the
conversation
|
- speaker : speaker_type label
|
participants |
list of speakers participating in the
conversation, ‘/’ separated
|
|
voc_dur_contribution |
contribution of a given speaker in the
conversation compared to others, in terms of
total speech duration
|
- speaker : speaker_type label
|
voc_speaker_count |
number of vocalizations produced by a given
speaker
|
- speaker : speaker_type label
|
voc_speaker_dur |
summed duration of speech for a given speaker
in the conversation
|
- speaker : speaker_type label
|
voc_total_dur |
summed duration of all speech in the
conversation (ms) N.B. can be higher than
conversation duration as speakers may speak
at the same time, resulting in multiple
spoken segments happening simultaneously
|
|
who_finished |
speaker type who spoke last in the
conversation
|
|
who_initiated |
speaker type who spoke first in the
conversation
|
Standard Conversations
- The Standard pipeline will extract a list of usual metrics that can be obtained from conversations. Using this pipeline with a set containing conversation annotations
will output:
metric |
name |
speaker |
---|---|---|
who_initiated |
initiator |
|
who_finished |
finisher |
|
voc_total_dur |
total_duration_of_vocalisations |
|
voc_speaker_count |
CHI_voc_count |
‘CHI’ |
voc_speaker_count |
FEM_voc_count |
‘FEM’ |
voc_speaker_count |
MAL_voc_count |
‘MAL’ |
voc_speaker_count |
OCH_voc_count |
‘OCH’ |
voc_speaker_dur |
CHI_voc_dur |
‘CHI’ |
voc_speaker_dur |
FEM_voc_dur |
‘FEM’ |
voc_speaker_dur |
MAL_voc_dur |
‘MAL’ |
voc_speaker_dur |
OCH_voc_dur |
‘OCH’ |
$ child-project conversations-summary /path/to/dataset output.csv standard --help
usage: child-project conversations-summary path destination standard [-h]
optional arguments:
-h, --help show this help message and exit
Custom Conversations
The Custom conversations pipeline allows you to provide your own list of desired metric to the pipeline to be extracted. The list must be in a csv file containing the following colums:
callable (required) : name of the metric to extract, see the list
name (required) : name to use in the resulting metrics. If none is given, a default name will be used. Use this to extract the same metric for different sets and avoid name clashes.
<argument> (depending on the requirements of the metric you chose) : For each required argument of a metric, add a column of that argument’s name.
This is an example of a csv file we use to extract conversation metrics. We want to extract who initiated the conversation, who finished it, the list of speakers involved and the percentage of speech produced by the target child (CHI) in each conversation and the same for female adult speakers (FEM). So we write 5 lines, one for each metric, we give the reference to the metric (as they are in the table above), the name that we want in the final output, and for some of them, the required argument(s).
metric |
name |
speaker |
---|---|---|
who_initiated |
initiator |
|
who_finished |
finisher |
|
participants |
participants |
|
voc_dur_contribution |
chi_dur_contrib |
CHI |
voc_dur_contribution |
fem_dur_contrib |
FEM |
$ child-project conversations-summary /path/to/dataset output.csv custom --help
usage: child-project conversations-summary path destination custom
[-h] features
positional arguments:
features name of the csv file containing the list of features to extract
optional arguments:
-h, --help show this help message and exit
Conversations extraction from parameter file
To facilitate the extraction of conversations, one can simply use an exhaustive yml parameter file to launch a new extraction. This file has the exact same structure as the one produced by the pipeline. So you can use the output parameter file of a previous extraction to rerun the same analysis.
$ child-project conversations-specification --help
usage: child-project conversations-specification [-h] parameters_input
positional arguments:
parameters_input path to the yml file with all parameters
optional arguments:
-h, --help show this help message and exit