Conversations summary extraction

Overview

This package allows to extract descriptive statistics on identified conversations in recordings. The set used for the extraction must contain conversation annotations which is to say have the columns segment_onset, segment_offset, speaker_type and conv_count. The Derive annotations pipeline can be used to derivate conversation annotations from diarized annotations; we recommend using this on vtc automated annotations to have automated conversation annotations. A csv file containing the statistics is produced along with a YML parameter file storing all the options used for the extractions

$ child-project conversations-summary --help
usage: child-project conversations-summary [-h] --set SETNAME
                                           [--recordings RECORDINGS]
                                           [-f FROM_TIME] [-t TO_TIME]
                                           [--rec-cols REC_COLS]
                                           [--child-cols CHILD_COLS]
                                           [--threads THREADS]
                                           path destination {custom,standard}
                                           ...

positional arguments:
  path                  path to the dataset
  destination           segments destination
  {custom,standard}     pipeline
    custom              custom conversation extraction
    standard            standard conversation extraction

optional arguments:
  -h, --help            show this help message and exit
  --set SETNAME         Set to use to get the conversation annotations
  --recordings RECORDINGS
                        path to a CSV dataframe containing the list of
                        recordings to sample from (by default, all recordings
                        will be sampled). The CSV should have one column named
                        recording_filename.
  -f FROM_TIME, --from-time FROM_TIME
                        time range start in HH:MM:SS format (optional)
  -t TO_TIME, --to-time TO_TIME
                        time range end in HH:MM:SS format (optional)
  --rec-cols REC_COLS   comma separated columns from recordings.csv to include
                        in the outputted conversations (optional), NA if
                        ambiguous
  --child-cols CHILD_COLS
                        comma separated columns from children.csv to include
                        in the outputted conversations (optional), NA if
                        ambiguous
  --threads THREADS     amount of threads to run on

The conversation extraction will always have the following columns:

column

info

conversation_onset

start of the conversation (ms) inside the recording

conversation_offset

end of the conversation (ms) inside the recording

voc_count

number of vocalizations inside the conversation

conv_count

identifier of the conversation (unique across the recording)

interval_last_conv

interval (ms) between the end of previous conversation end start of the current conversation (NA for first)

recording_filename

recording of the conversation

The list of supported functions is shown below:

Callable

Description

Required arguments

assign_conv_type

Compute the conversation type (overheard,
dyadic_XXX, peer, parent, triadic_XXX,
multiparty) depending on the participants

is_speaker

is a specific speaker type present in the
conversation
- speaker : speaker_type label

participants

list of speakers participating in the
conversation, ‘/’ separated

voc_dur_contribution

contribution of a given speaker in the
conversation compared to others, in terms of
total speech duration
- speaker : speaker_type label

voc_speaker_count

number of vocalizations produced by a given
speaker
- speaker : speaker_type label

voc_speaker_dur

summed duration of speech for a given speaker
in the conversation
- speaker : speaker_type label

voc_total_dur

summed duration of all speech in the
conversation (ms) N.B. can be higher than
conversation duration as speakers may speak
at the same time, resulting in multiple
spoken segments happening simultaneously

who_finished

speaker type who spoke last in the
conversation

who_initiated

speaker type who spoke first in the
conversation

Standard Conversations

The Standard pipeline will extract a list of usual metrics that can be obtained from conversations. Using this pipeline with a set containing conversation annotations

will output:

metric

name

speaker

who_initiated

initiator

who_finished

finisher

voc_total_dur

total_duration_of_vocalisations

voc_speaker_count

CHI_voc_count

‘CHI’

voc_speaker_count

FEM_voc_count

‘FEM’

voc_speaker_count

MAL_voc_count

‘MAL’

voc_speaker_count

OCH_voc_count

‘OCH’

voc_speaker_dur

CHI_voc_dur

‘CHI’

voc_speaker_dur

FEM_voc_dur

‘FEM’

voc_speaker_dur

MAL_voc_dur

‘MAL’

voc_speaker_dur

OCH_voc_dur

‘OCH’

$ child-project conversations-summary /path/to/dataset output.csv standard --help
usage: child-project conversations-summary path destination standard [-h]

optional arguments:
  -h, --help  show this help message and exit

Custom Conversations

The Custom conversations pipeline allows you to provide your own list of desired metric to the pipeline to be extracted. The list must be in a csv file containing the following colums:

  • callable (required) : name of the metric to extract, see the list

  • name (required) : name to use in the resulting metrics. If none is given, a default name will be used. Use this to extract the same metric for different sets and avoid name clashes.

  • <argument> (depending on the requirements of the metric you chose) : For each required argument of a metric, add a column of that argument’s name.

This is an example of a csv file we use to extract conversation metrics. We want to extract who initiated the conversation, who finished it, the list of speakers involved and the percentage of speech produced by the target child (CHI) in each conversation and the same for female adult speakers (FEM). So we write 5 lines, one for each metric, we give the reference to the metric (as they are in the table above), the name that we want in the final output, and for some of them, the required argument(s).

metric

name

speaker

who_initiated

initiator

who_finished

finisher

participants

participants

voc_dur_contribution

chi_dur_contrib

CHI

voc_dur_contribution

fem_dur_contrib

FEM

$ child-project conversations-summary /path/to/dataset output.csv custom --help
usage: child-project conversations-summary path destination custom
       [-h] features

positional arguments:
  features    name of the csv file containing the list of features to extract

optional arguments:
  -h, --help  show this help message and exit

Conversations extraction from parameter file

To facilitate the extraction of conversations, one can simply use an exhaustive yml parameter file to launch a new extraction. This file has the exact same structure as the one produced by the pipeline. So you can use the output parameter file of a previous extraction to rerun the same analysis.

$ child-project conversations-specification --help
usage: child-project conversations-specification [-h] parameters_input

positional arguments:
  parameters_input  path to the yml file with all parameters

optional arguments:
  -h, --help        show this help message and exit