Metrics extraction

Overview

This package allows to extract metrics that are commonly used from annotations produced by the LENA or other pipelines.

$ child-project metrics --help
usage: child-project metrics [-h] [--recordings RECORDINGS]
                             [--by {recording_filename,session_id,child_id}]
                             [-f FROM_TIME] [-t TO_TIME]
                             path destination {lena,aclew,period} ...

positional arguments:
  path                  path to the dataset
  destination           segments destination
  {lena,aclew,period}   pipeline
    lena                LENA metrics
    aclew               LENA metrics
    period              LENA metrics

optional arguments:
  -h, --help            show this help message and exit
  --recordings RECORDINGS
                        path to a CSV dataframe containing the list of
                        recordings to sample from (by default, all recordings
                        will be sampled). The CSV should have one column named
                        recording_filename.
  --by {recording_filename,session_id,child_id}
                        units to sample from (default behavior is to sample by
                        recording)
  -f FROM_TIME, --from-time FROM_TIME
                        time range start in HH:MM format (optional)
  -t TO_TIME, --to-time TO_TIME
                        time range end in HH:MM format (optional)

The list of supported metrics is shown below:

Variable	Description	pipelines
voc_fem/mal/och_ph	number of vocalizations by different talker types per hour	ACLEW,LENA,Period
voc_dur_fem/mal/och_ph	total duration of vocalizations by different talker types in seconds per hour	ACLEW,LENA,Period
avg_voc_dur_fem/mal/och	average vocalization length (conceptually akin to MLU) by different talker types	ACLEW,LENA,Period
wc_adu_ph	adult word count (collapsing across males and females)	ACLEW,LENA
wc_fem/mal_ph	adult word count by different talker types	ACLEW,LENA
sc_adu_ph	adult syllable count (collapsing across males and females)	ACLEW
sc_fem/mal_ph	adult syllable count by different talker types	ACLEW
pc_adu_ph	adult phoneme count (collapsing across males and females)	ACLEW
pc_fem/mal_ph	adult phoneme count by different talker types	ACLEW
freq_n	frequency of child voc out of all vocs based on number of vocalizations	ACLEW,LENA
freq_dur	frequency of child voc out of all vocs based on duration of vocalizations	ACLEW,LENA
cry_voc_chi_ph	number of child vocalizations that are crying	ACLEW,LENA
can_voc_chi_ph	number of child vocs that are canonical	ACLEW
non_can_vpc_chi_ph	number of child vocs that are non-canonical	ACLEW
sp_voc_chi_ph	number of child vocs that are speech-like (can+noncan for ACLEW)	ACLEW,LENA
cry_voc_dur_chi_ph	total duration of child vocalizations that are crying	ACLEW,LENA
can_voc_dur_chi_ph	total duration of child vocs that are canonical	ACLEW
non_can_voc_dur_chi_ph	total duration of child vocs that are non-canonical	ACLEW
sp_voc_dur_chi_ph	total duration of child vocs that are speech-like (can+noncan for ACLEW)	ACLEW,LENA
avg_cry_voc_dur_chi	average duration of child vocalizations that are crying	ACLEW,LENA
avg_cran_voc_dur_chi	average duration of child vocs that are canonical	ACLEW
avg_non_can_voc_dur_chi	average duration of child vocs that are non-canonical	ACLEW
avg_sp_voc_dur_chi	average duration of child vocs that are speech-like (can+noncan for ACLEW)	ACLEW,LENA
lp_n	linguistic proportion = (speech)/(cry+speech) based on number of vocalizations	ACLEW,LENA
cp_n	canonical proportion = canonical /(can+noncan) based on number of vocalizations	ACLEW
lp_dur	linguistic proportion = (speech)/(cry+speech) based on duration of vocalizations	ACLEW,LENA
cp_dur	canonical proportion = canonical /(can+noncan) based on duration of vocalizations	ACLEW

Note

Average rates are expressed in counts/hour (for events) or in seconds/hour (for durations).

LENA Metrics

$ child-project metrics /path/to/dataset output.csv lena --help
usage: child-project metrics path destination lena [-h] [--threads THREADS]
                                                   set

positional arguments:
  set                name of the LENA its annotations set

optional arguments:
  -h, --help         show this help message and exit
  --threads THREADS  amount of threads to run on

ACLEW Metrics

$ child-project metrics /path/to/dataset output.csv aclew --help
usage: child-project metrics path destination aclew [-h] [--vtc VTC]
                                                    [--alice ALICE]
                                                    [--vcm VCM]
                                                    [--threads THREADS]

optional arguments:
  -h, --help         show this help message and exit
  --vtc VTC          vtc set
  --alice ALICE      alice set
  --vcm VCM          vcm set
  --threads THREADS  amount of threads to run on

Period-aggregated metrics

The Period Metrics pipeline aggregates vocalizations for each time-of-the-day-unit based on a period specified by the user. For instance, if the period is set to 15Min (i.e. 15 minutes), vocalization rates will be reported for each recording and time-unit (e.g. 09:00 to 09:15, 09:15 to 09:30, etc.).

The output dataframe has \(r \times p\) rows, where \(r\) is the amount of recordings (or children if the -by option is set to child_id), and \(p\) is the amount of time-bins per day (i.e. \(24 \times 4=96\) for a 15-minute period).

The output dataframe includes a period column that contains the onset of each time-unit in HH:MM:SS format. The duration columns contains the total amount of annotations covering each time-bin, in milliseconds.

If --by is set to e.g. child_id, then the values for each time-bin will be the average rates across all the recordings of every child.

$ child-project metrics /path/to/dataset output.csv period --help
usage: child-project metrics path destination period [-h] --set SET --period
                                                     PERIOD
                                                     [--period-origin PERIOD_ORIGIN]
                                                     [--threads THREADS]

optional arguments:
  -h, --help            show this help message and exit
  --set SET             annotations set
  --period PERIOD       time units to aggregate (optional); equivalent to
                        ``pandas.Grouper``'s freq argument.
  --period-origin PERIOD_ORIGIN
                        time origin of each time period; equivalent to
                        ``pandas.Grouper``'s origin argument.
  --threads THREADS     amount of threads to run on

..note:

Average rates are expressed in seconds/hour regardless of the period.