Metrics extraction

Overview

This package allows to extract metrics that are commonly used from annotations produced by the LENA or other pipelines.

$ child-project metrics --help
usage: child-project metrics [-h] [--recordings RECORDINGS]
                             [--by {recording_filename,session_id,child_id}]
                             [-f FROM_TIME] [-t TO_TIME]
                             path destination {lena,aclew,period} ...

positional arguments:
  path                  path to the dataset
  destination           segments destination
  {lena,aclew,period}   pipeline
    lena                LENA metrics
    aclew               LENA metrics
    period              LENA metrics

optional arguments:
  -h, --help            show this help message and exit
  --recordings RECORDINGS
                        path to a CSV dataframe containing the list of
                        recordings to sample from (by default, all recordings
                        will be sampled). The CSV should have one column named
                        recording_filename.
  --by {recording_filename,session_id,child_id}
                        units to sample from (default behavior is to sample by
                        recording)
  -f FROM_TIME, --from-time FROM_TIME
                        time range start in HH:MM format (optional)
  -t TO_TIME, --to-time TO_TIME
                        time range end in HH:MM format (optional)

The list of supported metrics is shown below:

Variable

Description

pipelines

voc_fem/mal/och_ph

number of vocalizations by different talker types

ACLEW,LENA,Period

voc_dur_fem/mal/och_ph

total duration of vocalizations by different talker types

ACLEW,LENA,Period

avg_voc_dur_fem/mal/och

average vocalization length (conceptually akin to MLU) by different talker types

ACLEW,LENA,Period

wc_adu_ph

adult word count (collapsing across males and females)

ACLEW,LENA

wc_fem/mal_ph

adult word count by different talker types

ACLEW,LENA

sc_adu_ph

adult syllable count (collapsing across males and females)

ACLEW

sc_fem/mal_ph

adult syllable count by different talker types

ACLEW

pc_adu_ph

adult phoneme count (collapsing across males and females)

ACLEW

pc_fem/mal_ph

adult phoneme count by different talker types

ACLEW

freq_n

frequency of child voc out of all vocs based on number of vocalizations

ACLEW,LENA

freq_dur

frequency of child voc out of all vocs based on duration of vocalizations

ACLEW,LENA

cry_voc_chi_ph

number of child vocalizations that are crying

ACLEW,LENA

can_voc_chi_ph

number of child vocs that are canonical

ACLEW

non_can_vpc_chi_ph

number of child vocs that are non-canonical

ACLEW

sp_voc_chi_ph

number of child vocs that are speech-like (can+noncan for ACLEW)

ACLEW,LENA

cry_voc_dur_chi_ph

total duration of child vocalizations that are crying

ACLEW,LENA

can_voc_dur_chi_ph

total duration of child vocs that are canonical

ACLEW

non_can_voc_dur_chi_ph

total duration of child vocs that are non-canonical

ACLEW

sp_voc_dur_chi_ph

total duration of child vocs that are speech-like (can+noncan for ACLEW)

ACLEW,LENA

avg_cry_voc_dur_chi

average duration of child vocalizations that are crying

ACLEW,LENA

avg_cran_voc_dur_chi

average duration of child vocs that are canonical

ACLEW

avg_non_can_voc_dur_chi

average duration of child vocs that are non-canonical

ACLEW

avg_sp_voc_dur_chi

average duration of child vocs that are speech-like (can+noncan for ACLEW)

ACLEW,LENA

lp_n

linguistic proportion = (speech)/(cry+speech) based on number of vocalizations

ACLEW,LENA

cp_n

canonical proportion = canonical /(can+noncan) based on number of vocalizations

ACLEW

lp_dur

linguistic proportion = (speech)/(cry+speech) based on duration of vocalizations

ACLEW,LENA

cp_dur

canonical proportion = canonical /(can+noncan) based on duration of vocalizations

ACLEW

LENA Metrics

$ child-project metrics /path/to/dataset output.csv lena --help
usage: child-project metrics path destination lena [-h] [--threads THREADS]
                                                   set

positional arguments:
  set                name of the LENA its annotations set

optional arguments:
  -h, --help         show this help message and exit
  --threads THREADS  amount of threads to run on

ACLEW Metrics

$ child-project metrics /path/to/dataset output.csv aclew --help
usage: child-project metrics path destination aclew [-h] [--vtc VTC]
                                                    [--alice ALICE]
                                                    [--vcm VCM]
                                                    [--threads THREADS]

optional arguments:
  -h, --help         show this help message and exit
  --vtc VTC          vtc set
  --alice ALICE      alice set
  --vcm VCM          vcm set
  --threads THREADS  amount of threads to run on

Period-aggregated metrics

The Period Metrics pipeline aggregates vocalizations for each time-of-the-day-unit based on a period specified by the user. For instance, if the period is set to 15Min (i.e. 15 minutes), vocalization rates will be reported for each recording and time-unit (e.g. 09:00 to 09:15, 09:15 to 09:30, etc.).

The output dataframe has \(r \times p\) rows, where \(r\) is the amount of recordings (or children if the -by option is set to child_id), and \(p\) is the amount of time-bins per day (i.e. \(24 \times 4=96\) for a 15-minute period).

The output dataframe includes a period column that contains the onset of each time-unit in HH:MM:SS format. The duration columns contains the total amount of annotations covering each time-bin, in milliseconds.

If --by is set to e.g. child_id, then the values for each time-bin will be the average rates across all the recordings of every child.

$ child-project metrics /path/to/dataset output.csv period --help
usage: child-project metrics path destination period [-h] [--set SET]
                                                     [--threads THREADS]
                                                     [--period PERIOD]
                                                     [--period-origin PERIOD_ORIGIN]

optional arguments:
  -h, --help            show this help message and exit
  --set SET             annotations set
  --threads THREADS     amount of threads to run on
  --period PERIOD       time units to aggregate (optional); equivalent to
                        ``pandas.Grouper``'s freq argument.
  --period-origin PERIOD_ORIGIN
                        time origin of each time period; equivalent to
                        ``pandas.Grouper``'s origin argument.