Metrics extraction
Overview
This package allows to extract metrics that are commonly used from annotations produced by the LENA or other pipelines.
$ child-project metrics --help
usage: child-project metrics [-h] [--recordings RECORDINGS]
[--by {recording_filename,session_id,child_id}]
[-f FROM_TIME] [-t TO_TIME]
path destination {lena,aclew,period} ...
positional arguments:
path path to the dataset
destination segments destination
{lena,aclew,period} pipeline
lena LENA metrics
aclew LENA metrics
period LENA metrics
optional arguments:
-h, --help show this help message and exit
--recordings RECORDINGS
path to a CSV dataframe containing the list of
recordings to sample from (by default, all recordings
will be sampled). The CSV should have one column named
recording_filename.
--by {recording_filename,session_id,child_id}
units to sample from (default behavior is to sample by
recording)
-f FROM_TIME, --from-time FROM_TIME
time range start in HH:MM format (optional)
-t TO_TIME, --to-time TO_TIME
time range end in HH:MM format (optional)
The list of supported metrics is shown below:
Variable |
Description |
pipelines |
---|---|---|
voc_fem/mal/och_ph |
number of vocalizations by different talker types per hour |
ACLEW,LENA,Period |
voc_dur_fem/mal/och_ph |
total duration of vocalizations by different talker types in seconds per hour |
ACLEW,LENA,Period |
avg_voc_dur_fem/mal/och |
average vocalization length (conceptually akin to MLU) by different talker types |
ACLEW,LENA,Period |
wc_adu_ph |
adult word count (collapsing across males and females) |
ACLEW,LENA |
wc_fem/mal_ph |
adult word count by different talker types |
ACLEW,LENA |
sc_adu_ph |
adult syllable count (collapsing across males and females) |
ACLEW |
sc_fem/mal_ph |
adult syllable count by different talker types |
ACLEW |
pc_adu_ph |
adult phoneme count (collapsing across males and females) |
ACLEW |
pc_fem/mal_ph |
adult phoneme count by different talker types |
ACLEW |
freq_n |
frequency of child voc out of all vocs based on number of vocalizations |
ACLEW,LENA |
freq_dur |
frequency of child voc out of all vocs based on duration of vocalizations |
ACLEW,LENA |
cry_voc_chi_ph |
number of child vocalizations that are crying |
ACLEW,LENA |
can_voc_chi_ph |
number of child vocs that are canonical |
ACLEW |
non_can_vpc_chi_ph |
number of child vocs that are non-canonical |
ACLEW |
sp_voc_chi_ph |
number of child vocs that are speech-like (can+noncan for ACLEW) |
ACLEW,LENA |
cry_voc_dur_chi_ph |
total duration of child vocalizations that are crying |
ACLEW,LENA |
can_voc_dur_chi_ph |
total duration of child vocs that are canonical |
ACLEW |
non_can_voc_dur_chi_ph |
total duration of child vocs that are non-canonical |
ACLEW |
sp_voc_dur_chi_ph |
total duration of child vocs that are speech-like (can+noncan for ACLEW) |
ACLEW,LENA |
avg_cry_voc_dur_chi |
average duration of child vocalizations that are crying |
ACLEW,LENA |
avg_cran_voc_dur_chi |
average duration of child vocs that are canonical |
ACLEW |
avg_non_can_voc_dur_chi |
average duration of child vocs that are non-canonical |
ACLEW |
avg_sp_voc_dur_chi |
average duration of child vocs that are speech-like (can+noncan for ACLEW) |
ACLEW,LENA |
lp_n |
linguistic proportion = (speech)/(cry+speech) based on number of vocalizations |
ACLEW,LENA |
cp_n |
canonical proportion = canonical /(can+noncan) based on number of vocalizations |
ACLEW |
lp_dur |
linguistic proportion = (speech)/(cry+speech) based on duration of vocalizations |
ACLEW,LENA |
cp_dur |
canonical proportion = canonical /(can+noncan) based on duration of vocalizations |
ACLEW |
Note
Average rates are expressed in counts/hour (for events) or in seconds/hour (for durations).
LENA Metrics
$ child-project metrics /path/to/dataset output.csv lena --help
usage: child-project metrics path destination lena [-h] [--threads THREADS]
set
positional arguments:
set name of the LENA its annotations set
optional arguments:
-h, --help show this help message and exit
--threads THREADS amount of threads to run on
ACLEW Metrics
$ child-project metrics /path/to/dataset output.csv aclew --help
usage: child-project metrics path destination aclew [-h] [--vtc VTC]
[--alice ALICE]
[--vcm VCM]
[--threads THREADS]
optional arguments:
-h, --help show this help message and exit
--vtc VTC vtc set
--alice ALICE alice set
--vcm VCM vcm set
--threads THREADS amount of threads to run on
Period-aggregated metrics
The Period Metrics pipeline aggregates vocalizations for each time-of-the-day-unit based on a period specified by the user.
For instance, if the period is set to 15Min
(i.e. 15 minutes), vocalization rates will be reported for each
recording and time-unit (e.g. 09:00 to 09:15, 09:15 to 09:30, etc.).
The output dataframe has \(r \times p\) rows, where \(r\) is the amount of recordings (or children if the -by
option is set to child_id
), and \(p\) is the
amount of time-bins per day (i.e. \(24 \times 4=96\) for a 15-minute period).
The output dataframe includes a period
column that contains the onset of each time-unit in HH:MM:SS format.
The duration
columns contains the total amount of annotations covering each time-bin, in milliseconds.
If --by
is set to e.g. child_id
, then the values for each time-bin will be the average rates across
all the recordings of every child.
$ child-project metrics /path/to/dataset output.csv period --help
usage: child-project metrics path destination period [-h] --set SET --period
PERIOD
[--period-origin PERIOD_ORIGIN]
[--threads THREADS]
optional arguments:
-h, --help show this help message and exit
--set SET annotations set
--period PERIOD time units to aggregate (optional); equivalent to
``pandas.Grouper``'s freq argument.
--period-origin PERIOD_ORIGIN
time origin of each time period; equivalent to
``pandas.Grouper``'s origin argument.
--threads THREADS amount of threads to run on
..note:
Average rates are expressed in seconds/hour regardless of the period.