Managing annotations
Warning
You should never run two of the following commands in parallel. All of them need to be run sequentially, otherwise the index may get corrupted.
If you need to parallelize the processing to speed it up,
you can use the --threads
option, which is built-in
in all of our tools that might require it.
Importation
Single annotation importation
Annotations can be imported one by one or in bulk. Annotation importation does the following :
Convert all input annotations from their original format (e.g. rttm, eaf, textgrid..) into the CSV format defined at Annotation importation input format and stores them into
annotations/
.Registers them to the annotation index at
metadata/annotations.csv
Use child-project import-annotations
to import a single annotation.
$ child-project import-annotations /path/to/dataset --help
usage: child-project import-annotations [-h] [--annotations ANNOTATIONS]
[--set SET]
[--recording_filename RECORDING_FILENAME]
[--time_seek TIME_SEEK]
[--range_onset RANGE_ONSET]
[--range_offset RANGE_OFFSET]
[--raw_filename RAW_FILENAME]
[--format {csv,vtc_rttm,vcm_rttm,alice,its,TextGrid,eaf,cha,NA}]
[--filter FILTER] [--threads THREADS]
[--overwrite-existing]
source
convert and import a set of annotations
positional arguments:
source project path
optional arguments:
-h, --help show this help message and exit
--annotations ANNOTATIONS
path to input annotations dataframe (csv) [only for
bulk importation]
--set SET name of the annotation set (e.g. VTC, annotator1,
etc.)
--recording_filename RECORDING_FILENAME
recording filename as specified in the recordings
index
--time_seek TIME_SEEK
shift between the timestamps in the raw input
annotations and the actual corresponding timestamps in
the recordings (in milliseconds)
--range_onset RANGE_ONSET
covered range onset timestamp in milliseconds (since
the start of the recording)
--range_offset RANGE_OFFSET
covered range offset timestamp in milliseconds (since
the start of the recording)
--raw_filename RAW_FILENAME
annotation input filename location, relative to
`annotations/<set>/raw`
--format {csv,vtc_rttm,vcm_rttm,alice,its,TextGrid,eaf,cha,NA}
input annotation format
--filter FILTER source file to target. this field is dedicated to rttm
and ALICE annotations that may combine annotations
from several recordings into one same text file.
--threads THREADS amount of threads to run on
--overwrite-existing, --ow
overwrites existing annotation file if should generate
the same output file (useful when reimporting
Example:
child-project import-annotations /path/to/dataset \
--set eaf \
--recording_filename sound.wav \
--time_seek 0 \
--raw_filename example.eaf \
--range_onset 0 \
--range_offset 300 \
--format eaf
Find more information about the allowed values for each parameter, see Annotation importation input format.
Bulk importation
Use this to do bulk importation of many annotation files.
child-project import-annotations /path/to/dataset --annotations /path/to/dataframe.csv
The input dataframe /path/to/dataframe.csv
must have one entry per
annotation to import, according to the format specified at Annotation importation input format.
Rename a set of annotations
Rename a set of annotations. This will move the annotations themselves,
and update the index (metadata/annotations.csv
) accordingly.
$ child-project rename-annotations /path/to/dataset --help
usage: child-project rename-annotations [-h] --set SET --new-set NEW_SET
[--recursive] [--ignore-errors]
source
rename a set of annotations by moving the files and updating the index
accordingly
positional arguments:
source project path
optional arguments:
-h, --help show this help message and exit
--set SET set to rename
--new-set NEW_SET new name for the set
--recursive enable recursive mode
--ignore-errors proceed despite errors
Example:
child-project rename-annotations /path/to/dataset --set vtc --new-set vtc_1
Remove a set of annotations
This will deleted converted annotations associated to a given set and remove them from the index.
$ child-project remove-annotations /path/to/dataset --help
usage: child-project remove-annotations [-h] --set SET [--recursive] source
remove converted annotations of a given set and their entries in the index
positional arguments:
source project path
optional arguments:
-h, --help show this help message and exit
--set SET set to remove
--recursive enable recursive mode
child-project remove-annotations /path/to/dataset --set vtc
ITS annotations anonymization
LENA .its files might contain information that can help recover the identity of the participants, which may be undesired. This command anonymizes .its files, based on a routine by HomeBank.
$ child-project anonymize /path/to/dataset --help
usage: child-project anonymize [-h] --input-set INPUT_SET --output-set
OUTPUT_SET
[--replacements-json-dict REPLACEMENTS_JSON_DICT]
path
Anonymize a set of its annotations (`input_set`) and saves it as `output_set`.
positional arguments:
path project path
optional arguments:
-h, --help show this help message and exit
--input-set INPUT_SET
input annotation set
--output-set OUTPUT_SET
output annotation set
--replacements-json-dict REPLACEMENTS_JSON_DICT
path to the replacements configuration (json dict)
child-project anonymize /path/to/dataset --input-set lena --output-set lena/anonymous
Merge annotation sets
Some processing tools use pre-existing annotations as an input,
and label the original segments with more information. This is
typically the case of ALICE, which labels segments generated
by the VTC. In this case, one might want to merge the ALICE
and VTC annotations altogether. This can be done with child-project merge-annotations
.
$ child-project merge-annotations /path/to/dataset --help
usage: child-project merge-annotations [-h] --left-set LEFT_SET --right-set
RIGHT_SET --left-columns LEFT_COLUMNS
--right-columns RIGHT_COLUMNS
--output-set OUTPUT_SET
[--threads THREADS]
source
merge segments sharing identical onset and offset from two sets of annotations
positional arguments:
source project path
optional arguments:
-h, --help show this help message and exit
--left-set LEFT_SET left set
--right-set RIGHT_SET
right set
--left-columns LEFT_COLUMNS
comma-separated columns to merge from the left set
--right-columns RIGHT_COLUMNS
comma-separated columns to merge from the right set
--output-set OUTPUT_SET
name of the output set
--threads THREADS amount of threads to run on (default: 1)
child-project merge-annotations /path/to/dataset \
--left-set vtc \
--right-set alice/output \
--left-columns speaker_type \
--right-columns phonemes,syllables,words \
--output-set alice
Intersect annotations
In order to combine annotations from different annotators, or to compare them, it is necessary to calculate which portions of the audio have been annotated by all of them. This can be done from the command-line interface:
$ child-project intersect-annotations /path/to/dataset --help
usage: child-project intersect-annotations [-h] --destination DESTINATION
--sets SETS [SETS ...]
[--annotations ANNOTATIONS]
source
calculate the intersection of the annotations belonging to the given sets
positional arguments:
source project path
optional arguments:
-h, --help show this help message and exit
--destination DESTINATION
output CSV dataframe destination
--sets SETS [SETS ...]
annotation sets to intersect
--annotations ANNOTATIONS
path a custom input CSV dataframe of annotations to
intersect. By default, the whole index of the project
will be used.
Example:
child-project intersect-annotations /path/to/dataset \
--sets its textgrid/annotator1 textgrid/annotator2 textgrid/annotator3 \
--destination intersection.csv
The output dataframe has the same format as the annotations index (see Annotations index).