Audio processors

Overview

The package provides several tools for processing the recordings.

$ child-project process --help
usage: child-project process [-h] [--threads THREADS]
                             [--input-profile INPUT_PROFILE]
                             path {basic,vetting,channel-mapping,standard} ...

positional arguments:
  path                  path to the dataset
  {basic,vetting,channel-mapping,standard}
                        processor
    basic               basic audio conversion
    vetting             vetting
    channel-mapping     channel mapping
    standard            standard audio conversion

optional arguments:
  -h, --help            show this help message and exit
  --threads THREADS     amount of threads running conversions in parallel (0 =
                        uses all available cores)
  --input-profile INPUT_PROFILE
                        profile of input recordings (process raw recordings by
                        default)

Basic audio conversion

Converts all recordings in a dataset to a given encoding. Converted audios are stored into recordings/converted/<profile-name>.

$ child-project process /path/to/dataset basic test --help
usage: child-project process path basic [-h] --format FORMAT --codec CODEC
                                        --sampling SAMPLING [--skip-existing]
                                        [--recordings RECORDINGS [RECORDINGS ...]]
                                        name

positional arguments:
  name                  name of the export profile

optional arguments:
  -h, --help            show this help message and exit
  --format FORMAT       audio format (e.g. wav)
  --codec CODEC         audio codec (e.g. pcm_s16le)
  --sampling SAMPLING   sampling frequency (e.g. 16000)
  --skip-existing
  --recordings RECORDINGS [RECORDINGS ...]
                        list of recordings to process, separated by
                        whitespaces; only values of 'recording_filename'
                        present in the metadata are supported.

Example:

child-project process /path/to/dataset basic standard --format=wav --sampling=16000 --codec=pcm_s16le

Processing can be restricted to a white-list of recordings only using the --recordings option:

child-project process /path/to/dataset basic standard --format=wav --sampling=16000 --codec=pcm_s16le --recordings audio1.wav audio2.wav

Values provided to this option should be existing recording_filename values in metadata/recordings.csv.

The --skip-existing switch can be used to skip previously processed files.

Standard audio conversion

Same as the basic processor but using standard parameters for the conversion: - single-channel (first channel is kept) - 16KHz sampling rate - codec pcm_s16le - wav format Audios are exported to recordings/converted/standard.

$ child-project process /path/to/dataset standard --help
usage: child-project process path standard [-h] [--skip-existing]
                                           [--recordings RECORDINGS [RECORDINGS ...]]

optional arguments:
  -h, --help            show this help message and exit
  --skip-existing
  --recordings RECORDINGS [RECORDINGS ...]
                        list of recordings to process, separated by
                        whitespaces; only values of 'recording_filename'
                        present in the metadata are supported.

Example:

child-project process . standard

Values provided to the option --recordings should be existing recording_filename values in metadata/recordings.csv.

The --skip-existing switch can be used to skip previously processed files.

Multi-core audio conversion with slurm on a cluster

If you have access to a cluster with slurm, you can use a command like the one below to batch-convert your recordings. Please note that you may need to change some details depending on your cluster (eg cpus per task). If needed, refer to the slurm user guide

sbatch --mem=64G --time=5:00:00 --cpus-per-task=4 --ntasks=1 -o namibia.txt child-project process --threads 4 /path/to/dataset basic standard --format=wav --sampling=16000 --codec=pcm_s16le

Vetting

The vetting pipeline mutes segments of the recordings provided by the user while preserving the duration of the audio files. This technique can be used to remove speech that might contain confidential information before releasing the audio.

The input needs to be a CSV dataframe with the following columns: recording_filename, segment_onset, segment_onset. The timestamps need to be expressed in milliseconds.

$ child-project process /path/to/dataset vetting test --help
usage: child-project process path vetting [-h] --segments-path SEGMENTS_PATH
                                          [--recordings RECORDINGS [RECORDINGS ...]]
                                          name

positional arguments:
  name                  name of the export profile

optional arguments:
  -h, --help            show this help message and exit
  --segments-path SEGMENTS_PATH
                        path to the CSV dataframe containing the segments to
                        be vetted
  --recordings RECORDINGS [RECORDINGS ...]
                        list of recordings to process, separated by commas;
                        only values of 'recording_filename' present in the
                        metadata are supported.

Channel mapping

The channel mapping pipeline is meant to be used with multi-channel audio recordings, such as those produced by the BabyLogger. It allows to filter or to combine channels from the original recordings at your convenience.

$ child-project process /path/to/dataset channel-mapping test --help
usage: child-project process path channel-mapping [-h] --channels CHANNELS
                                                  [CHANNELS ...]
                                                  [--recordings RECORDINGS [RECORDINGS ...]]
                                                  name

positional arguments:
  name                  name of the export profile

optional arguments:
  -h, --help            show this help message and exit
  --channels CHANNELS [CHANNELS ...]
                        lists of weigths for each channel
  --recordings RECORDINGS [RECORDINGS ...]
                        list of recordings to process, separated by commas;
                        only values of 'recording_filename' present in the
                        metadata are supported.

In mathematical terms, assuming the input recordings have \(n\) channels with signals \(s_{j}(t)\); If the output recordings should have \(m\) channels, the user defines a matrix of weights \(w_{ij}\) with \(m\) rows and \(n\) columns, such as the signal of each output channel \(s'_{i}(t)\) is:

\[s'_{i}(t) = \sum_{j=1}^n w_{ij} s_{j}(t)\]

The weights matrix is defined through the --channels parameters.

The weights for each output channel are separated by blanks. For a given output channel, the weights of each input channels should be separated by commas.

For instance, if one would like to use the following weight matrix (which transforms 4-channel recordings into 2-channel audio):

\[\begin{split}\begin{pmatrix} 0 & 0 & 1 & 1 \\ 0.5 & 0.5 & 0 & 0 \end{pmatrix}\end{split}\]

Then the correct values for the –channels parameters should be:

--channels 0,0,1,1 0.5,0.5,0,0

To make things clear, we provide a couple of examples below.

Muting all channels except for the first

Let’s assume that the original recordings have 4 channels. The following command will extract the first channel from the recordings:

child-project process /path/to/dataset channel-mapping channel1 --channels 1,0,0,0

Invert a stereo signal

Let’s assume that the original recordings are stereo signals, i.e. they have two channels. The command below will flip the two channels:

child-project process /path/to/dataset channel-mapping channel1 --channels 0,1 --channels 1,0