Audio processors

Overview

The package provides several tools for processing the recordings.

$ child-project process --help
usage: child-project process [-h] [--threads THREADS]
                             [--input-profile INPUT_PROFILE]
                             path name {basic,vetting,channel-mapping} ...

positional arguments:
  path                  path to the dataset
  name                  name of the export profile
  {basic,vetting,channel-mapping}
                        processor
    basic               basic audio conversion
    vetting             vetting
    channel-mapping     channel mapping

optional arguments:
  -h, --help            show this help message and exit
  --threads THREADS     amount of threads running conversions in parallel (0 =
                        uses all available cores)
  --input-profile INPUT_PROFILE
                        profile of input recordings (process raw recordings by
                        default)

Basic audio conversion

Converts all recordings in a dataset to a given encoding. Converted audios are stored into recordings/converted/<profile-name>.

$ child-project process /path/to/dataset test basic --help
usage: child-project process path name basic [-h] --format FORMAT --codec
                                             CODEC --sampling SAMPLING
                                             [--split SPLIT] [--skip-existing]
                                             [--recordings RECORDINGS [RECORDINGS ...]]

optional arguments:
  -h, --help            show this help message and exit
  --format FORMAT       audio format (e.g. wav)
  --codec CODEC         audio codec (e.g. pcm_s16le)
  --sampling SAMPLING   sampling frequency (e.g. 16000)
  --split SPLIT         split duration (e.g. 15:00:00)
  --skip-existing
  --recordings RECORDINGS [RECORDINGS ...]
                        list of recordings to process, separated by commas;
                        only values of 'recording_filename' present in the
                        metadata are supported.

Example:

child-project process /path/to/dataset 16kHz basic --format=wav --sampling=16000 --codec=pcm_s16le

We typically run the following, to split long sound files every 15 hours, because the software we use for human annotation (ELAN, Praat) works better with audio that is maximally 15h long:

child-project process /path/to/dataset 16kHz basic --split=15:00:00 --format=wav --sampling=16000 --codec=pcm_s16le

Processing can be restricted to a white-list of recordings only using the --recordings option:

child-project process /path/to/dataset 16kHz basic --format=wav --sampling=16000 --codec=pcm_s16le --recordings audio1.wav audio2.wav

Values provided to this option should be existing recording_filename values in metadata/recordings.csv.

The --skip-existing switch can be used to skip previously processed files.

Multi-core audio conversion with slurm on a cluster

If you have access to a cluster with slurm, you can use a command like the one below to batch-convert your recordings. Please note that you may need to change some details depending on your cluster (eg cpus per task). If needed, refer to the slurm user guide

sbatch --mem=64G --time=5:00:00 --cpus-per-task=4 --ntasks=1 -o namibia.txt child-project process --threads 4 /path/to/dataset 16kHz basic --split=15:00:00 --format=wav --sampling=16000 --codec=pcm_s16le

Vetting

The vetting pipeline mutes segments of the recordings provided by the user while preserving the duration of the audio files. This technique can be used to remove speech that might contain confidential information before releasing the audio.

The input needs to be a CSV dataframe with the following columns: recording_filename, segment_onset, segment_onset. The timestamps need to be expressed in milliseconds.

$ child-project process /path/to/dataset test vetting --help
usage: child-project process path name vetting [-h] --segments-path
                                               SEGMENTS_PATH
                                               [--recordings RECORDINGS [RECORDINGS ...]]

optional arguments:
  -h, --help            show this help message and exit
  --segments-path SEGMENTS_PATH
                        path to the CSV dataframe containing the segments to
                        be vetted
  --recordings RECORDINGS [RECORDINGS ...]
                        list of recordings to process, separated by commas;
                        only values of 'recording_filename' present in the
                        metadata are supported.

Channel mapping

The channel mapping pipeline is meant to be used with multi-channel audio recordings, such as those produced by the BabyLogger. It allows to filter or to combine channels from the original recordings at your convenience.

$ child-project process /path/to/dataset test channel-mapping --help
usage: child-project process path name channel-mapping [-h] --channels
                                                       CHANNELS [CHANNELS ...]
                                                       [--recordings RECORDINGS [RECORDINGS ...]]

optional arguments:
  -h, --help            show this help message and exit
  --channels CHANNELS [CHANNELS ...]
                        lists of weigths for each channel
  --recordings RECORDINGS [RECORDINGS ...]
                        list of recordings to process, separated by commas;
                        only values of 'recording_filename' present in the
                        metadata are supported.

In mathematical terms, assuming the input recordings have \(n\) channels with signals \(s_{j}(t)\); If the output recordings should have \(m\) channels, the user defines a matrix of weights \(w_{ij}\) with \(m\) rows and \(n\) columns, such as the signal of each output channel \(s'_{i}(t)\) is:

\[s'_{i}(t) = \sum_{j=1}^n w_{ij} s_{j}(t)\]

The weights matrix is defined through the --channels parameters.

The weights for each output channel are separated by blanks. For a given output channel, the weights of each input channels should be separated by commas.

For instance, if one would like to use the following weight matrix (which transforms 4-channel recordings into 2-channel audio):

\[\begin{split}\begin{pmatrix} 0 & 0 & 1 & 1 \\ 0.5 & 0.5 & 0 & 0 \end{pmatrix}\end{split}\]

Then the correct values for the –channels parameters should be:

--channels 0,0,1,1 0.5,0.5,0,0

To make things clear, we provide a couple of examples below.

Muting all channels except for the first

Let’s assume that the original recordings have 4 channels. The following command will extract the first channel from the recordings:

child-project process /path/to/dataset channel1 channel-mapping --channels 1,0,0,0

Invert a stereo signal

Let’s assume that the original recordings are stereo signals, i.e. they have two channels. The command below will flip the two channels:

child-project process /path/to/dataset channel1 channel-mapping --channels 0,1 --channels 1,0