Zooniverse
Introduction
We are providing here a pipeline to create, upload and analyse long format recordings using the Zooniverse citizen science platform.
We have an open project aimed at adding vocal maturity labels to segments LENA labeled as being key child in Zooniverse (https://www.zooniverse.org/projects/chiarasemenzin/maturity-of-baby-sounds).
If you would like your data labeled with this project, here is what you’d need to do.
Get in touch with us, so we know you are interested!
Have someone trustworthy & with some coding skills (henceforth, the RA) create a database using the formatting instructions (see Datasets structure).
Have the RA create an account on Zooniverse (top right of zooniverse.org) for them and yourself, & provide us with both handles. The RA should first update the team section to add you (have ready a picture and a blurb). The RA can also add your institution’s logo if you’d like. Both of these are done in the lab section.
The RA will then follow the instructions in the present README to create subjects and push up your data – see below.
We also ask the RA to pitch in and help answer questions in the forum, at least one comment a day.
You can visit the stats section to look at how many annotations are being done.
You can also use this code and your own knowledge to set up a new project of your own.
Note
This is the documentation for the Zooniverse pipeline of the package. We also provide separately a more detailed step-by-step tutorial for creating a campaign of classification using Zooniverse and ChildProject.
Overview
$ child-project zooniverse --help
usage: child-project zooniverse [-h]
{extract-chunks,upload-chunks,link-orphan-subjects,reset-orphan-subjects,retrieve-classifications}
...
positional arguments:
{extract-chunks,upload-chunks,link-orphan-subjects,reset-orphan-subjects,retrieve-classifications}
action
extract-chunks extract chunks to <destination>, and exports the
metadata inside of this directory
upload-chunks upload chunks and updates chunk state
link-orphan-subjects
upload chunks and updates chunk state
reset-orphan-subjects
upload chunks and updates chunk state
retrieve-classifications
retrieve classifications and save them as
<destination>
optional arguments:
-h, --help show this help message and exit
Chunk extraction
The extract-chunks
pipeline creates wav and mp3 files for each chunk of audio to be classified on Zooniverse.
It also saves a record of all these chunks into a CSV dataframe.
This record can then be provided to the upload-chunks
command, in order to upload
the chunks to zooniverse.
Note
extract-chunks
will require the list of segments to classify, which are provided as a CSV dataframe with three columns:
recording_filename
, segment_onset
, and segment_offset
. The path to this dataframe has to be specified with the
--segments
parameter.
The list of segments can be generated with any of the samplers we provide (see Samplers), but custom lists may also be provided.
Optionally, the segments provided to the pipeline can be split into chunks of the desired duration. By setting this duration to sufficently low values (e.g. 500 milliseconds), one can ensure that no meaningful information could be recovered while listening to the audio on Zooniverse. This is useful when the segments of audio provided to the pipeline may contain confidential information.
$ child-project zooniverse extract-chunks /path/to/dataset --help
usage: child-project zooniverse extract-chunks [-h] --keyword KEYWORD
[--chunks-length CHUNKS_LENGTH]
[--chunks-min-amount CHUNKS_MIN_AMOUNT]
[--spectrogram] --segments
SEGMENTS --destination
DESTINATION [--profile PROFILE]
[--threads THREADS]
path
positional arguments:
path path to the dataset
optional arguments:
-h, --help show this help message and exit
--keyword KEYWORD export keyword
--chunks-length CHUNKS_LENGTH
chunk length (in milliseconds). if <= 0, the segments
will not be split into chunks (default value: 0)
--chunks-min-amount CHUNKS_MIN_AMOUNT
minimum amount of chunks to extract from a segment
(default value: 1)
--spectrogram the extraction generates a png spectrogram (default
False)
--segments SEGMENTS path to the input segments dataframe
--destination DESTINATION
destination
--profile PROFILE Recording profile to extract the audio clips from. If
not specified, raw recordings will be used
--threads THREADS how many threads to run on
If it does not exist, DESTINATION is created. Audio chunks are saved in
wav and mp3 in DESTINATION/chunks
. Metadata is stored in a CSV file
into DESTINATION/
.
The output dataframe will contain the following columns:
index |
self-generated integer index |
recording_filename |
recording from which the chunk as extracted |
onset |
onset timestamp of the chunk within the recording |
offset |
offset timestamp of the chunk within the recording |
segment_onset |
onset timestamp of the segment from which the chunk was extracted |
segment_offset |
offset timestamp of the segment from which the chunk was extracted |
wav |
name of the wav file |
mp3 |
name of the mp3 file |
png |
name of the png file if spectrogram option |
date_extracted |
date at which the chunk was extracted |
uploaded |
boolean flag set to True if the chunk was uploaded to Zooniverse, False otherwise |
project_id |
zooniverse project ID |
subject_set |
name of the Zooniverse subject set |
zooniverse_id |
subject’s Zooniverse ID |
keyword |
custom keyword provided by the user to label the chunks |
Chunk upload
Once the chunks have been extracted, the next step is to upload them to Zooniverse. Note that due to quotas, it is recommended to upload only a few at time (e.g. 1000 per day).
You will need to provide the numerical id of your Zooniverse project, as well as your Zooniverse credentials.
child-project zooniverse upload-chunks
uploads as many batches of audio chunks as specified to Zooniverse, and
updates the chunks metadata accordingly, by setting the zooniverse_id field and uploaded to True.
$ child-project zooniverse upload-chunks /path/to/dataset --help
usage: child-project zooniverse upload-chunks [-h] --chunks CHUNKS
--project-id PROJECT_ID
--set-name SET_NAME
[--amount AMOUNT]
[--zooniverse-login ZOONIVERSE_LOGIN]
[--zooniverse-pwd ZOONIVERSE_PWD]
[--ignore-errors]
[--record-orphan]
optional arguments:
-h, --help show this help message and exit
--chunks CHUNKS path to the chunk CSV dataframe
--project-id PROJECT_ID
zooniverse project id
--set-name SET_NAME subject set display name
--amount AMOUNT amount of chunks to upload
--zooniverse-login ZOONIVERSE_LOGIN
zooniverse login. If not specified, the program
attempts to get it from the environment variable
ZOONIVERSE_LOGIN instead
--zooniverse-pwd ZOONIVERSE_PWD
zooniverse password. If not specified, the program
attempts to get it from the environment variable
ZOONIVERSE_PWD instead
--ignore-errors keep uploading even when a subject fails to upload for
some reason
--record-orphan list correctly create subjects as uploaded even if
linking to a subject set failed
Classifications retrieval
$ child-project zooniverse retrieve-classifications /path/to/dataset --help
usage: child-project zooniverse retrieve-classifications [-h] --destination
DESTINATION
--project-id
PROJECT_ID
[--zooniverse-login ZOONIVERSE_LOGIN]
[--zooniverse-pwd ZOONIVERSE_PWD]
--chunks CHUNKS
[CHUNKS ...]
optional arguments:
-h, --help show this help message and exit
--destination DESTINATION
output CSV dataframe destination
--project-id PROJECT_ID
zooniverse project id
--zooniverse-login ZOONIVERSE_LOGIN
zooniverse login. If not specified, the program
attempts to get it from the environment variable
ZOONIVERSE_LOGIN instead
--zooniverse-pwd ZOONIVERSE_PWD
zooniverse password. If not specified, the program
attempts to get it from the environment variable
ZOONIVERSE_PWD instead
--chunks CHUNKS [CHUNKS ...]
list of chunks
Retrieve classifications and save them as DESTINATION
.
The optional --chunks
parameter can be used to match the classifications with the chunks metadata. Only the classifications
that match the metadata will be saved.
Warning
Retrieving chunks may take a long time for large projects.