Getting some data

You can either have some data of your own that you would like to use the package on, or you may know of some datasets that are already in this format that you’d like to reuse.

It may be easier to start with an extant dataset. Here is the list that we know exists. Please note that the large majority of these data are NOT public, and thus if you cannot retrieve them, this means you need to get in touch with the data managers.

Public data sets

We have prepared a public dataset for testing purposes which is based on the VanDam Public Daylong HomeBank Corpus; VanDam, Mark (2018). VanDam Public Daylong HomeBank Corpus. doi:10.21415/T5388S.

From the LAAC team

List of LAAC datasets

Name

Authors

Location

Recordings

Duration (h)

Namibia

Gandhi

https://github.com/LAAC-LSCP/namibia-data

113

1449

Solomon

Sarah

https://github.com/LAAC-LSCP/solomon-data

388

5954

Tsimane 2017

https://github.com/LAAC-LSCP/tsimane2017-data

41

556

png 2019

https://github.com/LAAC-LSCP/png2019-data

51

760

Vanuatu

unavailable

53

289

EL1000

The EL1000 dataset contains several corpora accessible upon request.

Other private datasets

We know of no other private datasets at present, but we hope one day to be able to use datalad’s search feature