The 'Spotify Subset' includes file names from the Spotify Dataset (Tanaka et al. (2022)) for classifying language variations in Brazilian Portuguese. The selection of file names resulted from applying a filter to the original dataset metadata, focusing on idiomatic expressions and names or acronyms of locations.
Speakers | Duration | Episodes | Female | Male |
---|---|---|---|---|
92 | ~15hrs 24 min | 52 | 43 | 38 |
Accent | Speaker | Duration | Female | Male |
---|---|---|---|---|
Rio de Janeiro | 5 | 49 min | 2 | 3 |
Bahia | 4 | 1hr 27 min | 4 | |
Mato Grosso do Sul | 4 | 18 min | 3 | 1 |
Maranhão | 7 | 1hr 18 min | 2 | 3 |
Minas Gerais | ~35 | 5hrs 23 min | ~13 | ~22 |
Recife | 10 | 3hrs 45 min | ||
São Paulo | ~25 | 1hr 18 min | ~19 | ~7 |
Rio Grande do Sul | 2 | ~53 min | 2 |
Accent | Train_speakers | Dev_speakers | Test_speakers | Podcasts | Episodes | Duration | segments |
---|---|---|---|---|---|---|---|
RE | 69 | 23 | 11 | 15 | 57 | ~48.23 | 14,008 |
SP | 52 | 18 | 15 | 11 | 78 | ~30.88 | 11,906 |