Skip to content

Latest commit

 

History

History

ASR

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Readme

This recipe contains data preparation for the VoxPopuli dataset (pdf). At the moment, without model training.

audio per language

language Size Hrs. untranscribed Hrs. transcribed
bg 295G 17.6K -
cs 308G 18.7K 62
da 233G 13.6K -
de 379G 23.2K 282
el 305G 17.7K -
en 382G 24.1K 543
es 362G 21.4K 166
et 179G 10.6K 3
fi 236G 14.2K 27
fr 376G 22.8K 211
hr 132G 8.1K 43
hu 297G 17.7K 63
it 361G 21.9K 91
lt 243G 14.4K 2
lv 217G 13.1K -
mt 147G 9.1K -
nl 322G 19.0K 53
pl 348G 21.2K 111
pt 300G 17.5K -
ro 296G 17.9K 89
sk 201G 12.1K 35
sl 190G 11.3K 10
sv 272G 16.3K -
total 6.3T 384K 1791