-
-
Notifications
You must be signed in to change notification settings - Fork 315
Adding New Languages
Alexander Veysov edited this page Sep 24, 2020
·
2 revisions
As a service for community we can easily add a CE model for any language that has a Unicode alphabet pro bono.
A few general rules of thumb:
- Generally it does not make sense to just use Common Voice - the resulting model will have problems with generalization;
- Usually it takes some effort to collect enough data to build a decent model;
- Ideally the proper way is to source as much training data as possible, but test/val datasets may cover many more domains to test generalization;
- The more diverse data you have - the better model will be;
Please do not hesitate to contact us directly for advice. CE models will always stay public for all languages. From time-to-time we will re-train all of our models when we achieve some fundamental breakthroughs in our research.
At this time for a number of reasons we decided not to share code for training models.
Please contact us directly for a quote.
Currently, without any hard deadlines, we are planning on supporting the following major languages both with CE and EE versions with the same attention to quality:
- French
- Italian
- Polish
- Czech