Adding New Languages

🦄 Adding New Languages

Adding a CE-only Model

As a service for community we can easily add a CE model for any language that has a Unicode alphabet pro bono.

A few general rules of thumb:

Generally it does not make sense to just use Common Voice - the resulting model will have problems with generalization;
Usually it takes some effort to collect enough data to build a decent model;
Ideally the proper way is to source as much training data as possible, but test/val datasets may cover many more domains to test generalization;
The more diverse data you have - the better model will be;

Please do not hesitate to contact us directly for advice. CE models will always stay public for all languages. From time-to-time we will re-train all of our models when we achieve some fundamental breakthroughs in our research.

Model Training Code

At this time for a number of reasons we decided not to share code for training models.

Adding a EE Model

Please contact us directly for a quote.

Current Backlog

Currently, without any hard deadlines, we are planning on supporting the following major languages both with CE and EE versions with the same attention to quality:

French
Italian
Polish
Czech

header)

Home
Getting Started
- Quickstart
- PyTorch
- ONNX
Benchmarks:
Licensing:
- License
- CE and EE Tiers
Services:
- Model Adaptation
- Adding New Languages
TTS:
- SSML
FAQ

Provide feedback

Saved searches