Common Text

Sample Texts for Mozilla's Common Voice

The main GitHub repository includes all the source files for the text corpus, iOS and Android apps, as well as the server to run the service. Here, all sample texts are located in the server/data fdirectory.

This repository, which originally started as a GitHub gist to count word occurence in Common Voice corpus, lists all Common Voice texts which are

Available languages

A full list of languages are available on the Common Voice website. Note that not all languages shown in this repository are officially launched, either due to localization problems or lack of text corpus.

Building

To run the scripts, make sure that you already have a copy of Common Voice repository on the same directory where you will put/clone the common-text directory. For simplicity I recommend to locate both under your Home directory.

./
|-common-text/
| |-scripts/
| | |-cv-count-latin.sh  // Script
| |-stats/
| | |-(Locale)/
| | | |-...              // Copy host
| |-...
|-voice-web/
| |-android/
| |-common/
| |-docker/
| |-docs/
| |-ios/
| |-locales/
| |-nubis/
| |-scripts/
| |-server/
| | |-data/
| | | |-(Locale)/
| | | | |-...            // Copy target
| | |-src/
| | |-...
| |-web/
| |-...
|-...

Contributing to this project

I welcome any pull requests on improving the extraction scripts. As of now it is implemented in bash (Linux) and does not work for non-Latin scripts (e.g. Arabic, Chinese).

If you would like to contribute more sample texts to this repository, please visit the Common Voice Sentence Collector. Any direct contributions to the sample texts will be overwritten by the texts hosted in the Common Voice.

To learn more about this project, or start contributing, visit voice.mozilla.org.

License

This project is licensed under Mozilla Public License, 2.0. See LICENSE file or https://mozilla.org/MPL/2.0/ for license details.

In accordance to Common Voice database license requirements, sample texts (located under stats//raw/ directory must be released under Public Domain (or similar licenses such as CC0, Unlicense, and WTFPL).

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
en-US		en-US
scripts		scripts
stats		stats
texts/en		texts/en
LICENSE		LICENSE
LICENSE-CC0.txt		LICENSE-CC0.txt
LICENSE-MIT.txt		LICENSE-MIT.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Common Text

Sample Texts for Mozilla's Common Voice

Available languages

Building

Contributing to this project

License

About

Licenses found

Releases

Packages

Languages

License

Licenses found

reinhart1010/common-text

Folders and files

Latest commit

History

Repository files navigation

Common Text

Sample Texts for Mozilla's Common Voice

Available languages

Building

Contributing to this project

License

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages