Characteristics of the Romanian Corpus:

The Romanian Language Repository

Romanian is a Eastern - Romance language, that developed in Southeastern Europe during the 5th-8th centuries. Presently, it is spoken by approximately 24 – 26 million people as a native language and about 4 million people as a secondary language. Romanian is the official language in Romania, the Rep. Of Moldova, and parts of Serbia and Greece, and it is also spoken within communities of Romanian and Moldovan immigrants in the countries of the European Union, the United States, Canada and Australia.

What is a Corpus?

A corpus is essentially a collection of written text, transcribed spoken language, a combination of both, or transcribed video recordings of signed language samples.

Characteristics of the Romanian Corpus:

Corpus Type

Monitor corpus

Corpus Size

written data
- ~ 5,500,000 words
spoken data
- ~ 100,000 words of transcrived spoken data -all audio associated with the transcribed data

Representativenes

written data
- 16 registers
spoken data
- 2 registers

Authors

The corpus data come (currently) from 381 authors

Metadata

the metadata associated with each file contains the author's name, gender, language, and the BRC information.

Acknowledgements

We would like to thank Kenji Sagae, Ph.D., for suggestions and comments during the data collection. Aishwarya Jaggannath, Luci Sanchez Ortega, Victoria Boyd, Doina Midrigan and Diana Malancea for helping with data collection and data computerization.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 319 Commits
Corpus_Data		Corpus_Data
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Romanian Language Repository

What is a Corpus?

Characteristics of the Romanian Corpus:

Corpus Type

Corpus Size

Representativenes

Authors

Metadata

Acknowledgements

About

Releases

Packages

Contributors 2

lmidriganciochina/romaniancorpus

Folders and files

Latest commit

History

Repository files navigation

The Romanian Language Repository

What is a Corpus?

Characteristics of the Romanian Corpus:

Corpus Type

Corpus Size

Representativenes

Authors

Metadata

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages