Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Romanian Corpus and Character set #9456

Closed
wants to merge 2 commits into from
Closed

Romanian Corpus and Character set #9456

wants to merge 2 commits into from

Conversation

the-ge
Copy link

@the-ge the-ge commented Mar 15, 2023

The Romanian corpus file is a cleaned version of the official Romanian Scrabble word list (https://dexonline.ro/scrabble), licensed under GPL (https://dexonline.ro/licenta). In addition to the base form of the words, it contains the inflexions and the diacriticless form (diacritics are mostly not used online). Please let me know if the corpus should be simplified.
I'm not sure if that's anything else I should add. Here's the Wikipedia page about the Romanian language: https://en.wikipedia.org/wiki/Romanian_language.

This is the same as #5881, which I closed because the pull request was done with a different email address, which in turn didn't let me sign the CLA.

the-ge added 2 commits March 15, 2023 21:38
The Romanian corpus file is a cleaned version of the official Romanian Scrabble word list (https://dexonline.ro/scrabble), licensed under GPL (https://dexonline.ro/licenta). In addition to the base form of the words, it contains the inflexions and the diacriticless form (diacritics are mostly not used online). Please let me know if it the corpus should be simplified.
I'm not sure if that's anything else I should add. Here's the Wikipedia page about the Romanian language: https://en.wikipedia.org/wiki/Romanian_language
@paddle-bot
Copy link

paddle-bot bot commented Mar 15, 2023

Thanks for your contribution!

@the-ge
Copy link
Author

the-ge commented Apr 13, 2023

Please provide some feedback as to what more needs to be done to merge the Romanian corpus and character set. I see that the Vietnamese PR (#7933) is in the limbo as well. Clearly, the steps outlined in Multilingual OCR Development Plan (#1048) are not enough.

@the-ge the-ge closed this by deleting the head repository Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants