-
Notifications
You must be signed in to change notification settings - Fork 7.8k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multilingual OCR Development Plan #1048
Comments
Traditional Mongolian |
I would love to work on "Bangla" |
I very happy if you do that with Vietnamese |
How about Arabic? That would be great. |
I've find out that PADDLE OCR algorithm cannot recognize some special characters (such as comma, semicolon, or dot...) when the language is english. Is there any possible way that i can fix this problem |
I would like to contribute to add the Burmese language. Is it only needed to submit two text files - dict & corpus? How further process do we need to provide? |
Adding "Bangla" will be grate for the people in south Asia |
Adding "Traditional Chinese (zh-TW)" would be great support. |
Do you have preTrained Russian recognition model? |
Hi adding " Tamil" language will be very grateful. Tamil_dict.txt Need more help plz refer this issue: |
I can help with Turkish language. |
I can help with polish language. |
@GmGniap Hello, Can you provide the corpus file of Burmese Language? |
@shahidul56 Hello, Can you provide the corpus file of Bangla Languag? |
All models updated in 2021.1.21 cannot be downloaded with following Error: |
Sorry for the invalid links and all of them have been revised now, you can try again. |
#1847, seems to be ongoing. |
@redcinelli Thank you very much. The Vietnamese model is in training and will be available soon~ |
@grasswolfs model name for Turkish should be "tr" instead of "tk", it is the widely used abbreviation for Turkish. |
I have also opened a pr for Turkish dict and corpora: #1856 |
Thanks @habout632 for adding Southeast Asian languages via #1896 |
Here is a dictionary for Greek. |
Hi , did we have a model to detect all English characters along with special characters like.,"() |
hi, thank you for the great work! I just wonder whether you will add traditional Chinese to the general model? Right now, the general model can support Chinese(sim), English and numbers. |
Please add Tajik Language |
@fcakyon @D-DanielYang @xmy0916 I would like to contribute to Bangla Dictionary and Corpus. Can I do that? Also, I have a few queries to ask -
Thanks in advance |
Please add Indonesia (id) and English (en) together |
Do you have any plan for Vietnamese release? |
Is it sufficient to change the file german_dict.txt if one wants to detect Fraktur a historic german script instead of the current script form? The dictionary which was learnt for the German language should be the same? For tesseract there is one trained file for Fraktur to ocr scan historic documents. |
need indonesian language please |
Hi Dear plz add the bangla and english support. I have attach both the file for bangla |
Hi team. Great work on Paddle, it's an amazing OCR engine! Can we please have Hebrew support in multilanguage models ? Thanks ! |
Dear Team, Tnx for your reply. I am from Bangladesh. I have already
submitted both files like dict and corpus for bangla. I would appreciate if
you could add bangla support.
Thank you.
Zahir
…On Fri, Jul 28, 2023, 1:50 AM Edward Li ***@***.***> wrote:
Hi team. Great work on Paddle, it's an amazing OCR engine! Can we please
have *Hebrew* support in multilanguage models ?
Thanks !
—
Reply to this email directly, view it on GitHub
<#1048 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD6CAOC6MTVJDVXY4W65TWDXSLBCHANCNFSM4TCPRJ6Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Can you provide for any ancient scripts? |
I'm trying with my private data, but the result very poor |
Sorry for my stupid question, I am novice at DL: What difference between Inference model and trained model? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
I created a PR for Bangla |
Does this list contain the latest models? If i want to fine tune for example german model do i use this link from this page to download the pretrained model? If so what yml file should i use? How do i know what is the architecture of these models? |
Please add Tajik Language |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Guideline for new language requests
If you want to request a new language support, a PR with 2 following files are needed:
In folder ppocr/utils/dict,
it is necessary to submit the dict text to this path and name it with
{language}_dict.txt
that contains a list of all characters. Please see the format example from other files in that folder.In folder ppocr/utils/corpus,
it is necessary to submit the corpus to this path and name it with
{language}_corpus.txt
that contains a list of words in your language.Maybe, 50000 words per language is necessary at least.
Of course, the more, the better.
call for contributions to add new language support for PaddleOCR.
For anyone might be insterested in traing the new language model, Guidance to train the model is provided. We are calling contributions to add new language support for PaddleOCR.
If your language has unique elements, please tell me in advance within any way, such as useful links, wikipedia and so on.
The text was updated successfully, but these errors were encountered: