-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for other languages #306
Comments
I wish this addon can support Chinese, and you may use https://github.com/fxsjy/jieba or https://github.com/hankcs/HanLP as the backend |
It does support Chinese (UDPipe lemmatization and sentiment analysis). Word embedding is upcoming. |
Ah, do you mean for tokenization? We have absolutely no clue how to work with Chinese, so someone else would have to set this up (especially for tests). HanLP doesn't have English documentation, but Jieba seems nice! |
English version of HanLP is here: https://github.com/hankcs/HanLP/tree/master. I have forked this repo, translated some widgets to chinese and added some Chinese tokenization support. but I have no idea of what nlp is, so I don't know how to do next. (I am a teacher who teaches machine learning in China, and my students know little English, so I translated it) |
Ok, I've checked both. From what I gathered, the only real issue in Chinese text processing is word segmentation. So we would need a specialized Chinese segmenter. This could be added to the new Preprocess Text as a separate option. I would close this issue, as it is too broad and open a separate specific one. Continued in #536. |
Text version
3.5.dev-
Orange version
0.2.5
Expected behavior
Orange supports many key languages.
Actual behavior
No support for Latin, (old) Greek, Polish... Poor support for French, German, Spanish, Portuguese. Think about Chinese, Hindu and Arabic as well.
Steps to reproduce the behavior
Additional info (worksheets, data, screenshots, ...)
The text was updated successfully, but these errors were encountered: