Replies: 1 comment
-
Hi, we are preparing a colab tutorial about how to modify the language dictionary to solve the unknown character problem. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
When I develop a language based OCR system, I faced a problem that the recogition fed an image with a unknown token .
etc. Lets say, I develop a french OCR model with french charset which has À Á Â characters, but when recoginition image is based on other lanugage like German character Ä.
In prediction, it could be predicted as any character between À Á Â or even general capital A in good case, however it could be predicted as anything ridiculous, such as character E.
I was wondering if there is great way to handle this instead of developing a Latin system model which needs amount of data in different language.
I understand a couple bad options:
or
<unk>
. (ps: It souds great to common sence but it does not make too much sence for classifiction problem in deep learning.)or
or
any great ideas?
Beta Was this translation helpful? Give feedback.
All reactions