-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possibility of removing Arabic diacritics from the headwords #366
Comments
Should we keep the original headword after the trimmed headword, to prevent duplicate headwords? أخذ (أَخَذ) |
You mean write-option I mean what if there are two entries with headwords that will be the same if we remove diacritics. For example کَتَب (to write) and کُتُب (books). |
Can they be alternatives for the same search word? I mean when I write in the search كتب |
Yes, but some dictionaries / formats don't support alternates. Also why two steps? You convert twice? |
Most dictionaries do support prefix searching I guess, no? Because this change doesn't have to be format-specific. |
Because when I use multidictionaries and writes أخذ I know that I will find it if I scrolled the arrow before pressing enter! |
You want to convert to slob? |
Slob, or txt, But I will need to convert it after that to mdx 🤷♂️ |
Okay. |
What about Tanwin / Nunation? |
Yes, remove them all, but keep "ؤ" "ئ" "ء" "ه" "ة" About آ and أ replace them by "ا" (it's very important to enrich arabic word search in arabic multidictionaries) |
It would be great for anyone to search in the Arabic Mo'ajams without he needs to write all these diacritics I hope and I'm sure you can do this "In Sha'a Allah" |
I pushed to this branch: Add flag |
Great 🙏 I will try it right now Is "replacement of آ and أ with ا" included too? |
Yes |
Perfectly done Thanks Saeed I appreciate that You are life saver Oh my god, You have just saved lifes of about 10 previously unusable Ar-En dictionaries 😍 |
Sorry Saeed Yes, this worked with Ar-En Morphology dictionary. But with other dictionaries it comes back with this error: python main.py Almawrid_Plus_ar-en.mdx Almawrid_Plus_ar-en.txt --trim-arabic-diacritics [CRITICAL] Writing file 'Almawrid_Plus_ar-en.txt' failed. Please download this dictionary and try: https://mega.nz/file/rUtCCB4T#ShmvtlDth0h_ANcNm1-Xq-laolz7g3lehRktCGmMF3I May the problem that there is headwords contradiction. Please a solution 🥺 |
I updated the branch. |
Perfectly done Many thanks |
I pushed to master branch. |
Great |
Hello Saeed
I have an Ar-En glossary; its headwords not just contain Arabic letters but also diacritics (A.K.A TASHKEEL / HARAKAT) above or below each letter. This makes the searching process in the dictionary very hard,
I need an option to delete the Arabic diacritics from the headwords during conversion with keeping the letters (without affecting the definitions).
I think there are python ways that could do that, put I'm a beginner 😰.
This is the Ar-En Morphology dictionary:
https://mega.nz/file/bdFyAZQD#QgHPND-rqbsICW3rMKhn6UaA4qt10VQvwllN08VoGK4
See this:
The text was updated successfully, but these errors were encountered: