-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong matching for Arabic #36
Comments
Hi Clémentine, I've been trying to work on this issue. After some testing, I came to the following conclusion:
What do you suggest doing to fix this? |
Hello @ahmedkrmn thanks for your interest! 😁 @ManyTheFish can help you on this when he will have the time :) |
Hello @ahmedkrmn are you sure that deunicoding Arabic script is a good thing to do?
would be deunicoded as
🤔 I can't write Arabic script, so I don't know what should be the good behavior. |
Hello @ManyTheFish, |
Hello @Reex11, I will investigate your case, 🤔 Is there a name of this Thanks for your help 😁 |
Thank you for Investigating this. You need to know this first: Now, this library is calling this process I'll dig around to see if there's anything else to consider. |
Hi again,
I'll lookup for a solution for Waw stopword. And I already have some workarounds in mind. |
Hello @Reex11! Thanks for your help, we have to design or find a specialized normalizer for this. |
Hi @ManyTheFish, I think that there are a lot of cases that are not space-separated. Example: I found a great Arabic NLP library, I think its the best so far. Its called CAMeL tools |
Closed in favor of meilisearch/product#139 |
Related to meilisearch/meilisearch#1331
The text was updated successfully, but these errors were encountered: