Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Pinyin normalizer #135

Closed
ManyTheFish opened this issue Sep 14, 2022 · 0 comments · Fixed by #143
Closed

Implement Pinyin normalizer #135

ManyTheFish opened this issue Sep 14, 2022 · 0 comments · Fixed by #143
Labels
good first issue Good for newcomers

Comments

@ManyTheFish
Copy link
Member

ManyTheFish commented Sep 14, 2022

Today Meilisearch normalizes Chinese characters by converting traditional characters into simplified ones.

drawback

This normalization process doesn't seem to enhance the recall of Meilisearch.

enhancement

Following the official discussion about Chinese support in Meilisearch, it is more relevant to normalize Chinese characters by transliterating them into a Phonological version.
In order to have accurate phonology for Mandarin, we should normalize Chinese characters into Pinyin using the pinyin crates.

Files expected to be modified

Misc

related to product#503

Hey! 👋
Before starting any implementation, make sure that you read the CONTRIBUTING.md file.
In addition to the recurrent rules, you can find some guides to easily implement a Segmenter or a Normalizer.
Thanks a lot for your Contribution! 🤝

@ManyTheFish ManyTheFish changed the title Implement an Hanyu Pinyin normalizer Implement Hanyu Pinyin normalizer Sep 14, 2022
@ManyTheFish ManyTheFish changed the title Implement Hanyu Pinyin normalizer Implement Pinyin normalizer Sep 14, 2022
@curquiza curquiza transferred this issue from meilisearch/engine-team Sep 29, 2022
@bors bors bot closed this as completed in 1244b9d Oct 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant