-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trimmer #12
Comments
I've made a sample page for this. Using Spanish, if you search for We've discussed a little bit about this in manastech/middleman-search#23 (that's were the example comes from), and I think this should be solved by lunr-languages rather than the user having to load lunr.unicodeNormalizer by itself. If lunr-languages loads lunr.unicodeNormalizer or if it does a different thing, I'm not sure. But if I'm enabling spanish full-text search, I definitely want accented words to yield the exact same results than a non-accented version of the word. I can totally try to fix lunr-languages if you give me some pointers about how to do it. It's just that I'm not sure where/how should I do it. I'm pretty much sure @eemi wants to know about this issue. |
about handling accent, see fortnightlabs/snowball-js#2 |
and back to snowballstem/snowball#55 |
Hi, any news about that issue? I'm currently working on an offline & multi-language search client with pouchdb-quick-search and I face the same limitations.
I completely agree with @matiasgarciaisaia. Right now, the only workaround I can think of would be to strip all the diacritical mark before indexing the data. |
When doing
lunr.trimmer
is removed from the pipeline, making words including punctation and the like to enter the index. E.g., both"word."
and"word"
will enter the index.Adding
lunr.trimmer
to the pipeline manually is not really a good solution, aslunr.trimmer
uses\W
to match non word characters (regexp unicode only supported as of ES6).A solution could be to normalize characters like
æøå
->aoa
, like done here: https://github.com/cvan/lunr-unicode-normalizer/blob/master/lunr.unicodeNormalizer.jsThoughts?
The text was updated successfully, but these errors were encountered: