-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing results within edit distance in fuzzy search #390
Comments
That's annoying! Thanks for reporting, I'll put together some tests and then figure out how to get a fix out. |
Thanks a lot! Here is one simple test reproducing it: lucaong@24f4223 I can send a PR if you wish, but for now it would only include the test, as I did not get the time to delve into the code and devise a fix yet. |
Thanks, yeah I had something similar as a test too. I've got something that fixes that exact bug, but I'm convinced there are other bugs in that code with a similar fix, I'm just trying to find the right test to expose it. If you look at the changes from the previous diff the fix is to unconditionally add a final frame to the stack. That same pattern is in every other fuzzy operation (insertion, deletion, substitution etc). Indeed, it is the same fix in a different location that fixes what you've reported. Of course, I could be completely wrong, I'm having a hard time trying to wrap my head around that code at the moment! I'll spend a bit more time trying to understand the bug in a wider context, otherwise I'll put out a patch in the next couple of days with the more specific fix to unblock you. |
Hi @olivernn There is one way I could maybe help. When working on this kind of algorithmic code, where it’s easy to overlook corner cases, I often find it useful to have one or two property based tests, where I assert some invariant on many examples of randomly generated data. I could help writing one such test for the fuzzy search if that is useful. I have something like that on my own full text search library, and it helped me discover a number of cases: https://github.com/lucaong/minisearch/blob/master/src/SearchableMap/SearchableMap.test.js#L256 I probably won’t make it this weekend, but I might have some time soon if it’s useful. |
I've just pushed a fix for this in 2.3.6. Thanks again for taking the time to investigate and report the bug! |
Hello,
the issue #375 seems to be still occurring in some cases. Fuzzy search, even with stemming disabled, misses some words within the given edit distance. In particular, it seems to miss terms that are equal to the search term plus a suffix as long as the maximum edit distance.
In other words, searching for
abc~2
correctly matchesabcx
, but missesabcxx
. Searching forabc~3
correctly matchesabcx
andabcxx
, but missesabcxxx
.Here is a fiddle reproducing the issue on Lunr
v3.2.5
(latest release at the moment of posting): https://jsfiddle.net/3ajf72Ly/1/I know that #382 addressed this already, and it did fix some cases, but not all.
This is also visible from the expansion of
lunr.TokenSet.fromFuzzyString("abc", 2).toArray()
, whereabc**
is missing:Thanks again for the great library!
The text was updated successfully, but these errors were encountered: