-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: improve fuzzy-matching heuristics #1710
Conversation
Thanks for your contribution! Please make sure to follow our Commit Convention. |
4684119
to
495d5c6
Compare
src/Lean/Data/FuzzyMatching.lean
Outdated
|
||
-- TODO: the following code is assuming all characters are ASCII | ||
for patternIdx in [:pattern.length] do | ||
let patternComplete := patternIdx == pattern.length - 1 | ||
|
||
-- for the IH, it's only necessary to populate a range of length `word.length - pattern.length` at each index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the IH?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inductive hypothesis for showing the correctness of the dynamic program -- I've rephrased it.
ee66ff6
to
ba51b46
Compare
@rish987 Could you rebase against master for benchmarking? Locally it looks like there is some overhead of up to 25%, unfortunately.
|
!bench |
Here are the benchmark results for commit 2bbdd62. Benchmark Metric Change
===================================================
- workspaceSymbols branches 16% (2831.8 σ)
- workspaceSymbols instructions 18% (4354.7 σ)
- workspaceSymbols task-clock 21% (72.0 σ)
- workspaceSymbols wall-clock 21% (71.7 σ) |
On the other hand, this is still fast enough that we can test-drive it and then think about further behavior/performance refinements. The improvements should be worth it. |
At first glance I'm a bit puzzled that it's slower, since I actually made the |
Closes #1546. There were no bugs with the current implementation as far as I could tell, but I noticed three areas that could be improved here:
Lean.AVeryLongNamespace.AnotherVeryLongNamespace.SMap
every miss would incur an additional penalty that would overwhelmingly negate the merit of a perfect match with the main identifier and give it no chance to show up in the top 100 results. So, rather than accumulating penalties for every miss, we apply a penalty once at the start of every consecutive run to indicate how "awkward" of a place to start a match this is. This awkwardness factor increases within a namespace but is reset at the beginning of every new namespace. However, we still do add a constant amount of penalty for every namespace so that we prefer less nested namespaces.LMK what you think of the above heuristics. After some (very brief) testing it looks like it's working great, but if someone could test-drive this and give me feedback it would be much appreciated!