-
-
Notifications
You must be signed in to change notification settings - Fork 21.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve approximate/fuzzy string matching in quick open dialog search #82200
base: master
Are you sure you want to change the base?
Conversation
Would it be possible to expose the fuzzy search in the API so it would be possible to use in GDScript? |
Yep totally doable. |
I wanted to do exactly this but tree view items don’t allow individual character background or font colors I explored the idea of making the tree items all rich text views, but abandoned. We loose too much nice behaviors the default tree view items give us. I also explored adding ascii controls characters to the text views. So like how console terminals can print colors/italics by printing special chars hidden to the user but tells the terminal to start printing all text as red from now or whatever. But this kinda breaks the contribution guidelines as I’d be altering core components to achieve something far away (in terms of the arch diagram distance).
Update on the last sentence. @djrain figured out a way to property highlight the matches and going with this solution. |
e6a8aae
to
f664725
Compare
I wanted to add this functionality as well and found this PR. I've rebased it to the latest master and made some changes here https://github.com/a-johnston/godot/tree/fuzzy-search if you'd care to incorporate or consider them. Or if it's easier I could commandeer/make a new pr. High level changes from this branch are
I tested the functionality of the changes in a project containing 1400+ files and at least for my project and queries I seemed to get overall high quality results, although I'm sure the various magic numbers could be tweaked further. A missing A typo where the Multiple terms can still be included (although unlike the example in this pr, they match in order): |
I tested the same 1400+ file project on a fairly underpowered linux laptop (i5-7200U) and still encountered no issues with allowing more results into the sorting step. I also ran into a case where a suboptimal match was being selected and scored (it still showed up in the results but not as high as I'd like) so I may try to extend the matcher to return the optimal match rather than the greedy one and then re-verify performance. |
@a-johnston There's also another PR on going to reinvent the quick search dialog: #56772. I've been meaning to reach out to it's creator and ask could we merge our changes. Just forked and tested your algorithm. In my opinion it's noticeably slower but it does seem to give better matches some of the time. Let me set up a suite of tests cases to benchmark for accuracy, speed and misspelling correction just so we can be scientists here and pick the optimum and also help out anyone who wants to modify or try a new algo in the future. Also there seems to be some weird bug where the quick search is showing the result, dropping it and then showing it again. Did you notice that? It's even in my orignal branch after rebasing on master. |
Ah I hadn't seen that pr. I especially like the idea of adding new behavior controls and new editor settings; it seems worthwhile also adding an option for fuzzy vs exact matching. I also haven't noticed the quick search bug you mention; does it happen every time you change the query or just when the dialogue initially opens? As far as performance goes, I'm not surprised if it feels slower considering it is sorting all results above the cutoff rather than the first N. I wanted to output the time to filter and graph the score distribution for some of the queries in order to be more guided about setting scoring and filtering criteria but for some, probably silly, reason I couldn't get any new debug output to show up. I wouldn't be surprised if changes to the scoring and threshold could substantially speed it up without much degradation in quality. There's also the option to heapify and pop up to the number of max results times to avoid sorting the presumably long tail of worse results (especially for short queries). I started to do that earlier but I wasn't sure the best way to do so, and it seemed already fast enough on my systems/projects. Definitely room for improvement. |
I'll post back here when I have some data. |
Turns out SortArray already had what I was hoping to do so I've updated my branch to use |
Cool, will make sure to grab that change in my tests. |
b8e5ea4
to
7edf8de
Compare
Here's the results on a 10k file project I have. Its dir tree is part of the unit test data now. Godot was compiled with Overall the new fzf algorithm is actually a bit faster. Especially with the short query optimization.
But it's not giving (in my opinion) results a user would expect when searching with multiple query tokens. For example, 4 - 9 in the comparison table above . |
Thanks for putting that benchmark together! I'm not surprised those queries do poorly since it expects the tokens to be in order, so I would expect it to improve for edit- I just noticed it looks like a few string unit tests I added and the partial_sort use didn't make it in when you cherry picked. If it's easier, I can open a pr against your branch to collaborate |
auto dataset_path = line[1]; | ||
auto expected_result = line[2]; | ||
|
||
bench(query, dataset_path, expected_result, "fzf"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the test runner accepts optional arguments to only run certain files; would it be possible to keep the benchmark as a standalone test that is disabled by default, and then a separate test which verifies that an obvious query match is the top result?
@a-johnston Yeah I should of mentioned that, sorry. I didn't merge those because I noticed the issue of out of order queries and wanted to address that first. Yeah please open a PR against my branch so we can keep the tests and stuff. Not saying this is the best way but I had the same issue with the algorithm I used and addressed it by scoring parts of the path individually and then summing the score but heavily weighting the last part of the path. This was inspired by visual studio code's approach.
Yeah that was my exact idea too. We can leave the benchmark off by default but add a bunch of smaller tests to verify the search is working. |
@a-johnston I have an idea to try tweak your algorithm to better score out of sequence tokens. Are you working on anything or should I try? |
Feel free to give it a shot. I do have some stuff in progress and thoughts about what approaches might be best but was sorta distracted from this the last few days. I'm leaving for a trip Friday morning so I'm hoping to have a shareable commit tonight or tomorrow. |
@samsface I finally updated my branch and if I have time tonight I'll rebase it onto yours and pr it there. I ended up changing almost all of how it works for better or worse; it now considers multiple subsequences and only matches the best one which does not conflict with prior token matches, so I removed the fzf inspired back-then-forward search. I also removed the special case for short queries and it didn't seem to affect much. |
I really liked the short query optimization. For me, it made the first few keystrokes feel really responsive. |
It would be interesting to see what difference it makes in the benchmark. Originally it wasn't added for performance but because short queries were more likely to have low relevance subsequence matches later in the string. Since the current implementation always does one linear scan of the target string, it no longer helps with relevance. It might help allow a full scan to be skipped in favor of a partial scan, but it does have the downside that a target it doesn't match ends up being searched twice. In any case, that part of the most recent commit can be reverted. |
PR to achieve what this issue is asking for:
godotengine/godot-proposals#7771
This PR modifies the editor quick open dialog to use a fuzzy search algorithm inspired by the fuzzy file search in Visual Studio Code.
It has a few improvements for the user: