Add fuzzy string matching to quick open search #98278

a-johnston · 2024-10-17T20:20:07Z

This is a continuation of #82200 to implement godotengine/godot-proposals#7771. Rebasing the first PR against #56772 was pretty awkward so I ended up flattening the earlier changes with @samsface as a co-author. A bit more info and pre-flattened commits are available in this pr against sam's fork samsface#1.

Functionally this adds optional fuzzy searching and search highlighting in the results:

The highlighting on the grid items feels a bit hacky due to the centered text; I'd appreciate any tips to do it more cleanly. This is also my first time working on a nontrivial c++ codebase so I'd appreciate any general tips to be more idiomatic or if I'm using anything incorrectly. Also a few semi-related changes:

Fixes a bug where, if adaptive isn't used, on first launch neither result container will be visible. Now it uses project metadata to track the last mode, defaulting to list.
Grid items are slightly wider, showing 6 per row with the default window size, in order to show more of the filename.

.pre-commit-config.yaml

samsface · 2024-10-18T09:43:46Z

@a-johnston How does this new algorithm work? There seems to be a lot more to it than the first two approaches. But it works pretty well. Opened a small PR to fix some C++ gotchas but whole thing looks good to me (though didn't look at any of the Dialog code).

Also, is there a way to not match "res://" ?

Is it correct that we rank HAT.wav first here?

a-johnston · 2024-10-18T17:20:34Z

@samsface thanks for testing it out! The matching process is similar to the earlier ones, although there are definitely more helpers etc cluttering it up haha. It essentially is doing:

Match each token starting from 0
If a match is found, test to see if that match overlaps any previously matched sections and reject it if so
Otherwise, score the new match and potentially update the top scoring match
In either case of a match, move the offset from 0 to start + 1 and try again. If no match, quit

Scoring then prioritizes (in rough order of importance):

Exact token matches without breaks
Longer matched sections
Fewer missed query characters
Filename match
Word boundary match

Is it correct that we rank HAT.wav first here?

Results with equal scores are tie-broken first on length and then on their alphanumeric order, which is why HAT.wav is ordered before hat.wav -- I'll add a slight score deduction if the match relies on being case insensitive to break that in favor of the exact match.

Also, is there a way to not match "res://" ?

I hadn't even noticed that! Yeah I can add a way to the fuzzy search code to skip a given prefix, or we can change that part of the updated quick open popup to not include it to begin with.

Also thanks for mentioning the PR; I didn't get an email or notification about it so I would've missed it otherwise. TIL!

a-johnston · 2024-10-18T18:31:10Z

I just updated it to 1) add a slight tie-breaker penalty if it relies on case insensitive matching and 2) ignore res:// by setting a new starting offset value. Ironically, similar to last time, I'll be out of town this weekend so slow updates from me for a bit.

core/templates/sort_array.h

doc/classes/EditorSettings.xml

scene/gui/dialogs.cpp

editor/editor_settings.cpp

core/string/fuzzy_search.h

KoBeWi · 2024-10-18T20:52:01Z

I wonder if FuzzySearch can be used for autocompletion 🤔 We used to have something similar, but it was reverted.
Although for that, it can't be coupled to QuickOpenDialog.

core/string/fuzzy_search.h

core/string/fuzzy_search.cpp

core/string/ustring.h

core/string/fuzzy_search.cpp

btarg · 2024-10-24T17:19:07Z

Will this apply to the Command Palette too? I often find myself mis-spelling things or using alternative words in tools like VS Code and Rider, and still getting to the same result - but I have to type everything exactly in Godot. For instance, typing "options" for opening settings, and being allowed to spell it wrong

a-johnston · 2024-10-24T17:27:53Z

@btarg not as part of this PR (it's already too big) but the fuzzy searching logic was intentionally kept separate so that it could be used elsewhere, just as a part of a different PR.

core/string/fuzzy_search.cpp

editor/gui/editor_quick_open_dialog.cpp

editor/gui/editor_quick_open_dialog.h

core/string/fuzzy_search.cpp

core/string/fuzzy_search.h

tests/core/string/test_fuzzy_search.h

bruvzg · 2024-10-25T13:39:58Z

editor/gui/editor_quick_open_dialog.cpp

+	Ref<Font> font = get_theme_font(SceneStringName(font));
+	int font_size = get_theme_font_size(SceneStringName(font_size));
+	Vector2i prefix = font->get_string_size(get_text().substr(0, p_substr.x), HORIZONTAL_ALIGNMENT_LEFT, -1, font_size);
+	prefix.y = 0;
+	Vector2i size = font->get_string_size(get_text().substr(p_substr.x, p_substr.y), HORIZONTAL_ALIGNMENT_LEFT, -1, font_size);
+	return { prefix, size };


This won't work with right-to-left or mixed direction text. Also, a substring is not necessarily continuous and can consist of multiple rects.

TextServer.shaped_text_get_selection should be used instead.

Since it's a class derived from Label, it probably should use existing text_rid (currently private, but getter can be added).

// Assuming `RID get_text_rid() const { return text_rid; }` was added to `Label`. // You can use something like this to draw rects (can be done directly from NOTIFICATION_DRAW or cached). void draw_substr_rect(const RID &p_ci, const Vector2 &p_osf, const Vector2i &p_substr, const Color &p_color) { Vector<Vector2> ranges = TS->shaped_text_get_selection(get_text_rid(), p_substr.x, p_substr.y - p_substr.x); // Get horizontal ranges (start, end offsets for each). Size2 line_size = TS->shaped_text_get_size(get_text_rid()); // Get string size (for rect height). for (int i = 0; i < ranges.size(); i++) { Rect2 rect = Rect2(ranges[i].x + p_ofs.x, p_ofs.y, ranges[i].y - ranges[i].x, line_size.y); RenderingServer::get_singleton()->canvas_item_add_rect(p_ci, rect, p_color); // Draw rect. } }

Note: this code will only work for a single line Label (which should be fine for this case), for multiple lines the same can be done using lines_rid vector from the Label (just do the same for each).

Thanks! I realized late last night I had entirely neglected RTL..

bruvzg · 2024-10-25T13:57:36Z

editor/gui/editor_quick_open_dialog.cpp

+			return;
+		}
+
+		Vector2i offset = get_character_bounds(0).position;


This also won't work, character 0 is not necessarily first on the left. You can copy code for Label draw to get initial offset instead (or extract it as a subfunction in the Label, I think it's already used at least twice).

Just to make sure I'm not totally missing something, you're talking about this code https://github.com/godotengine/godot/blob/master/scene/gui/label.cpp#L475-L534? I'm not super happy with my current approach for extracting the necessary part of it which you can imagine the implementations for:

struct DrawParameters { Vector2 offset; int first_line = 0; int last_line = 0; int line_spacing = 0; }; RID get_line_rid(int p_line) const; Rect2 get_line_rect(int p_line) const; Label::DrawParameters get_draw_parameters() const

would it be preferable to just recompute the total height, line range, vsep? or did you have something else entirely in mind?

I never liked this approach of drawing boxes on labels. I did it this way originally because I was trying to touch the quick open dialog code as little as possible.

But seeing as big changes are being made to the dialog. Maybe an idea is drop this approach and instead only use richtext labels in the dialog. That way you just have to build a string instead of calculating finicky offsets and such. The performance of rich text is pretty good now too.

Previously I thought that bbcode wouldn't be suitable but yeah turns out I missed a few options on RichTextLabel. I'm just going to change this to use bbcode and avoid this.

ed: ehh I'm waffling more about it. I personally prefer the appearance of the rects compared to RichTextLabel's bgcolor tag and RichTextLabel doesn't seem to have overrun handling.

OK well it's in the latest commit. I don't know if the label refactor is totally appropriate for this pr, although it does seem like an overall useful refactor now that I have a better understanding of the code there. I haven't tested it in any multi-line or rtl applications yet though.

The layout code in label is imo on the subtle side; I might go through and add a few comments but I'm also not entirely sure I didn't introduce any issues in this refactor. At least the editor looked the same.

I was more thinking to use just the font color like visual studio code:

I get that it mightn't be as cute but I think (though could be wrong) bbcode might be a lot simpler to maintain in the long run. Considering issues like theming, BiDi fonts, sight impaired users that need high contrast.

Yeah, I agree that's the trade-off. I have a partially complete bbcode version of this branch which I can finish to compare but I'm also curious what @bruvzg thinks of the label changes. The bbcode approach would definitely be fewer loc changed.

samsface

Awesome stuff @a-johnston. The algorithm works super well.

There's one more optimization I'd like to apply (but in another PR) that in my testing cut most of the benchmarks by 50%. A cull during the search that discards results early if their score is considerably less than the current max. I noticed we spend a ton of time adding then removing results that had like, a score of 3 while the max sore is 100.

a-johnston · 2024-10-25T16:45:22Z

There's one more optimization I'd like to apply (but in another PR) that in my testing cut most of the benchmarks by 50%. A cull during the search that discards results early if their score is considerably less than the current max. I noticed we spend a ton of time adding then removing results that had like, a score of 3 while the max sore is 100.

Ah yeah I was thinking about something similar but instead considering the future potential score a la an advent of code problem from last(?) year. If you have the branch ready, feel free to pr against https://github.com/a-johnston/godot/tree/fuzzy-search-rebase

Co-authored-by: A Thousand Ships <96648715+AThousandShips@users.noreply.github.com> Co-authored-by: Tomek <kobewi4e@gmail.com>

samsface · 2024-10-25T18:58:27Z

There's one more optimization I'd like to apply (but in another PR) that in my testing cut most of the benchmarks by 50%. A cull during the search that discards results early if their score is considerably less than the current max. I noticed we spend a ton of time adding then removing results that had like, a score of 3 while the max sore is 100.

Ah yeah I was thinking about something similar but instead considering the future potential score a la an advent of code problem from last(?) year. If you have the branch ready, feel free to pr against https://github.com/a-johnston/godot/tree/fuzzy-search-rebase

It's already fast enough (imo) so wouldn't want to delay the PR. Let me know if you need help with any coding work to get the PR merged.

a-johnston · 2024-10-25T19:35:55Z

It's already fast enough (imo) so wouldn't want to delay the PR. Let me know if you need help with any coding work to get the PR merged.

Haha well you piqued my curiosity so I wanted to try it before nuking the benchmark. I settled on a pretty tame early cull criteria of being 1) negative (so, bad and missing query characters) and 2) at least 50 away from the current max score because I noticed that although the top results were of course preserved, the combination of match length being squared and a 100 bonus being given to exact token matches resulted in a query of ie spawner having a huge delta between spawn.tscn and spawner.tscn although to me the former still seems very relevant as a secondary result. From logging the early/late cull counts, this seems to have the most pronounced effect early when typing the query, when many more results are "valid" but negative, but later in the query most targets filter out with invalid matches. In the benchmark with this early cull criteria I saw a ~5% speedup for a synthetic 100k case and ~3% speedup for a 1k case, although in terms of the delta and not ratio, the 100k case had an average delta of ~3ms and the 1k case never had a delta greater than a few percent of a ms. That said, the benchmark is pretty biased due to the way I pruned down your tree to 1k lines.

Although the changes are pretty small I won't include them here but yeah for a future pr that addresses this

Keep in mind the nonlinear score bonuses and secondary results, not just the top result matching the benchmark's expected result
I've been changing the list item's name to name->set_text(p_candidate.file_path.get_file() + vformat(" (%d)", p_candidate.result->score)); to get a better intuition for the relative scores of displayed results

…se layout data to subclass

Resolved

a-johnston requested review from a team as code owners October 17, 2024 20:20

KoBeWi added enhancement topic:editor usability labels Oct 17, 2024

KoBeWi removed request for a team October 17, 2024 20:27

KoBeWi added this to the 4.4 milestone Oct 17, 2024

KoBeWi requested a review from a team October 17, 2024 20:28

KoBeWi reviewed Oct 17, 2024

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved