Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fuzzy string matching to quick open search #98278

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

a-johnston
Copy link

@a-johnston a-johnston commented Oct 17, 2024

This is a continuation of #82200 to implement godotengine/godot-proposals#7771. Rebasing the first PR against #56772 was pretty awkward so I ended up flattening the earlier changes with @samsface as a co-author. A bit more info and pre-flattened commits are available in this pr against sam's fork samsface#1.

Functionally this adds optional fuzzy searching and search highlighting in the results:

Screenshot 2024-10-24 at 11 01 48 AM Screenshot 2024-10-24 at 11 01 38 AM Screenshot 2024-10-17 at 1 57 04 AM

The highlighting on the grid items feels a bit hacky due to the centered text; I'd appreciate any tips to do it more cleanly. This is also my first time working on a nontrivial c++ codebase so I'd appreciate any general tips to be more idiomatic or if I'm using anything incorrectly. Also a few semi-related changes:

  • Fixes a bug where, if adaptive isn't used, on first launch neither result container will be visible. Now it uses project metadata to track the last mode, defaulting to list.
  • Grid items are slightly wider, showing 6 per row with the default window size, in order to show more of the filename.

@a-johnston a-johnston requested review from a team as code owners October 17, 2024 20:20
@KoBeWi KoBeWi removed request for a team October 17, 2024 20:27
@KoBeWi KoBeWi added this to the 4.4 milestone Oct 17, 2024
@KoBeWi KoBeWi requested a review from a team October 17, 2024 20:28
.pre-commit-config.yaml Outdated Show resolved Hide resolved
@samsface
Copy link
Contributor

samsface commented Oct 18, 2024

@a-johnston How does this new algorithm work? There seems to be a lot more to it than the first two approaches. But it works pretty well. Opened a small PR to fix some C++ gotchas but whole thing looks good to me (though didn't look at any of the Dialog code).

Also, is there a way to not match "res://" ?
image

Is it correct that we rank HAT.wav first here?
image

@a-johnston
Copy link
Author

@samsface thanks for testing it out! The matching process is similar to the earlier ones, although there are definitely more helpers etc cluttering it up haha. It essentially is doing:

  • Match each token starting from 0
  • If a match is found, test to see if that match overlaps any previously matched sections and reject it if so
  • Otherwise, score the new match and potentially update the top scoring match
  • In either case of a match, move the offset from 0 to start + 1 and try again. If no match, quit

Scoring then prioritizes (in rough order of importance):

  • Exact token matches without breaks
  • Longer matched sections
  • Fewer missed query characters
  • Filename match
  • Word boundary match

Is it correct that we rank HAT.wav first here?

Results with equal scores are tie-broken first on length and then on their alphanumeric order, which is why HAT.wav is ordered before hat.wav -- I'll add a slight score deduction if the match relies on being case insensitive to break that in favor of the exact match.

Also, is there a way to not match "res://" ?

I hadn't even noticed that! Yeah I can add a way to the fuzzy search code to skip a given prefix, or we can change that part of the updated quick open popup to not include it to begin with.

Also thanks for mentioning the PR; I didn't get an email or notification about it so I would've missed it otherwise. TIL!

@a-johnston
Copy link
Author

I just updated it to 1) add a slight tie-breaker penalty if it relies on case insensitive matching and 2) ignore res:// by setting a new starting offset value. Ironically, similar to last time, I'll be out of town this weekend so slow updates from me for a bit.

core/templates/sort_array.h Outdated Show resolved Hide resolved
scene/gui/dialogs.cpp Outdated Show resolved Hide resolved
editor/editor_settings.cpp Outdated Show resolved Hide resolved
core/string/fuzzy_search.h Outdated Show resolved Hide resolved
core/string/fuzzy_search.h Outdated Show resolved Hide resolved
core/string/fuzzy_search.h Outdated Show resolved Hide resolved
@KoBeWi
Copy link
Member

KoBeWi commented Oct 18, 2024

I wonder if FuzzySearch can be used for autocompletion 🤔 We used to have something similar, but it was reverted.
Although for that, it can't be coupled to QuickOpenDialog.

core/string/fuzzy_search.h Outdated Show resolved Hide resolved
core/string/fuzzy_search.h Outdated Show resolved Hide resolved
core/string/ustring.h Outdated Show resolved Hide resolved
@btarg
Copy link

btarg commented Oct 24, 2024

Will this apply to the Command Palette too? I often find myself mis-spelling things or using alternative words in tools like VS Code and Rider, and still getting to the same result - but I have to type everything exactly in Godot. For instance, typing "options" for opening settings, and being allowed to spell it wrong

@a-johnston
Copy link
Author

@btarg not as part of this PR (it's already too big) but the fuzzy searching logic was intentionally kept separate so that it could be used elsewhere, just as a part of a different PR.

core/string/fuzzy_search.cpp Outdated Show resolved Hide resolved
core/string/fuzzy_search.cpp Outdated Show resolved Hide resolved
core/string/fuzzy_search.cpp Outdated Show resolved Hide resolved
core/string/fuzzy_search.cpp Outdated Show resolved Hide resolved
core/string/fuzzy_search.h Outdated Show resolved Hide resolved
tests/core/string/test_fuzzy_search.h Outdated Show resolved Hide resolved
tests/core/string/test_fuzzy_search.h Outdated Show resolved Hide resolved
tests/core/string/test_fuzzy_search.h Outdated Show resolved Hide resolved
tests/core/string/test_fuzzy_search.h Outdated Show resolved Hide resolved
tests/core/string/test_fuzzy_search.h Outdated Show resolved Hide resolved
Comment on lines 50 to 55
Ref<Font> font = get_theme_font(SceneStringName(font));
int font_size = get_theme_font_size(SceneStringName(font_size));
Vector2i prefix = font->get_string_size(get_text().substr(0, p_substr.x), HORIZONTAL_ALIGNMENT_LEFT, -1, font_size);
prefix.y = 0;
Vector2i size = font->get_string_size(get_text().substr(p_substr.x, p_substr.y), HORIZONTAL_ALIGNMENT_LEFT, -1, font_size);
return { prefix, size };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work with right-to-left or mixed direction text. Also, a substring is not necessarily continuous and can consist of multiple rects.

TextServer.shaped_text_get_selection should be used instead.

Copy link
Member

@bruvzg bruvzg Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's a class derived from Label, it probably should use existing text_rid (currently private, but getter can be added).

// Assuming `RID get_text_rid() const { return text_rid; }` was added to `Label`.
// You can use something like this to draw rects (can be done directly from NOTIFICATION_DRAW or cached).

void draw_substr_rect(const RID &p_ci, const Vector2 &p_osf, const Vector2i &p_substr, const Color &p_color) {
	Vector<Vector2> ranges = TS->shaped_text_get_selection(get_text_rid(), p_substr.x, p_substr.y - p_substr.x); // Get horizontal ranges (start, end offsets for each).
	Size2 line_size = TS->shaped_text_get_size(get_text_rid()); // Get string size (for rect height).
	for (int i = 0; i < ranges.size(); i++) {
		Rect2 rect = Rect2(ranges[i].x + p_ofs.x, p_ofs.y, ranges[i].y - ranges[i].x, line_size.y);
		RenderingServer::get_singleton()->canvas_item_add_rect(p_ci, rect, p_color); // Draw rect.
	}
}

Copy link
Member

@bruvzg bruvzg Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this code will only work for a single line Label (which should be fine for this case), for multiple lines the same can be done using lines_rid vector from the Label (just do the same for each).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I realized late last night I had entirely neglected RTL..

return;
}

Vector2i offset = get_character_bounds(0).position;
Copy link
Member

@bruvzg bruvzg Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also won't work, character 0 is not necessarily first on the left. You can copy code for Label draw to get initial offset instead (or extract it as a subfunction in the Label, I think it's already used at least twice).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I'm not totally missing something, you're talking about this code https://github.com/godotengine/godot/blob/master/scene/gui/label.cpp#L475-L534? I'm not super happy with my current approach for extracting the necessary part of it which you can imagine the implementations for:

struct DrawParameters {
	Vector2 offset;
	int first_line = 0;
	int last_line = 0;
	int line_spacing = 0;
};

RID get_line_rid(int p_line) const;
Rect2 get_line_rect(int p_line) const;
Label::DrawParameters get_draw_parameters() const

would it be preferable to just recompute the total height, line range, vsep? or did you have something else entirely in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never liked this approach of drawing boxes on labels. I did it this way originally because I was trying to touch the quick open dialog code as little as possible.

But seeing as big changes are being made to the dialog. Maybe an idea is drop this approach and instead only use richtext labels in the dialog. That way you just have to build a string instead of calculating finicky offsets and such. The performance of rich text is pretty good now too.

Copy link
Author

@a-johnston a-johnston Oct 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously I thought that bbcode wouldn't be suitable but yeah turns out I missed a few options on RichTextLabel. I'm just going to change this to use bbcode and avoid this.

ed: ehh I'm waffling more about it. I personally prefer the appearance of the rects compared to RichTextLabel's bgcolor tag and RichTextLabel doesn't seem to have overrun handling.

Copy link
Author

@a-johnston a-johnston Oct 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK well it's in the latest commit. I don't know if the label refactor is totally appropriate for this pr, although it does seem like an overall useful refactor now that I have a better understanding of the code there. I haven't tested it in any multi-line or rtl applications yet though.

The layout code in label is imo on the subtle side; I might go through and add a few comments but I'm also not entirely sure I didn't introduce any issues in this refactor. At least the editor looked the same.

Copy link
Contributor

@samsface samsface Oct 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was more thinking to use just the font color like visual studio code:
image

I get that it mightn't be as cute but I think (though could be wrong) bbcode might be a lot simpler to maintain in the long run. Considering issues like theming, BiDi fonts, sight impaired users that need high contrast.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that's the trade-off. I have a partially complete bbcode version of this branch which I can finish to compare but I'm also curious what @bruvzg thinks of the label changes. The bbcode approach would definitely be fewer loc changed.

Copy link
Contributor

@samsface samsface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome stuff @a-johnston. The algorithm works super well.

There's one more optimization I'd like to apply (but in another PR) that in my testing cut most of the benchmarks by 50%. A cull during the search that discards results early if their score is considerably less than the current max. I noticed we spend a ton of time adding then removing results that had like, a score of 3 while the max sore is 100.

@a-johnston
Copy link
Author

There's one more optimization I'd like to apply (but in another PR) that in my testing cut most of the benchmarks by 50%. A cull during the search that discards results early if their score is considerably less than the current max. I noticed we spend a ton of time adding then removing results that had like, a score of 3 while the max sore is 100.

Ah yeah I was thinking about something similar but instead considering the future potential score a la an advent of code problem from last(?) year. If you have the branch ready, feel free to pr against https://github.com/a-johnston/godot/tree/fuzzy-search-rebase

Co-authored-by: A Thousand Ships <96648715+AThousandShips@users.noreply.github.com>
Co-authored-by: Tomek <kobewi4e@gmail.com>
@samsface
Copy link
Contributor

samsface commented Oct 25, 2024

There's one more optimization I'd like to apply (but in another PR) that in my testing cut most of the benchmarks by 50%. A cull during the search that discards results early if their score is considerably less than the current max. I noticed we spend a ton of time adding then removing results that had like, a score of 3 while the max sore is 100.

Ah yeah I was thinking about something similar but instead considering the future potential score a la an advent of code problem from last(?) year. If you have the branch ready, feel free to pr against https://github.com/a-johnston/godot/tree/fuzzy-search-rebase

It's already fast enough (imo) so wouldn't want to delay the PR. Let me know if you need help with any coding work to get the PR merged.

@a-johnston
Copy link
Author

It's already fast enough (imo) so wouldn't want to delay the PR. Let me know if you need help with any coding work to get the PR merged.

Haha well you piqued my curiosity so I wanted to try it before nuking the benchmark. I settled on a pretty tame early cull criteria of being 1) negative (so, bad and missing query characters) and 2) at least 50 away from the current max score because I noticed that although the top results were of course preserved, the combination of match length being squared and a 100 bonus being given to exact token matches resulted in a query of ie spawner having a huge delta between spawn.tscn and spawner.tscn although to me the former still seems very relevant as a secondary result. From logging the early/late cull counts, this seems to have the most pronounced effect early when typing the query, when many more results are "valid" but negative, but later in the query most targets filter out with invalid matches. In the benchmark with this early cull criteria I saw a ~5% speedup for a synthetic 100k case and ~3% speedup for a 1k case, although in terms of the delta and not ratio, the 100k case had an average delta of ~3ms and the 1k case never had a delta greater than a few percent of a ms. That said, the benchmark is pretty biased due to the way I pruned down your tree to 1k lines.

Although the changes are pretty small I won't include them here but yeah for a future pr that addresses this

  • Keep in mind the nonlinear score bonuses and secondary results, not just the top result matching the benchmark's expected result
  • I've been changing the list item's name to name->set_text(p_candidate.file_path.get_file() + vformat(" (%d)", p_candidate.result->score)); to get a better intuition for the relative scores of displayed results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants