Add a setting to disable fuzzy matching in quick open #99171

akbyrd · 2020-06-02T18:22:37Z

Issue Type: Bug

The quick open panel recently changed and since then searching has become extremely problematic for me. It's way too aggressive with fuzzy searching and it returns results in unexpected orders.

Take this image:

There are a number of issues visible.

Results 4 - 13 should not appear before results 14 - 15. Results 14 - 15 contain an entire word match. It makes no sense to prioritize matches with letters randomly matched piecemeal in the middle of words.
Results 4 - 13 and 16 - 24 should not exist at all. These matches are based on completely random letters in the middle of words. This is not at all the common way to search for files and it adds a huge amount of noise. Of the results in the list, 5 out of 24 results actually make reasonable sense. Matching at the beginning of words for snake case, camel case, and paths makes sense, but not individual letters in the middle of words.

Here's another example:

Results 11 - 12 are clearly the best matches and should be at the top.
None of the other results make sense and should not exist.

Compare these two searches:

Notice that adding a letter, which should only be able to restrict the search further, adds an entirely new result. Adding t to the end of "mult" adds result 11.
Also notice that most of these results are double matching the same letters. In result 3 the l from "mul" and the l from "lifetime" are matching to the same letter in the result.

VS Code version: Code - Insiders 1.46.0-insider (2c1871d, 2020-06-01T10:34:11.096Z)
OS version: Windows_NT x64 10.0.18363

The text was updated successfully, but these errors were encountered:

bpasero · 2020-06-03T06:01:05Z

@akbyrd the thing we did change is to run each set of characters that are separated by spaces on the full path. In other words, you can do multiple searches on the same path separating queries by space. If you simply don't use space, it should work as before.

akbyrd · 2020-06-03T06:12:02Z

This seems unnecessarily dismissive.

The "multiple queries" change does explain the very last issue I mention: "results are double matching the same letters". But that doesn't address any of the other problems.

What additional info is required?

bpasero · 2020-06-03T06:21:06Z

@akbyrd no further info, please see #27317 for related issues. I think specifically #25925 applies.

akbyrd · 2020-06-03T16:40:35Z

That issue covers the sorting. I don't see anything in the meta-task that addresses the largest problem here: matching random letters in the middle of words. This is fundamentally what makes my search results so bad. Humans don't write search queries trying to pick random letters out of the middle of words. Just look at the images, the majority of results in every one are complete nonsense.

In the first image there are 24 results. Of those 5 actually make sense.
In the second image there are 15 results. Of those 2 are reasonable.
In the third image: 11 results, 2 are useful based on the query.

This is something I have to deal with every single day, hundreds of times a day. It's an absolute nuisance just to get to the file I want. In a large project it's borderline unusable. Please, please take this seriously. I don't want this to go on a backlog and die quietly.

An option to match words as a unit instead of letters would be sufficient.

bpasero · 2020-06-04T06:00:05Z

I added this issue to the top of #27317, which is meant as a resource to find all the issues reported, even if closed, because there are often very good examples (as in this issue) posted.

the largest problem here: matching random letters in the middle of words

To clarify, you would prefer if letters are not matched in the middle of words? Can you make a suggestion how it should behave? Do you think there are too many random results?

akbyrd · 2020-06-04T17:46:03Z

Thanks!

To clarify, you would prefer if letters are not matched in the middle of words?

Correct.

Do you think there are too many random results?

Absolutely, yes. Frequently my queries result in 80% unhelpful matches. I don't mean that as hyperbole, that's the math on the examples above which are representative of my common experience.

Can you make a suggestion how it should behave?

Sure. I'll use the query in the first image: m life. In this query life should only match in two conditions: the consecutive letters (e.g. lifetime_job.cpp) or letters on word boundaries (e.g. land_is_far_enough.cpp). Consecutive letters should rank higher than word boundaries.

It should not match on arbitrary characters in the middle of words (e.g multiplayer_client_idl.files). I can't speak for all users, but I have a difficult time believing that people intentionally search for individual letters in the middle of words or would reasonably expect multiplayer_client_idl.files to be a result for m life. Even if there are people who legitimately do this, there is no way it's the common case.

The other undesirable behavior is matching the same characters more than once. If I have a file named green grass.cpp and I search for gr gr that file should be at the top of the list. It's a 'perfect' match. If I have a file named green.cpp I don't expect that to be a match at all. Why would I have typed the second gr if I didn't expect it to match something new?

In the current implementation, both files appear to be be ranked equally, so not only will green.cpp appear when it should, it will be the top result if it comes first in the workspace.

the thing we did change is to run each set of characters that are separated by spaces on the full path. In other words, you can do multiple searches on the same path separating queries by space. If you simply don't use space, it should work as before.

I don't really understand what this means. I guess you run multiple queries then AND the results? I guess that gives us order-independence in the query? If so, I don't even want order independence. I typed my query in an intentional order and that carries information I don't want to be lost. I should know whether the path or filename will be matched first and be able to adjust my query based on whether I'm looking for a file that starts with "multiplayer" or a file in the "multiplayer" folder.

I may not understand the full use case here, but I can confidently say it causes problems and isn't something users should have to 'workaround' for the common case. This needs to be opt-in through different syntax.

To recap, I propose 3 changes that will vastly improve the search experience in my case and, I believe, most cases:

Discrete words in the query should match consecutive letters or letters on word boundaries.
Discrete words in the query should not match individual, non-consecutive letters (unless on word boundaries).
The query should be run as a single logical query, not multiple queries to avoid matching the same letter multiple times.

bpasero · 2020-06-05T05:43:59Z

We had a lot of users asking for the support to type a folder name after the file name separated by space to further tweak the search results with additional queries. So dropping support for doing multiple queries that are separated by space is not an option. And I do not understand why you have to add the space in the first place, you don't have to. If you simply type all the letters in one sequence we will match them on the entire string.

As for showing too many irrelevant results, that is part of the "fuzzy" matching logic. To accommodate typing errors we rather want to show more results than none.

I think to move forward this would probably require a setting to not regress the experience for users that like the fuzzy matching.

akbyrd · 2020-06-05T06:52:14Z

type a folder name after the file name separated by space to further tweak the search results with additional queries.

Sure, the search should work on the full path, I'm not advocating against that. There's no fundamental reason that needs to result in double matching the same characters.

And I do not understand why you have to add the space in the first place, you don't have to.

That's the natural, intuitive thing to do. When I want to open my_favorite_file.cpp my instinct is not to type myfavoritefile.

As for showing too many irrelevant results, that is part of the "fuzzy" matching logic. To accommodate typing errors we rather want to show more results than none.

This makes sense with reasonable constraints. If I miss 1 or 2 letters, or I type the wrong letter, sure give me a few sensible results. But matching every single character in isolation is not reasonable.

When literally 80% of my results are noise that would not be expected by any reasonable person that's objectively bad.

I think to move forward this would probably require a setting to not regress the experience for users that like the fuzzy matching.

In general, fuzzy matching is great. My desire isn't to disable it entirely, just to rein it in to where it yields more useful results. But I'll settle for being able to disable it if that's the only path forward.

roblourens assigned bpasero Jun 2, 2020

bpasero added the info-needed Issue requires more information from poster label Jun 3, 2020

bpasero added the *duplicate Issue identified as a duplicate of another issue(s) label Jun 3, 2020

bpasero mentioned this issue Jun 3, 2020

Improve ranking of elements in quick open #27317

Open

64 tasks

bpasero changed the title ~~Quick open search regressions~~ Add a setting to disable fuzzy matching in quick open Jun 5, 2020

bpasero added feature-request Request for new features or functionality quick-pick Quick-pick widget issues and removed *duplicate Issue identified as a duplicate of another issue(s) info-needed Issue requires more information from poster labels Jun 5, 2020

bpasero added this to the Backlog Candidates milestone Jun 5, 2020

bpasero removed their assignment Jun 5, 2020

bpasero reopened this Jun 5, 2020

bpasero added a commit that referenced this issue Aug 28, 2020

fuzzy score - add test for #99171

787ba75

github-actions bot locked and limited conversation to collaborators Sep 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a setting to disable fuzzy matching in quick open #99171

Add a setting to disable fuzzy matching in quick open #99171

akbyrd commented Jun 2, 2020 •

edited

Loading

bpasero commented Jun 3, 2020

akbyrd commented Jun 3, 2020

bpasero commented Jun 3, 2020

akbyrd commented Jun 3, 2020

bpasero commented Jun 4, 2020

akbyrd commented Jun 4, 2020

bpasero commented Jun 5, 2020

akbyrd commented Jun 5, 2020

Add a setting to disable fuzzy matching in quick open #99171

Add a setting to disable fuzzy matching in quick open #99171

Comments

akbyrd commented Jun 2, 2020 • edited Loading

bpasero commented Jun 3, 2020

akbyrd commented Jun 3, 2020

bpasero commented Jun 3, 2020

akbyrd commented Jun 3, 2020

bpasero commented Jun 4, 2020

akbyrd commented Jun 4, 2020

bpasero commented Jun 5, 2020

akbyrd commented Jun 5, 2020

akbyrd commented Jun 2, 2020 •

edited

Loading