-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
onhold - [file-search] update file-search to prioritize exact and better fuzzy matches #5638
Conversation
@kittaakos please let me know if it addresses your previous issues and is correct implementation wise. |
I see your point. On the other hand, it is nice that we support cancelation and search result limiting, but it produces incorrect results. |
Make sure that this change does not harm performance in some cases. One has to research to figure out why we had such limit in the first place. Maybe search VS Code bugs and implementations as well they should have the same issues. |
I would favor the correct results over performance. Otherwise, the file search is not usable. |
I tend to agree that I'd prefer having correct (or better) results over a minor performance degradation. |
I'll keep this comment to store some vscode issues: |
Based on vscode's implementation, it does not look like they have the |
+1 for merging it. If we hit a performance issue we can introduce a preference for the limit. |
@kittaakos ok sounds good, do you want to test out the PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
For a larger project (Tested with CDT), it is a bit slower, but negligible. The result information is more important here
They have: https://github.com/microsoft/vscode/blob/49c45742b979ee742af2fc778e99ac3af074bff9/src/vs/workbench/contrib/search/browser/openAnythingHandler.ts#L106 Not more than 512. Could you search how they can produce proper results and don't compromise on the performance? |
Thanks! Sure, I'll take a look :) |
6f431ce
to
de813fb
Compare
@akosyakov @kittaakos @lmcbout I updated the code, the limit is no longer removed to get the better result list. |
Great, I am trying it now.
Is there a chance to run the search ( |
I thought of that also, let me try :) |
👍 Nice, it is definitely better than it was.
Is there a chance to do exactly what VS Code does? They're using the same |
de813fb
to
535346b
Compare
Great!
It looks like it's already being tracked #4548, I can see if I can find a way to address it as well. |
@kittaakos is it necessary to fix #4548 in the PR? |
@vince-fugnitto testing with latest commit 535346b, |
The highlighting for the fuzzy matches at the end? |
535346b
to
73262de
Compare
Please tackle highlighting separately. It is quite involving. The issue is that we rely on Monaco for matching on frontend and it works differently to how we match on the backend. We basically should do matching ourselves then. We need new API on file search that provides information about matched indexes. Old API should not be changes since it is used by other clients which are not interested in highlighting. |
} | ||
}, token); | ||
// Perform searches for `exact` and `fuzzy` matches in parallel. | ||
await Promise.all([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to run the same command twice in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial idea was that I wanted to run the exact
matches first to try to fill up the limit.
Exact matches will give us the best possible results from a user standpoint. If there would be remaining space (the limit was not reached), then the remaining space would be dedicated to fuzzy
matches.
The initial problem was that we checked at the same time if a result is an exact
match and if not is it a fuzzy
match leading to the limit being hit with much more fuzzy matches (and never seeing much better exact
matches later on).
After @kittaakos #5638 (comment), I tried to perform these searches in parallel to squeeze out any performance I could.
Please let me know if something can be optimized or if anything needs addressing better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial idea was
It's all clear.
But since the two doFind
runs in asynchronously, how can you ensure that the exact matches will fill the array first? Then comes the rest with the fuzzy match? Perhaps my ripgrep knowledge is limited on this :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But since the two doFind runs in asynchronously, how can you ensure that the exact matches will fill the array first?
I can update the code to go back to using two sets (exactMatches
and fuzzyMatches
) to ensure each search goes to their respective set.
Perhaps my ripgrep knowledge is limited on this...
Mine is as well :(
@kittaakos @akosyakov Do either of you have any ideas? |
Fixes #5636 Fixes an issue where `exact` file results were not being displayed since `fuzzy` matches were added instead. Due to the limit present when searching for files, `exact` matches should be prioritized more while `fuzzy` matches should be used to fill up the result list if necessary. Adjusting the code means that better results are returned, and for an end-user, they get more consistent results in respect to their workspace. Signed-off-by: Vincent Fugnitto <vincent.fugnitto@ericsson.com>
73262de
to
0fbdb6b
Compare
@vince-fugnitto ok, let's go with parallel if you think it is better. Someone needs to study how VS Code does search and then think what can be applied here. |
+1 merging the proper fix only. No need to |
Should I close the PR in favor of a better solution? |
It is up to you. If you leave it open, please put an |
The PR is onhold until a better solution is proposed.
I've received several offline complains about it from users. It seems to be quite annoying bug. |
Do you think we should try and simple fix (like perhaps the PR), and potentially improve it further in the future? |
Fixes #5636
by the file-search due to the limit. The limit meant that if we
ever reached the quota of results allowed some better results
were never to be displayed. Instead, the logic was changed so
that fuzzy matches are sorted by their score (how well they match
the
searchPattern
), then are sent to the front end limited bythe option. This means that all possible exact matches are sent
and the remaining best fuzzy matches along with them until the limit
is reached.
Signed-off-by: Vincent Fugnitto vincent.fugnitto@ericsson.com