Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

onhold - [file-search] update file-search to prioritize exact and better fuzzy matches #5638

Closed
wants to merge 1 commit into from

Conversation

vince-fugnitto
Copy link
Member

Fixes #5636

  • fixes an issue where better file results were not being displayed
    by the file-search due to the limit. The limit meant that if we
    ever reached the quota of results allowed some better results
    were never to be displayed. Instead, the logic was changed so
    that fuzzy matches are sorted by their score (how well they match
    the searchPattern), then are sent to the front end limited by
    the option. This means that all possible exact matches are sent
    and the remaining best fuzzy matches along with them until the limit
    is reached.

Signed-off-by: Vincent Fugnitto vincent.fugnitto@ericsson.com

@vince-fugnitto vince-fugnitto added bug bugs found in the application file search issues related to the file search labels Jul 4, 2019
@vince-fugnitto vince-fugnitto self-assigned this Jul 4, 2019
@vince-fugnitto
Copy link
Member Author

@kittaakos please let me know if it addresses your previous issues and is correct implementation wise.
The only thing I am worried about is having to remove the cancellation logic to opt for better results.

@kittaakos
Copy link
Contributor

The only thing I am worried about is having to remove the cancellation logic to opt for better results.

I see your point. On the other hand, it is nice that we support cancelation and search result limiting, but it produces incorrect results.

@akosyakov
Copy link
Member

akosyakov commented Jul 5, 2019

Make sure that this change does not harm performance in some cases. One has to research to figure out why we had such limit in the first place. Maybe search VS Code bugs and implementations as well they should have the same issues.

@kittaakos
Copy link
Contributor

Make sure that this change does not harm performance in some cases.

I would favor the correct results over performance. Otherwise, the file search is not usable.

@vince-fugnitto
Copy link
Member Author

Make sure that this change does not harm performance in some cases.

I would favor the correct results over performance. Otherwise, the file search is not usable.

I tend to agree that I'd prefer having correct (or better) results over a minor performance degradation.
I tested using theia as a workspace and with .gitignore files included and the performance is not horrible. I took a quick look at vscode and was not able to find them using a limit but I'll keep looking.

@vince-fugnitto
Copy link
Member Author

vince-fugnitto commented Jul 5, 2019

Make sure that this change does not harm performance in some cases. One has to research to figure out why we had such limit in the first place. Maybe search VS Code bugs and implementations as well they should have the same issues.

I'll keep this comment to store some vscode issues:

@vince-fugnitto
Copy link
Member Author

Based on vscode's implementation, it does not look like they have the limit in an attempt to support better search results, what's the best way forward?

@kittaakos
Copy link
Contributor

what's the best way forward

+1 for merging it. If we hit a performance issue we can introduce a preference for the limit.

@vince-fugnitto
Copy link
Member Author

what's the best way forward

+1 for merging it. If we hit a performance issue we can introduce a preference for the limit.

@kittaakos ok sounds good, do you want to test out the PR?

lmcbout
lmcbout previously approved these changes Jul 8, 2019
Copy link
Contributor

@lmcbout lmcbout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
For a larger project (Tested with CDT), it is a bit slower, but negligible. The result information is more important here

@akosyakov
Copy link
Member

Based on vscode's implementation, it does not look like they have the limit in an attempt to support better search results, what's the best way forward?

They have: https://github.com/microsoft/vscode/blob/49c45742b979ee742af2fc778e99ac3af074bff9/src/vs/workbench/contrib/search/browser/openAnythingHandler.ts#L106 Not more than 512.

Could you search how they can produce proper results and don't compromise on the performance?

@vince-fugnitto
Copy link
Member Author

Based on vscode's implementation, it does not look like they have the limit in an attempt to support better search results, what's the best way forward?

They have: https://github.com/microsoft/vscode/blob/49c45742b979ee742af2fc778e99ac3af074bff9/src/vs/workbench/contrib/search/browser/openAnythingHandler.ts#L106 Not more than 512.

Could you search how they can produce proper results and don't compromise on the performance?

Thanks! Sure, I'll take a look :)

@vince-fugnitto
Copy link
Member Author

@akosyakov @kittaakos @lmcbout

I updated the code, the limit is no longer removed to get the better result list.
Instead, I prioritize the collection of exact matches and if necessary fill up the remaining limit size with fuzzy matches. This means that better matches are displayed first, while fuzzy ones are displayed later and we get the results we expect :)

@kittaakos
Copy link
Contributor

I updated the code, the limit is no longer removed to get the better result list.

Great, I am trying it now.

I prioritize the collection of exact matches and if necessary fill up the remaining limit size with fuzzy matches

Is there a chance to run the search (this.doFind) in parallel? Instead of running them sequentially.

@vince-fugnitto
Copy link
Member Author

Is chance to run the search (this.doFind) in parallel? Instead of running them sequentially.

I thought of that also, let me try :)

@kittaakos
Copy link
Contributor

👍 Nice, it is definitely better than it was.
However, I still get different results than in VS Code;

  • the highlighting is incorrect:

Search for object:
Theia:
Screen Shot 2019-07-09 at 14 12 27

VS Code:
Screen Shot 2019-07-09 at 14 12 36


  • different results for fuzzy match:

Search for os.ts:
Theia:
Screen Shot 2019-07-09 at 14 15 58

VS Code:
Screen Shot 2019-07-09 at 14 15 36


Is there a chance to do exactly what VS Code does? They're using the same vscode-ripgrep lib, do not they?

@vince-fugnitto
Copy link
Member Author

+1 Nice, it is definitely better than it was.

Great!

  • the highlighting is incorrect:

It looks like it's already being tracked #4548, I can see if I can find a way to address it as well.

@vince-fugnitto
Copy link
Member Author

@kittaakos is it necessary to fix #4548 in the PR?

@lmcbout
Copy link
Contributor

lmcbout commented Jul 9, 2019

@vince-fugnitto testing with latest commit 535346b,
I see the same thing as @kittaakos when searching for "object"
Do you have an idea why the result is different on VSCode?

@vince-fugnitto
Copy link
Member Author

@vince-fugnitto testing with latest commit 535346b,
I see the same thing as @kittaakos when searching for "object"
Do you have an idea why the result is different on VSCode?

The highlighting for the fuzzy matches at the end?

@akosyakov
Copy link
Member

Please tackle highlighting separately. It is quite involving. The issue is that we rely on Monaco for matching on frontend and it works differently to how we match on the backend. We basically should do matching ourselves then. We need new API on file search that provides information about matched indexes. Old API should not be changes since it is used by other clients which are not interested in highlighting.

}
}, token);
// Perform searches for `exact` and `fuzzy` matches in parallel.
await Promise.all([
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to run the same command twice in parallel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akosyakov

The initial idea was that I wanted to run the exact matches first to try to fill up the limit.
Exact matches will give us the best possible results from a user standpoint. If there would be remaining space (the limit was not reached), then the remaining space would be dedicated to fuzzy matches.

The initial problem was that we checked at the same time if a result is an exact match and if not is it a fuzzy match leading to the limit being hit with much more fuzzy matches (and never seeing much better exact matches later on).

After @kittaakos #5638 (comment), I tried to perform these searches in parallel to squeeze out any performance I could.

Please let me know if something can be optimized or if anything needs addressing better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial idea was

It's all clear.

But since the two doFind runs in asynchronously, how can you ensure that the exact matches will fill the array first? Then comes the rest with the fuzzy match? Perhaps my ripgrep knowledge is limited on this :/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since the two doFind runs in asynchronously, how can you ensure that the exact matches will fill the array first?

I can update the code to go back to using two sets (exactMatches and fuzzyMatches) to ensure each search goes to their respective set.

Perhaps my ripgrep knowledge is limited on this...

Mine is as well :(

@vince-fugnitto
Copy link
Member Author

@kittaakos @akosyakov
I'm not sure I can address the problem but also keep the doFind in parallel.
If I perform the exact searches initially then fill up the limit with fuzzy matches, the results collected are much better for end-users.

Do either of you have any ideas?

Fixes #5636

Fixes an issue where `exact` file results were not being
displayed since `fuzzy` matches were added instead. Due
to the limit present when searching for files, `exact` matches
should be prioritized more while `fuzzy` matches should be
used to fill up the result list if necessary. Adjusting the
code means that better results are returned, and for an
end-user, they get more consistent results in respect to their
workspace.

Signed-off-by: Vincent Fugnitto <vincent.fugnitto@ericsson.com>
@akosyakov
Copy link
Member

@vince-fugnitto ok, let's go with parallel if you think it is better. Someone needs to study how VS Code does search and then think what can be applied here.

@kittaakos
Copy link
Contributor

kittaakos commented Jul 11, 2019

Someone needs to study how VS Code does search and then think what can be applied here.

+1 merging the proper fix only. No need to fix merge half-baked solutions.

@vince-fugnitto
Copy link
Member Author

+1 merging the proper fix only. No need to fix merge half-baked solutions.

Should I close the PR in favor of a better solution?

@kittaakos
Copy link
Contributor

Should I close the PR in favor of a better solution?

It is up to you. If you leave it open, please put an [on-hold] tag (or something similar) to the title. Thanks!

@vince-fugnitto vince-fugnitto changed the title [file-search] update file-search to prioritize exact and better fuzzy matches onhold - [file-search] update file-search to prioritize exact and better fuzzy matches Jul 11, 2019
@vince-fugnitto vince-fugnitto dismissed lmcbout’s stale review July 11, 2019 11:21

The PR is onhold until a better solution is proposed.

@akosyakov
Copy link
Member

I've received several offline complains about it from users. It seems to be quite annoying bug.

@vince-fugnitto
Copy link
Member Author

I've received several offline complains about it from users. It seems to be quite annoying bug.

Do you think we should try and simple fix (like perhaps the PR), and potentially improve it further in the future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug bugs found in the application file search issues related to the file search
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[file-search] As a user I want exact matches to show up in file search
4 participants