onhold - [file-search] update file-search to prioritize exact and better fuzzy matches #5638

vince-fugnitto · 2019-07-04T12:33:58Z

fixes an issue where better file results were not being displayed
by the file-search due to the limit. The limit meant that if we
ever reached the quota of results allowed some better results
were never to be displayed. Instead, the logic was changed so
that fuzzy matches are sorted by their score (how well they match
the searchPattern), then are sent to the front end limited by
the option. This means that all possible exact matches are sent
and the remaining best fuzzy matches along with them until the limit
is reached.

Signed-off-by: Vincent Fugnitto vincent.fugnitto@ericsson.com

vince-fugnitto · 2019-07-04T12:34:48Z

@kittaakos please let me know if it addresses your previous issues and is correct implementation wise.
The only thing I am worried about is having to remove the cancellation logic to opt for better results.

kittaakos · 2019-07-04T14:42:07Z

The only thing I am worried about is having to remove the cancellation logic to opt for better results.

I see your point. On the other hand, it is nice that we support cancelation and search result limiting, but it produces incorrect results.

akosyakov · 2019-07-05T07:36:01Z

Make sure that this change does not harm performance in some cases. One has to research to figure out why we had such limit in the first place. Maybe search VS Code bugs and implementations as well they should have the same issues.

kittaakos · 2019-07-05T07:37:51Z

Make sure that this change does not harm performance in some cases.

I would favor the correct results over performance. Otherwise, the file search is not usable.

vince-fugnitto · 2019-07-05T12:45:06Z

Make sure that this change does not harm performance in some cases.

I would favor the correct results over performance. Otherwise, the file search is not usable.

I tend to agree that I'd prefer having correct (or better) results over a minor performance degradation.
I tested using theia as a workspace and with .gitignore files included and the performance is not horrible. I took a quick look at vscode and was not able to find them using a limit but I'll keep looking.

vince-fugnitto · 2019-07-05T19:44:54Z

Make sure that this change does not harm performance in some cases. One has to research to figure out why we had such limit in the first place. Maybe search VS Code bugs and implementations as well they should have the same issues.

I'll keep this comment to store some vscode issues:

vince-fugnitto · 2019-07-05T19:55:13Z

Based on vscode's implementation, it does not look like they have the limit in an attempt to support better search results, what's the best way forward?

kittaakos · 2019-07-05T20:19:09Z

what's the best way forward

+1 for merging it. If we hit a performance issue we can introduce a preference for the limit.

vince-fugnitto · 2019-07-05T20:27:53Z

what's the best way forward

+1 for merging it. If we hit a performance issue we can introduce a preference for the limit.

@kittaakos ok sounds good, do you want to test out the PR?

lmcbout

LGTM
For a larger project (Tested with CDT), it is a bit slower, but negligible. The result information is more important here

akosyakov · 2019-07-09T07:27:15Z

Based on vscode's implementation, it does not look like they have the limit in an attempt to support better search results, what's the best way forward?

They have: https://github.com/microsoft/vscode/blob/49c45742b979ee742af2fc778e99ac3af074bff9/src/vs/workbench/contrib/search/browser/openAnythingHandler.ts#L106 Not more than 512.

Could you search how they can produce proper results and don't compromise on the performance?

vince-fugnitto · 2019-07-09T10:44:32Z

Based on vscode's implementation, it does not look like they have the limit in an attempt to support better search results, what's the best way forward?

They have: https://github.com/microsoft/vscode/blob/49c45742b979ee742af2fc778e99ac3af074bff9/src/vs/workbench/contrib/search/browser/openAnythingHandler.ts#L106 Not more than 512.

Could you search how they can produce proper results and don't compromise on the performance?

Thanks! Sure, I'll take a look :)

vince-fugnitto · 2019-07-09T12:05:33Z

@akosyakov @kittaakos @lmcbout

I updated the code, the limit is no longer removed to get the better result list.
Instead, I prioritize the collection of exact matches and if necessary fill up the remaining limit size with fuzzy matches. This means that better matches are displayed first, while fuzzy ones are displayed later and we get the results we expect :)

kittaakos · 2019-07-09T12:09:26Z

I updated the code, the limit is no longer removed to get the better result list.

Great, I am trying it now.

I prioritize the collection of exact matches and if necessary fill up the remaining limit size with fuzzy matches

Is there a chance to run the search (this.doFind) in parallel? Instead of running them sequentially.

vince-fugnitto · 2019-07-09T12:14:54Z

Is chance to run the search (this.doFind) in parallel? Instead of running them sequentially.

I thought of that also, let me try :)

kittaakos · 2019-07-09T12:21:10Z

👍 Nice, it is definitely better than it was.
However, I still get different results than in VS Code;

the highlighting is incorrect:

Search for object:
Theia:

VS Code:

different results for fuzzy match:

Search for os.ts:
Theia:

VS Code:

Is there a chance to do exactly what VS Code does? They're using the same vscode-ripgrep lib, do not they?

vince-fugnitto · 2019-07-09T12:31:04Z

+1 Nice, it is definitely better than it was.

Great!

the highlighting is incorrect:

It looks like it's already being tracked #4548, I can see if I can find a way to address it as well.

vince-fugnitto · 2019-07-09T13:37:06Z

@kittaakos is it necessary to fix #4548 in the PR?

lmcbout · 2019-07-09T17:52:42Z

@vince-fugnitto testing with latest commit 535346b,
I see the same thing as @kittaakos when searching for "object"
Do you have an idea why the result is different on VSCode?

vince-fugnitto · 2019-07-09T17:53:57Z

@vince-fugnitto testing with latest commit 535346b,
I see the same thing as @kittaakos when searching for "object"
Do you have an idea why the result is different on VSCode?

The highlighting for the fuzzy matches at the end?

akosyakov · 2019-07-10T08:12:21Z

Please tackle highlighting separately. It is quite involving. The issue is that we rely on Monaco for matching on frontend and it works differently to how we match on the backend. We basically should do matching ourselves then. We need new API on file search that provides information about matched indexes. Old API should not be changes since it is used by other clients which are not interested in highlighting.

akosyakov · 2019-07-10T08:17:07Z

packages/file-search/src/node/file-search-service-impl.ts

-                    }
-                }, token);
+                // Perform searches for `exact` and `fuzzy` matches in parallel.
+                await Promise.all([


Why do we need to run the same command twice in parallel?

@akosyakov

The initial idea was that I wanted to run the exact matches first to try to fill up the limit.
Exact matches will give us the best possible results from a user standpoint. If there would be remaining space (the limit was not reached), then the remaining space would be dedicated to fuzzy matches.

The initial problem was that we checked at the same time if a result is an exact match and if not is it a fuzzy match leading to the limit being hit with much more fuzzy matches (and never seeing much better exact matches later on).

After @kittaakos #5638 (comment), I tried to perform these searches in parallel to squeeze out any performance I could.

Please let me know if something can be optimized or if anything needs addressing better.

The initial idea was

It's all clear.

But since the two doFind runs in asynchronously, how can you ensure that the exact matches will fill the array first? Then comes the rest with the fuzzy match? Perhaps my ripgrep knowledge is limited on this :/

But since the two doFind runs in asynchronously, how can you ensure that the exact matches will fill the array first?

I can update the code to go back to using two sets (exactMatches and fuzzyMatches) to ensure each search goes to their respective set.

Perhaps my ripgrep knowledge is limited on this...

Mine is as well :(

vince-fugnitto · 2019-07-10T15:30:06Z

@kittaakos @akosyakov
I'm not sure I can address the problem but also keep the doFind in parallel.
If I perform the exact searches initially then fill up the limit with fuzzy matches, the results collected are much better for end-users.

Do either of you have any ideas?

Fixes #5636 Fixes an issue where `exact` file results were not being displayed since `fuzzy` matches were added instead. Due to the limit present when searching for files, `exact` matches should be prioritized more while `fuzzy` matches should be used to fill up the result list if necessary. Adjusting the code means that better results are returned, and for an end-user, they get more consistent results in respect to their workspace. Signed-off-by: Vincent Fugnitto <vincent.fugnitto@ericsson.com>

akosyakov · 2019-07-11T09:43:36Z

@vince-fugnitto ok, let's go with parallel if you think it is better. Someone needs to study how VS Code does search and then think what can be applied here.

kittaakos · 2019-07-11T09:47:28Z

Someone needs to study how VS Code does search and then think what can be applied here.

+1 merging the proper fix only. No need to ~~fix~~ merge half-baked solutions.

vince-fugnitto · 2019-07-11T10:31:02Z

+1 merging the proper fix only. No need to ~~fix~~ merge half-baked solutions.

Should I close the PR in favor of a better solution?

kittaakos · 2019-07-11T11:19:10Z

Should I close the PR in favor of a better solution?

It is up to you. If you leave it open, please put an [on-hold] tag (or something similar) to the title. Thanks!

The PR is onhold until a better solution is proposed.

akosyakov · 2019-08-02T10:34:15Z

I've received several offline complains about it from users. It seems to be quite annoying bug.

vince-fugnitto · 2019-08-02T12:09:39Z

I've received several offline complains about it from users. It seems to be quite annoying bug.

Do you think we should try and simple fix (like perhaps the PR), and potentially improve it further in the future?

vince-fugnitto added bug bugs found in the application file search issues related to the file search labels Jul 4, 2019

vince-fugnitto requested review from lmcbout, kittaakos and akosyakov July 4, 2019 12:33

vince-fugnitto self-assigned this Jul 4, 2019

lmcbout previously approved these changes Jul 8, 2019

View reviewed changes

vince-fugnitto force-pushed the vf/file-search branch from 6f431ce to de813fb Compare July 9, 2019 12:03

vince-fugnitto force-pushed the vf/file-search branch from de813fb to 535346b Compare July 9, 2019 12:21

vince-fugnitto force-pushed the vf/file-search branch from 535346b to 73262de Compare July 9, 2019 17:54

akosyakov mentioned this pull request Jul 10, 2019

file search does not highlight properly #4548

Open

akosyakov reviewed Jul 10, 2019

View reviewed changes

vince-fugnitto force-pushed the vf/file-search branch from 73262de to 0fbdb6b Compare July 10, 2019 15:39

vince-fugnitto changed the title ~~[file-search] update file-search to prioritize exact and better fuzzy matches~~ onhold - [file-search] update file-search to prioritize exact and better fuzzy matches Jul 11, 2019

kittaakos removed their request for review September 17, 2019 11:52

vince-fugnitto closed this Nov 26, 2019

kittaakos mentioned this pull request Jul 22, 2020

[monaco] As an extension developer I want to customize which modules are loaded from monaco #8220

Closed

vince-fugnitto mentioned this pull request Feb 3, 2021

Quick File Open: Support for Search Queries with Whitespaces #8989

Merged

1 task

vince-fugnitto deleted the vf/file-search branch October 7, 2021 17:45

magnologan mentioned this pull request Jan 9, 2023

[Snyk] Security upgrade puppeteer from 2.1.1 to 3.0.0 magnologan/theia#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

onhold - [file-search] update file-search to prioritize exact and better fuzzy matches #5638

onhold - [file-search] update file-search to prioritize exact and better fuzzy matches #5638

vince-fugnitto commented Jul 4, 2019

vince-fugnitto commented Jul 4, 2019

kittaakos commented Jul 4, 2019

akosyakov commented Jul 5, 2019 •

edited

Loading

kittaakos commented Jul 5, 2019

vince-fugnitto commented Jul 5, 2019

vince-fugnitto commented Jul 5, 2019 •

edited

Loading

vince-fugnitto commented Jul 5, 2019

kittaakos commented Jul 5, 2019

vince-fugnitto commented Jul 5, 2019

lmcbout left a comment

akosyakov commented Jul 9, 2019

vince-fugnitto commented Jul 9, 2019

vince-fugnitto commented Jul 9, 2019

kittaakos commented Jul 9, 2019

vince-fugnitto commented Jul 9, 2019

kittaakos commented Jul 9, 2019

vince-fugnitto commented Jul 9, 2019

vince-fugnitto commented Jul 9, 2019

lmcbout commented Jul 9, 2019

vince-fugnitto commented Jul 9, 2019

akosyakov commented Jul 10, 2019

akosyakov Jul 10, 2019

vince-fugnitto Jul 10, 2019

kittaakos Jul 10, 2019

vince-fugnitto Jul 10, 2019

vince-fugnitto commented Jul 10, 2019

akosyakov commented Jul 11, 2019

kittaakos commented Jul 11, 2019 •

edited

Loading

vince-fugnitto commented Jul 11, 2019

kittaakos commented Jul 11, 2019

akosyakov commented Aug 2, 2019

vince-fugnitto commented Aug 2, 2019

onhold - [file-search] update file-search to prioritize exact and better fuzzy matches #5638

onhold - [file-search] update file-search to prioritize exact and better fuzzy matches #5638

Conversation

vince-fugnitto commented Jul 4, 2019

vince-fugnitto commented Jul 4, 2019

kittaakos commented Jul 4, 2019

akosyakov commented Jul 5, 2019 • edited Loading

kittaakos commented Jul 5, 2019

vince-fugnitto commented Jul 5, 2019

vince-fugnitto commented Jul 5, 2019 • edited Loading

vince-fugnitto commented Jul 5, 2019

kittaakos commented Jul 5, 2019

vince-fugnitto commented Jul 5, 2019

lmcbout left a comment

Choose a reason for hiding this comment

akosyakov commented Jul 9, 2019

vince-fugnitto commented Jul 9, 2019

vince-fugnitto commented Jul 9, 2019

kittaakos commented Jul 9, 2019

vince-fugnitto commented Jul 9, 2019

kittaakos commented Jul 9, 2019

vince-fugnitto commented Jul 9, 2019

vince-fugnitto commented Jul 9, 2019

lmcbout commented Jul 9, 2019

vince-fugnitto commented Jul 9, 2019

akosyakov commented Jul 10, 2019

akosyakov Jul 10, 2019

Choose a reason for hiding this comment

vince-fugnitto Jul 10, 2019

Choose a reason for hiding this comment

kittaakos Jul 10, 2019

Choose a reason for hiding this comment

vince-fugnitto Jul 10, 2019

Choose a reason for hiding this comment

vince-fugnitto commented Jul 10, 2019

akosyakov commented Jul 11, 2019

kittaakos commented Jul 11, 2019 • edited Loading

vince-fugnitto commented Jul 11, 2019

kittaakos commented Jul 11, 2019

akosyakov commented Aug 2, 2019

vince-fugnitto commented Aug 2, 2019

akosyakov commented Jul 5, 2019 •

edited

Loading

vince-fugnitto commented Jul 5, 2019 •

edited

Loading

kittaakos commented Jul 11, 2019 •

edited

Loading