Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 531785 - Auto infix search in Open Resource #12

Merged

Conversation

pbodnar
Copy link
Contributor

@pbodnar pbodnar commented Apr 17, 2022

Basic feature description

A new match rule RULE_SUBSTRING_MATCH is introduced in the
SearchPattern. The constant itself is copied as-is from JDT's variant
of SearchPattern, yet its usage and purpose differs.

When used and prefix search is not enforced, then:

  • search is done as if "*" was specified at the pattern start
  • CamelCase search doesn't require the 1st pattern char to be the very
    1st char of the matched string

Use this in Resource search dialogs for the default and file name
pattern, while also list "prefix matches" firstly and extend the matched
characters highlighting to still work correctly.
Path (container) and extension searches should not be affected by this.

Further changes:

  • users can start a search pattern with ">" to enforce the old
    prefix-only search

For seeing how autoInfixSearch differs from the old one, one can compare "SearchPatternAuto.java" with "InfixSearchPatternAuto.java".

Yet to be investigated / implemented

Historical issues identified that could / should be solved later on

  1. SearchPattern.RULE_CASE_SENSITIVE - this flag seems to be ignored by the actual SearchPattern implementation

  2. SearchPattern.camelCaseMatch() could be optimized to check strings' lengths firstly. Something like:

     int patternLength = patternEnd - patternStart;
     int nameLength = nameEnd - nameStart;
     if (nameLength < patternLength) {
     	return false;
     }
    
  3. Constants like SearchPattern.END_SYMBOL are just private and special chars checking is hard-coded in dependent classes like FilteredResourcesSelectionDialog

  4. Inefficient matches highlighting: ResourceItemLabelProvider.getMatchPositions() and related seem to contain some tricky code and logic. -> see Make matches highlighting in Resource dialog more simple and efficient #184

@pbodnar pbodnar force-pushed the bug-531785-auto-infix-search branch from 5702845 to 8e68049 Compare April 17, 2022 09:41
@mickaelistria
Copy link
Contributor

For consistency, couldn't it be a SearchPattern.RULE_INFIX_MATCH or something like that?

@pbodnar
Copy link
Contributor Author

pbodnar commented Apr 18, 2022

For consistency, couldn't it be a SearchPattern.RULE_INFIX_MATCH or something like that?

Possibly yes, I will have to look at how it could be incorporated into the existing code base, but also how to make it easy to use for the client code. Maybe a related question to be sure: We want to keep the behavior backwards compatible for existing clients (so that they won't have to change their code in order to keep the old behavior), don't we?

@mickaelistria
Copy link
Contributor

We want to keep the behavior backwards compatible for existing clients (so that they won't have to change their code in order to keep the old behavior), don't we?

As much as possible yes. If some important feature does require to break some behavior, this needs to be discussed with the PMC for a decision.

@pbodnar pbodnar force-pushed the bug-531785-auto-infix-search branch from c95f8c7 to b0fc7cd Compare May 16, 2022 20:57
@pbodnar
Copy link
Contributor Author

pbodnar commented May 30, 2022

So, as can be seen from the commits, I have implemented some other points from the todo-list in the description. The next big thing in the list is this one:

  • make sure found resources are properly highlighted (see existing ResourceItemLabelTest)

After this, I would probably move on to the last:

  • present the SearchPattern's autoInfixSearch feature on UI layer somehow

Plus of course, some documentation pages should be updated...


BTW The Jenkins build is currently failing on some error I don't quite understand. It is possible that the error is quite unrelated to commits in this PR, but I don't know really...


A final thought, after digging into this search topic more, I think that replacing the current "wildcards / camelCase / etc.-based searches" with a brand-new "fuzzy search" as known from the other IDEs out there (i. e. when writing simply & easily "illexc" finds "IllegalArgumentException") is the way to go. Yet, even finishing this one (which is rather an extension of the existing searches) could be a good start before moving on to a more "radical" change like that?

As usual, feedback and hints are welcome. :)

@@ -436,6 +441,21 @@ protected Comparator getItemsComparator() {
return 1;
}
}
// Prioritize names starting with the pattern
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be hopefully sufficient?

The existing comparator code is written in a way which doesn't seem to let us cover the code with corresponding unit tests easily. But hopefully it doesn't matter that much in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No one protested, so assuming this sorting implementation is sufficient and closing this thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reopen for a while with a question...

The current comparator implementation part:

				// Prioritize names starting with the pattern
				char patternFirstChar = getFirstFileNameChar(pattern);
				if (patternFirstChar != 0) {
					patternFirstChar = Character.toLowerCase(patternFirstChar);
					m1 = patternFirstChar == Character.toLowerCase(s1.charAt(0));
					m2 = patternFirstChar == Character.toLowerCase(s2.charAt(0));
					if (!m1 || !m2) {
						if (m1) {
							return -1;
						}
						if (m2) {
							return 1;
						}
					}
				}

... is simple, but because of checking just the first letters, it wrongly evaluates for example name "Aabc" as a "name starting with the pattern" if the pattern is "ab". IMO the only 100% working solution would be either to de-facto re-implement the matching here (which is somewhat already done in the matches highlighting code) or to change the whole searching solution to keep metadata for every successful match. Until then, can we say that the solution above is "good enough" for now and maybe add some inline warning comment about it not being perfect?

@pbodnar
Copy link
Contributor Author

pbodnar commented Jun 2, 2022

Status update: I've tried some real-world manual testing, finding I need to fix handling of some edge cases in / with the new code. Approximately these:

  1. when entering just " " or "> ": ArrayIndexOutOfBoundsException
  2. nothing is found when typing-in (not copy-pasting) ">abc", because ">" already finds nothing (and "subFilter concept" kicks in, filtering an empty set on entering ">a" etc.)
  3. ResourceFilter should probably work with SearchPattern.initialPattern in order to handle dots (".") in the search string correctly, while pattern is being typed in the search dialog

@pbodnar
Copy link
Contributor Author

pbodnar commented Jun 25, 2022

Wow, it looks like everything is fine again :), I guess the previous build failures were given by the June release / freeze period somehow.

Anyway, as can be seen, I have (hopefully) coped with the 3 points described above - by the 3 follow-up commits. So if someone could review and give a feedback, it would be great. Mainly the 3rd point was quite challenging, as I had to drill through many historical tickets to find out what is going on there in the ResourceFilter class & retesting various scenarios manually (no easy tests automation in this area without a huge refactoring, it seems).

I'm also not sure if I have correctly selected to merge the master branch into mine, or if I should do a rebase instead. So please let me know, provided I should do this differently.

@HannesWell
Copy link
Member

HannesWell commented Jun 25, 2022

I'm also not sure if I have correctly selected to merge the master branch into mine, or if I should do a rebase instead. So please let me know, provided I should do this differently.

A rebase is perfectly fine. We try to avoid merge commits.
Usually it does not matter when a change happened at your computer and what rounds you spinned until the change is completed, what matters is what changes were finally committed to the master. And this is best tracked with a simple linear history where each commit is a self-contained change. So unless you have unrelated changes in this PR or large changes that can be understood more easily if applied incrementally you can squash your commits into one or at least few self-contained commits.

@pbodnar pbodnar force-pushed the bug-531785-auto-infix-search branch from ab825db to 70ef7f4 Compare June 26, 2022 14:22
@pbodnar
Copy link
Contributor Author

pbodnar commented Jun 26, 2022

@HannesWell, thanks for your quick info. So while still probably quite far from being a fully implemented PR, I have at least master-rebased, reorganized & squashed the 8 commits into 3 relatively "self-contained" ones, which I believe can somewhat simplify the reviews.

@pbodnar pbodnar force-pushed the bug-531785-auto-infix-search branch 2 times, most recently from 27585f1 to 14b25b8 Compare July 2, 2022 09:52
@pbodnar
Copy link
Contributor Author

pbodnar commented Jul 2, 2022

OK, so I have finished some smaller amendments and the code is ready for review & testing now (except for the 2 remaining tasks from PR description). I think I will take a small break from this one. ;)

So @mickaelistria, @vogella or whoever is watching this issue, can you please have a look? It would be great to have some concrete suggestions before moving on, so that we avoid a major rework after everything is implemented. Thanks in advance. :)

@mickaelistria
Copy link
Contributor

I'm quite busy with a handful of high priority tasks. I'll try to review it next week if no other committer does look at it in the meantime (but having someone else to review would be very welcome!)

@vogella
Copy link
Contributor

vogella commented Jul 5, 2022

I will be soon on vacation for a few weeks so I cannot review at the moment. I suggest to squash the commits into one commit using interfactive rebase and force push to your branch. Also I would suggest to remove the WIP from the PR, WIP is usually something committers ignore as WIP indicates that the change is not yet complete but to me it looks (on a quick glance) that this change is actually ready for review and that you expecting feedback on it.

Thanks @pbodnar for working on this, it is really nice to see an issue being worked on that makes IntelliJ better than Eclipse

@vogella
Copy link
Contributor

vogella commented Jul 8, 2022

@pbodnar this one needs manual rebase.

@pbodnar pbodnar force-pushed the bug-531785-auto-infix-search branch from 14b25b8 to 3dba2df Compare July 8, 2022 20:30
@pbodnar pbodnar changed the title [WIP] Bug 531785 - Auto infix search in Open Resource Bug 531785 - Auto infix search in Open Resource Jul 8, 2022
@pbodnar
Copy link
Contributor Author

pbodnar commented Jul 8, 2022

OK, freshly rebased on the latest master, squashed into 1 commit, updated the description and removed prefix "WIP". All ready for a review, hopefully. :)

Copy link
Contributor

@mickaelistria mickaelistria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really be a MatchRule like others. Keeping this consistencyy would reduce the amount of new API to be created.

@mickaelistria
Copy link
Contributor

Also, I've looked a bit on the web to find a formal definiton of infix search and couldn't find something relatively standard. Can you please properly define it, or use another more standard term?

@pbodnar
Copy link
Contributor Author

pbodnar commented Jul 17, 2022

This should really be a MatchRule like others. Keeping this consistencyy would reduce the amount of new API to be created.

OK, I can see we could remove those 2 new constructors and (currently unused) isAutoInfixSearch() afterwards, right? I expect that for backwards compatibility reasons, we won't make the new rule the default one in the no-argument constructor, so can someone please suggest the best "Eclipse code-base" way of how to make the API developer-friendly, so that for example RULE_EXACT_MATCH | RULE_PREFIX_MATCH | RULE_PATTERN_MATCH | RULE_CAMELCASE_MATCH | RULE_BLANK_MATCH | RULE_INFIX_MATCH doesn't have to be copied to every client code which would like to use the new behavior? Maybe introduce a dedicated public constant(s) for the various rule sets? I can also imagine introducing a builder for the SearchPattern class, but that would probably go above the scope of this PR...?

Also, I've looked a bit on the web to find a formal definiton of infix search and couldn't find something relatively standard. Can you please properly define it, or use another more standard term?

Regarding the infix search term, there are many pages using this term, but I can see it doesn't have to be necessarily the best in our context (and it would deserve an explanatory sentence in the javadoc). What to use instead though? Page What is it called when you search the middle of a string instead of the beginning? suggests unanchored search, which comes from the world of regex, but maybe contains search is a "more natural" candidate, although it has a verb in its name?

@mickaelistria
Copy link
Contributor

JDT's SearchPattern calls it SUBSTRING_MATCH if I understand correctly. That's a good term to reuse.

@pbodnar
Copy link
Contributor Author

pbodnar commented Jul 23, 2022

OK, I will try to refactor the code - to introduce a new flag RULE_SUBSTRING_MATCH...

@vogella
Copy link
Contributor

vogella commented Dec 6, 2022

@pbodnar did you update your API baseline? I downloaded https://download.eclipse.org/eclipse/downloads/drops4/S-4.26RC2-202211231800/ and set my workspace to use this a API baseline in which I did not get the error you described. Lets see what CI thinks of this change.

@vogella
Copy link
Contributor

vogella commented Dec 6, 2022

@pbodnar unfortunately this change now conflicts with your Javadoc changes. Can you rebase your change onto master?

mickaelistria pushed a commit to eclipse-platform/eclipse.platform.common that referenced this pull request Dec 8, 2022
This covers changes implemented here:
eclipse-platform/eclipse.platform.ui#12.
In short, automatic prefix ("starts with") search got replaced with
automatic infix ("contains") search. Also, a new wildcard character
'>' was introduced to enforce the original prefix search.

To improve the docs at the same time:

* mention explicitly that the search is case-insensitive
* unify the way example searches are formulated
@pbodnar pbodnar force-pushed the bug-531785-auto-infix-search branch 2 times, most recently from 978152a to 5fda965 Compare December 8, 2022 20:48
@pbodnar
Copy link
Contributor Author

pbodnar commented Dec 8, 2022

@vogella, thanks for your assistance. :)

So I have rebased, squashed and moved version / @since to 3.128.0 / 3.128. Not sure about the GitHub build (seems like a random error?), but the Jenkins build fails on this now:

21:00:02.429 [ERROR] Failed to execute goal org.eclipse.tycho.extras:tycho-p2-extras-plugin:4.0.0-SNAPSHOT:compare-version-with-baselines (compare-attached-artifacts-with-release)
on project org.eclipse.ui.ide: Only qualifier changed for (org.eclipse.ui.ide/3.20.0.v20221208-2047).
Expected to have bigger x.y.z than what is available in baseline (3.20.0.v20221027-2208) -> [Help 1]

Can you please help once again?

@mickaelistria
Copy link
Contributor

The version of org.eclipse.ui.ide in the MANIFEST.MF should be increased by +0.0.100 to signify some changes have happened compare to last release. Also, by the way, its requirement on org.eclipse.ui.workbench should set 3.128.0 as minimal version as it uses the new APIs there.

@pbodnar pbodnar force-pushed the bug-531785-auto-infix-search branch from 5fda965 to beb720c Compare December 8, 2022 22:44
@pbodnar pbodnar force-pushed the bug-531785-auto-infix-search branch from beb720c to ce1c3e8 Compare December 8, 2022 23:05
@mickaelistria
Copy link
Contributor

OK, but what about the org.eclipse.ui.internal.navigator.resources.actions.GotoResourceDialog which extends the extended org.eclipse.ui.dialogs.FilteredResourcesSelectionDialog? No need for "explicit" dependency here?

No, because it doesn't consume new API methods or fields. It's still compatible with the older one.

Apart from that, it looks like this will be necessary according to the latest build failure (as tests have been modified as well):

You got it ;)

... and hopefully that could be all?

Hopefully yes.

I wonder if these versions checks could be automated on localhost (dev's machine) somehow...

Some of those checks can already be seen in the IDE if you configure a PDE baseline referencing latest release.
Some others are currently not available in the IDE; and a local build isn't really faster than a remote one, so it's OK to just push to PR to test.

A new match rule `RULE_SUBSTRING_MATCH` is introduced in the
`SearchPattern`. The constant itself is copied as-is from JDT's variant
of `SearchPattern`, yet its usage and purpose differs.

When used and prefix search is not enforced, then:

* search is done as if "*" was specified at the pattern start
* CamelCase search doesn't require the 1st pattern char to be the very
1st char of the matched string

Use this in Resource search dialogs for the default and file name
pattern, while also list "prefix matches" firstly and extend the matched
characters highlighting to still work correctly.
Path (container) and extension searches should not be affected by this.

Further changes:

* users can start a search pattern with ">" to enforce the old
prefix-only search

Necessary refactoring:

* unify how `stringPattern` gets prepared in
`SearchPattern.initializePatternAndMatchRule()`
  * this also implies refactoring of camelCaseMatch()
* use `SearchPattern.initialPattern` for creating the extra patterns in
the dialog class
  * this fixes scenarios when the newly introduced ">" is used together
with "." (changes in pattern didn't get recognized)
  * this also means that most of the lines made by 2ff4063 are
removed as they are no longer necessary

Improve unit tests of `SearchPattern`:

* remove duplicated examples from javadocs
* newly cover basic edge cases
* use better examples for starts-vs-contains-like tests
* add sanity check that at least 1 testing resource matches some pattern
@pbodnar
Copy link
Contributor Author

pbodnar commented Dec 10, 2022

@mickaelistria, thanks for the tips. So I have done the 2 changes in manifests as proposed + another forced rebase. And finally "All checks have passed". :)

@mickaelistria mickaelistria merged commit f31b811 into eclipse-platform:master Dec 11, 2022
@mickaelistria
Copy link
Contributor

Thank you!
Can you please add a note to the N&N document as well ( https://github.com/eclipse-platform/www.eclipse.org-eclipse-news )?

pbodnar added a commit to pbodnar/www.eclipse.org-eclipse-news that referenced this pull request Dec 12, 2022
pbodnar added a commit to pbodnar/www.eclipse.org-eclipse-news that referenced this pull request Dec 15, 2022
pbodnar added a commit to pbodnar/www.eclipse.org-eclipse-news that referenced this pull request Dec 15, 2022
mickaelistria pushed a commit to eclipse-platform/www.eclipse.org-eclipse-news that referenced this pull request Dec 15, 2022
@vogella
Copy link
Contributor

vogella commented May 26, 2023

@pbodnar the new funcationality is super nice and easy to use, thanks a bunch for this.

One of my clients mentioned that it would also be very nice if the * in the middle of a work could be avoid, e.g. in the following example:

image

the match for TaskColumnPropertyAccessor continues if:

  • I continue to write column, e.g. taskcolumn
  • I write TCP

but not if I write taskproperti have to add an * to match this one, e.g. task*propert

Would you be interested in also implementing the automatic * in the middle of the search term? According to my client this is standard behavior in VSCode and IntelliJ.

@pbodnar
Copy link
Contributor Author

pbodnar commented May 26, 2023

@vogella, thanks for your feedback. I have actually kinda proposed this kind of change at https://bugs.eclipse.org/bugs/show_bug.cgi?id=531785#c13 - I called it "fuzzy search" there. :)

Anyway, yes, it seems doable (with matching algorithm itself being quite trivial?), with some major refactorings (API and UI changes would have to be consulted?), given I already somewhat understand the searching code behind. I'm just not sure if it still pays off when Eclipse Theia is on its way, and it is also harder for me to find some extra time for Eclipse contributions these days... Maybe when there will be more votes on this? ;)

@vogella
Copy link
Contributor

vogella commented Jun 20, 2023

Anyway, yes, it seems doable (with matching algorithm itself being quite trivial?),

If the matching algorithm can be adjusted easily why not change it without any API changes? I think it would be OK for our search to become better without offering API to go back to the "not so nice" behavior.

@pbodnar
Copy link
Contributor Author

pbodnar commented Jun 27, 2023

@vogella, good point, just extending the behavior could probably be the simplest way, provided it would by widely accepted. Question is, what with all those existing SearchPattern's flags like RULE_EXACT_MATCH, RULE_PREFIX_MATCH, RULE_PATTERN_MATCH etc., moreover when we wouldn't introduce any new one...?

Besides that, matching, sorting and highlighting of the matches are currently done independently of each other. As seen already within this MR, this has got some limits (mainly in regards to sorting and coping with duplicated executions of +- the same stuff). I can imagine a major redesign where output from matching wouldn't be just a boolean - instead, it would be some structured data which could be directly used within sorting and highlighting. But once again, this is not quite trivial to do, IMHO...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request noteworthy Noteworthy feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants