-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search by type feature, a kind of sherlodoc in Merlin #1828
Conversation
cc @art-w |
d95be0c
to
f4d6973
Compare
a01654b
to
4a40993
Compare
The behaviour of the feature is very strange on Windows :/ |
In case it helps others review this PR, the general algorithm implemented is to:
The scoring uses an algorithm similar to sherlodoc to compare types, with enhancements by @xvw that I'm going to steal.
While this doesn't attempt to solve the much harder type isomorphism problem, it does rank types by similarity by preferring types that requires the less modifications to match the query, while tolerating user mistakes. (If the previous explanations are incomprehensible, there's an even worst description in French with drawings at around 21:00-23:00 in this presentation) @xvw added string levenshtein distance to compare type names while sherlodoc only checks for substring match. His normalization of the type ASTs is a really smart and simple way to handle polymorphic variables! He's also much more principled in his scoring, while sherlodoc had to resort to magic constants to produce results in a nice order... I have yet to test this PR to evaluate if the ordering feels right, but tuning the heuristic constants is highly subjective anyway :) The type scoring algorithm can be very expensive. If you run into performance issues on large environments, sherlodoc has an additional trick that might be useful in a follow-up PR: The values are pre-filtered to remove obviously bad candidates using the simpler Anyway, I'm super hyped by this PR, it will be excellent to not leave my editor while also getting better results specific to my codebase! |
Thank you very much for your feedback and for helping us with future reviews!
Of course! When I've got a bit more time, I'll try to make a small pass at at least integrating the polymorphic variable distinction (and potentially try to implement what I've got in mind for unsupported forms of type).
I don't know what the best approach is yet, but if I ever get any user feedback complaining about the search by term, I'll try out your approach.
As this patch is not based on indexing and does not allow a suffix tree to be expressed, as is the case with Sherlodoc, no pre-optimisation phase has been carried out for the time being. If users ever complain about performance, I'll do some experiments with filters, in the environment, by polarity! (Unlike Sherlodoc, which looks at all OPAM packets, here the lookup is only run on the environment, so the number of values is much smaller.) On the whole, I agree with the points you raise! And I'll keep all that in mind when we give our user feedback. For the moment, I think we need to move forward and collect user feedback to refine the heuristics and optimise (avoiding premature optimisations). |
I guess the Ci faisl on Windows because the standard libraries are slightly different. |
Is there a straightforward way to do it (make tests compatible)? |
No. It depends on the test :-) Here I would imagine that you could filter out results that are not in both sides ? But maybe not :-) |
Also, could you update |
Done in e47a488 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have to look at the distance calculations but this looks good so far 👍
src/analysis/type_search.ml
Outdated
| exception _ -> acc | ||
| _ -> | ||
let acc = compute_values ctx env (Some lident) acc in | ||
Env.fold_modules (fun name _ mdl acc -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really wonder why lazy tries where used for polarity search...
7cf0b85
to
77bbfce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very good. My only worries are related to performances, I think we should make documentation fetching optional.
src/analysis/polarity_search.ml
Outdated
Locate.get_doc | ||
~config | ||
~env | ||
~local_defs | ||
~comments | ||
~pos | ||
(`User_input name) | ||
|> Type_search.doc_to_option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering about the cost of fetching the doc of all the returned items. It's not a trivial operation. Maybe we should provide an option to the command so that clients can opt-out from that. (I put the comment here but of course it is also valable for the new type-search results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about the approach proposed here a8688aa
La CI macos a aussi un problème: # File "tests/test-dirs/search-by-type-comparison-to-polarity-search.t", line 1, characters 0-0:
# /opt/homebrew/bin/git --no-pager diff --no-index --color=always -u _build/.sandbox/1535e15bb85fec87f6431da9c6ded6e4/default/tests/test-dirs/search-by-type-comparison-to-polarity-search.t _build/.sandbox/1535e15bb85fec87f6431da9c6ded6e4/default/tests/test-dirs/search-by-type-comparison-to-polarity-search.t.corrected
# diff --git a/_build/.sandbox/1535e15bb85fec87f6431da9c6ded6e4/default/tests/test-dirs/search-by-type-comparison-to-polarity-search.t b/_build/.sandbox/1535e15bb85fec87f6431da9c6ded6e4/default/tests/test-dirs/search-by-type-comparison-to-polarity-search.t.corrected
# index dc5aa5f..0ac9c7d 100644
# --- a/_build/.sandbox/1535e15bb85fec87f6431da9c6ded6e4/default/tests/test-dirs/search-by-type-comparison-to-polarity-search.t
# +++ b/_build/.sandbox/1535e15bb85fec87f6431da9c6ded6e4/default/tests/test-dirs/search-by-type-comparison-to-polarity-search.t.corrected
# @@ -37,11 +37,11 @@ potential failures, so lifting the result in an int option).
# "type": "string -> bool option"
# }
# {
# - "name": "Float.of_string_opt",
# + "name": "float_of_string_opt",
# "type": "string -> float option"
# }
# {
# - "name": "float_of_string_opt",
# + "name": "Float.of_string_opt",
# "type": "string -> float option"
# } |
a8688aa
to
c3e9c69
Compare
Rebased and formatted |
84c5106
to
cc4b4f2
Compare
Core search engine by type, strongly inspired by https://doc.sherlocode.com/. The library is globally dependency agnostic so that it can potentially be used one day as a base for other applications.
Currently, Merlin's code base only uses test CRAMs, although some functions are easier to maintain if they are unit-tested. It is therefore a ‘gradual’ introduction of unit tests into the code base.
95ab63b
to
50e9086
Compare
… after the limit is enforced.
This might help with short-paths, we should introduce tests showing it.
Search by type feature+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @xvw !
As we discussed, what this feature needs now is user feedback. Let's merge it 🙂
CHANGES: Thu Sep 26 18:48:42 CEST 2024 + merlin binary - A new `WRAPPING_PREFIX` configuration directive that can be used to tell Merlin what to append to the current unit name in the presence of wrapping (ocaml/merlin#1788) - Add `-unboxed-types` and `-no-unboxed-types` as ocaml ignored flags (ocaml/merlin#1795, fixes ocaml/merlin#1794) - destruct: Refinement in the presence of optional arguments (ocaml/merlin#1800 ocaml/merlin#1807, fixes ocaml/merlin#1770) - Implement new expand-node command for expanding PPX annotations (ocaml/merlin#1745) - Implement new inlay-hints command for adding hints on a sourcetree (ocaml/merlin#1812) - Implement new search-by-type command for searching values by types (ocaml/merlin#1828) - Canonicalize paths in occurrences. This helps deduplicate the results and show more user-friendly paths. (ocaml/merlin#1840) - Fix dot-merlin-reader ignoring `SOURCE_ROOT` and `STDLIB` directives (ocaml/merlin#1839, ocaml/merlin#1803) + editor modes - vim: fix python-3.12 syntax warnings in merlin.py (ocaml/merlin#1798) - vim: Dead code / doc removal for previously deleted MerlinPhrase command (ocaml/merlin#1804) - emacs: Improve the way that result of polarity search is displayed (ocaml/merlin#1814) - emacs: Add `merlin-search-by-type`, `merlin-search-by-polarity` and change the behaviour of `merlin-search` to switch between `by-type` or `by-polarity` depending on the query (ocaml/merlin#1828)
CHANGES: Thu Sep 26 18:48:42 CEST 2024 + merlin binary - A new `WRAPPING_PREFIX` configuration directive that can be used to tell Merlin what to append to the current unit name in the presence of wrapping (ocaml/merlin#1788) - Add `-unboxed-types` and `-no-unboxed-types` as ocaml ignored flags (ocaml/merlin#1795, fixes ocaml/merlin#1794) - destruct: Refinement in the presence of optional arguments (ocaml/merlin#1800 ocaml/merlin#1807, fixes ocaml/merlin#1770) - Implement new expand-node command for expanding PPX annotations (ocaml/merlin#1745) - Implement new inlay-hints command for adding hints on a sourcetree (ocaml/merlin#1812) - Add `signature-help` command (ocaml/merlin#1720) - Implement new search-by-type command for searching values by types (ocaml/merlin#1828) - Fix dot-merlin-reader ignoring `SOURCE_ROOT` and `STDLIB` directives (ocaml/merlin#1839, ocaml/merlin#1803) + editor modes - vim: fix python-3.12 syntax warnings in merlin.py (ocaml/merlin#1798) - vim: Dead code / doc removal for previously deleted MerlinPhrase command (ocaml/merlin#1804) - emacs: Improve the way that result of polarity search is displayed (ocaml/merlin#1814) - emacs: Add `merlin-search-by-type`, `merlin-search-by-polarity` and change the behaviour of `merlin-search` to switch between `by-type` or `by-polarity` depending on the query (ocaml/merlin#1828)
CHANGES: Thu Sep 26 18:48:42 CEST 2024 + merlin binary - A new `WRAPPING_PREFIX` configuration directive that can be used to tell Merlin what to append to the current unit name in the presence of wrapping (ocaml/merlin#1788) - Add `-unboxed-types` and `-no-unboxed-types` as ocaml ignored flags (ocaml/merlin#1795, fixes ocaml/merlin#1794) - destruct: Refinement in the presence of optional arguments (ocaml/merlin#1800 ocaml/merlin#1807, fixes ocaml/merlin#1770) - Implement new expand-node command for expanding PPX annotations (ocaml/merlin#1745) - Implement new inlay-hints command for adding hints on a sourcetree (ocaml/merlin#1812) - Add `signature-help` command (ocaml/merlin#1720) - Implement new search-by-type command for searching values by types (ocaml/merlin#1828) - Fix dot-merlin-reader ignoring `SOURCE_ROOT` and `STDLIB` directives (ocaml/merlin#1839, ocaml/merlin#1803) + editor modes - vim: fix python-3.12 syntax warnings in merlin.py (ocaml/merlin#1798) - vim: Dead code / doc removal for previously deleted MerlinPhrase command (ocaml/merlin#1804) - emacs: Improve the way that result of polarity search is displayed (ocaml/merlin#1814) - emacs: Add `merlin-search-by-type`, `merlin-search-by-polarity` and change the behaviour of `merlin-search` to switch between `by-type` or `by-polarity` depending on the query (ocaml/merlin#1828)
CHANGES: Fri Sep 27 12:02:42 CEST 2024 + merlin binary - A new `WRAPPING_PREFIX` configuration directive that can be used to tell Merlin what to append to the current unit name in the presence of wrapping (ocaml/merlin#1788) - Add `-unboxed-types` and `-no-unboxed-types` as ocaml ignored flags (ocaml/merlin#1795, fixes ocaml/merlin#1794) - destruct: Refinement in the presence of optional arguments (ocaml/merlin#1800 ocaml/merlin#1807, fixes ocaml/merlin#1770) - Implement new expand-node command for expanding PPX annotations (ocaml/merlin#1745) - Implement new inlay-hints command for adding hints on a sourcetree (ocaml/merlin#1812) - Implement new search-by-type command for searching values by types (ocaml/merlin#1828) - Canonicalize paths in occurrences. This helps deduplicate the results and show more user-friendly paths. (ocaml/merlin#1840) - Fix dot-merlin-reader ignoring `SOURCE_ROOT` and `STDLIB` directives (ocaml/merlin#1839, ocaml/merlin#1803) + editor modes - vim: fix python-3.12 syntax warnings in merlin.py (ocaml/merlin#1798) - vim: Dead code / doc removal for previously deleted MerlinPhrase command (ocaml/merlin#1804) - emacs: Improve the way that result of polarity search is displayed (ocaml/merlin#1814) - emacs: Add `merlin-search-by-type`, `merlin-search-by-polarity` and change the behaviour of `merlin-search` to switch between `by-type` or `by-polarity` depending on the query (ocaml/merlin#1828)
CHANGES: Fri Sep 27 12:02:42 CEST 2024 + merlin binary - A new `WRAPPING_PREFIX` configuration directive that can be used to tell Merlin what to append to the current unit name in the presence of wrapping (ocaml/merlin#1788) - Add `-unboxed-types` and `-no-unboxed-types` as ocaml ignored flags (ocaml/merlin#1795, fixes ocaml/merlin#1794) - destruct: Refinement in the presence of optional arguments (ocaml/merlin#1800 ocaml/merlin#1807, fixes ocaml/merlin#1770) - Implement new expand-node command for expanding PPX annotations (ocaml/merlin#1745) - Implement new inlay-hints command for adding hints on a sourcetree (ocaml/merlin#1812) - Add `signature-help` command (ocaml/merlin#1720) - Implement new search-by-type command for searching values by types (ocaml/merlin#1828) - Fix dot-merlin-reader ignoring `SOURCE_ROOT` and `STDLIB` directives (ocaml/merlin#1839, ocaml/merlin#1803) + editor modes - vim: fix python-3.12 syntax warnings in merlin.py (ocaml/merlin#1798) - vim: Dead code / doc removal for previously deleted MerlinPhrase command (ocaml/merlin#1804) - emacs: Improve the way that result of polarity search is displayed (ocaml/merlin#1814) - emacs: Add `merlin-search-by-type`, `merlin-search-by-polarity` and change the behaviour of `merlin-search` to switch between `by-type` or `by-polarity` depending on the query (ocaml/merlin#1828)
CHANGES: Fri Sep 27 12:02:42 CEST 2024 + merlin binary - A new `WRAPPING_PREFIX` configuration directive that can be used to tell Merlin what to append to the current unit name in the presence of wrapping (ocaml/merlin#1788) - Add `-unboxed-types` and `-no-unboxed-types` as ocaml ignored flags (ocaml/merlin#1795, fixes ocaml/merlin#1794) - destruct: Refinement in the presence of optional arguments (ocaml/merlin#1800 ocaml/merlin#1807, fixes ocaml/merlin#1770) - Implement new expand-node command for expanding PPX annotations (ocaml/merlin#1745) - Implement new inlay-hints command for adding hints on a sourcetree (ocaml/merlin#1812) - Add `signature-help` command (ocaml/merlin#1720) - Implement new search-by-type command for searching values by types (ocaml/merlin#1828) - Fix dot-merlin-reader ignoring `SOURCE_ROOT` and `STDLIB` directives (ocaml/merlin#1839, ocaml/merlin#1803) + editor modes - vim: fix python-3.12 syntax warnings in merlin.py (ocaml/merlin#1798) - vim: Dead code / doc removal for previously deleted MerlinPhrase command (ocaml/merlin#1804) - emacs: Improve the way that result of polarity search is displayed (ocaml/merlin#1814) - emacs: Add `merlin-search-by-type`, `merlin-search-by-polarity` and change the behaviour of `merlin-search` to switch between `by-type` or `by-polarity` depending on the query (ocaml/merlin#1828)
CHANGES: Fri Sep 27 12:02:42 CEST 2024 + merlin binary - A new `WRAPPING_PREFIX` configuration directive that can be used to tell Merlin what to append to the current unit name in the presence of wrapping (ocaml/merlin#1788) - Add `-unboxed-types` and `-no-unboxed-types` as ocaml ignored flags (ocaml/merlin#1795, fixes ocaml/merlin#1794) - destruct: Refinement in the presence of optional arguments (ocaml/merlin#1800 ocaml/merlin#1807, fixes ocaml/merlin#1770) - Implement new expand-node command for expanding PPX annotations (ocaml/merlin#1745) - Implement new inlay-hints command for adding hints on a sourcetree (ocaml/merlin#1812) - Implement new search-by-type command for searching values by types (ocaml/merlin#1828) - Canonicalize paths in occurrences. This helps deduplicate the results and show more user-friendly paths. (ocaml/merlin#1840) - Fix dot-merlin-reader ignoring `SOURCE_ROOT` and `STDLIB` directives (ocaml/merlin#1839, ocaml/merlin#1803) + editor modes - vim: fix python-3.12 syntax warnings in merlin.py (ocaml/merlin#1798) - vim: Dead code / doc removal for previously deleted MerlinPhrase command (ocaml/merlin#1804) - emacs: Improve the way that result of polarity search is displayed (ocaml/merlin#1814) - emacs: Add `merlin-search-by-type`, `merlin-search-by-polarity` and change the behaviour of `merlin-search` to switch between `by-type` or `by-polarity` depending on the query (ocaml/merlin#1828) [new release] merlin (3 packages) (4.17.1-501) CHANGES: Fri Sep 27 12:02:42 CEST 2024 + merlin binary - A new `WRAPPING_PREFIX` configuration directive that can be used to tell Merlin what to append to the current unit name in the presence of wrapping (ocaml/merlin#1788) - Add `-unboxed-types` and `-no-unboxed-types` as ocaml ignored flags (ocaml/merlin#1795, fixes ocaml/merlin#1794) - destruct: Refinement in the presence of optional arguments (ocaml/merlin#1800 ocaml/merlin#1807, fixes ocaml/merlin#1770) - Implement new expand-node command for expanding PPX annotations (ocaml/merlin#1745) - Implement new inlay-hints command for adding hints on a sourcetree (ocaml/merlin#1812) - Add `signature-help` command (ocaml/merlin#1720) - Implement new search-by-type command for searching values by types (ocaml/merlin#1828) - Fix dot-merlin-reader ignoring `SOURCE_ROOT` and `STDLIB` directives (ocaml/merlin#1839, ocaml/merlin#1803) + editor modes - vim: fix python-3.12 syntax warnings in merlin.py (ocaml/merlin#1798) - vim: Dead code / doc removal for previously deleted MerlinPhrase command (ocaml/merlin#1804) - emacs: Improve the way that result of polarity search is displayed (ocaml/merlin#1814) - emacs: Add `merlin-search-by-type`, `merlin-search-by-polarity` and change the behaviour of `merlin-search` to switch between `by-type` or `by-polarity` depending on the query (ocaml/merlin#1828) [new release] merlin (3 packages) (4.17.1-414) CHANGES: Fri Sep 27 12:02:42 CEST 2024 + merlin binary - A new `WRAPPING_PREFIX` configuration directive that can be used to tell Merlin what to append to the current unit name in the presence of wrapping (ocaml/merlin#1788) - Add `-unboxed-types` and `-no-unboxed-types` as ocaml ignored flags (ocaml/merlin#1795, fixes ocaml/merlin#1794) - destruct: Refinement in the presence of optional arguments (ocaml/merlin#1800 ocaml/merlin#1807, fixes ocaml/merlin#1770) - Implement new expand-node command for expanding PPX annotations (ocaml/merlin#1745) - Implement new inlay-hints command for adding hints on a sourcetree (ocaml/merlin#1812) - Add `signature-help` command (ocaml/merlin#1720) - Implement new search-by-type command for searching values by types (ocaml/merlin#1828) - Fix dot-merlin-reader ignoring `SOURCE_ROOT` and `STDLIB` directives (ocaml/merlin#1839, ocaml/merlin#1803) + editor modes - vim: fix python-3.12 syntax warnings in merlin.py (ocaml/merlin#1798) - vim: Dead code / doc removal for previously deleted MerlinPhrase command (ocaml/merlin#1804) - emacs: Improve the way that result of polarity search is displayed (ocaml/merlin#1814) - emacs: Add `merlin-search-by-type`, `merlin-search-by-polarity` and change the behaviour of `merlin-search` to switch between `by-type` or `by-polarity` depending on the query (ocaml/merlin#1828)
This patch introduces a kind of Sherlodoc in Merlin!
Feature added
Adds to merlin's protocol the
search-by-type
command, which takes a query (ie:'a list list -> ’a list
) and returns a list of possible candidates in the current environment. The structure of the result is as follows:The result also include the location (using
with_location
) of thevalue
(useful to, from a search list, quickly jump to definition or look for usages usingproject-wide-occurences
).Comparison with
polarity-search
For the moment, I find that 100% of experiments give more convincing results with type search (which supports type parameters and functions, unlike polarity search) and in many case, I think that the query is more natural to write. For example:
int_of_string
-string +int
string -> int
int_of_string_opt
-string +option
string -> int option
List.flatten
-list +list
'a list list -> 'a list
There is also a test-suite that make a comparison for the same expectation between
by type
andby polarity
.Here is some direct examples:
Search by polarity
Search by type
Difference with Sherlodoc
At present, Sherlodoc is much more intelligent when it comes to indexing, which is not at all within the scope of this PR, which uses the same environment as that used for polarity search. In addition, Sherlodoc is better at comparing names and this PR distinguishes polymorphic variables (but both approach deal relatively properly with equivalence modulo isomorphism). In practice, this distinction is expressed as follows:
'foo list -> ('foo -> 'bar) -> 'bar list
is treated by sherlodoc where every variables arePoly
and the following PR use indexes (so we can distinguishfst
andsnd
for example).Client support
Currently, only the suppport for emacs has been added and uses the same output as the polarity search result. The emacs client introduces 2 new commands:
merlin-search-by-polarity
(that replacemerlin-search
)merlin-search-by-type
(that perform a searchby type)And
merlin-search
was changed to switch betweensearch-by-type
orsearch-by-polarity
(depending on the query, so if the query starts by-
or+
, it arbitrary decide that this is a polarity query).About the code base
The core of the feature is isolated inside a dedicated sub library :
sherlodoc
and has no dependencies and just deal with "computing distances between queries and values". The PR also introduceunit-test
(using Alcotest) andocamlformat
, currently only available forsrc/sherldoc
andtests/test-units
.Further work
Currently, I think that the PR is an huge improvement, because it leads (IMHO) in every case to a better result set than the polarity search but we can have improvement:
But I think we can improve progressively the feature according to user reviews.
More visual examples
Here we can see that
'foo list -> ('foo -> 'bar) -> 'bar list
is valid formap: ('a -> 'b) -> 'a list -> 'b list
!Here, we can see that
fst
andsnd
are properly ordered!