Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imenu: Add imenu functionality to tree-sitter #199

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mohkale
Copy link

@mohkale mohkale commented Dec 18, 2021

WIP imenu implementation for tree-sitter.

This works pretty much exactly like the highlighting functionality. We have a separate patterns file for querying imenu entries and a (not enabled by default) minor mode for forcing tree-sitter to be used for imenu.

@mohkale
Copy link
Author

mohkale commented Dec 18, 2021

@ubolonton

What I have submitted now works, but depending on the query it allows duplicate imenu entries with separate capture-scopes. For example:

@dataclasses.dataclass
class URL(object):
    url: str
    protocol: str = ""
    target: str = ""
    mimetype: str = ""
    desktop_file: str = None
    foo, bar = [1, 2]

    def foo(self):
        def bar():
            print('bar')
            print('hello world')

def foo(self):
    print('hello world')

We'll end up with 3 imenu entries for foo using the patterns file here. One for the method foo and two function foo since they overlap. Is there a way to prevent this from the query directly? I could also just ignore captures which have the same point as an earlier capture (should be relatively simple) but I'd rather avoid that if possible.

Also I'd like to be able to build the imenu recursively by enumerating through the buffer tree. For example here's the imenu response built by eglot using python language server. Notice how Class > URL works, but also all the fields of that class are also grouped into that class (Field > URL > url, Field > URL > protocol for example). Is there a way where I can run a query, and then beginning from the root node, walk through the tree, record the value of intermittent nodes (such as class) while deriving further nodes such as fields within that class? That would make this much more powerful and more importantly resillient. It would also avoid the issue with functions also being methods because I can only focus on the first capture at that node and skip later captures (adding a sort of precedence to the queries). Thoughts?

(("Class"
  ("URL"
   [...]
   #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>))
 ("Field"
  ("URL"
   ("url"
    [...]
    #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>)
   ("protocol"
    [...]
    #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>)
   ("target"
    [...]
    #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>)
   ("mimetype"
    [...]
    #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>)
   ("desktop_file"
    [...]
    #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>)
   ("bar"
    [...]
    #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>)))
 ("Method"
  ("URL"
   ("foo"
    [...]
    #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>)))
 ("Function"
  ("foo"
   ("bar"
    [...]
    #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>))
  ("foo"
   [...]
   #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>)))

@mohkale
Copy link
Author

mohkale commented Dec 18, 2021

Okay, I tried implementing a recursive menu function using tsc-mapc-children but it won't work because the tree isn't consistent structured in a way that's easy to recurse. For example the python tree is:

module:
  decorated_definition:
    decorator:
      attribute:
        identifier:
        identifier:
    class_definition:
      identifier:
      argument_list:
        identifier:
      block:
        class_definition:
          identifier:
          argument_list:
            identifier:
          block:
            pass_statement:

The identifier, which we're building the entry from, is nested as a sibling of the body of the definition, not a parent. This means there's no smart way (that I can think of) to reliably associate a class X with a field in that class y, at least without assuming the structure of the tree.

First capture of a node now takes precedence over all later captures.
@mohkale mohkale mentioned this pull request Dec 18, 2021
20 tasks
Now you can capture, for example, a Class Name and a Field value in the
same identifier and tree-sitters index function will combine them into a
form that can reliably handle this.
@mohkale
Copy link
Author

mohkale commented Dec 18, 2021

Okay, I've done something that's borderline horrific 😨, but it does give a much more powerful imenu index.

Screenshot_20211218_201759

Essentially we do 2 things:

  1. Delete duplicate nodes in the query response. If two different patterns match the same node, we delete the later one and prioritise the earlier one.
  2. If a pattern captures more than one node, we nest into that node. That's how, in this case, the field names are all prefixed by the class in which their defined.

Warnings related to point 2: I'm not sure how tree-sitter orders multiple captures in response to a query. Hopefully it just matches the order in which it is declared in which case we can deterministically format each capture. Worst case we might end up with a class within a field "foo Foo" which completely ruins this experiment 😢.

For anyone wanting to try this out here's an updated patterns file for python.

(class_definition
 name: (identifier) @Class
 body: (block
        [(function_definition name: (identifier) @Method)
         (class_definition name: (identifier) @Class)
         (expression_statement
          (assignment
           left: [(identifier) @Field
                  (pattern_list (identifier) @Field)]))
         ]))

(class_definition name: (identifier) @Class)

(expression_statement
 (assignment
  left: [(identifier) @Variable
         (pattern_list (identifier) @Variable)]))

(function_definition
 name: (identifier) @Function)

@ethan-leba
Copy link

This looks cool! I'm not sure what the benefit of including this directly into the tree-sitter library vs providing your own package though. Do you have any reasons?

@mohkale
Copy link
Author

mohkale commented Dec 20, 2021

@ethan-leba

The end goal of tree sitter, I believe, is to be integrated into emacs core and eventually replace the old regexp based syntax highlighting and completion implementations. That's why, since imenu is a core part of that, that I think it should be integrated into tree-sitter proper. Plus this followed from an open issue ticket for adding imenu support so I think it fits. Plus if I move it to a separate package I'd have to reimplement the lookup for imenu query files just like we have for highlighting.

All things considered, I think there's a strong case for having this in tree sitter itself.

@ubolonton
Copy link
Collaborator

Thanks for looking into this! I haven't had time to check it out yet. I'll try to do so later this week.

Dealing with recursion is definitely the hardest part of designing the imenu integration IMO.

@mohkale mohkale mentioned this pull request Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants