Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: add several fields to SymbolInformation #154

Closed
olafurpg opened this issue May 15, 2023 · 10 comments
Closed

Proposal: add several fields to SymbolInformation #154

olafurpg opened this issue May 15, 2023 · 10 comments

Comments

@olafurpg
Copy link
Member

This issue is an umbrella for several proposed additions to SCIP, based on discussions with @donsbot on Mastodon .

SymbolInformation.display_name

This would be the name of the symbol, which is both helpful for local variables and avoids parsing the name from the symbol. The field could be name instead of display_name, we use display_name in SemanticDB to emphasize that this name is meant to be displayed (and should therefore not have special encoding for non-ASCII characters like emojis.

SymbolInformation.owner

Alternative name parent. The thinking with this field is that it avoids parsing the owner from the symbol, and it allows us to emit an owner for local symbols.

SymbolInformation.kind

An enum that specifies what kind of symbol this is (enum/interface/method/...). Currently, Descriptor.Suffix doesn't encode enough fine-grained information (and it's intentionally named "suffix" to emphasize that it's primarily related to the syntax of the symbol.

SymbolInformation.signature_documentation

A string-formatted rendering of the signature. Currently, indexers emit this information in the documentation field as markdown-formatted code blocks. Having a separate field makes it cleaner to extract only the signature. I propose we reserve the field SymbolInformation.signature for fully typed/structured signatures (not string-formatted signatures).

@varungandhi-src varungandhi-src changed the title Proposall: add several fields to SymbolInformation Proposal: add several fields to SymbolInformation May 17, 2023
@donsbot
Copy link
Contributor

donsbot commented May 23, 2023

SymbolInformation.kind

For symbol kinds, a variant of the lsif/vscode spec. E.g. currently used by Glean : https://github.com/facebookincubator/Glean/blob/main/glean/schema/source/codemarkup.types.angle#L38 ; these are used for search and outline views: to e.g. constraint a search to methods or classes only, or when searching by name, to show the kind of each result.

SymbolInformation.signature_documentation

this is somewhat important, as the scip index conflates "Documentation". e.g. in rust-analyzer the type signature and the documentation markdown are combined. They should ideally be separate "tables" in the output, keyed by scip.Symbol, as depending on the scenario you might only want to show the signature (e.g. a search page), or docs (an API page).

SymbolInformation.display_name

This is another search related application. Imagine taking the scip data and efficiently listing all symbols called 'vec'. We'd need a way to extrac the local name. Another scenario: search for foo::bar::vec , where 'bar' might be a subclass of something. In each case we need to know the precise local name and parent name ("qualified name"). These can be extracted by parsing the scip.Symbol but that's something the protocol could already do,

@olafurpg
Copy link
Member Author

olafurpg commented May 23, 2023

@donsbot Agreed! We have these pieces of information in SemanticDB for similar reasons. If we add these fields to SCIP, would you be interested in contributing the changes to rust-analyzer? I estimate it will be easy to update scip-java to emit the new fields. We can create a tracking issue for the remaining languages.

@donsbot
Copy link
Contributor

donsbot commented May 23, 2023

Yep we would likely want to extend rust-analyzer immediately to support these.

olafurpg added a commit that referenced this issue May 23, 2023
This is the first step towards closing #154.

Previously, the only way to determine the type of a symbol was through `SymbolDescriptor.Suffix`. This solution was not ideal since `Suffix` had very coarse grained types, and it was intentionally named "suffix" to indicate it was primarily used to determine the syntax of the symbol (not semantics).

This PR adds a new enum `SymbolInformation.Kind` to add first-class support to determine the type of a symbol.
@olafurpg
Copy link
Member Author

I opened a PR adding SymbolInformation.kind as a starting point. Will follow up with separate PRs adding the remaining fields.

olafurpg added a commit that referenced this issue May 25, 2023
Towards #154

Previously, there was no reliable way to get the name of a symbol. You
could parse it from the symbol descriptor but that only worked for
global symbols (not locals) and was error-prone when the indexer chooses
to a special encoding in the symbol for some reason (for example,
all-lowercase because symbols are case-insensitive).

Now, clients can use `SymbolInformation.display_name` to reliably get
the name of a symbol to render in API docs or some other application.
@olafurpg
Copy link
Member Author

@donsbot can you elaborate on your motivation for SymbolInformation.owner? I am OK with adding this field on the condition it should only be used for locals. My concern with adding this field for global symbols is that indexers will start to emit malformed global symbols. I think it's a very desirable property that clients can implement a lot of functionality with an array of symbols (string[]) without having to load a global symbol table (record<string, SymbolInformation>). If we limit SymbolInformation.owner to locals then this problem is alleviated since it's cheap to load a local-only symbol table.

I think it should be unavoidable that clients need to implement a symbol parser to walk up the symbol hierarchy. The Java logic for the symbol parser is ~150 lines of code and we could aim to provide similar APIs for all language bindings.

@donsbot
Copy link
Contributor

donsbot commented May 25, 2023

I was imagining most indexers usually know the "containing" symbol.

  • method -> class/interface
  • param -> method
  • class -> module (e.g. python)

which lets us build navigation by containing scope.

There's an equivalent relationship for parent by "inheritance" (e.g. A extends B).

So In this case I was just interested in the relationship between a symbol and its parent (at the scip.Symbol level, not the textual names).

Obviously if we have something like a/b/c() , we can figure out that a/b might be the containing parent. But then I need to search for scip.Symbol that match? or is this existent relationship present somewhere else?

@donsbot
Copy link
Contributor

donsbot commented May 25, 2023

The specific product use case is not just symbol outlines on. page, but API listings. E.g. generating cargo-like docs for a language: E.g. looking up "class C" to see its API:

Class C

- method 1
- method 2
- method 3

Inherits:
- A::method4
- B::method5

olafurpg added a commit that referenced this issue May 25, 2023
* Add `SymbolInformation.display_name`

Towards #154

Previously, there was no reliable way to get the name of a symbol. You
could parse it from the symbol descriptor but that only worked for
global symbols (not locals) and was error-prone when the indexer chooses
to a special encoding in the symbol for some reason (for example,
all-lowercase because symbols are case-insensitive).

Now, clients can use `SymbolInformation.display_name` to reliably get
the name of a symbol to render in API docs or some other application.

* Add example for `display_name`.
olafurpg added a commit that referenced this issue May 25, 2023
Towards #154

Previously, SCIP clients had to parse markdown from the
`SymbolInformation.documentation` field to display only the signature
information of a symbol (for example, method parameters and return
type).

With the new `SymbolInformation.signature_documentation` field, indexers
can now emit more structured information about the signature of a
class/method including optional hyperlinks to the referenced symbols.
@olafurpg
Copy link
Member Author

@donsbot You can parse the owner from the symbol fields to support those use-cases (assuming they're global symbols). The Java symbol parser is 150 lines of code and we have had a decent experience porting this logic to other languages with ChatGPT https://github.com/sourcegraph/scip-java/blob/main/scip-semanticdb/src/main/java/com/sourcegraph/scip_semanticdb/SymbolDescriptor.java#L21

I am open to add SymbolInformation.owner (or SymbolInformation.enclosing_symbol) to support the same functionality for local symbols.

Would that be a satisfiable solution? I'm not 100% against allowing owner/enclosing_symbol for global symbols but I am concerned it increases the risk that indexers stop emitting well-structured symbols as I mentioned in my last comment.

@donsbot
Copy link
Contributor

donsbot commented May 26, 2023

Ok sounds good. Yes I will need it for locals (and can derive it for globals).

olafurpg added a commit that referenced this issue May 30, 2023
* Add `SymbolInformation.signature_documentation`

Towards #154

Previously, SCIP clients had to parse markdown from the
`SymbolInformation.documentation` field to display only the signature
information of a symbol (for example, method parameters and return
type).

With the new `SymbolInformation.signature_documentation` field, indexers
can now emit more structured information about the signature of a
class/method including optional hyperlinks to the referenced symbols.

* Made `signature_documentation` non-repeated
olafurpg added a commit that referenced this issue May 30, 2023
Towards #154

Previously, it was not possible to determine the enclosing symbol of a
local because the syntax for local symbols has no hierarchy (unlike
global symbols). This PR closes the gap by adding a field
`SymbolInformation.enclosing_symbol` to allow documenting the "parent"
or "owner" of a local symbol.
olafurpg added a commit that referenced this issue May 31, 2023
Towards #154

Previously, it was not possible to determine the enclosing symbol of a
local because the syntax for local symbols has no hierarchy (unlike
global symbols). This PR closes the gap by adding a field
`SymbolInformation.enclosing_symbol` to allow documenting the "parent"
or "owner" of a local symbol.
olafurpg added a commit that referenced this issue Jun 4, 2023
* Add `SymbolInformation.enclosing_symbol`

Towards #154

Previously, it was not possible to determine the enclosing symbol of a
local because the syntax for local symbols has no hierarchy (unlike
global symbols). This PR closes the gap by adding a field
`SymbolInformation.enclosing_symbol` to allow documenting the "parent"
or "owner" of a local symbol.

* Add example for enclosing_symbol

* Regenerate bindings, and gitignore Haskell output
@olafurpg
Copy link
Member Author

olafurpg commented Jun 4, 2023

Closing this as fixed after #164 since all the proposed fields have been added to the spec now. However(!), none of the existing SCIP indexers have been updated yet to emit the new information. Feel free to open an issue in this repository or each SCIP indexer repository if you'd like the indexer to add support for these new fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants