-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hclsyntax: Introduce token-based parse methods #383
Conversation
45fbdd6
to
162e40d
Compare
After testing it a bit more I think it would be sensible to add some checks to
|
This change introduces new methods to allow two-phased approach where tokenization is done prior to parsing.
162e40d
to
fa7c453
Compare
I added some checks as mentioned above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in looking at this. This looks good to me! I left a few minor things inline, but overall this looks great.
One thing I was wondering about is that in the terraform-lsp
PR you linked to it only seems to be using ParseBodyFromTokens
so far, and the other ones surprised me a little bit because we don't really have any API surface here for extracting an isolated set of tokens for a slice of a file's token stream. I don't mind including them if you have a use-case for them, but I'd like to understand more about how they will be used since I think this is the first time we've introduced the idea of parsing "fragments" of input as a public API, and the ability to do that may constrain what refactorings are possible in the internals of the parser in future.
rng := bi.Range() | ||
diags = append(diags, &hcl.Diagnostic{ | ||
Severity: hcl.DiagError, | ||
Summary: fmt.Sprintf("Unexpected definition (%T)", bi), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we've written errors like this with an end-user target audience in mind, so that a caller can use a call to this function to represent the assertion "the user should have written an attribute" and automatically get a good error message if the user didn't provide one.
With that said, I'm not totally sure about that framing for these new functions. It could well be that we consider it a programming error on the part of the caller to pass in tokens representing a block here, in which case I suppose this could be okay although in cases like that HCL has typically used panic
rather than error diagnostics so far. 🤔
As a compromise, what do you think about taking the text of the message HCL would normally return if the schema calls for attribute syntax but the user wrote a block, and reducing it to fit what we can determine here without a schema? For example:
Error: Unsupported block type
Blocks of type "example" are not expected here.
(To do this would, I realize, require type-asserting the bi
to (*Block)
first, so I guess there would still need to be a generic fallback for the short-never-happen case of it not being a block, but that would could presumably be a panic because it would only ever happen if there were a bug inside Parser.ParseBodyItem
.)
(I have similar feedback for the opposite case of ParseBlockFromTokens
above, but I won't write it all out again. 😄 )
|
||
peeker := newPeeker(tokens, false) | ||
|
||
// Sanity checks to avoid surprises |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say "Initial checks" instead here, for inclusiveness. ❤️
Regarding TL;DR language server currently uses these methods, mostly as a form of optimization. I will try to share some more context below. The main reason behind tokenization done in a separate step is that language server needs to know where exactly in a parsed token This can be implemented a few ways
With that said I understand your concerns and I admit this may be unnecessary pre-optimization which doesn't come cheap from API perspective. Perhaps a single array is just fine. I'm currently working on a proposal for a new parser for the language server, which also involves looking into this problem more deeply. I have not figured out yet how to provide accurate completion data within |
I'm going to close this as I found a different (more elegant?) way of resolving the same problem. The new API which was just published in the form of |
This change introduces new methods to allow two-phased approach where tokenization is done prior to parsing.
This is specifically useful in the context of a language server, so it can pass around a single interpretation of HCL blocks (tokens) instead of passing around both block and its tokens.
Related: hashicorp/terraform-ls#125
New methods