Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor[parser]: remove ASTTokens #4364

Merged

Conversation

charles-cooper
Copy link
Member

@charles-cooper charles-cooper commented Nov 18, 2024

What I did

remove asttokens from the parse machinery, since the method is buggy (see below bugs) and slow (#4272). brings down parse time (time spent in ast generation) between 40-70%.

fixes multiple tokenizer-related bugs, including

How I did it

another way to adjust tokens

How to verify it

Commit message

this commit removes `asttokens` from the parse machinery, since the
method is buggy (see below bugs) and slow. this commit brings down
parse time (time spent in ast generation) between 40-70%.

the `mark_tokens()` machinery is replaced with a modified version of
`python.ast`'s `fix_missing_locations()` function, which recurses
through the AST and adds missing line info based on the parent node.

it also changes to a more consistent method for updating source
offsets that are modified by the `pre_parse` step, which fixes several
outstanding bugs with source location reporting.

there were some exceptions to the line info fixup working, the issues
and corresponding workarounds are described as follows:

- some python AST nodes returned by `ast.parse()` are singletons, which
  we work around by deepcopying the AST before operating on it.

- notably, there is an interaction between our AST annotation and
  `coverage.py` in the case of `USub`. in this commit we paper over the
  issue by simply always overriding line info for `USub` nodes. in the
  future, we should refactor `VyperNode` generation by bypassing the
  python AST annotation step entirely, which is a more proper fix to the
  problems encountered in this PR.

the `asttokens` package is not removed entirely since it still has a
limited usage inside of the natspec parser. we could remove it in a
future PR; for now it is out-of-scope.

referenced bugs:
- https://github.com/vyperlang/vyper/issues/2258
- https://github.com/vyperlang/vyper/issues/3059
- https://github.com/vyperlang/vyper/issues/3430
- https://github.com/vyperlang/vyper/issues/4139

Description for the changelog

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

vyper/ast/parse.py Fixed Show fixed Hide fixed
@charles-cooper charles-cooper added this to the v0.4.1 milestone Jan 2, 2025
@charles-cooper charles-cooper added the release - must release blocker label Jan 2, 2025
@cyberthirst
Copy link
Collaborator

i find the name modification_offset confusing, it blends with adjustments and imo doesn't capture the essence. can't we name smth like original_types?

@cyberthirst
Copy link
Collaborator

can we generalize this code? it's pretty much the same thing 3 times

if string in VYPER_CLASS_TYPES and start[1] == 0:
new_keyword = "class"
toks = [TokenInfo(NAME, new_keyword, start, end, line)]
adjustment = len(string) - len(new_keyword)
# adjustments for following tokens
lineno, col = start
_col_adjustments[lineno] += adjustment
modification_offsets[newstart] = VYPER_CLASS_TYPES[string]
elif string in CUSTOM_STATEMENT_TYPES:
new_keyword = "yield"
adjustment = len(string) - len(new_keyword)
# adjustments for following tokens
lineno, col = start
_col_adjustments[lineno] += adjustment
toks = [TokenInfo(NAME, new_keyword, start, end, line)]
modification_offsets[newstart] = CUSTOM_STATEMENT_TYPES[string]
elif string in CUSTOM_EXPRESSION_TYPES:
# a bit cursed technique to get untokenize to put
# the new tokens in the right place so that modification_offsets
# will work correctly.
# (recommend comparing the result of parse with the
# source code side by side to visualize the whitespace)
new_keyword = "await"
vyper_type = CUSTOM_EXPRESSION_TYPES[string]
adjustment = len(string) - len(new_keyword)
# adjustments for following tokens
lineno, col = start
_col_adjustments[lineno] += adjustment
# fixup for when `extcall/staticcall` follows `log`
modification_offsets[newstart] = vyper_type

@@ -166,6 +166,10 @@ class PreParser:
settings: Settings
# A mapping of class names to their original class types.
modification_offsets: dict[tuple[int, int], str]

# Magic adjustments
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make a better comment hehe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release - must release blocker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants