refactor[parser]: remove `ASTTokens` #4364

charles-cooper · 2024-11-18T19:25:45Z

What I did

remove asttokens from the parse machinery, since the method is buggy (see below bugs) and slow (#4272). brings down parse time (time spent in ast generation) between 40-70%.

fixes multiple tokenizer-related bugs, including

How I did it

another way to adjust tokens

How to verify it

Commit message

this commit removes `asttokens` from the parse machinery, since the
method is buggy (see below bugs) and slow. this commit brings down
parse time (time spent in ast generation) between 40-70%.

the `mark_tokens()` machinery is replaced with a modified version of
`python.ast`'s `fix_missing_locations()` function, which recurses
through the AST and adds missing line info based on the parent node.

it also changes to a more consistent method for updating source
offsets that are modified by the `pre_parse` step, which fixes several
outstanding bugs with source location reporting.

there were some exceptions to the line info fixup working, the issues
and corresponding workarounds are described as follows:

- some python AST nodes returned by `ast.parse()` are singletons, which
  we work around by deepcopying the AST before operating on it.

- notably, there is an interaction between our AST annotation and
  `coverage.py` in the case of `USub`. in this commit we paper over the
  issue by simply always overriding line info for `USub` nodes. in the
  future, we should refactor `VyperNode` generation by bypassing the
  python AST annotation step entirely, which is a more proper fix to the
  problems encountered in this PR.

the `asttokens` package is not removed entirely since it still has a
limited usage inside of the natspec parser. we could remove it in a
future PR; for now it is out-of-scope.

referenced bugs:
- https://github.com/vyperlang/vyper/issues/2258
- https://github.com/vyperlang/vyper/issues/3059
- https://github.com/vyperlang/vyper/issues/3430
- https://github.com/vyperlang/vyper/issues/4139

Description for the changelog

Cute Animal Picture

vyper/ast/parse.py

cyberthirst · 2025-01-12T09:53:10Z

i find the name modification_offset confusing, it blends with adjustments and imo doesn't capture the essence. can't we name smth like original_types?

cyberthirst · 2025-01-12T09:54:32Z

can we generalize this code? it's pretty much the same thing 3 times

vyper/vyper/ast/pre_parser.py

Lines 289 to 324 in 5c1fb8b

    
           if string in VYPER_CLASS_TYPES and start[1] == 0: 
        
               new_keyword = "class" 
        
               toks = [TokenInfo(NAME, new_keyword, start, end, line)] 
        
               adjustment = len(string) - len(new_keyword) 
        
               # adjustments for following tokens 
        
               lineno, col = start 
        
               _col_adjustments[lineno] += adjustment 
        
               modification_offsets[newstart] = VYPER_CLASS_TYPES[string] 
        
           elif string in CUSTOM_STATEMENT_TYPES: 
        
               new_keyword = "yield" 
        
               adjustment = len(string) - len(new_keyword) 
        
               # adjustments for following tokens 
        
               lineno, col = start 
        
               _col_adjustments[lineno] += adjustment 
        
               toks = [TokenInfo(NAME, new_keyword, start, end, line)] 
        
               modification_offsets[newstart] = CUSTOM_STATEMENT_TYPES[string] 
        
           elif string in CUSTOM_EXPRESSION_TYPES: 
        
               # a bit cursed technique to get untokenize to put 
        
               # the new tokens in the right place so that modification_offsets 
        
               # will work correctly. 
        
               # (recommend comparing the result of parse with the 
        
               # source code side by side to visualize the whitespace) 
        
               new_keyword = "await" 
        
               vyper_type = CUSTOM_EXPRESSION_TYPES[string] 
        
               adjustment = len(string) - len(new_keyword) 
        
               # adjustments for following tokens 
        
               lineno, col = start 
        
               _col_adjustments[lineno] += adjustment 
        
               # fixup for when `extcall/staticcall` follows `log` 
        
               modification_offsets[newstart] = vyper_type

cyberthirst · 2025-01-12T10:05:17Z

vyper/ast/pre_parser.py

@@ -166,6 +166,10 @@ class PreParser:
    settings: Settings
    # A mapping of class names to their original class types.
    modification_offsets: dict[tuple[int, int], str]
+
+    # Magic adjustments


let's make a better comment hehe

charles-cooper added 12 commits November 18, 2024 09:53

wip - add token adjuster

fc5356b

wip

9aece7c

success?

c96feef

remove old code

9e9f588

fix bugs

b51902e

another fix

bd368e8

more futzing around

bc5c5cb

add asttokens library back, it's used in natspec.py

56f851f

fix 0-line files

9f9ced9

remove the weird parent thing

e26bd9c

Merge branch 'master' into refactor/kill-asttokens

425e5f1

fix lint

43f3393

charles-cooper assigned sandbubbles Nov 21, 2024

charles-cooper mentioned this pull request Nov 21, 2024

feat[ux]: improve hint for events kwarg upgrade #4275

Open

charles-cooper added 11 commits November 25, 2024 13:40

Merge branch 'master' into refactor/kill-asttokens

5e84337

fix from bad merge

331076b

fix tests

a973e83

Merge branch 'master' into refactor/kill-asttokens

8c834d8

Merge branch 'master' into refactor/kill-asttokens

8f102c3

add a comment

17fa967

fix node missing fields correctly

9ff98da

fixes and debugging

1218acc

debug

34b8bd9

fix the bug

22cca3a

clean up code

10fc81d

github-advanced-security bot found potential problems Dec 31, 2024

View reviewed changes

vyper/ast/parse.py Fixed Show fixed Hide fixed

charles-cooper added 4 commits December 31, 2024 19:08

fix lint

80a5dc8

fix edge case for 0-line source

db9c7f0

bugfix

43fb5ba

another bugfix

91a2946

charles-cooper added this to the v0.4.1 milestone Jan 2, 2025

charles-cooper added the release - must release blocker label Jan 2, 2025

charles-cooper added 8 commits January 4, 2025 12:27

Merge branch 'master' into refactor/kill-asttokens

042021c

fix some lint

3401a94

fix more lint

5b247f2

kludge

55d3680

Merge branch 'master' into refactor/kill-asttokens

17038fa

Merge branch 'master' into refactor/kill-asttokens

ae82674

adjust SyntaxError column offset

06a91e5

fix lint

5c1fb8b

cyberthirst reviewed Jan 12, 2025

View reviewed changes

charles-cooper added 7 commits January 12, 2025 09:09

update a variable name

a9e41fe

small refactor

9a5a921

Merge branch 'master' into refactor/kill-asttokens

b686396

update a comment

e63c4d5

roll back Load/Store changes

ec898c7

roll back test_ast_dict changes

630621e

update timeit to micros

6ce2853

cyberthirst approved these changes Jan 12, 2025

View reviewed changes

charles-cooper added 2 commits January 12, 2025 11:19

Merge branch 'master' into refactor/kill-asttokens

ad0c158

Merge branch 'master' into refactor/kill-asttokens

af25f2d

charles-cooper enabled auto-merge (squash) January 12, 2025 16:49

charles-cooper merged commit db8dcc7 into vyperlang:master Jan 12, 2025
158 checks passed

charles-cooper mentioned this pull request Jan 27, 2025

add support for vyper 0.4.1 vyperlang/titanoboa#377

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor[parser]: remove `ASTTokens` #4364

refactor[parser]: remove `ASTTokens` #4364

charles-cooper commented Nov 18, 2024 •

edited

Loading

cyberthirst commented Jan 12, 2025

cyberthirst commented Jan 12, 2025

cyberthirst Jan 12, 2025

refactor[parser]: remove ASTTokens #4364

refactor[parser]: remove ASTTokens #4364

Conversation

charles-cooper commented Nov 18, 2024 • edited Loading

What I did

How I did it

How to verify it

Commit message

Description for the changelog

Cute Animal Picture

cyberthirst commented Jan 12, 2025

cyberthirst commented Jan 12, 2025

cyberthirst Jan 12, 2025

Choose a reason for hiding this comment

refactor[parser]: remove `ASTTokens` #4364

refactor[parser]: remove `ASTTokens` #4364

charles-cooper commented Nov 18, 2024 •

edited

Loading