Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'IndexError: list index out of range' raised calling vyper.compiler.compile_code with 54 double quotes #2258

Closed
agroce opened this issue Dec 14, 2020 · 3 comments
Labels
bug Bug that shouldn't change language semantics when fixed.

Comments

@agroce
Copy link
Contributor

agroce commented Dec 14, 2020

Version Information

  • vyper Version (output of vyper --version): 0.2.8+commit.d145722
  • OS: ubuntu 20.04 (on docker on osx)
  • Python Version (output of python --version): Python 3.8.5
  • Environment (output of pip freeze):
    asttokens==2.0.4
    Cython==0.29.21
    dbus-python==1.2.16
    pycryptodome==3.9.9
    Pygments==2.3.1
    PyGObject==3.36.0
    python-afl==0.7.3
    PyYAML==5.3.1
    semantic-version==2.8.5
    six==1.15.0
    vyper==0.2.8

What's your issue about?

>>> import vyper.compiler
>>> vyper.compiler.compile_code('\r[[]]\ndef _(e:[],l:[]):\n    """"""""""""""""""""""""""""""""""""""""""""""""""""""\n    f.n()')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/__init__.py", line 151, in compile_code
    return compile_codes(
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/opcodes.py", line 222, in _wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/__init__.py", line 110, in compile_codes
    raise exc
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/__init__.py", line 105, in compile_codes
    out[contract_name][output_format] = OUTPUT_FORMATS[output_format](compiler_data)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/output.py", line 138, in build_bytecode_output
    return f"0x{compiler_data.bytecode.hex()}"
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 126, in bytecode
    self._bytecode = generate_bytecode(self.assembly)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 114, in assembly
    self._assembly = generate_assembly(self.lll_nodes)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 102, in lll_nodes
    self._gen_lll()
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 97, in _gen_lll
    self._lll_nodes, self._lll_runtime = generate_lll_nodes(self.global_ctx)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 90, in global_ctx
    self.vyper_module_folded, self.interface_codes
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 81, in vyper_module_folded
    self._vyper_module_folded = generate_folded_ast(self.vyper_module)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 74, in vyper_module
    self._vyper_module = generate_ast(self.source_code, self.source_id, self.contract_name)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 154, in generate_ast
    return vy_ast.parse_to_ast(source_code, source_id, contract_name)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/ast/utils.py", line 36, in parse_to_ast
    annotate_python_ast(py_ast, source_code, class_types, source_id, contract_name)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/ast/annotation.py", line 281, in annotate_python_ast
    tokens = asttokens.ASTTokens(source_code, tree=parsed_ast)
  File "/usr/local/lib/python3.8/dist-packages/asttokens-2.0.4-py3.8.egg/asttokens/asttokens.py", line 65, in __init__
    self.mark_tokens(self._tree)
  File "/usr/local/lib/python3.8/dist-packages/asttokens-2.0.4-py3.8.egg/asttokens/asttokens.py", line 76, in mark_tokens
    MarkTokens(self).visit_tree(root_node)
  File "/usr/local/lib/python3.8/dist-packages/asttokens-2.0.4-py3.8.egg/asttokens/mark_tokens.py", line 49, in visit_tree
    util.visit_tree(node, self._visit_before_children, self._visit_after_children)
  File "/usr/local/lib/python3.8/dist-packages/asttokens-2.0.4-py3.8.egg/asttokens/util.py", line 199, in visit_tree
    ret = postvisit(current, par_value, value)
  File "/usr/local/lib/python3.8/dist-packages/asttokens-2.0.4-py3.8.egg/asttokens/mark_tokens.py", line 92, in _visit_after_children
    nfirst, nlast = self._methods.get(self, node.__class__)(node, first, last)
  File "/usr/local/lib/python3.8/dist-packages/asttokens-2.0.4-py3.8.egg/asttokens/mark_tokens.py", line 189, in handle_attr
    name = self._code.next_token(dot)
  File "/usr/local/lib/python3.8/dist-packages/asttokens-2.0.4-py3.8.egg/asttokens/asttokens.py", line 141, in next_token
    while is_non_coding_token(self._tokens[i].type):
IndexError: list index out of range

Note that this (like all fuzzing bugs I'll report) is minimized: removing one of the quotes makes it properly raise a SyntaxException:

>>> vyper.compiler.compile_code('\r[[]]\ndef _(e:[],l:[]):\n    """""""""""""""""""""""""""""""""""""""""""""""""""""\n    f.n()')
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/ast/pre_parser.py", line 112, in pre_parse
    token_list = list(tokenize(io.BytesIO(code_bytes).readline))
  File "/usr/lib/python3.8/tokenize.py", line 461, in _tokenize
    raise TokenError("EOF in multi-line string", strstart)
tokenize.TokenError: ('EOF in multi-line string', (3, 52))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/__init__.py", line 151, in compile_code
    return compile_codes(
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/opcodes.py", line 222, in _wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/__init__.py", line 110, in compile_codes
    raise exc
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/__init__.py", line 105, in compile_codes
    out[contract_name][output_format] = OUTPUT_FORMATS[output_format](compiler_data)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/output.py", line 138, in build_bytecode_output
    return f"0x{compiler_data.bytecode.hex()}"
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 126, in bytecode
    self._bytecode = generate_bytecode(self.assembly)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 114, in assembly
    self._assembly = generate_assembly(self.lll_nodes)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 102, in lll_nodes
    self._gen_lll()
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 97, in _gen_lll
    self._lll_nodes, self._lll_runtime = generate_lll_nodes(self.global_ctx)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 90, in global_ctx
    self.vyper_module_folded, self.interface_codes
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 81, in vyper_module_folded
    self._vyper_module_folded = generate_folded_ast(self.vyper_module)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 74, in vyper_module
    self._vyper_module = generate_ast(self.source_code, self.source_id, self.contract_name)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 154, in generate_ast
    return vy_ast.parse_to_ast(source_code, source_id, contract_name)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/ast/utils.py", line 30, in parse_to_ast
    class_types, reformatted_code = pre_parse(source_code)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/ast/pre_parser.py", line 159, in pre_parse
    raise SyntaxException(e.args[0], code, e.args[1][0], e.args[1][1]) from e
vyper.exceptions.SyntaxException: EOF in multi-line string
  line 3:52 
       2 [[]]
  ---> 3 def _(e:[],l:[]):
  -----------------------------------------------------------^
       4     """""""""""""""""""""""""""""""""""""""""""""""""""""
@fubuloubu fubuloubu added the bug Bug that shouldn't change language semantics when fixed. label Dec 15, 2020
@agroce
Copy link
Contributor Author

agroce commented Dec 23, 2020

If helpful, here's what seems to be clearly a variant (again, exactly 54 quotes in the minimal failure-inducing input):

>>> vyper.compiler.compile_code('implements:0\n""""""""""""""""""""""""""""""""""""""""""""""""""""""')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/__init__.py", line 151, in compile_code
    return compile_codes(
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/opcodes.py", line 222, in _wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/__init__.py", line 110, in compile_codes
    raise exc
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/__init__.py", line 105, in compile_codes
    out[contract_name][output_format] = OUTPUT_FORMATS[output_format](compiler_data)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/output.py", line 138, in build_bytecode_output
    return f"0x{compiler_data.bytecode.hex()}"
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 126, in bytecode
    self._bytecode = generate_bytecode(self.assembly)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 114, in assembly
    self._assembly = generate_assembly(self.lll_nodes)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 102, in lll_nodes
    self._gen_lll()
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 97, in _gen_lll
    self._lll_nodes, self._lll_runtime = generate_lll_nodes(self.global_ctx)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 90, in global_ctx
    self.vyper_module_folded, self.interface_codes
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/compiler/phases.py", line 82, in vyper_module_folded
    validate_semantics(self._vyper_module_folded, self.interface_codes)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/context/validation/__init__.py", line 10, in validate_semantics
    add_module_namespace(vyper_ast, interface_codes)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/context/validation/module.py", line 36, in add_module_namespace
    ModuleNodeVisitor(vy_module, interface_codes, namespace)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/context/validation/module.py", line 69, in __init__
    self.visit(node)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/context/validation/base.py", line 19, in visit
    visitor_fn(node)
  File "/usr/local/lib/python3.8/dist-packages/vyper-0.2.8-py3.8.egg/vyper/context/validation/module.py", line 134, in visit_AnnAssign
    interface_name = node.annotation.id
AttributeError: 'Int' object has no attribute 'id'

@agroce
Copy link
Contributor Author

agroce commented Dec 23, 2020

(just trying to compiler 54 quotes, on the other hand, works fine, so it is context, not just that syntactic construction)

charles-cooper added a commit that referenced this issue Jan 12, 2025
this commit removes `asttokens` from the parse machinery, since the
method is buggy (see below bugs) and slow. this commit brings down
parse time (time spent in ast generation) between 40-70%.

the `mark_tokens()` machinery is replaced with a modified version of
`python.ast`'s `fix_missing_locations()` function, which recurses
through the AST and adds missing line info based on the parent node.

it also changes to a more consistent method for updating source
offsets that are modified by the `pre_parse` step, which fixes several
outstanding bugs with source location reporting.

there were some exceptions to the line info fixup working, the issues
and corresponding workarounds are described as follows:

- some python AST nodes returned by `ast.parse()` are singletons, which
  we work around by deepcopying the AST before operating on it.

- notably, there is an interaction between our AST annotation and
  `coverage.py` in the case of `USub`. in this commit we paper over the
  issue by simply always overriding line info for `USub` nodes. in the
  future, we should refactor `VyperNode` generation by bypassing the
  python AST annotation step entirely, which is a more proper fix to the
  problems encountered in this PR.

the `asttokens` package is not removed entirely since it still has a
limited usage inside of the natspec parser. we could remove it in a
future PR; for now it is out-of-scope.

referenced bugs:
- #2258
- #3059
- #3430
- #4139
@charles-cooper
Copy link
Member

fixed in #4364

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug that shouldn't change language semantics when fixed.
Projects
None yet
Development

No branches or pull requests

3 participants