-
-
Notifications
You must be signed in to change notification settings - Fork 793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support address code attribute #2583
feat: support address code attribute #2583
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2583 +/- ##
==========================================
+ Coverage 86.26% 86.61% +0.34%
==========================================
Files 90 91 +1
Lines 9351 9413 +62
Branches 2370 2354 -16
==========================================
+ Hits 8067 8153 +86
+ Misses 796 775 -21
+ Partials 488 485 -3
Continue to review full report at Codecov.
|
138845b
to
a35c233
Compare
The CI failure could be due to some flakiness of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
Will have to give a more thorough review later.
@hi-ogawa I think the approach looks good, I can review in more detail later. I like your use of Regarding the bounds check
The bounds check is used for safety reasons. Calldata also zero fills when the length exceeds the actual calldata, but we check the length anyways. I think for consistency, we should include the bounds check here. |
@@ -96,6 +97,23 @@ def visit_For(self, node): | |||
self.expr_visitor.visit(node.iter) | |||
|
|||
|
|||
def validate_address_code_attribute(node: vy_ast.Attribute, type_: BaseTypeDefinition) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this functionality is more appropriate in vyper/semantics/validation/local.py
. It would be very clean if this functionality were put in the following function:
vyper/vyper/semantics/validation/local.py
Lines 182 to 191 in 2f1584b
def visit_Attribute(self, node): | |
if node.get("value.id") == "msg" and node.attr == "data": | |
parent = node.get_ancestor() | |
if parent.get("func.id") not in ("slice", "len"): | |
raise SyntaxException( | |
"msg.data is only allowed inside of the slice or len functions", | |
node.node_source_code, | |
node.lineno, | |
node.col_offset, | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the validation from ExpressionAnnotationVisitor
to FunctionNodeVisitor
(5a5798d).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems I have a bug whichever ExpressionAnnotationVisitor
or FunctionNodeVisitor
I choose.
With ExpressionAnnotationVisitor
, I think some attribute nodes won't get traversed and miss the validation.
With FunctionNodeVisitor
, as showed up on CI, get_exact_type_from_node
will fail on the second visit
since it doesn't setup correct namespaces for e.g. for
loop.
vyper/vyper/semantics/validation/local.py
Lines 174 to 180 in 2f1584b
def visit(self, node): | |
super().visit(node) | |
self.annotation_visitor.visit(node) | |
attr_descendants = node.get_descendants(vy_ast.Attribute) | |
for attr_descendant in attr_descendants: | |
self.visit(attr_descendant) |
As @charles-cooper suggested, it surely makes sense to do the validation in FunctionNodeVisitor
, but I believe I need to make some change. Let me think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think any typechecking should be done in the annotation visitor since that merely propagates types to nodes. I'm a bit confused why you're code wouldn't work in FunctionNodeVisitor
. Is there some bug regarding typechecking in loops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked into it but I was just in the process of fixing some typechecking for list literals. cf. #2584 (comment), #2584 (comment) and #2587
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for letting me know about that! I can get back to this tomorrow and I will check that out if it works. Sorry for the delay.
Regarding the reason why FunctionNodeVisitor
doesn't work, I think that's because of the structure around this:
def visit(self, node):
super().visit(node) # <<<=== 1st visit (possibly `Attribute` won't be visited this time)
self.annotation_visitor.visit(node)
attr_descendants = node.get_descendants(vy_ast.Attribute)
for attr_descendant in attr_descendants:
self.visit(attr_descendant) # <<<=== 2nd visit (I think, this is to make sure to visit all the `Attribute` nodes missed on 1st visit above)
The way visiting the Attribute
on 2nd time doesn't setup namespace properly. For example, from CI,
https://github.com/vyperlang/vyper/runs/4707801642?check_suite_focus=true#step:5:6264
for i in ...:
... _baz[i].x ... # <<<=== try to visit this `Attribute` node
here, this visits Attribute
node _baz[i].x
without setting up the namespace for the for-loop variable i
and thus get_exact_type_from_node
on the value node _baz[i]
is failing.
I haven't actually debug it, so I might be wrong, but this is my intuition of the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charles-cooper Sorry for the break and thanks the patience.
I pushed the commit f13e4f7 for the fix. I modified FunctionNodeVisitor
(with FunctionNodeExpressionVisitor
) to visit all the nodes up to the leaf nodes in the single pass.
I found this more natural approach than grabbing vy_ast.Attribute
descendent nodes and validating certain properties.
I might be missing some reasons why it hasn't implemented in this way, so I would appreciate if you could review this change for potential drawbacks.
Thanks a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your approach is natural and makes sense. It probably wasn't implemented that way because we didn't really have context-dependent checks like that. The only thing I would ask is to change the name of the class to indicate that it is private to this module - maybe _LocalExprVisitor
@charles-cooper @fubuloubu Thanks for the feedback! I'll address your comments as soon as I can.
May I ask the detail of this "safety reasons"? Probably I'm missing some intuition and use cases, but I cannot see what could be "unsafe" when users get zero-filled slice. |
Interpreting filler bytes as "true zero" is a frequent source of errors |
Ah, that indeed makes sense to me. Thanks for the clarification! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hi-ogawa I think this is almost ready to merge. Please make the name change for FunctionNodeExpressionVisitor
.
Also I think for clarity in the LLL (since the introduced opcodes are a little ad-hoc to begin with), we should use ~extcode_slice
, ~selfcode_slice
and ~calldata_slice
. (Sorry I didn't suggest it before, I had to think about it for awhile before coming to the conclusion). Would you be willing to make this change for the msg.data
slice code as well?
@charles-cooper Thanks for the suggestions! I agree that By the way, I was wondering but forgot about this. For these uses of adhoc lll opcodes ( |
Yes - they should be included in |
"~extcode", | ||
"~selfcode", | ||
"~calldata", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charles-cooper These are not exactly suggested in #2583 (review), but I chose these since I noticed that msg.data
could be also used as len(msg.data)
, so ~calldata_slice
and ~calldata_len
will be needed, which would required to be distinguished in Expr.parse_Attribute
by looking up its parent node etc...
I thought such process would be unnecessary complication.
What do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charles-cooper These are not exactly suggested in #2583 (review), but I chose these since I noticed that
msg.data
could be also used aslen(msg.data)
, so~calldata_slice
and~calldata_len
will be needed, which would required to be distinguished inExpr.parse_Attribute
by looking up its parent node etc... I thought such process would be unnecessary complication. What do you think about this?
Right I saw that! I think your reasoning is good.
elif self.expr.attr == "code": | ||
addr = Expr.parse_value_expr(self.expr.value, self.context) | ||
if is_base_type(addr.typ, "address"): | ||
# These adhoc nodes will be replaced with a valid node in `Slice.build_LLL` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice comment
arg = Expr(node.args[0], context).lll_node | ||
if arg.value == "~calldata": | ||
return LLLnode.from_list(["calldatasize"], typ="uint256") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while we are here should we add len(<address>.code)
too? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never mind -- @skellet0r points out we already have <address>.codesize
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work. thank you
What I did
Support
code
attribute foraddress
type as described in #2427Accordingly, I updated the documentation (
address
type section indocs/types.rst
).How I did it
Validation
the usage of
<address>.code
is restricted to be inslice(<address>.code, start, length)
wherelength
is a constant.this validation is done during
ExpressionAnnotationVisitor.visit_Attribute
.I noticed that
slice(msg.data, ...)
validation was implemented inFunctionNodeVisitor.visit_Attribute
,but the type information is not available during
FunctionNodeVisitor
whereasExpressionAnnotationVisitor
already makes use ofget_exact_type_from_node
, so I added a validation logic for<address>.code
inExpressionAnnotationVisitor
.Code generation
Step 1.
During
Expr.parse_Attribute
, detect<address>.code
and generate (temporary) LLL node~extcode
.Step 2.
During
Slice.build_LLL
, detect if the first argument ofslice
built-in call is~extcode
LLL node. If so, replace it withcodecopy
orextcodecopy
evm opcode depending on whether<address>
isself
or not.How to verify it
Added new test cases in
tests/parser/syntax/test_address_code.py
, which shows the three cases of<address>.code
whereextcodecopy
is used,codecopy
is used during deployment (i.e. in constructor),codecopy
is used after deployed.Description for the changelog
Support
code
attribute foraddress
type. Addcode
as a reserved keyword.Cute Animal Picture
Concerns/Questions
It seems that the attributes of
address
type (e.g.balance
,codesize
) is included inRESERVED_KEYWORDS
, so I addedcode
into that as well. However, I believe this would be a breaking change for the people who usedcode
as variable, user defined struct members, etc...It might be possible not to include it in
RESERVED_KEYWORDS
, but, at least, the use ofself.code
won't be available to users anymore, e.g. if there is an existing code withcode: public(uint256)
, that would become a compile error.Please let me know if there's any guideline for how such breaking changes could be treated.
The use of temporary LLL node with
~extcode
might be a bit strange. I noticed thatmsg.data
uses an LLL node withvalue=0, location="calldata"
to denote the special case. Compared to that, the use of the LLL node~extcode
felt more explit/concise way, so I went with it, but probably there might be a nicer way to do it. I would appreciate if someone could advice me on that.(EDIT) I noticed that
slice(msg.data, ..., length)
has an assertion to check the bound, but I didn't put that check forslice(<address>.code, ..., length)
. This is because, as far as I can tell (at least in official go-ethereum evm),codecopy
andextcodecopy
would zero fill the data when thelength
exceeds the actual bytecode (cf. https://github.com/ethereum/go-ethereum/blob/356bbe343a30789e77bb38f25983c8f2f2bfbb47/core/vm/instructions.go#L357-L371). Please let me know if I'm missing some significance of such bound check.(EDIT)
I think I should add something about "dynamic gas" for(dynamic gas cost estimation is done in 1be8c2c)EXTCODECOPY
(cf. https://github.com/ethereum/go-ethereum/blob/356bbe343a30789e77bb38f25983c8f2f2bfbb47/core/vm/jump_table.go#L446-L453). I'm still on the way of research and hopefully I can figure out what to do.Thanks for the review!