feat: support address code attribute #2583

hi-ogawa · 2022-01-03T06:23:38Z

What I did

Support code attribute for address type as described in #2427
Accordingly, I updated the documentation (address type section in docs/types.rst).

Closes VIP: Accessing deployed byte code #2427

How I did it

Validation

the usage of <address>.code is restricted to be in slice(<address>.code, start, length) where length is a constant.
this validation is done during ExpressionAnnotationVisitor.visit_Attribute.
I noticed that slice(msg.data, ...) validation was implemented in FunctionNodeVisitor.visit_Attribute,
but the type information is not available during FunctionNodeVisitor whereas ExpressionAnnotationVisitor already makes use of get_exact_type_from_node, so I added a validation logic for <address>.code in ExpressionAnnotationVisitor.
Code generation
- Step 1.
  During Expr.parse_Attribute, detect <address>.code and generate (temporary) LLL node ~extcode.
- Step 2.
  During Slice.build_LLL, detect if the first argument of slice built-in call is ~extcode LLL node. If so, replace it with codecopy or extcodecopy evm opcode depending on whether <address> is self or not.

How to verify it

Added new test cases in tests/parser/syntax/test_address_code.py, which shows the three cases of <address>.code where

extcodecopy is used,
codecopy is used during deployment (i.e. in constructor),
codecopy is used after deployed.

Description for the changelog

Support code attribute for address type. Add code as a reserved keyword.

Cute Animal Picture

Concerns/Questions

It seems that the attributes of address type (e.g. balance, codesize) is included in RESERVED_KEYWORDS, so I added code into that as well. However, I believe this would be a breaking change for the people who used code as variable, user defined struct members, etc...
It might be possible not to include it in RESERVED_KEYWORDS, but, at least, the use of self.code won't be available to users anymore, e.g. if there is an existing code with code: public(uint256), that would become a compile error.
Please let me know if there's any guideline for how such breaking changes could be treated.
The use of temporary LLL node with ~extcode might be a bit strange. I noticed that msg.data uses an LLL node with value=0, location="calldata" to denote the special case. Compared to that, the use of the LLL node ~extcode felt more explit/concise way, so I went with it, but probably there might be a nicer way to do it. I would appreciate if someone could advice me on that.
(EDIT) I noticed that slice(msg.data, ..., length) has an assertion to check the bound, but I didn't put that check for slice(<address>.code, ..., length). This is because, as far as I can tell (at least in official go-ethereum evm), codecopy and extcodecopy would zero fill the data when the length exceeds the actual bytecode (cf. https://github.com/ethereum/go-ethereum/blob/356bbe343a30789e77bb38f25983c8f2f2bfbb47/core/vm/instructions.go#L357-L371). Please let me know if I'm missing some significance of such bound check.
(EDIT) I think I should add something about "dynamic gas" for EXTCODECOPY (cf. https://github.com/ethereum/go-ethereum/blob/356bbe343a30789e77bb38f25983c8f2f2bfbb47/core/vm/jump_table.go#L446-L453). I'm still on the way of research and hopefully I can figure out what to do. (dynamic gas cost estimation is done in 1be8c2c)

Thanks for the review!

codecov-commenter · 2022-01-03T06:35:28Z

Codecov Report

Merging #2583 (13f93a3) into master (23ed82c) will increase coverage by 0.34%.
The diff coverage is 97.87%.

@@            Coverage Diff             @@
##           master    #2583      +/-   ##
==========================================
+ Coverage   86.26%   86.61%   +0.34%     
==========================================
  Files          90       91       +1     
  Lines        9351     9413      +62     
  Branches     2370     2354      -16     
==========================================
+ Hits         8067     8153      +86     
+ Misses        796      775      -21     
+ Partials      488      485       -3

Impacted Files	Coverage Δ
vyper/semantics/namespace.py	`96.87% <ø> (ø)`
vyper/semantics/validation/annotation.py	`93.15% <ø> (-0.10%)`	⬇️
vyper/utils.py	`87.23% <ø> (+3.79%)`	⬆️
vyper/codegen/expr.py	`80.62% <85.71%> (+0.03%)`	⬆️
vyper/semantics/validation/local.py	`92.23% <98.52%> (+1.39%)`	⬆️
vyper/builtin_functions/functions.py	`88.26% <100.00%> (+0.08%)`	⬆️
vyper/codegen/lll_node.py	`94.02% <100.00%> (+0.04%)`	⬆️
vyper/semantics/types/value/address.py	`93.10% <100.00%> (+1.43%)`	⬆️
vyper/lll/compile_lll.py	`94.10% <0.00%> (-0.43%)`	⬇️
vyper/semantics/types/user/interface.py	`81.57% <0.00%> (-0.41%)`	⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23ed82c...13f93a3. Read the comment docs.

hi-ogawa · 2022-01-03T08:34:13Z

The CI failure could be due to some flakiness of deadline setting (https://github.com/vyperlang/vyper/runs/4688866148?check_suite_focus=true#step:5:85). I cannot reproduce the test failure locally.

fubuloubu

Looking good!

Will have to give a more thorough review later.

vyper/semantics/namespace.py

charles-cooper · 2022-01-03T16:19:38Z

@hi-ogawa I think the approach looks good, I can review in more detail later. I like your use of ~extcode to denote the special case.

Regarding the bounds check

(EDIT) I noticed that slice(msg.data, ..., length) has an assertion to check the bound, but I didn't put that check for slice(
.code, ..., length). This is because, as far as I can tell (at least in official go-ethereum evm), codecopy and extcodecopy would zero fill the data when the length exceeds the actual bytecode (cf. https://github.com/ethereum/go-ethereum/blob/356bbe343a30789e77bb38f25983c8f2f2bfbb47/core/vm/instructions.go#L357-L371). Please let me know if I'm missing some significance of such bound check.

The bounds check is used for safety reasons. Calldata also zero fills when the length exceeds the actual calldata, but we check the length anyways. I think for consistency, we should include the bounds check here.

vyper/builtin_functions/functions.py

charles-cooper · 2022-01-03T21:30:31Z

vyper/semantics/validation/annotation.py

@@ -96,6 +97,23 @@ def visit_For(self, node):
            self.expr_visitor.visit(node.iter)


+def validate_address_code_attribute(node: vy_ast.Attribute, type_: BaseTypeDefinition) -> None:


I think this functionality is more appropriate in vyper/semantics/validation/local.py. It would be very clean if this functionality were put in the following function:

vyper/vyper/semantics/validation/local.py

Lines 182 to 191 in 2f1584b

def visit_Attribute(self, node):

if node.get("value.id") == "msg" and node.attr == "data":

parent = node.get_ancestor()

if parent.get("func.id") not in ("slice", "len"):

raise SyntaxException(

"msg.data is only allowed inside of the slice or len functions",

node.node_source_code,

node.lineno,

node.col_offset,

)

.

I moved the validation from ExpressionAnnotationVisitor to FunctionNodeVisitor (5a5798d).

It seems I have a bug whichever ExpressionAnnotationVisitor or FunctionNodeVisitor I choose.
With ExpressionAnnotationVisitor, I think some attribute nodes won't get traversed and miss the validation.
With FunctionNodeVisitor, as showed up on CI, get_exact_type_from_node will fail on the second visit since it doesn't setup correct namespaces for e.g. for loop.

vyper/vyper/semantics/validation/local.py

Lines 174 to 180 in 2f1584b

def visit(self, node):

super().visit(node)

self.annotation_visitor.visit(node)

attr_descendants = node.get_descendants(vy_ast.Attribute)

for attr_descendant in attr_descendants:

self.visit(attr_descendant)

As @charles-cooper suggested, it surely makes sense to do the validation in FunctionNodeVisitor, but I believe I need to make some change. Let me think about it.

I don't think any typechecking should be done in the annotation visitor since that merely propagates types to nodes. I'm a bit confused why you're code wouldn't work in FunctionNodeVisitor. Is there some bug regarding typechecking in loops?

I haven't looked into it but I was just in the process of fixing some typechecking for list literals. cf. #2584 (comment), #2584 (comment) and #2587

@hi-ogawa could you merge in master now that #2556 is merged and see if that fixes the issue

Thanks for letting me know about that! I can get back to this tomorrow and I will check that out if it works. Sorry for the delay.
Regarding the reason why FunctionNodeVisitor doesn't work, I think that's because of the structure around this:

def visit(self, node): super().visit(node) # <<<=== 1st visit (possibly `Attribute` won't be visited this time) self.annotation_visitor.visit(node) attr_descendants = node.get_descendants(vy_ast.Attribute) for attr_descendant in attr_descendants: self.visit(attr_descendant) # <<<=== 2nd visit (I think, this is to make sure to visit all the `Attribute` nodes missed on 1st visit above)

The way visiting the Attribute on 2nd time doesn't setup namespace properly. For example, from CI,

https://github.com/vyperlang/vyper/runs/4707801642?check_suite_focus=true#step:5:6264

for i in ...: ... _baz[i].x ... # <<<=== try to visit this `Attribute` node

here, this visits Attribute node _baz[i].x without setting up the namespace for the for-loop variable i and thus get_exact_type_from_node on the value node _baz[i] is failing.

I haven't actually debug it, so I might be wrong, but this is my intuition of the issue.

@charles-cooper Sorry for the break and thanks the patience.
I pushed the commit f13e4f7 for the fix. I modified FunctionNodeVisitor (with FunctionNodeExpressionVisitor) to visit all the nodes up to the leaf nodes in the single pass.
I found this more natural approach than grabbing vy_ast.Attribute descendent nodes and validating certain properties.
I might be missing some reasons why it hasn't implemented in this way, so I would appreciate if you could review this change for potential drawbacks.
Thanks a lot!

I think your approach is natural and makes sense. It probably wasn't implemented that way because we didn't really have context-dependent checks like that. The only thing I would ask is to change the name of the class to indicate that it is private to this module - maybe _LocalExprVisitor

hi-ogawa · 2022-01-04T03:52:58Z

@charles-cooper @fubuloubu Thanks for the feedback! I'll address your comments as soon as I can.

The bounds check is used for safety reasons.

May I ask the detail of this "safety reasons"? Probably I'm missing some intuition and use cases, but I cannot see what could be "unsafe" when users get zero-filled slice.

fubuloubu · 2022-01-04T04:57:01Z

May I ask the detail of this "safety reasons"? Probably I'm missing some intuition and use cases, but I cannot see what could be "unsafe" when users get zero-filled slice.

Interpreting filler bytes as "true zero" is a frequent source of errors

hi-ogawa · 2022-01-04T05:02:24Z

Interpreting filler bytes as "true zero" is a frequent source of errors

Ah, that indeed makes sense to me. Thanks for the clarification!

…Visitor`

vyper/semantics/namespace.py

charles-cooper

@hi-ogawa I think this is almost ready to merge. Please make the name change for FunctionNodeExpressionVisitor.

Also I think for clarity in the LLL (since the introduced opcodes are a little ad-hoc to begin with), we should use ~extcode_slice, ~selfcode_slice and ~calldata_slice. (Sorry I didn't suggest it before, I had to think about it for awhile before coming to the conclusion). Would you be willing to make this change for the msg.data slice code as well?

hi-ogawa · 2022-01-09T01:58:35Z

@charles-cooper Thanks for the suggestions! I agree that ~extcode is a bit hacky and it should be more explicit as you suggests. I will work on it and let you know when it's ready.

By the way, I was wondering but forgot about this. For these uses of adhoc lll opcodes (~extcode_slice, etc...), should they be included in VALID_LLL_MACROS? It seems to affect the behavior of LLLnode.cache_when_complex, but I couldn't tell which is preferred (include or not include) for "adhoc" lll opcodes.
Do you think it should be included there?

charles-cooper · 2022-01-09T02:07:04Z

By the way, I was wondering but forgot about this. For these uses of adhoc lll opcodes (~extcode_slice, etc...), should they be included in VALID_LLL_MACROS? It seems to affect the behavior of LLLnode.cache_when_complex, but I couldn't tell which is preferred (include or not include) for "adhoc" lll opcodes. Do you think it should be included there?

Yes - they should be included in VALID_LLL_MACROS

hi-ogawa · 2022-01-09T03:14:04Z

vyper/utils.py

+    "~extcode",
+    "~selfcode",
+    "~calldata",


@charles-cooper These are not exactly suggested in #2583 (review), but I chose these since I noticed that msg.data could be also used as len(msg.data), so ~calldata_slice and ~calldata_len will be needed, which would required to be distinguished in Expr.parse_Attribute by looking up its parent node etc...
I thought such process would be unnecessary complication.
What do you think about this?

@charles-cooper These are not exactly suggested in #2583 (review), but I chose these since I noticed that msg.data could be also used as len(msg.data), so ~calldata_slice and ~calldata_len will be needed, which would required to be distinguished in Expr.parse_Attribute by looking up its parent node etc... I thought such process would be unnecessary complication. What do you think about this?

Right I saw that! I think your reasoning is good.

charles-cooper · 2022-01-09T03:58:14Z

vyper/codegen/expr.py

+        elif self.expr.attr == "code":
+            addr = Expr.parse_value_expr(self.expr.value, self.context)
+            if is_base_type(addr.typ, "address"):
+                # These adhoc nodes will be replaced with a valid node in `Slice.build_LLL`


nice comment

charles-cooper · 2022-01-09T03:59:52Z

vyper/builtin_functions/functions.py

        arg = Expr(node.args[0], context).lll_node
+        if arg.value == "~calldata":
+            return LLLnode.from_list(["calldatasize"], typ="uint256")


while we are here should we add len(<address>.code) too? 🤔

never mind -- @skellet0r points out we already have <address>.codesize!

charles-cooper

great work. thank you

hi-ogawa added 2 commits January 3, 2022 15:44

feat: support address code attribute

3a25660

test: fix test_address_code_self when --no-optimize

a35c233

hi-ogawa force-pushed the 2427-feat-address-code-attribute branch from 138845b to a35c233 Compare January 3, 2022 06:49

fix: estimate dynamic gas cost of EXTCODECOPY

1be8c2c

fubuloubu reviewed Jan 3, 2022

View reviewed changes

vyper/semantics/namespace.py Show resolved Hide resolved

charles-cooper reviewed Jan 3, 2022

View reviewed changes

fix: assert slice bound for address.code

76fce00

refactor: move slice(address.code, ...) validation to `FunctionNode…

5a5798d

…Visitor`

hi-ogawa requested review from fubuloubu and charles-cooper January 4, 2022 22:04

fubuloubu approved these changes Jan 4, 2022

View reviewed changes

vyper/semantics/namespace.py Show resolved Hide resolved

hi-ogawa added 2 commits January 8, 2022 10:28

Merge branch 'master' into 2427-feat-address-code-attribute

9508c41

fix: add FunctionNodeExpressionVisitor to validate expression nodes

f13e4f7

charles-cooper requested changes Jan 8, 2022

View reviewed changes

refactor: rename to _LocalExpressionVisitor

593534a

hi-ogawa added 3 commits January 9, 2022 11:36

refactor: add adhoc nodes extcode/selfcode/calldata

9897f27

refactor: add helper function _build_adhoc_slice_node

2a76efc

refactor: refactor _build_adhoc_slice_node

13f93a3

hi-ogawa commented Jan 9, 2022

View reviewed changes

charles-cooper reviewed Jan 9, 2022

View reviewed changes

charles-cooper approved these changes Jan 9, 2022

View reviewed changes

charles-cooper merged commit 3a9a342 into vyperlang:master Jan 9, 2022

hi-ogawa deleted the 2427-feat-address-code-attribute branch January 9, 2022 04:16

charles-cooper mentioned this pull request Feb 23, 2022

VIP: Add .code element to address #2681

Closed

charles-cooper mentioned this pull request Mar 23, 2022

change <address>.codesize to len(address.code) #2720

Open

JorgeAtPaladin mentioned this pull request Apr 19, 2023

VIP: CODECOPY support (self.code) #3362

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support address code attribute #2583

feat: support address code attribute #2583

hi-ogawa commented Jan 3, 2022 •

edited

Loading

codecov-commenter commented Jan 3, 2022 •

edited

Loading

hi-ogawa commented Jan 3, 2022

fubuloubu left a comment

charles-cooper commented Jan 3, 2022

charles-cooper Jan 3, 2022

hi-ogawa Jan 4, 2022 •

edited

Loading

hi-ogawa Jan 5, 2022 •

edited

Loading

charles-cooper Jan 6, 2022

charles-cooper Jan 6, 2022

charles-cooper Jan 6, 2022

hi-ogawa Jan 7, 2022 •

edited

Loading

hi-ogawa Jan 8, 2022 •

edited

Loading

charles-cooper Jan 8, 2022

hi-ogawa commented Jan 4, 2022

fubuloubu commented Jan 4, 2022

hi-ogawa commented Jan 4, 2022

charles-cooper left a comment •

edited

Loading

hi-ogawa commented Jan 9, 2022

charles-cooper commented Jan 9, 2022 •

edited

Loading

hi-ogawa Jan 9, 2022

charles-cooper Jan 9, 2022

charles-cooper Jan 9, 2022

charles-cooper Jan 9, 2022

charles-cooper Jan 9, 2022

charles-cooper left a comment

		@@ -96,6 +97,23 @@ def visit_For(self, node):
		self.expr_visitor.visit(node.iter)


		def validate_address_code_attribute(node: vy_ast.Attribute, type_: BaseTypeDefinition) -> None:

	def visit_Attribute(self, node):
	if node.get("value.id") == "msg" and node.attr == "data":
	parent = node.get_ancestor()
	if parent.get("func.id") not in ("slice", "len"):
	raise SyntaxException(
	"msg.data is only allowed inside of the slice or len functions",
	node.node_source_code,
	node.lineno,
	node.col_offset,
	)

	def visit(self, node):
	super().visit(node)
	self.annotation_visitor.visit(node)

	attr_descendants = node.get_descendants(vy_ast.Attribute)
	for attr_descendant in attr_descendants:
	self.visit(attr_descendant)

feat: support address code attribute #2583

feat: support address code attribute #2583

Conversation

hi-ogawa commented Jan 3, 2022 • edited Loading

What I did

How I did it

How to verify it

Description for the changelog

Cute Animal Picture

Concerns/Questions

codecov-commenter commented Jan 3, 2022 • edited Loading

Codecov Report

hi-ogawa commented Jan 3, 2022

fubuloubu left a comment

Choose a reason for hiding this comment

charles-cooper commented Jan 3, 2022

Choose a reason for hiding this comment

hi-ogawa Jan 4, 2022 • edited Loading

Choose a reason for hiding this comment

hi-ogawa Jan 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hi-ogawa Jan 7, 2022 • edited Loading

Choose a reason for hiding this comment

hi-ogawa Jan 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hi-ogawa commented Jan 4, 2022

fubuloubu commented Jan 4, 2022

hi-ogawa commented Jan 4, 2022

charles-cooper left a comment • edited Loading

Choose a reason for hiding this comment

hi-ogawa commented Jan 9, 2022

charles-cooper commented Jan 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charles-cooper left a comment

Choose a reason for hiding this comment

hi-ogawa commented Jan 3, 2022 •

edited

Loading

codecov-commenter commented Jan 3, 2022 •

edited

Loading

hi-ogawa Jan 4, 2022 •

edited

Loading

hi-ogawa Jan 5, 2022 •

edited

Loading

hi-ogawa Jan 7, 2022 •

edited

Loading

hi-ogawa Jan 8, 2022 •

edited

Loading

charles-cooper left a comment •

edited

Loading

charles-cooper commented Jan 9, 2022 •

edited

Loading