-
-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add runtime code layout to initcode #3584
feat: add runtime code layout to initcode #3584
Conversation
this commit adds the runtime code layout to the initcode payload (as a suffix), so that the runtime code can be analyzed without source code. this is particularly important for disassemblers, which need demarcations for where the data section starts as distinct from the runtime code segment itself. the layout is: CBOR-encoded list: runtime code length [<length of data section> for data section in runtime data sections] immutable section length {"vyper": (major, minor, patch)} length of CBOR-encoded list + 2, encoded as two big-endian bytes. note the specific format for the CBOR payload was chosen to avoid changing the last 13 bytes of the signature. that is, the last 13 bytes still look like b"\xa1evyper\x83...", this is because, as the last item in a list, its encoding does not change compared to being the only dict in the payload. this commit also changes the meaning of the two footer bytes: they now indicate the length of the entire footer (including the two bytes indicating the footer length). the sole purpose of this is to be more intuitive as the two footer bytes indicate offset-from-the-end where the CBOR-encoded metadata starts, rather than the length of the CBOR payload (without the two length bytes). lastly, this commit renames the internal `insert_vyper_signature=` kwarg to `insert_compiler_metadata=` as the metadata includes more than just the vyper version now.
c647fe1
to
0aa21d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usage example
offset = int.from_bytes(code[-2:], 'big')
signature = cbor2.loads(code[-offset:])
for vyper 0.3.10 the encoded values are:
- length of runtime code (code without immutables, data sections, and signature), ex.
4096
- list of lengths of data sections, ex.
[64, 128]
- total length of immutables, ex.
384
- compiler version, ex.
{'vyper': [0, 3, 10]}
so you could even do this:
names = ['runtime_size', 'data_sizes', 'immutable_size', 'compiler']
dict(zip(names, signature))
# {'runtime_size': 98, 'data_sizes': [6], 'immutable_size': 32, 'compiler': {'vyper': [0, 3, 10]}}
the number of data sections may differ when compiled with --optimize codesize
.
format rationale
a justification of why this format was chosen instead of just adding items to the dict.
vyper 0.3.4 has added a cbor signature from which you can read the compiler version #2860
vyper 0.3.5 has added a suffix with the section length, following solidity #3009
the implementation was a bit flawed, since it hasn't always come at the very end and could be followed by immutables, rendering the length suffix useless.
everyone has resorted to just using a regex. we spent some time understanding cbor so we don't break this compatibility.
the way cbor encodes fixed-size lists suits us well. for example [1, 2, 3]
is encoded as bytes 83 01 02 03
, with 8 in 83 denoting a list and 3 denoting its size. after that all items come in their normal encodings with no terminator, so the regex for old vypers would just work.
we have also changed the cbor size suffix to offset, so you can simply read the suffix and then decode the metadata from code[-offset:]
.
note that you don't need to write code[-offset:-2]
because cbor would know where to terminate because of how the format works.
Codecov Report
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the GitHub App Integration for your organization. Read more. @@ Coverage Diff @@
## master #3584 +/- ##
==========================================
- Coverage 89.05% 89.01% -0.05%
==========================================
Files 85 85
Lines 11378 11390 +12
Branches 2586 2590 +4
==========================================
+ Hits 10133 10139 +6
- Misses 821 825 +4
- Partials 424 426 +2
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
apparently, byteorder is required in 3.10 but not in 3.11
this commit adds the runtime code layout to the initcode payload (as a suffix), so that the runtime code can be analyzed without source code. this is particularly important for disassemblers, which need demarcations for where the data sections (added in #3496) start as distinct from the runtime code segment itself.
note the specific format for the CBOR payload was chosen to avoid changing the last 13 bytes of the signature. that is, the last 13 bytes still look like b"\xa1evyper\x83...", this is because, as the last item in a list, its encoding does not change compared to being the only dict in the payload.
this commit also changes the meaning of the two footer bytes: they now indicate the length of the entire footer (including the two bytes indicating the footer length). the sole purpose of this is to be more intuitive as the two footer bytes indicate offset-from-the-end where the CBOR-encoded metadata starts, rather than the length of the CBOR payload (without the two length bytes).
lastly, this commit renames the internal
insert_vyper_signature=
kwarg toinsert_compiler_metadata=
as the metadata includes more than just the vyper version now.What I did
How I did it
How to verify it
Commit message
Description for the changelog
Cute Animal Picture