-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eof: Support DATALOADN
(EIP-7480)
#15456
Conversation
Thank you for your contribution to the Solidity compiler! A team member will follow up shortly. If you haven't read our contributing guidelines and our review checklist before, please do it now, this makes the reviewing process and accepting your contribution smoother. If you have any questions or need our help, feel free to post them in the PR or talk to us directly on the #solidity-dev channel on Matrix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some problems here, the most important being that the PR makes the EOF instructions available in Yul even when EOF is not enabled. That has to be dealt with first, perhaps in a separate PR.
For the rest, see the comments below. Generally a mix of actual problems and small, easy to apply tweaks.
libevmasm/Assembly.cpp
Outdated
@@ -717,6 +717,11 @@ AssemblyItem Assembly::newImmutableAssignment(std::string const& _identifier) | |||
return AssemblyItem{AssignImmutable, h}; | |||
} | |||
|
|||
AssemblyItem Assembly::newDataLoadN(size_t offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AssemblyItem Assembly::newDataLoadN(size_t offset) | |
AssemblyItem Assembly::newDataLoadN(size_t _offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in EthAssemblyAdapter::appendDataLoadN()
and AbstractAssembly::appendDataLoadN()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still not fixed in Assembly.h
.
libevmasm/Instruction.h
Outdated
DATALOAD = 0xd0, ///< load data from EOF data section | ||
DATALOADN = 0xd1, ///< load data from EOF data section |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DATALOAD = 0xd0, ///< load data from EOF data section | |
DATALOADN = 0xd1, ///< load data from EOF data section | |
DATALOAD = 0xd0, ///< load data from EOF data section (offset on the stack) | |
DATALOADN = 0xd1, ///< load data from EOF data section (offset as a 16-bit immediate value) |
libevmasm/Assembly.cpp
Outdated
auto appendedDataAndAuxDataSize = ret.bytecode.size() - dataStart; | ||
|
||
// If some data was already added to data section we need to update data section refs accordigly | ||
if (appendedDataAndAuxDataSize > 0) | ||
for (auto [pos, val] : dataSectionRef) | ||
setBigEndian(ret.bytecode, pos, 2, val + appendedDataAndAuxDataSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually make sense? If I understand it correctly, with this our DATALOADN
in assembly and dataloadn()
in Yul will not map 1:1 to the underlying EVM instruction, but instead will have an implicit offset that skips any data already embedded by the compiler in the binary (i.e. metadata and strings stored as data)? So they effectively only allow the user to access dynamic_aux_data
?
But then the spec says that reading dynamic_aux_data
is illegal:
DATALOADN
'simmediate + 32
must be withinpre_deploy_data_size
(see Data Section Lifecycle)
- the part of the data section which exceeds these bounds (the
dynamic_aux_data
portion) needs to be accessed usingDATALOAD
orDATACOPY
This seems wrong unless this code is going to change in a subsequent PR to append something more to static_aux_data
. Is that the idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think that implicitly modifying the offset like that will be very confusing to users looking at asm output. Maybe we should give it a different name so that it's clearer that it's not pure DATALOADN
but our custom builtin for accessing only a subset of data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, this is missing a validation that when you add appendedDataAndAuxDataSize
, the value still does not exceed 0xffff
. There is an assert in setBigEndianUint16()
, but this looks like something that can actually happen in normal usage and needs a runtime check. The user has no direct control over the size of data appended by the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks that during all the splitting and rebasing one important code section was missed. The fragment calculates the maximum offset accessed by the DataLoadN
and adjust the predeploy_data_size
(data section size value in the header) accordingly. We need definitely better tests also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree with different name. Any idea for the name? :) Moreover it's also related to my comment here #15456 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need definitely better tests also.
True, but for now I'd not worry about it too much. We need something and we definitely need coverage for things where there's a risk of non-EOF being affected, but Daniel's idea with initial EOF implementation is to basically speed through it to the point where we can run semantic tests. Once we get them to pass, we'll have some assurance that things can't be too wrong and we'll be able to focus on testing and ironing out specific parts.
So I'll definitely be pointing out spots where I think something is wrong if I notice, but also it is not that important to get them all right at this point. We're fine with it being unstable and incomplete to some extent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea for the name?
Maybe we should take a step back and talk about the design of this instruction first. Actually, what kinds of data we can have now? This is my current understanding:
- Explicit Yul data - comes from
data
sections other than".metadata"
in a Yul object. - Implicit Yul data - added in transition from Yul to assembly. E.g. long strings that in Yul are just literals, but become data when assembled.
- Metadata - comes from the
".metadata"
data section in a Yul object. - Static auxdata - comes from
RETURNCONTRACT
, not known at compilation time. - Dynamic auxdata - comes from
RETURNCONTRACT
, not known at compilation time and inaccessible viaDATALOADN
.
AFAIK metadata ends up in m_auxiliaryData
and implicit+explicit Yul data are both in m_data
. Static auxdata at compilation time is not in the binary at all.
It seems that your idea is to allow the user to access only (4). In that case auxdataloadn()
would be a descriptive name.
We may still want to add dataloadn()
in some form in addition to it, to make it possible to access named Yul data sections that way. Probably not dataloadn(offset)
since it needs a literal (so expressions like dataload(add(dataoffset('mydatasection'), 42))
would not work), but maybe dataloadn(data_name, offset)
? The extra argument should make it clear that the offset is not the same as in the EVM opcode.
Also, having things like auxdataoffset()
and staticauxdatasize()
might be useful in combination with datacopy()
, but we could leave those for later. I was originally going to suggest that you could do things like dataloadn(add(auxdataoffset(), 42))
to uniformly access auxdata and named data sections, but then realized that we would not be able to use expressions with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, this is missing a validation that when you add
appendedDataAndAuxDataSize
, the value still does not exceed0xffff
. There is an assert insetBigEndianUint16()
, but this looks like something that can actually happen in normal usage and needs a runtime check. The user has no direct control over the size of data appended by the compiler.
setBigEndian(ret.bytecode, refPosition, 2, ...)
checks that the value does not exceed 0xffff
. The template parameter ValueT
is size_t
in this case. It should be fine imo. https://github.com/ipsilon/solidity/blob/8f35fdb9a0525e2c04ed7cacc50542093ad41cb1/libevmasm/Assembly.cpp#L901
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setBigEndian()
assumes that it does not exceed 0xffff
. It's not a validation, it just says that it's up to the caller to ensure that this is true. If it's not true, we panic and get a compiler crash, spewing diagnostic info.
If in assembleEOF()
we know that it can actually be exceeded for legit reasons, then we need a proper validation (solRequire()
) to report a codegen error to the user and ensure that setBigEndian()
is not called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I get it now. Fixed.
fc298f5
to
1229a8c
Compare
25276e2
to
d9a3173
Compare
2b53622
to
f5032c9
Compare
f5032c9
to
28e1163
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's more feedback after another pass through the PR. Overall, of the things that are not yet done, there are just two that I'd consider particularly important:
- Syntax tests ensuring that
auxdataloadn()
does not affect Yul when EOF is not enabled. - Validations for
auxdataloadn()
's argument inAsmAnalysis
. Also, that argument should be a number, not a string.
All the remaining stuff are simple things: getting the asserts and validations right, naming that better aligns with the spec to avoid making things confusing, docstring corrections, etc.
97689bc
to
983f773
Compare
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this looks fine in terms of correctness, but some of the older comments are still unresolved. Would be good to address those before we merge. It needs a rebase anyway since base PR was merged.
test/libyul/yulSyntaxTests/eof/auxdataloadn_in_eof_invalid_value.yul
Outdated
Show resolved
Hide resolved
All important issues were addressed. Still needs a rebase and a few tweaks before merge.
983f773
to
0baa9ac
Compare
0baa9ac
to
1c90f4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving already, but looks like the resolution of one of the older comments was somehow reverted. Would be good to tweak it before we merge.
1c90f4e
to
4f85f23
Compare
4f85f23
to
9cafc42
Compare
DATALOAD
andDATALOADN
instructions.DATALOADN
inevmasm::Assembly
Depends on #15467Merged.Depends on #15503Merged.