Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eof: Support DATALOADN (EIP-7480) #15456

Merged
merged 1 commit into from
Oct 15, 2024

Conversation

rodiazet
Copy link
Contributor

@rodiazet rodiazet commented Sep 25, 2024

  • Add DATALOAD and DATALOADN instructions.
  • Support DATALOADN in evmasm::Assembly
  • Add smoke test

Depends on #15467 Merged.
Depends on #15503 Merged.

Copy link

Thank you for your contribution to the Solidity compiler! A team member will follow up shortly.

If you haven't read our contributing guidelines and our review checklist before, please do it now, this makes the reviewing process and accepting your contribution smoother.

If you have any questions or need our help, feel free to post them in the PR or talk to us directly on the #solidity-dev channel on Matrix.

cameel
cameel previously requested changes Sep 25, 2024
Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some problems here, the most important being that the PR makes the EOF instructions available in Yul even when EOF is not enabled. That has to be dealt with first, perhaps in a separate PR.

For the rest, see the comments below. Generally a mix of actual problems and small, easy to apply tweaks.

@@ -717,6 +717,11 @@ AssemblyItem Assembly::newImmutableAssignment(std::string const& _identifier)
return AssemblyItem{AssignImmutable, h};
}

AssemblyItem Assembly::newDataLoadN(size_t offset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AssemblyItem Assembly::newDataLoadN(size_t offset)
AssemblyItem Assembly::newDataLoadN(size_t _offset)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in EthAssemblyAdapter::appendDataLoadN() and AbstractAssembly::appendDataLoadN().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not fixed in Assembly.h.

libevmasm/AssemblyItem.h Outdated Show resolved Hide resolved
libevmasm/Instruction.cpp Outdated Show resolved Hide resolved
DATALOAD = 0xd0, ///< load data from EOF data section
DATALOADN = 0xd1, ///< load data from EOF data section
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DATALOAD = 0xd0, ///< load data from EOF data section
DATALOADN = 0xd1, ///< load data from EOF data section
DATALOAD = 0xd0, ///< load data from EOF data section (offset on the stack)
DATALOADN = 0xd1, ///< load data from EOF data section (offset as a 16-bit immediate value)

libevmasm/Assembly.cpp Outdated Show resolved Hide resolved
Comment on lines 1439 to 1465
auto appendedDataAndAuxDataSize = ret.bytecode.size() - dataStart;

// If some data was already added to data section we need to update data section refs accordigly
if (appendedDataAndAuxDataSize > 0)
for (auto [pos, val] : dataSectionRef)
setBigEndian(ret.bytecode, pos, 2, val + appendedDataAndAuxDataSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually make sense? If I understand it correctly, with this our DATALOADN in assembly and dataloadn() in Yul will not map 1:1 to the underlying EVM instruction, but instead will have an implicit offset that skips any data already embedded by the compiler in the binary (i.e. metadata and strings stored as data)? So they effectively only allow the user to access dynamic_aux_data?

But then the spec says that reading dynamic_aux_data is illegal:

DATALOADN's immediate + 32 must be within pre_deploy_data_size (see Data Section Lifecycle)

  • the part of the data section which exceeds these bounds (the dynamic_aux_data portion) needs to be accessed using DATALOAD or DATACOPY

This seems wrong unless this code is going to change in a subsequent PR to append something more to static_aux_data. Is that the idea?

Copy link
Member

@cameel cameel Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think that implicitly modifying the offset like that will be very confusing to users looking at asm output. Maybe we should give it a different name so that it's clearer that it's not pure DATALOADN but our custom builtin for accessing only a subset of data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, this is missing a validation that when you add appendedDataAndAuxDataSize, the value still does not exceed 0xffff. There is an assert in setBigEndianUint16(), but this looks like something that can actually happen in normal usage and needs a runtime check. The user has no direct control over the size of data appended by the compiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks that during all the splitting and rebasing one important code section was missed. The fragment calculates the maximum offset accessed by the DataLoadN and adjust the predeploy_data_size (data section size value in the header) accordingly. We need definitely better tests also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree with different name. Any idea for the name? :) Moreover it's also related to my comment here #15456 (comment)

Copy link
Member

@cameel cameel Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need definitely better tests also.

True, but for now I'd not worry about it too much. We need something and we definitely need coverage for things where there's a risk of non-EOF being affected, but Daniel's idea with initial EOF implementation is to basically speed through it to the point where we can run semantic tests. Once we get them to pass, we'll have some assurance that things can't be too wrong and we'll be able to focus on testing and ironing out specific parts.

So I'll definitely be pointing out spots where I think something is wrong if I notice, but also it is not that important to get them all right at this point. We're fine with it being unstable and incomplete to some extent.

Copy link
Member

@cameel cameel Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea for the name?

Maybe we should take a step back and talk about the design of this instruction first. Actually, what kinds of data we can have now? This is my current understanding:

  1. Explicit Yul data - comes from data sections other than ".metadata" in a Yul object.
  2. Implicit Yul data - added in transition from Yul to assembly. E.g. long strings that in Yul are just literals, but become data when assembled.
  3. Metadata - comes from the ".metadata" data section in a Yul object.
  4. Static auxdata - comes from RETURNCONTRACT, not known at compilation time.
  5. Dynamic auxdata - comes from RETURNCONTRACT, not known at compilation time and inaccessible via DATALOADN.

AFAIK metadata ends up in m_auxiliaryData and implicit+explicit Yul data are both in m_data. Static auxdata at compilation time is not in the binary at all.

It seems that your idea is to allow the user to access only (4). In that case auxdataloadn() would be a descriptive name.

We may still want to add dataloadn() in some form in addition to it, to make it possible to access named Yul data sections that way. Probably not dataloadn(offset) since it needs a literal (so expressions like dataload(add(dataoffset('mydatasection'), 42)) would not work), but maybe dataloadn(data_name, offset)? The extra argument should make it clear that the offset is not the same as in the EVM opcode.

Also, having things like auxdataoffset() and staticauxdatasize() might be useful in combination with datacopy(), but we could leave those for later. I was originally going to suggest that you could do things like dataloadn(add(auxdataoffset(), 42)) to uniformly access auxdata and named data sections, but then realized that we would not be able to use expressions with it.

Copy link
Contributor Author

@rodiazet rodiazet Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, this is missing a validation that when you add appendedDataAndAuxDataSize, the value still does not exceed 0xffff. There is an assert in setBigEndianUint16(), but this looks like something that can actually happen in normal usage and needs a runtime check. The user has no direct control over the size of data appended by the compiler.

setBigEndian(ret.bytecode, refPosition, 2, ...) checks that the value does not exceed 0xffff. The template parameter ValueT is size_t in this case. It should be fine imo. https://github.com/ipsilon/solidity/blob/8f35fdb9a0525e2c04ed7cacc50542093ad41cb1/libevmasm/Assembly.cpp#L901

Copy link
Member

@cameel cameel Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setBigEndian() assumes that it does not exceed 0xffff. It's not a validation, it just says that it's up to the caller to ensure that this is true. If it's not true, we panic and get a compiler crash, spewing diagnostic info.

If in assembleEOF() we know that it can actually be exceeded for legit reasons, then we need a proper validation (solRequire()) to report a codegen error to the user and ensure that setBigEndian() is not called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I get it now. Fixed.

libevmasm/AssemblyItem.cpp Outdated Show resolved Hide resolved
libevmasm/Assembly.cpp Outdated Show resolved Hide resolved
@rodiazet rodiazet force-pushed the eof-support-dataloadn branch 5 times, most recently from fc298f5 to 1229a8c Compare September 30, 2024 15:23
@cameel cameel added has dependencies The PR depends on other PRs that must be merged first EOF labels Sep 30, 2024
@rodiazet rodiazet force-pushed the eof-support-dataloadn branch 3 times, most recently from 25276e2 to d9a3173 Compare October 1, 2024 13:06
@cameel cameel removed the has dependencies The PR depends on other PRs that must be merged first label Oct 1, 2024
@rodiazet rodiazet force-pushed the eof-support-dataloadn branch 8 times, most recently from 2b53622 to f5032c9 Compare October 8, 2024 18:38
Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's more feedback after another pass through the PR. Overall, of the things that are not yet done, there are just two that I'd consider particularly important:

  • Syntax tests ensuring that auxdataloadn() does not affect Yul when EOF is not enabled.
  • Validations for auxdataloadn()'s argument in AsmAnalysis. Also, that argument should be a number, not a string.

All the remaining stuff are simple things: getting the asserts and validations right, naming that better aligns with the spec to avoid making things confusing, docstring corrections, etc.

liblangutil/EVMVersion.cpp Outdated Show resolved Hide resolved
libevmasm/AssemblyItem.cpp Outdated Show resolved Hide resolved
libevmasm/AssemblyItem.cpp Show resolved Hide resolved
libevmasm/AssemblyItem.cpp Outdated Show resolved Hide resolved
libevmasm/Assembly.h Outdated Show resolved Hide resolved
libevmasm/Assembly.cpp Outdated Show resolved Hide resolved
libevmasm/Assembly.cpp Outdated Show resolved Hide resolved
libyul/backends/evm/AbstractAssembly.h Outdated Show resolved Hide resolved
libevmasm/Assembly.cpp Outdated Show resolved Hide resolved
@rodiazet rodiazet force-pushed the eof-support-dataloadn branch 3 times, most recently from 97689bc to 983f773 Compare October 11, 2024 14:23
@rodiazet

This comment was marked as resolved.

Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this looks fine in terms of correctness, but some of the older comments are still unresolved. Would be good to address those before we merge. It needs a rebase anyway since base PR was merged.

libyul/backends/evm/EVMDialect.cpp Outdated Show resolved Hide resolved
@cameel cameel dismissed their stale review October 11, 2024 23:09

All important issues were addressed. Still needs a rebase and a few tweaks before merge.

cameel
cameel previously approved these changes Oct 14, 2024
Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving already, but looks like the resolution of one of the older comments was somehow reverted. Would be good to tweak it before we merge.

libevmasm/AssemblyItem.cpp Outdated Show resolved Hide resolved
@cameel cameel enabled auto-merge October 15, 2024 11:11
@cameel cameel merged commit 4a3d0b2 into ethereum:develop Oct 15, 2024
73 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants