-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow verbatim
in Solidity assembly
blocks
#12067
Comments
The reason is that we have to think about which optimizer steps to disable, and probably we have to disable almost all of them. Can you tell us more about the new opcodes in the fork? |
For now it's a proof of concept, and it's not even clear that this approach is the correct one. We want to add support for Solidity as a language for an experimental language-agnostic smart contracts platform. We don't need or want to emulate a whole Ethereum blockchain, since our smart contracts run in level 2; we only need to run the Solidity code and store the current state. What we need is some way to bridge the Solidity code with the underlying sandbox; and for that one idea is to add some custom opcodes, like some form of "syscalls", like "obtain the chain ID" or "publish an event". Again, maybe there is a better approach but for now we are experimenting with this. Thanks again! |
Wouldn't it be better to do this on the yul level instead? Plese feel free to schedule a call to discuss in more detail! |
What do you mean exactly? I know that I can write a contract in yul, and I can inject arbitrary bytecode with |
@dessaya To summarize why we haven't done this so far, the Yul optimizer wasn't designed keeping We can of course allow it in Solidity's inline assembly. The question is if the Yul optimizer should be completely disabled or partly; if partly, which ones? It would be useful if you can give a list of additional opcodes that you would like to add this way, to get an idea of how this would work for your case. |
What I meant is to compile solidity to yul using |
We are still evaluating options, so we haven't settled on what the opcodes will look like exactly. But an example would be something very simple to retrieve a value from the context:
where the return value is provided by our sandbox; for example the (non-ethereum) address of the contract creator. @chriseth I was not aware that |
Instead of special opcodes it is also possible to use special addresses to exchange data (i.e. precompiles or system contract on some chains). If you have complete control over your system, that may be a nicer way because all EVM compatible languages (Vyper, Fe, etc.) could be made to work with it without changes. |
As mentioned we wanted to have this enabled originally, but the main problem here is somehow signalling what commitments the Since we have the Additionally we could think about introducing a "clobbered variables" list. |
looking forward to this! so why not just do analysis on the ops to ensure no stack manipulation deeper than the "inputs" in the verbatim name & ensures that only 1 element is added to the stack at the end? if you have that guarantee + memory safety, while you dont get "full" mutability of the execution context, you do get a decent ways safely if you want to enable this sort of check maybe something like: And if you don't have stack-safe all bets are off on optimizations? Here it would be trivial to look op by op and ensure its stack safe, and if you break the safety contract there is a compiler error. And you can also easily run an analysis to check that storage purity in solidity matches verbatim opcodes. I think at this low level its fine to force users to enumerate the safety they are guaranteeing the compiler, i.e. one could imagine:
To be clear I am not as knowledgeable on the compiler as y'all obviously, just my 2 cents |
One early goal for verbatim was to support opcodes that are not yet in the EVM. Other machines like OVM1, new proposed EIPs, other EVM like chains are example use cases. So we cannot really analyze the raw bytecode in general. User annotation is a good idea. However, we have to check what optimizations we can enable with that. The Optimizer wasn't designed with Verbatim in mind, so we'll have to really look at what steps can be broken by |
I'm coming in late to this conversation, but am exploring the possibility of using Optimization in our case isn't really a concern - this would be meant for prototyping and validation, and wouldn't be needed in the meanwhile for production-optimized code. |
One option is to just disallow the optimizer to be on if verbatim is used. Then slowly as the team analyzes what optimizations are possible for given annotations, they can be added. i.e.:
when there are experimental opcodes, the user defines it as such. If that flag isn't present, stack analysis can be done. If it isnt present and the user uses an unknown opcode, the compiler would report something to the effect of: For clarity, when talking about optimizations here, I assume we mean at the contract level, not the verbatim bytecode level? In general, I don't want the optimizer touching my verbatim (see #12951) anyway. I view the user annotation as a contract between the user and the compiler - somethings are for the compiler, somethings are for me. What I mean is things like When thinking about the evm, and what optimizations would be safe (on the contract level), it feels like if the following guarantees are made it should be fine:
It feels like if you have 1, 2, 3, and 4, and the verbatim code isn't modified by the compiler, the compiler can view it as an inlined function of sorts. If you don't, the optimizer takes a hike. I may be missing some things, and if anyone things of some safety contracts that need to hold it may be worthwhile to post them here? In my mind an MVP of this would be: enum SafetyContractItem {
MemSafe,
StackSafe,
Jumpless,
StorageSafe,
Experimental
}
if verbatim_block
.iter()
.all(|safety_item| {
matches!(safety_item,
SafetyContractItem::MemSafe,
SafetyContractItem::StackSafe,
SafetyContractItem::Jumpless,
SafetyContractItem::StorageSafe
)
&& !matches(safety_item, SafetyContractItem::Experimental)
}) {
optimizations_possible = verbatim_block.analyze_stack_promise();
optimizations_possible = verbatim_block.analyze_storage_promise();
optimizations_possible = verbatim_block.analyze_jumpless_promise();
} And otherwise throw an error if the optimizer is on. If only a subset of needed promises are made, then throw the error. This gets us most of the way there in my mind:
And I want to drive home that no one expects (nor likely wants) verbatim block interiors to be optimized by the compiler so if that is a major hang up, if possible just avoid that altogether. Again, I could definitely be off-base here, i'm just a layman trynna to get a cool feature pushed thru :). |
This issue has been marked as stale due to inactivity for the last 90 days. |
please dont let this issue die :) |
I want use this to insert |
verbatim is dangerous and should not be added to solidity at will. At least it needs a lot of restrictions, such as declaring it in the compilation configuration before it can be used. A contract that uses verbatim should not be verified by the source code verifier. Thinking about verbatim can generate any bytecode's contract source code, that means I can complete the contract verification as long as I know the bytecode. |
Yeah, verification is a good point. If we allow |
@Hellobloc is your point that I don't understand why that is a reason to not include it in solc. You can already obfuscate code today. And if you do that, a quick look at the "verified" code is enough to know you are doing that and that you are (maybe) a bad actor. Actually, I see something that is different: it makes it easier to get a meaningless "Verified" checkmark in an explorer. But IMO that's not reason enough to not add a useful feature. |
@fvictorio I feel like you don't seem to understand what I mean. Actually the above issue is not feasible in some versions of contract source code validation because solc adds an invalid value bytecode fe at the end of the bytecode, this value is 00 in version 0.4. I have to say here that solidity really shocked me because it prevents the problem of loose assembly for source forgery, I'm not sure if this is their design, but they must have had a lot of consideration. I think it is worth discussing whether a new invalid value ending should be added inside the yul. And I'm curious about what you call obfuscate code because I'm looking forward to having more options for constructing source code, maybe I'm missing something. |
That's fair, thanks for elaborating 🙂 |
My initial understanding of the concern was that it allowed you to verify a contract with meaningless code, which is sort of like obfuscating it. But I was thinking about the deployer doing it. I agree that this makes troll-verifying (I see a deployment on chain from a well-knwon account and immediately verify it using |
I feel like it shouldn't throw the question to the source code verifier. Actually I feel there is a solution to generate a new invalid value ending for |
we still need this 💯 |
A proposal for fixing the verification issue: Have an unreachable metadata that points to verbatim blocks: contract A {
function b() public returns (uint256) {
uint256 x = 100;
uint256 y;
assembly ("memory-safe") {
y := verbatim_0i_1o(hex"6001")
}
return x + y;
}
} Would loosely codegen into: b_block:
PUSH(100)
verbatim_block:
PUSH(1)
// mstore and return
// .. snip ..
VERBATIM_METADATA_START
NUM_VERBATIM_BLOCKS
VERBATIM_IDENT_OP
verbatim_block
verbatim_block_len
// if more than one verbatims
VERBATIM_IDENT_OP
verbatim_block2
verbatim_block2_len This would make it easy on an integrator like etherscan to verify that only portions of the code are Sure this adds 3 bytes/verbatim block + 2, but i think thats fine personally. |
While solidity don't support verbatim we must be creative... I implemented this workaround that doesn't require you to manually edit the bytecode: contract ExampleImpl {
// Workaround for calling an arbitrary code from solidity
function _verbatim() private pure returns (uint256 output) {
assembly ("memory-safe") {
let ptr := mload(0x40)
// Force a constant to be represented as 32 repeated '7E' in the runtime code.
mstore(ptr, 0x7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E7E)
// Return the result of the inline code
output := mload(ptr)
}
}
function add(uint256 a, uint256 b) external pure returns (uint256) {
// Once we don't know the stack order, we store the two parameter in memory.
// so the inline code can access it and sum a and b
assembly ("memory-safe") {
let ptr := mload(0x40) // get free memory ptr
mstore(ptr, a)
mstore(add(ptr, 32), b)
}
return _verbatim();
}
}
contract Example is ExampleImpl {
// OBS: The code MUST have exact 32 bytes in size and push ONE value onto the stack.
// PUSH22 0x40 MLOAD DUP1 PUSH1 0x20 ADD MLOAD SWAP1 MLOAD ADD
bytes32 private constant INLINE_BYTECODE = 0x7500000000000000000000000000000000000000000040518060200151905101;
constructor() payable {
bytes memory runtimeCode = type(ExampleImpl).runtimeCode;
assembly ("memory-safe") {
let size := mload(runtimeCode)
// Efficient algorithm to inject a bytecode in the contract by
// replace the code PUSH31 0x7E7E7E....
// Initial search position.
let ptr := add(runtimeCode, 32)
// Efficient Algorithm to find 32 repeated bytes (ex: 0x7E7E7E..) in a byte sequence
for { let chunk := 1 } gt(chunk, 0) { ptr := add(ptr, chunk) } {
// Transform all `0x7E` bytes into `0xFF`
// 0x81 ^ 0x7E == 0xFF
// Also transform all other bytes in something different than `0xFF`
chunk := xor(mload(i), 0x8181818181818181818181818181818181818181818181818181818181818181)
// Find the right most unset bit
// (0x12345678FFFFFF + 1) & (~0x12345678FFFFFF) == 0x00000001000000
chunk := and(add(chunk, 1), not(chunk))
// Round down to the closest power of 2 multiple of 256
// Ex: 2 ** 18 become 2 ** 16
chunk := div(chunk, mod(chunk, 0xff))
// Find the number of leading bytes different than `0x7E`.
// Rationale:
// Multiplying a number by a power of 2 is the same as shifting the bits to the left
// 1337 * (2 ** 16) == 1337 << 16
// Once the chunk is a multiple of 256 it always shift entire bytes, we use this to
// select a specific byte in a byte sequence.
chunk :=
shr(248, mul(0x201f1e1d1c1b1a191817161514131211100f0e0d0c0b0a090807060504030201, chunk))
}
// Replace '0x7E7E...' by some arbitrary code
mstore(ptr, INLINE_BYTECODE)
// This code can be easily extended to run any arbitrary code of any size by appending it at the end of runtime code.
// and using the `INLINE_BYTECODE` to jump to this location.
return (add(runtimeCode, 32), mload(runtimeCode))
}
}
} |
@dessaya, have you managed to make this work? You can find ways to contact me on my GitHub Profile page. |
There is currently support for
verbatim
, allowing to instert arbitrary bytecode, but only when compiling in strict assembly mode. But theverbatim
group of functions ios not available insideassembly
blocks in Solidity code. Example:When compiling with
solc
:What is the motivation for disabling verbatim in Solidity? I understand that it must be used with care and only for very specific reasons. My use case is that I am targetting a forked version of the EVM interpreter with new opcodes. I'm currently unable to use
solc
to compile contracts targetting this fork.Thanks!
The text was updated successfully, but these errors were encountered: