Skip to content

Commit

Permalink
Automatically merged updates to draft EIP(s) 615
Browse files Browse the repository at this point in the history
Hi, I'm a bot! This change was automatically merged because:

 - It only modifies existing Draft or Last Call EIP(s)
 - The PR was approved or written by at least one author of each modified EIP
 - The build is passing
  • Loading branch information
gcolvin authored and corollari committed Mar 4, 2019
1 parent 94fc6df commit c6a37c5
Showing 1 changed file with 60 additions and 66 deletions.
126 changes: 60 additions & 66 deletions EIPS/eip-615.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,58 +4,51 @@ title: Subroutines and Static Jumps for the EVM
status: Draft
type: Standards Track
category: Core
author: Greg Colvin <greg@colvin.org>, Paweł Bylica (@chfast), Christian Reitwiessner(@chriseth), Brooklyn Zelenka (@expede)
author: Greg Colvin <greg@colvin.org>, Brooklyn Zelenka <hello@brooklynzelenka.co> , Paweł Bylica (@chfast), Christian Reitwiessner(@chriseth)
discussion to: https://github.com/ethereum/EIPs/issues/615
created: 2016-12-10
edited: 2019-18-2
---

## Simple Summary

This EIP introduces new EVM opcodes and conditions on EVM code to support subroutines and static jumps, deprecates dynamic jumps, and thereby make EVM code amenable to linear-time static analysis. This will provide for better compilation, interpretation, transpilation, and formal analysis of EVM code.
In the 21st century, on a blockchain circulating billions of ETH, formal specification and verification are an essential tool against loss. Yet the design of the EVM makes this unnecessarily difficult. Further, the design of the EVM makes low-gas-cost, high-performance execution difficult. We propose to move forward with proposals to resolve these problems by tightening the security guarantees and pushing the performance limits of the EVM.

## Abstract

EVM code is currently difficult to statically analyze, hobbling a critical tool for preventing the many expensive bugs our blockchain has experienced. Futher, none of the current implementations of the Ethereum Virtual Machine—including the compilers—are sufficiently performant to meet the network's long-term demands. This proposal identifies a major reason for these issues, and proposes changes to the EVM specification to address the problem, making futher efforts towards a safer and more performant the EVM possible.

In particular, it imposes restrictions on EVM code and proposes new instructions to help provide for

* Better static analysis tools
* Much easier formalization
* Faster interpretation
* Near-linear-time compilation to native code, and to and from eWasm
* Easier code generation from other languages, including
* Solidity
* LLVM IR and thus many languages, incuding
* C, C++, Fortran, Haskell, Java, Ruby and Rust.

These goals are achieved by:

* Deprecating dynamic jumps
* Introducing subroutines—jumps with return support
* Disallowing pathological control flow and uses of the stack
EVM code is currently difficult to statically analyze, hobbling critical tools for preventing the many expensive bugs our blockchain has experienced. Further, none of the current implementations of the Ethereum Virtual Machine—including the compilers—are sufficiently performant to reduce the need for precompiles and otherwise meet the network's long-term demands. This proposal identifies dynamic jumps as a major reason for these issues, and proposes changes to the EVM specification to address the problem, making further efforts towards a safer and more performant the EVM possible.

We also propose to validate—in linear time—that EVM contracts correctly use subroutines, avoid misuse of the stack, and meet other safety conditions _before_ placing them on the blockchain. Validated code precludes most runtime exceptions and the need to test for them. And well-behaved control flow and use of the stack makes life easier for interpreters, compilers, formal analysis, and other tools.

## Motivation

Currently the EVM supports dynamic jumps, where the address to jump to is an argument on the stack. These dynamic jumps obscure the structure of the code and thus mostly inhibit control- and data-flow analysis. This puts the quality and speed of optimized compilation fundamentally at odds. Further, since every jump can potentially be to any jump destination in the code, the number of possible paths through the code goes up as the product of the number of jumps by the number of destinations, as does the time complexity of static analysis. Many of these cases are undecidable at validation time, further inhibiting static and formal analyses. But absent dynamic jumps code can be statically analyzed in linear time.
Currently the EVM supports dynamic jumps, where the address to jump to is an argument on the stack. These dynamic jumps obscure the structure of the code and thus mostly inhibit control- and data-flow analysis. This puts the quality and speed of optimized compilation fundamentally at odds. Further, since every jump can potentially be to any jump destination in the code, the number of possible paths through the code goes up as the product of the number of jumps by the number of destinations, as does the time complexity of static analysis. Many of these cases are undecidable at deployment time, further inhibiting static and formal analyses.

Absent dynamic jumps code can be statically analyzed in linear time. Static analysis includes validation, and much of optimization, compilation, and formal analysis; every part of the tool chain benefits when linear-time analysis is available. And absent dynamic jumps, and with proper subroutines the EVM is a better target for code generation from other languages, including
* Solidity
* Vyper
* LLVM—languages with an LLVM backend include C, C++, Common Lisp, D, Fortran, Haskell, Java, Javascript, Kotlin, Lua, Objective-C, Pony, Pure, Python, Ruby, Rust, Scala, Scheme, and Swift.

Static analysis includes validation, and much of optimization, compilation, transpilation, and formal analysis; every part of the tool chain benefits when linear-time analysis is available. In particular, linear-time control-flow analysis means near-linear-time compilation of EVM code, and better data-flow analysis can help the compiler and the interpreter better track the size of the values on the stack and use native 64- and 32-bit operations when possible. Gas metering optimization also becomes much more tractable. Additionally, conditions which are statically checked at validation time don’t have to be checked repeatedly at runtime.
The result is that all of the following validations and optimizations can be done at deployment time with **linear** **`(n)`** **or** **near-linear** **`(n log n)`** **time complexity**
* The absence of most exception halting conditions can be validated.
* The maximum use of resources can be sometimes be calculated.
* Bytecode can be compiled to machine code.
* Compilation can optimize use of smaller registers.
* Compilation can optimize injection of gas metering.

Note that analyses of a contract’s bytecode before execution—such as optimizations performed before interpretation, compilation, and on-the-fly machine code generation—must be efficient and linear-time. Otherwise, specially crafted contracts can be used as attack vectors against clients that use static analysis of EVM code before the creation or execution of contracts.
Note that near-linear `(n log n)` time complexity is essential. Otherwise, specially crafted contracts can be used as attack vectors against any validations and optimizations.

## Specification

### Proposal

We propose to deprecate two existing instructions—`JUMP` and `JUMPI`. They take their argument on the stack, which means that unless the value is constant they can jump to any `JUMPDEST`. (In simple cases like `PUSH 0 JUMP` the value on the stack can be known to be constant, but in general it's difficult.) We must nonetheless continue to support them in old code.

Having deprecated `JUMP` and `JUMPI`, we propose new instructions to support their legitimate uses.
We propose to deprecate two existing instructions—`JUMP` and `JUMPI`—and propose new instructions to support their legitimate uses.

#### Preliminaries

These forms
* `INSTRUCTION x,`
* `INSTRUCTION n,`
* `INSTRUCTION x, y`
* `INSTRUCTION n, x ...`

Expand Down Expand Up @@ -138,7 +131,7 @@ We will adopt the following conventions to describe the machine state:

Defining the frame pointer so as to include the arguments is unconventional, but better fits our stack semantics and simplifies the remainder of the proposal.

The frame pointer and return stacks are internal to the subroutine mechanism, and not directly accessible to the program. This is necessary to prevent the program from modifying its state in ways that could be invalid.
The frame pointer and return stacks are internal to the subroutine mechanism, and not directly accessible to the program. This is necessary to prevent the program from modifying its own state in ways that could be invalid.

The first instruction of an array of EVM bytecode begins execution of a _main_ routine with no arguments, `SP` and `FP` set to 0, and with one value on the return stack—`code_size - 1`. (Executing the virtual byte of 0 after this offset causes an EVM to stop. Thus executing a `RETURNSUB` with no prior `JUMPSUB` or `JUMBSUBV`—that is, in the _main_ routine—executes a `STOP`.)

Expand Down Expand Up @@ -224,6 +217,45 @@ In practice, we must test at runtime for conditions 1 and 2—sufficient gas and

All of the remaining conditions we validate statically.


#### Costs & Codes

All of the instructions are `O(1)` with a small constant, requiring just a few machine operations each, whereas a `JUMP` or `JUMPI` must do an O(log n) binary search of an array of `JUMPDEST` offsets before every jump. With the cost of `JUMPI` being _high_ and the cost of `JUMP` being _mid_, we suggest the cost of `JUMPV` and `JUMPSUBV` should be _mid_, `JUMPSUB` and `JUMPIF` should be _low_, and`JUMPTO` should be _verylow_. Measurement will tell.

We suggest the following opcodes:
```
0xb0 JUMPTO
0xb1 JUMPIF
0xb2 JUMPV
0xb3 JUMPSUB
0xb4 JUMPSUBV
0xb5 BEGINSUB
0xb6 BEGINDATA
0xb7 RETURNSUB
0xb8 PUTLOCAL
0xb9 GETLOCAL
```

## Backwards Compatibility

These changes would need to be implemented in phases at decent intervals:
>**1.** If this EIP is accepted, invalid code should be deprecated. Tools should stop generating invalid code, users should stop writing it, and clients should warn about loading it.
>**2.** A later hard fork would require clients to place only valid code on the block chain. Note that despite the fork old EVM code will still need to be supported indefinitely.
If desired, the period of deprecation can be extended indefinitely by continuing to accept code not versioned as new—but without validation. That is, by delaying phase 2. Since we must continue to run old code this is not technically difficult.

## Rationale

This design was highly constrained by the existing EVM semantics, the requirement for eWasm compatibility, and the security demands of the Ethereum environment. It was also informed by the lead author's previous work implementing Java and Scheme interpreters. As such there was very little room for alternative designs.

As described above, the approach was simply to deprecate the problematic dynamic jumps, then ask what opcodes were necessary to provide for the features they supported. These needed to include those provided by eWasm, which themselves were modeled after typical hardware. The only real innovation was to move the frame pointer and the return pointer to their own stacks, so as to prevent any possibility of overwriting them. (Although Forth also uses a return stack.) This allowed for treating subroutine arguments as local variables, and facilitated the return of multiple values.

## Implementation

Implementation of this proposal need not be difficult. At the least, interpreters can simply be extended with the new opcodes and run unchanged otherwise. The new opcodes require only stacks for the frame pointers and return offsets and the few pushes, pops, and assignments described above. Compiled code can use native call instructions, greatly improving performance. Further optimizations include minimizing runtime checks for exceptions, condensing gas metering, and otherwise taking advantage of validated code wherever possible. A lightly tested reference implementation is available in [Greg Colvin's Aleth fork.](https://github.com/gcolvin/aleth/tree/master/libaleth-interpreter)

## APPENDIX
### Validation

Validation comprises two tasks:
Expand Down Expand Up @@ -344,7 +376,7 @@ The basic approach is to call `validate_subroutine(i, 0, 0)`, for `i` equal to t
// return to top or from recursion to JUMPSUB
if instruction is RETURNSUB
break;
return true;;
if instruction is JUMPSUB
{
Expand Down Expand Up @@ -380,44 +412,6 @@ The basic approach is to call `validate_subroutine(i, 0, 0)`, for `i` equal to t
}
```

#### Costs & Codes

All of the instructions are `O(1)` with a small constant, requiring just a few machine operations each, whereas a `JUMP` or `JUMPI` must do an O(log n) binary search of an array of `JUMPDEST` offsets before every jump. With the cost of `JUMPI` being _high_ and the cost of `JUMP` being _mid_, we suggest the cost of `JUMPV` and `JUMPSUBV` should be _mid_, `JUMPSUB` and `JUMPIF` should be _low_, and`JUMPTO` should be _verylow_. Measurement will tell.

We suggest the following opcodes:
```
0xb0 JUMPTO
0xb1 JUMPIF
0xb2 JUMPV
0xb3 JUMPSUB
0xb4 JUMPSUBV
0xb5 BEGINSUB
0xb6 BEGINDATA
0xb7 RETURNSUB
0xb8 PUTLOCAL
0xb9 GETLOCAL
```

## Backwards Compatibility

These changes would need to be implemented in phases at decent intervals:
>**1.** If this EIP is accepted, invalid code should be deprecated. Tools should stop generating invalid code, users should stop writing it, and clients should warn about loading it.
>**2.** A later hard fork would require clients to place only valid code on the block chain. Note that despite the fork old EVM code will still need to be supported indefinitely.
If desired, the period of deprecation can be extended indefinitely by continuing to accept code not versioned as new—but without validation. That is, by delaying phase 2. Since we must continue to run old code this is not technically difficult.

## Rationale

This design was highly constrained by the existing EVM semantics, the requirement for eWasm compatibility, and the security demands of the Ethereum environment. It was also informed by the lead author's previous work implementing Java and Scheme interpreters. As such there was very little room for alternative designs.

As described above, the approach was simply to deprecate the problematic dynamic jumps, then ask what opcodes were necessary to provide for the features they supported. These needed to include those provided by eWasm, which themselves were modeled after typical hardware. The only real innovation was to move the frame pointer and the return pointer to their own stacks, so as to prevent any possibility of overwriting them. (Although Forth also uses a return stack.) This allowed for treating subroutine arguments as local variables, and facilitated the return of multiple values.

## Implementation

Implementation of this proposal need not be difficult. At the least, interpreters can simply be extended with the new opcodes and run unchanged otherwise. The new opcodes require only stacks for the frame pointers and return offsets and the few pushes, pops, and assignments described above. Compiled code can use native call instructions, greatly improving performance. Further optimizations include minimizing runtime checks for exceptions, condensing gas metering, and otherwise taking advantage of validated code wherever possible. A lightly tested reference implementation is available in [Greg Colvin's Aleth fork.](https://github.com/gcolvin/aleth/tree/master/libaleth-interpreter)


## Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).

0 comments on commit c6a37c5

Please sign in to comment.