Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEP-481: Synchronous wasm submodules #481

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ Changes to the protocol specification and standards are called NEAR Enhancement
| [0399](https://github.com/near/NEPs/blob/master/neps/nep-0399.md) | Flat Storage | @Longarithm @mzhangmzz | Review |
| [0448](https://github.com/near/NEPs/blob/master/neps/nep-0448.md) | Zero-balance Accounts | @bowenwang1996 | Final |
| [0455](https://github.com/near/NEPs/blob/master/neps/nep-0455.md) | Parameter Compute Costs | @akashin @jakmeier | Final |
| [0481](https://github.com/near/NEPs/blob/master/neps/nep-0481.md) | Synchronous wasm submodules | @mooori @birchmd | New |

## Specification

Expand Down
215 changes: 215 additions & 0 deletions neps/nep-0481.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
---
NEP: 481
Title: Synchronous wasm submodules
Authors: Moritz Zielke <moritz.zielke@aurora.dev>, Michael Birch <michael.birch@aurora.dev>
Status: New
DiscussionsTo: https://github.com/nearprotocol/neps/pull/0481
Type: Protocol
Version: 1.0.0
Created: 2023-05-12
LastUpdated: 2023-05-12
---

# Synchronous wasm submodules

## Summary

This NEP is an extension of [#480](https://github.com/near/NEPs/pull/480) which enables synchronous execution for namespaces on the same account. Additionally, it proposes a coroutine-like interface that would allow for synchronous-only namespaces with restricted capabilities. Such namespaces may be used as the foundation for certain types of account extensions in the future.

As a reminder, NEP-480 introduced the idea of a namespace. This is a named location where a Wasm contract can be stored within an account. Each namespace has its own code and storage, but shares storage staking and a Near balance with the account as a whole. Note that namespaces are distinct from sub-accounts. The latter is a completely separate account with its own namespaces, and may live in a separate shard. All namespaces for the same account (e.g. `alice.near:X` and `alice.near:Y`) are guaranteed to live in the same shard. This allows them to call each other synchronously, as proposed in this NEP.

The new host functions introduced in this NEP are:

- `sync_function_call` to call another namespace synchronously,
- `callback` to suspend execution of a synchronous call and return control-flow to the caller,
- `resume_sync_call` to resume a synchronous execution that previously suspended itself.

These host functions are all described in more detail below. Note that the usage of the new `callback` host function in a contract method means that method can only be called synchronously because `callback` panics when used in a VM directly created by a `FunctionCall` action (i.e. async execution).

## Motivation

Currently all interactions between NEAR contracts are asynchronous, which prevents use cases that require synchronous interactions. An example of such a use case is Aurora.

Aurora is a smart contract on NEAR which emulates an Ethereum ecosystem. To execute a function call on an Ethereum contract, it loads the contract and its state from storage, spins up an EVM interpreter to execute the contract’s EVM bytecode and persists state changes. Interpreting EVM bytecode inside a NEAR transaction is inefficient in terms of gas usage.

Given a [compiler](https://github.com/aurora-is-near/evm2near) to convert EVM bytecode into wasm bytecode targeted at NEAR, Aurora can [reduce its gas consumption](https://github.com/aurora-is-near/aurora-engine/pull/463) significantly. Instead of spinning up an EVM interpreter and interpreting a contract’s EVM bytecode it would invoke a namespace whose wasm is equivalent that EVM bytecode.

Significantly increasing the efficiency of Aurora can be beneficial for the NEAR ecosystem in general. Moreover, synchronous calling of namespaces is part of the broader [account extensions](https://gov.near.org/t/proposal-account-extensions-contract-namespaces/34227) feature.

## Specification

### Deployment

No new changes are needed for deployment over what is already specified by the account namespaces proposal.

### Starting synchronous calls

A synchronous call is started using a newly added host function, `sync_function_call`. It has the following (high-level[^1]) interface:

```rust
fn sync_function_call(namespace: &str, method_name: &str, arguments: &[u8], gas_limit: u64) -> SyncCallResult
```

where the return type `SyncCallResult` is defined as

```rust
enum SyncCallResult {
Success(Vec<u8>),
Callback(Vec<u8>),
Error(SyncExecutionError),
}
enum SyncExecutionError {
NamespaceNotFound,
MethodNotFound,
OutOfGas,
WasmTrap(WasmTrap),
AlreadyStarted,
NotStarted,
}
```

The arguments passed to `sync_function_call` specify the namespace to call and the method to use as an entry point, as well as the input data to pass to that method and the maximum amount of gas that can be spent on the execution. The gas limit given to a synchronous function call cannot exceed the remaining gas the current execution has. The return value indicates whether the execution was successful, suspended (more on this later), or encountered an error. The errors are self-explanatory: the namespace may not exist, or the method may not exist within the specified namespace, the execution may exceed the given gas limit, or the called namespace code could have reached a [trap](https://webassembly.github.io/spec/core/intro/overview.html#:~:text=target%20such%20constructs.-,Traps,-Under%20some%20conditions) (e.g. reached a panic in its own code). The `AlreadyStarted` and `NotStarted` errors are related to suspending and resuming synchronous calls, which will be discussed below. In the case the result is a success the returned bytes are the output from the method (returned via the `value_return` host function).

[^1]: The actual interface at the Wasm level will require pointers (represented as `u64` values) and data sizes, and use a register for the return data. But the high-level interface is easier to understand.

### Suspending and resuming synchronous calls

An additional host function is made available, `callback`. It has the following (high-level) interface:

```rust
fn callback(data: &[u8]) -> Vec<u8>
```

This host function is only available when the code is running as the callee of a synchronous call (otherwise it will panic, ending the contract execution). This host function suspends the current execution, returning the control flow to the contract which made the synchronous call. That contract will receive `SyncCallResult::Callback(data)` as the result of `sync_function_call`, where `data` is the same bytes the callee passed to the `callback` host function.

If this happens, the contract may resume the synchronous execution in the other namespace with another new host function, `resume_sync_call`. It has the following (high-level) interface

```rust
fn resume_sync_call(namespace: &str, response: &[u8]) -> SyncCallResult
```

where `SyncCallResult` is the same definition as in `sync_function_call`. The `response` argument in `resume_sync_call` becomes the output of the `callback` host function and control flow of the execution is returned to the callee namespace. The other error cases in `SyncExecutionError` are now clear: if a namespace suspends its execution then it cannot be restarted again via `sync_function_call` (the `AlreadyStarted` error), and if a namespace execution has never been started then it cannot be resumed via `resume_sync_call` (the `NotStarted` error).

This pattern intentionally follows the interface for a [coroutine](https://en.wikipedia.org/wiki/Coroutine). One reason to allow this pattern is performance; resuming execution on an existing Wasm runtime is cheaper than creating a fresh instance of the VM. In particular, if a contract needs to use the functionality in another namespace multiple times during its execution then it is cheaper if the interaction is mediated by `resume_sync_call`+`callback` instead of `sync_function_call`+`value_return`. This is similar to how web-sockets have better performance than repeated RPC calls. Another reason to have this ability is discussed further in the section on untrusted code.

Note that any namespace within the same account can be invoked synchronously via `sync_function_call`. If the namespace methods do not use the `callback` host function then the only possible results are `SyncCallResult::Success` or `SyncCallResult::Error` (`SyncCallResult::Callback` is impossible) and such functions could also be invoked asynchronously using the `FunctionCall` action (either coming directly in a transaction or from a promise created in a prior call). If a namespace method does use the `callback` function then it can be called via `sync_function_call` only (an asynchronous call via a `FunctionCall` action will cause a panic when `callback` is executed).

### Recursive synchronous calls

All namespaces are equal in the sense that they have access to all host functions (including the newly introduced `sync_function_call`, `resume_sync_call` and `callback`). However, not all host functions can be called under all conditions. It has already been mentioned above that `callback` can only be used if the execution is happening as the result of a synchronous call. Additionally, recursive synchronous calls are not allowed (though perhaps this restriction will be lifted in a future NEP). I.e. it is not allowed to call `sync_function_call` or `resume_sync_call` from a synchronous execution; they are only allowed in (asynchronous) executions triggered by a `FunctionCall` action.

### Behavior of host functions in synchronous calls

As stated above, all namespaces have access to all host functions but the host functions related to synchronous execution work differently depending on if the execution resulted from a synchronous call or not. Those are the exception; all other host functions behave the same regardless if they were called in an asynchronous or synchronous context. For example, the promises API will still create promises from the current account and currently executing namespace (as specified in NEP-480). The storage function calls will still interact with the storage of that namespace, again as specified in NEP-480.

### Handling error cases

If a synchronous call (freshly started or resumed) returns `SyncCallResult::Error` then the Wasm VM running the called namespace will have been shut down and any effects it might have caused (e.g. writing to storage, creating promises) are reverted. The contract which made the synchronous call is free to continue executing, handling the fact that this error happened in any way the developer chose. Note that because recursive synchronous calls are not allowed we need not worry about recursively reverting the effects of a synchronous execution.

### Untrusted code and the "app" pattern

A new potential pattern emerges from this design, which has implications for the account extensions feature as a whole. Suppose a namespace contains code which only uses the `callback` and `value_return` host functions (this is easy to check via static analysis of the Wasm code). Such code is safe to deploy as a namespace to any account without security risk, even knowing nothing else about the code (i.e. it is untrusted). The reason is because such code can only be pure functions (it does not have access to host functions to create effects like creating promises or writing to storage), or a sequence of pure functions with input controlled by a trusted source. The latter is the case where the code uses `callback` to interface with another (trusted) namespace; execution between `callback` calls can be modelled as pure functions (and the input is always from a trusted source because functions using `callback` cannot be invoked from external namespaces).

Now suppose we write an "OS-like" contract which exposes a particular ABI to all the namespaces it calls synchronously (i.e. a uniform way to interpret the input and output bytes that are passed by the new host functions related to synchronous execution). This ABI would include requests to cause effects (e.g. writing to storage, calling other namespaces -- synchronously or not) as well as rules (user-defined) for when to fulfill those requests. This would allow for arbitrarily complex permissions models to enforce the safety of the various extensions a user adds to their account. Such extensions would be like apps on a smart phone and the main OS contract would be like Android or iOS.

The specifics of such an OS contract and its app ABI are out of scope for the NEP. However we believe it should be done as a future NEP and that the possibility of such a future NEP is a strong argument to allow the coroutine-like functionality in this NEP even though it creates extra complexity.

## Reference implementation

A proof of concept (PoC) implementation is available at [this branch](https://github.com/birchmd/nearcore/tree/sync-wasm-poc). The following sections refer to this PoC.

### Status

The current PoC implementation is based on an older version of the specification (pre-namespaces). It allows writing "submodules" in Rust and executing them synchronously. The PoC will be revised to align with the new specification.

For now, [here](https://github.com/birchmd/nearcore/blob/67ce064e5a692c9af10f8b7ae23606218d78a244/runtime/near-test-contracts/test-submodule-rs/src/lib.rs) is a submodule implemented in Rust which calls host functions and interacts with the parent contract. It can be [compiled to wasm](https://github.com/birchmd/nearcore/blob/67ce064e5a692c9af10f8b7ae23606218d78a244/runtime/near-test-contracts/build.rs#L31) like a regular contract. In [this test](https://github.com/birchmd/nearcore/blob/67ce064e5a692c9af10f8b7ae23606218d78a244/integration-tests/src/tests/runtime/submodule.rs#L318-L346), the submodule is deployed and then executed via the parent contract by calling [execute_submodule_rs](https://github.com/birchmd/nearcore/blob/67ce064e5a692c9af10f8b7ae23606218d78a244/runtime/near-test-contracts/test-contract-rs/src/lib.rs#L1416-L1479).

### Implementation details

The wasm bytecode of namespaces is subject to the same instrumentation as the wasm bytecode of regular contracts. Control flow passing is handled on the host using [corosonsei coroutines](https://docs.rs/corosensei/latest/corosensei/). The host function to start a namespace spins up a virtual machine in which the namespaces’s bytecode is executed. This virtual machine is contained in a coroutine and when the namespace yields back to the caller, the host suspends the coroutine which interrupts execution of that namespace. Resuming the execution of the namespace is accomplished by resuming the coroutine.

The gist of the PoC is implemented in [wasmer2_runner.rs](https://github.com/birchmd/nearcore/blob/6cac4924997f658ec8b91a58e1513564402b7f1b/runtime/near-vm-runner/src/wasmer2_runner.rs).

### Outstanding TODOs

The current implementation is in a PoC state. The most important outstanding tasks are:

- Aligning the previous "submodules" design with the new namespaces design
- Gas accounting: Gas cost of operations that handle submodules need to be estimated and accounted for in the newly added host functions.
- Error handling:
- Not all errors related to submodules that may occur inside the host are handled properly (some errors are `unwrap()`’ed in the PoC).

## Alternatives

Possible alternatives for the synchronous execution of wasm bytecode are:

### An `eval` host functions

An `eval` host function that executes bytecode which it received as input parameter.

Pros:

- Might allow the execution of bytecode without previous deployment, which could reduce the complexity of some use cases.

Cons:

- Coroutines are meant to run concurrently and be resumable, which are features required for supporting the concurrent execution of submodules. Supporting this functionality with an `eval` function might be more involved.
- Handling traps in submodules is likely to be more challenging when the submodule is executed in the same virtual machine as the parent.
- In some programming languages similar `eval` functions have a notoriously bad reputation, see for instance [Python](https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice/1832957#1832957) and [JavaScript](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/eval#never_use_eval!).

### User-facing sharding

The purpose of namespaces in this proposal is to ensure all state and code needed for the synchronous execution is co-located in the same shard. In Near's current design sharding is invisible to users in the sense that all accounts appear to be on separate shards (and therefore only asynchronous calls are possible between accounts). However, in practice there are many accounts on the same shard and if this information was surfaced to users then synchronous calls could be possible between those co-located accounts. The only thing that would change about this proposal is that namespaces would no longer be necessary and therefore it would no longer depend on NEP-480.
akhi3030 marked this conversation as resolved.
Show resolved Hide resolved

Pros:

- Proposal becomes independent of NEP-480.
- More use cases are possible because synchronous calls can extend beyond a single account.

Cons:

- Many open questions about how sharding should be exposed to users. Examples:
- Should we change the sharding boundaries? Today shards are split (somewhat arbitrarily) based on lexicographical ordering of the account IDs, but if users know about sharding should they be allowed to choose what shard the account exists on when it is created? Should a sub-account be located in the same shard as its parent by default?
- Should it be possible to move an account from one shard to another?
- How will new shards be introduced in the future without breaking user contracts (which may rely on the details of shard boundaries)? Would new shards simply be empty? Could an existing shard be split based on data about synchronous calls?
- May reduce the effectiveness of Near's sharding solution if users intentionally co-locate to a specific shard (e.g. because it contains a popular contract).

## Drawbacks

### Disallowing recursive synchronous calls

This restriction is primarily about avoiding complexity in this first implementation of synchronous execution. The main questions around loosening this restriction would be:

- What is the allowed call depth?
- How can we ensure that effects are recursively reverted (i.e. the effects of all synchronous sub-calls must be reverted if the current synchronous call encounters an error)?

It will be easier to answer these questions after the initial implementation of synchronous execution is complete. We do not think this limitation will prevent the feature from being useful for some cases in the meantime.

## Security Implications

## Future possibilities

## Changelog

### 1.0.0 - Initial Version

- Created 2023-05-12

#### Benefits

[Placeholder for Subject Matter Experts review for this version:]

- Benefit 1
- Benefit 2

#### Concerns

[Template for Subject Matter Experts review for this version:]
| # | Concern | Resolution | Status |
| - | - | - | - |
| 1 | | | |
| 2 | | | |

## Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).