Limited Replayability Tracking Issue #8197

Ekleog-NEAR · 2022-12-09T12:52:24Z

Today any development to neard is required to maintain the exact behavior as seen in all past versions of the program. Among other things, this is necessary to enable replay of the blockchain all the way back to the genesis, as implemented by the various neard view-state sub-commands.

Some of the prerequisite infrastructure to reconcile the operational needs and the desire to modify the protocol already exists in the form of protocol versioning. Practice has nevertheless demonstrated protocol versioning to incur a significant hit to the development velocity. Every time a change is made, engineers need to carefully consider the interaction between the change and protocol-visible behaviour. In cases where the change is intentionally a breaking change to the protocol, care needs to be taken to also maintain the former behaviour for the previous version of the protocol. In extreme cases the only feasible way to maintain compatibility is to duplicate the affected subsystem in full. First time this happens, logic to switch between different versions of the functionality must be implemented as well. Such logic acts to further impede future evolution of the implementation.

We are not able to consistently verify whether our efforts to maintain compatible behaviour are being implemented correctly in practice. Verifying the protocol versioning is implemented correctly by replaying even portions of the history is an extremely time-consuming process (taking weeks to months per experiment) and requires significant effort to set up. This makes verification of code changes quite onerous, to the point where there has only been One Attempt on behalf of the Contract Runtime team back in 2021.

It would be ideal, if we could make the requirements for compatibility satiable by construction and remove the burden imposed by this functionality on the development process.

See the proposal thread for further details.

Current status

Write down the proposal;
Gather sufficient stakeholder (node, protocol teams’) involvement and approval;
Nail down the implementation strategy and similar such details;
Merge currently-published crates that have no reason being separate from near-vm-runner into near-vm-runner
Remove all the dependencies of near-vm-runner
Consider whether to do the same to node-runtime (and rename it to near-transaction-runtime), if yes s/near-vm-runner/near-transaction-runtime below
Define and document a proper API to near-vm-runner;
Limit replayability of near-vm-runner, removing all of the dead code;
Consider future steps

Ekleog-NEAR · 2023-02-24T14:57:57Z

We discussed this in yesterday’s protocol meeting. The summary is as follows:

The issues of complete replayability: old security-flawed code staying around, development velocity issues due to having to introduce conditions for all the new changes, and the complexity of refactoring when needing to keep supporting old behavior
The little advantages of complete replayability: As of today, we don’t even know whether we can replay all history. @marcelo-gonzalez is currently running an experiment that will help us know if we definitely don’t, but for computing power reasons we won’t be able to know for sure if we do. The main thing complete replayability brings us is, if someone needs to replay an old block, we have the state for that block at hand. Whereas with limited replayability, we would need some way to either find old state somewhere, or somehow un-migrate it from our current archival nodes.
The different ways to do limited replayability:
- Limited-to-some (the lowest-hanging-fruit, and thus @mm-near’s favorite), where we say we officially support only running a protocol version on the last nearcore release that has it, but we actually cut down on old releases only manually when we actually feel the need to
- Limited-to-three (thanks @jakmeier for pointing out that we need three for the canary nodes), basically the same but we keep always the last 3 protocol versions alive in a single neard instance
- Limited-to-one (my and @nagisa’s favorite), where we make each neard binary actually support only a single protocol version, and then we make sure to orchestrate execution properly so that we can pass over the execution from one neard binary to the next. It is the one that brings us the best development velocity, as we can do any refactoring we want without having to care even once about backwards compatibility (except for stored data); but the cost is more upfront development
The different ways to do limited-to-three (or -to-some) while still getting development velocity benefits for the runtime at least:
- Splitting at the contract/transaction-runtime level, into a separate binary that gets talked to by RPC
- Same, but by having nearcore depend on different crate versions of the contract/transaction-runtime (probably better)

We overall agreed that:

Limited-to-one replayability is currently more effort than it’s worth
We should definitely limit replayability at least -to-some, as the use cases for actually replaying old blocks seem very few and unlikely to ever happen in practice. We should still have a plan for what to do the day someone comes to us needing it, but we do not need to actually do it yet. We will be keeping all historical data anyway, and so can rebuild whatever we need to.
Currently, contract runtime being the most affected team, it should do anything it wants so long as it keeps at least limited-to-three replayability at all times. Other teams can then look at the results of the experiment there and consider the way forward.

Ekleog-NEAR · 2023-02-24T15:10:58Z

We then discussed it again today in the contract runtime meeting. Our current plan is to:

Define a clear API for the transaction runtime, that is pure, and does not depend on storage or anything else other than the callbacks passed as parameters to it. This will probably have to be a trait. (And storage might want to be a trait too?)
Implement a transaction runtime switcher, that decides which transaction runtime implementation (trait object) to use to execute the actions
Make a new transaction runtime version that only has the last version of the protocol, and defer all older protocol versions to the old "shove-all-in" transaction runtime version via the transaction runtime switcher
Make a new transaction runtime version that is able to load a transaction runtime from a .so file, and makes us able to compile any transaction runtime version to a .so file
Introduce neard calls that allow a validator to switch the transaction runtime for some protocol version to a .so file on disk, so that they could get security updates with 0 downtime (all new actions using this protocol version would immediately go to the updated transaction runtime)

jakmeier · 2023-02-24T15:42:49Z

A quick overview of existing traits / interface between the 3 runtime could be useful to inform a decision where a cut could be made.

chain -> NightshadeRuntime

trait RuntimeAdapter for -> direction
- notably includes fn store(&self) -> &Store; that gives full mutable DB access
trait EpochInfoProvider gives some access in the <- direction (I think that's the only access that way)

NightshadeRuntime <-> transaction runtime (`struct Runtime {}`)

we don't use trait for -> direction but all necessary state is passed in and out in complex structs.
- &mut TrieUpdate is used for read-only DB access and a way to in-memory add changes to be committed to DB later by the chain
- &ApplyState has all context needed to apply transactions and receipts (protocol version / current block height / gas price / ...)
- trait CompiledContractCache gives write access to the DB in a round-about way
- ApplyResult and the changes made inside &mut TrieUpdate make up all the changes that go <-
trait EpochInfoProvider is passed in and gives some access in <- direction

transaction runtime (`Runtime`) <-> contract runtime (`VMLogic`)

trait VM defines everything ->
all output is defined in VMResult
all <- accesses are defined in trait External in runtime/near-vm-logic/src/dependencies.rs
&dyn CompiledContractCache is also passed in, pretty sure this is the only part where we have write access to a DB (we could easily move this to a separate DB, nobody except the VM care about the complied code cache)

jakmeier · 2023-02-24T15:50:15Z

Corrections on the previous comments are welcome, I'm not sure I got this all correct.

But based on what I described there, my opinion is that a clean API for the transaction runtime would probably be a cleaned-up version of what currently sits between struct Runtime {} and NightshadeRuntime. But cleaning it up could be a bit of work.

Ekleog-NEAR · 2023-03-10T15:20:14Z

So I just checked the current crate structure, and in order to publish crates on crates.io we would need to publish crates for near-vm-runner and all its dependency tree at least:

near-vm-runner
near-cache
near-vm-logic
near-vm-errors
near-primitives
near-stable-hasher
near-test-contracts
near-crypto
near-account-id
near-o11y
near-primitives-core
near-rpc-error-macro
probably others, I stopped going down the crate tree at this point

This makes me reconsider the idea of starting with manually publishing crates, even though I still think @nagisa’s idea from today’s meeting (which I’ll let you describe) is the best long-term.

Maybe we’d be better off just copy-pasting near-vm-runner for today’s lowest-hanging-fruit goal?

And then in a second step I think we should reintroduce the automated deploy-from-CI behavior, and get an auto-deploy of literally all crates with a major release on every nearcore version, ideally numbering that release with the latest protocol version this set of crates supports for clarity. It’d also make it easier for other teams to integrate limited replayability in their own workflows.

As a side-note, I remember seeing people from the rust community angry at… was it solana? for doing exactly this kind of huge update-all-crates events regularly. So if we actually go with this solution we may want to sponsor even a moderate recurring amount to Rust Foundation, so that we could reasonably answer that we’re paying for the resources we’re using should people complain about it. @akhi3030 maybe that’s a thing you could look into? (or maybe we already do, which’d be awesome)

nagisa · 2023-03-10T15:30:11Z

So here's the idea we came up in our meeting today, which we think is in the short term is the best and easiest way forward.

We publish a crate that contains our current runtime without any changes. Where we make the split in terms of what is a "published runtime" and what is not is still TBD, but this can be easily changed later, so I wouldn't overthink this too much. This crate will thus implement all the legacy versions of the protocols that we have support for up to this point in time.
We nuke all the old code available in the published crate, getting the state of the runtime to an idealized state, add a dependency along the lines of:
```
[dependencies]
previous_runtime_crate = { package = "runtime", version = "=CURRENT - 1" }
```

Finally a branching construct along the lines of is added as an entrypoint:

fn run_stuff(...)
    if we_are(REQUESTED_VERSION) { 
        run_code_with_this_runtime(code)
    } else {
        // Potentially adjusting for any API differences between current neard and what the old runtime expected.
        previous_runtime_crate::run_stuff(...)
    }
}

Every time we make a change to runtime, we publish a new crate, bumping the previous_runtime_crate and the current crate. With this setup, we're achieving effectively a limited-to-one replayability from the developer's perspective.

The cost of this is that every time we publish a runtime, the size of the final neard binary will increase by the size of a single runtime binary (sans whatever LLVM manages to deduplicate.) Another cost is linearly increasing compilation time, which I think we can deal with until we get more time to work out something better.

So I just checked the current crate structure, and in order to publish crates on crates.io we would need to publish crates for near-vm-runner and all its dependency tree at least.

I think ideally we make the runtime crate(s) self-contained as much as possible, reducing the number of crates the runtime is split into. There's no inherent reason why we need to maintain near-vm-* crates individually, and, I guess, getting rid of some of the other dependencies would mean crafting a hermetic API after all… This may change where we make the initial split.

One observation is that building a .so (cdylib) would produce self-contained binaries (we could publish these somewhere else?) and with smart use of git tags these could be made always buildable and reproducible. A concern here are global variable duplication between neard-the-binary and the .so (especially for things like tracing), but I think there are solutions for that too. These .sos could use largely the same scheme as with crates above, except that instead of [dependencies] it'd be trying to load an older version via libloading or something.

nagisa · 2023-03-10T15:37:13Z

As a side-note, I remember seeing people from the rust community angry at… was it solana? for doing exactly this kind of huge update-all-crates events regularly.

There are some projects like that – this was handled pretty well by deprioritizing these crates in places like e.g. docs.rs queue where interactive uses of the service disproportionally suffer from added latency spikes. Otherwise publishing crates in patches is not a big deal AFAIU.

Ekleog-NEAR · 2023-03-13T12:38:37Z

Right, so we probably do need to plan on spending quite some time refactoring, as the API of near-vm-runner will probably be much larger than the fn run we were initially thinking of (eg. near-o11y, near-primitives, near-stable-hasher and/or near-crypto are probably going to be required by things outside the runtime itself, so we’ll need to decide how to deal with them)

Otherwise publishing crates in patches is not a big deal AFAIU.

Right, I don’t think it’s a big deal either, but PR is not a rational thing and if Pagoda can avoid a PR hit (or even turn it into positive PR) for like $10/mo it’s probably worth it :)

bowenwang1996 · 2023-06-12T14:54:10Z

@Ekleog-NEAR @nagisa is the eventual goal to extend this idea to nearcore as a crate? If not, what is the plan for limited replayability outside of runtime?

nagisa · 2023-06-12T15:08:00Z

We had hoped this to be a nearcore-wide thing, however when discussing this idea with other teams the overall conclusion was that no other team is feeling the impact of replayability requirements as much as the contract runtime does. For them the resources-to-benefit ratio is significantly less than for the contract runtime, especially with how busy they are right now with stuff that's higher priority anyhow.

As thus we're pursuing a design that the contract runtime team can implement on its own. Unfortunately, this also means that the solution is unlikely to be expandable past the runtime boundaries easily, or at least the end result wouldn't make much sense, holistically speaking.

(This is my opinion, Leo’s may be different)

bowenwang1996 · 2023-06-14T22:45:43Z

To me one of the most important goals of limited replayability is to improve the maintainability of the codebase overall. Restricting our attention to the contract runtime will fall short of that goal. That said, I think it is okay to move gradually and potentially start with contract runtime first, but the design should be extendable to other parts of the codebase. cc @akhi3030 @jakmeier

akhi3030 · 2023-06-15T07:23:31Z

@bowenwang1996: I would also like to see this project improve the maintainability of the entire codebase. I think it is good to start with just runtime as the first milestone and then see where we are.

jakmeier · 2023-06-15T10:22:17Z

As I remember the discussion, the major push back came because proposed solutions that work holistically requires multiple neard binary versions on validator nodes. (During protocol upgrade, they hot switch from one to the other. Very tricky to do with RocksDB, network state, and so one, considering the time to upgrade is only one block time.)
I think there could be other solutions that don't require multiple binaries but as I remember, we didn't discuss hat much.

But regardless of the strategy, it feel like the RocksDB layer of storage, and the network stack, have their own concepts of versioning that is not easily solvable in a general way. For those components, we have to think carefully about backwards compatibility on each upgrade anyway and keeping multiple versions of the code around might only make things worse.

For the contract runtime OTOH, we can just include multiple versions in the same binary and pick them dynamically. This idea can be extended to the transaction runtime, if we can define appropriate interfaces. Those two components are also where we have most protocol version dependent if-branches at the moment.

Note that this doesn't drop support for old protocol versions. It only makes runtime code cleaner and easier to maintain.
If we want to remove support for old protocol versions, we could do so in a more manual process. For example, once a year we decide to drop support for versions older than 1 year. That would allow to delete code everywhere in nearcore with a much higher benefit/effort ratio.

bowenwang1996 · 2023-06-19T16:55:49Z

@jakmeier the separation between what we do manually and what we implement programmatically makes sense. I guess one thing we need to consider is what to do in case there is a need to replay some old history with old versions. Today it is possible to do with the same binary, but if we manually remove some old versions then we need some other solution.

jakmeier · 2023-06-22T11:24:31Z

The simplest solution would probably be to offload the work to those who actually want to replay a complete history. They would have to checkout old nearcore tags, compile them with the specified toolchain, and then use the different binary versions for different block height ranges.

I'm not sure if anything more sophisticated is worth building. At least I am not aware of anyone actually doing such replays.

akhi3030 · 2023-06-22T11:49:33Z

I agree. I would wait and see if someone actually requests something more sophisticated before we devote resources to it.

Part of #8197

@jakmeier

Part of #8197 This changes the behavior, in ways that hopefully would not be problematic. @jakmeier AFAICT you’re one of the last ones to have touched io traces, WDYT about this change?

* fix(state-sync): finalize using flat storage * fix(state-sync): Applied values from state parts should be inlined when applicable (#9265) * fix(state-sync): Applied values from state parts should be inlined when applicable * fix(state-sync): Applied values from state parts should be inlined when applicable * fix(state-sync): Applied values from state parts should be inlined when applicable * fix(state-sync): Applied values from state parts should be inlined when applicable * clippy * fmt * logging * scan from a hash * refactor: add FlatStateValue constructor for on-disk values (#9269) This logic is used in multiple places, so it makes sense to abstract it behind a method in `FlatStateValue`. `on_disk` name is chosen over `new` since this logic is only applicable for values to be persisted on disk. Later we will also add inlining for values as part of Flat Storage Delta and those will have lower inlining threshold, so we will have another constructor. * refactor: Move adjust-db tool to database tool (#9264) Refactoring of `adjust-db` tool, moving it to the `database` toolset where there's of database tools should live. * feat: remove fixed shards (#9219) In this PR I'm removing the fixed shards field from the shard layout. I removed the fixed_shards field directly from the ShardLayoutV1. This implementation is quite hacky as it does not follow the typical flow for making protocol changes. This approach needs to be discussed and agreed on first but here is an implementation just to get a feel for it. The motivation for removing fixed shards is that fixes_shards break the account ids contiguity within a shard. A sub-account of a fixed shard is assigned to the same shard as the fixed account but it may fall in the middle of account range of another shard. For example let's consider the following shard layout: ``` fixed_shards = ['fixed'] boundary_accounts = ['middle'] ``` In this shard layout we have three shards: - 0 - 'fixed' and all sub-accounts - 1 - [..'middle'] with exception for any sub-account of fixed - 2 - ['middle'..] with exception for any sub-account of fixed Because of the fixed account, shards 1 and 2 are now full of holes for any subaccounts of 'fixed'. This property of fixed_shards makes it hard to perform any range operations on a shard such as resharding or deletion. The proper way to do it would be to add ShardLayoutV2 - a copy of V1 with fixed_shards removed and version bumped. Due to the way that we use for storing state date in storage - using the shard_uid as a prefix for the storage key - this approach would require a careful migration. One way to go about it would be to perform a null-resharding in epoch preceeding the shard layout version bump. There the node would have to create a copy of the state and store it in storage with the new version in shard_uid. Once the state is duplicated the node can keep applying changes to both states, same as in resharding. Such a migration would need to be prepared very carefully and would take a lot of time and effort. The reason why I believe that we can get away without new shard layout version is that fixed_shards are not used in production and the shard version is not actually stored in the state. The ShardLayoutV1 with empty fixed_shards is semantically equivalent to ShardLayoutV1 with the fixed_shards field removed. It's worth mentioning that the migration pains would be removed if we decided to change the storage structure to one that doesn't have the shard_uid in the storage key. * remove near-cache dep from near-vm-runner (#9273) Part of #8197 * fix(state-sync): Clear flat storage outside of ClientActor (#9268) Deleting multiple GBs from RocksDB is definitely taking more than 100ms. In case a node catches up, it can potentially make a chunk producer or a block producer miss its chunk/block. This PR moves deletion of Flat State to `SyncJobsActor` to avoid blocking `ClientActor`. * remove near-stable-hasher dep from near-vm-runner (#9270) Part of #8197 * fix(state-sync): Applied values from state parts should be inlined when applicable (#9265) * fix(state-sync): Applied values from state parts should be inlined when applicable * fix(state-sync): Applied values from state parts should be inlined when applicable * fix(state-sync): Applied values from state parts should be inlined when applicable * fix(state-sync): Applied values from state parts should be inlined when applicable * clippy * fmt * fix(state-sync): Finalize state sync using flat storage * fix(state-sync): Finalize state sync using flat storage --------- Co-authored-by: Anton Puhach <anton@near.org> Co-authored-by: Jure Bajic <jure@near.org> Co-authored-by: wacban <wacban@users.noreply.github.com> Co-authored-by: Ekleog-NEAR <96595974+Ekleog-NEAR@users.noreply.github.com> Co-authored-by: near-bulldozer[bot] <73298989+near-bulldozer[bot]@users.noreply.github.com>

nagisa · 2023-07-13T15:41:30Z

One axis of consideration that I feel our summary of possible split points does not capture is where does the serialized forms of chain data materializes. Unfortunately it seems that right now we have a bunch of that all over the place. Some of it is in near-vm-runtime which borsh-serializes Actions for various delegation related functionality, and a lot of it is in the chain component. Surprisingly (to me) the node-runtime crate proper has very little serialization code, but there's a bunch of serializable stuff in near-primitives crate which currently sits in between near-vm-runtime and node-runtime:

near-primitives-core → near-vm-runtime → near-primitives → node-runtime → NightShade → chain

Ideally, I feel, we should end up in a situation where all of these schemas are part of our limited replayability story, which would mean, I feel, that we'd need to pull in a bunch of BorshSerialize types up the dependency chain so that they end up above whatever split point we end up picking at the moment.

Rather than keeping these types inside near-vm-runner where they don't belong conceptually and interfere with limited replayability due to being schemas for serialization on-chain, we move them to near_primitives and expose the necessary interfaces to the `near-vm-runner` via `Externals`. With this none of the near-vm-runner code is on-chain schema sensitive, which is one less thing to worry about. “But wait,” you ask “shouldn’t limited-replayability help with not needing to think about what changes are made in the first place?” Yes, indeed it should be that way, but it isn’t entirely clear to me if we don’t or won’t have intentional or accidental functionality where the data is not serialized, but rather deserialized, even without a need to invoke the runtime to execute the code. This might be as simple a tool as something that shows the stored blockchain structure (e.g. near explorer.) (But more seriously, I have petitioned that it would make sense to have all schemas part of the limited replayability story as well, but that didn't really gain as much traction as I had hoped, so I’m just punting this problem upwards and away from the runtime crates.) cc #8197

Here the noted protocol features have been replaced with runtime configuration via parameters instead. This would be one solution/option to getting rid of compile-time features in contract runtime which is interfering with limited replayability (compile-time feature control means that all of the crates still need to be built as a single compilation unit with consistent options.) cc #8197

@jakmeier

…eatures to runtime configuration (#9364) This PR is built on top of a previous PR (see the base branch). Here the noted protocol features have been replaced with runtime configuration via parameters instead. This would be one solution/option to getting rid of compile-time features in contract runtime which is interfering with limited replayability (compile-time feature control means that all of the crates still need to be built as a single compilation unit with consistent options.) cc @jakmeier cc #8197

Rather than keeping these types inside near-vm-runner where they don't belong conceptually and interfere with limited replayability due to being schemas for serialization on-chain, we move them to near_primitives and expose the necessary interfaces to the `near-vm-runner` via `Externals`. With this none of the near-vm-runner code is on-chain schema sensitive, which is one less thing to worry about. “But wait,” you ask “shouldn’t limited-replayability help with not needing to think about what changes are made in the first place?” Yes, indeed it should be that way, but it isn’t entirely clear to me if we don’t or won’t have intentional or accidental functionality where the data is not serialized, but rather deserialized, even without a need to invoke the runtime to execute the code. This might be as simple a tool as something that shows the stored blockchain structure (e.g. near explorer.) (But more seriously, I have petitioned that it would make sense to have all schemas part of the limited replayability story as well, but that didn't really gain as much traction as I had hoped, so I’m just punting this problem upwards and away from the runtime crates.) cc #8197

Rather than keeping these types inside near-vm-runner where they don't belong conceptually and interfere with limited replayability due to being schemas for serialization on-chain, we move them to near_primitives and expose the necessary interfaces to the `near-vm-runner` via `Externals`. With this none of the near-vm-runner code is on-chain schema sensitive, which is one less thing to worry about. “But wait,” you ask “shouldn’t limited-replayability help with not needing to think about what changes are made in the first place?” Yes, indeed it should be that way, but it isn’t entirely clear to me if we don’t or won’t have intentional or accidental functionality where the data is not serialized, but rather deserialized, even without a need to invoke the runtime to execute the code. This might be as simple a tool as something that shows the stored blockchain structure (e.g. near explorer.) (But more seriously, I have petitioned that it would make sense to have all schemas part of the limited replayability story as well, but that didn't really gain as much traction as I had hoped, so I’m just punting this problem upwards and away from the runtime crates.) cc near#8197

@jakmeier

…eatures to runtime configuration (near#9364) This PR is built on top of a previous PR (see the base branch). Here the noted protocol features have been replaced with runtime configuration via parameters instead. This would be one solution/option to getting rid of compile-time features in contract runtime which is interfering with limited replayability (compile-time feature control means that all of the crates still need to be built as a single compilation unit with consistent options.) cc @jakmeier cc near#8197

Rather than keeping these types inside near-vm-runner where they don't belong conceptually and interfere with limited replayability due to being schemas for serialization on-chain, we move them to near_primitives and expose the necessary interfaces to the `near-vm-runner` via `Externals`. With this none of the near-vm-runner code is on-chain schema sensitive, which is one less thing to worry about. “But wait,” you ask “shouldn’t limited-replayability help with not needing to think about what changes are made in the first place?” Yes, indeed it should be that way, but it isn’t entirely clear to me if we don’t or won’t have intentional or accidental functionality where the data is not serialized, but rather deserialized, even without a need to invoke the runtime to execute the code. This might be as simple a tool as something that shows the stored blockchain structure (e.g. near explorer.) (But more seriously, I have petitioned that it would make sense to have all schemas part of the limited replayability story as well, but that didn't really gain as much traction as I had hoped, so I’m just punting this problem upwards and away from the runtime crates.) cc #8197

@jakmeier

…eatures to runtime configuration (#9364) This PR is built on top of a previous PR (see the base branch). Here the noted protocol features have been replaced with runtime configuration via parameters instead. This would be one solution/option to getting rid of compile-time features in contract runtime which is interfering with limited replayability (compile-time feature control means that all of the crates still need to be built as a single compilation unit with consistent options.) cc @jakmeier cc #8197

nagisa · 2024-01-05T14:02:04Z

A point for myself is to move this forward is to:

Define the scope and the end state implementation that we're striving for. Drop all the alternatives and options. Just pick one thing and work towards that. Do this in the context of what we have learned working on this so far;
- This is especially relevant in the context of the last message I made. We should not really have any more "ideally this is a part of the story" questions anymore. It should be written down exactly what is included.
Split the remaining work that is necessary to achieve the end state into individually actionable tasks (zkASM project is a good example to learn from here.)

nagisa · 2024-01-16T11:38:44Z

Scope

Before this message the scope of this task was quite ambiguous. We were not sure if we want a split to be entirely at the boundary of the near-vm-runner or if it should occur somewhere further up within the transaction runtime. The reasons for this are multifaceted: near_vm_runner::run is a very nice singular chokepoint to make the split at, but it is also one that’s not achieving as much as we would like the project to (modifying transaction runtime is also a major pain.)

Well, it turns out that the ambiguity has caused us more harm than good, actually. The progress itself had to be put on hold for us to rethink the approach here. Eventually we realized that we should focus on putting out a working MVP prototype. Once we have a working MVP we can then plan our future steps without making this a huge change that touches upon everything in the system.

Note that this explicitly does not include into the limited replayability scheme some of the very stable common dependencies – near-crypto or near-primitives-core – that near-vm-runner depends on currently.

For the MVP lets depend on near-vm-runner versions via regular crate dependencies, with an integrator crate being setup to invoke older versions of the runtime as necessary. We can explore using semver-hack or dynamic libraries or any other fancier mechanism we thought of as a refinement of the MVP.

Fortunately there shouldn’t be too much remaining to achieve all of this. near-vm-runner has been cleaned up both through the efforts we did towards limited replayability before and through ad-hoc cleanups we’ve done during the downtime.

Remaining work

near-vm-runner is sufficiently self-contained as a crate for limited replayability;
Wait for the next near-vm-runner release;
Add a configuration parameter to choose a version of the runtime to invoke and a table in the integrating crate to pick the relevant crate based on the parameter;
Add multiple versions of dependency on near-vm-runner from the crate integrating it and the code to select between the versions based on the parameter introduced above, thus making the cut.

Ekleog-NEAR · 2024-01-16T15:16:55Z

I’ll ask, for the remaining dependencies of near-vm-runner: do we want to stabilize them, releasing a 1.0 for them like near-account-id, to disincentivize ourselves against making breaking changes to them?

nagisa · 2024-01-19T10:37:44Z

The three dependencies that we have currently are near-crypto, near-primitives-core and near-parameters (and everything they depend on transitively, I guess.) However it is less so a problem that a dependency like it exists than the fact that it appears in the public interface of the near-vm-runner crate.

We clearly cannot freeze near-parameters – our configuration schema is expected to be pretty fluid into perpetuity. Unfortunately it also shows up in the public API of near-vm-runner, so the changes of some sort will be necessary there (I removed the relevant checkbox in the remaining work section to that effect.)

near-crypto and near-primitives-core are both crates that I wouldn’t lose any sleep over making frozen though, but there’s always an option of making copies of the few types used in the public API, too.

nagisa · 2024-05-16T10:05:47Z

This is not currently being worked on, so unassigning.

Ekleog-NEAR added C-housekeeping Category: Refactoring, cleanups, code quality P-medium Priority: medium C-tracking-issue Category: a tracking issue labels Dec 9, 2022

Ekleog-NEAR assigned nagisa Dec 9, 2022

This was referenced Dec 9, 2022

Tech Debt: Get rid of wasmer0 from our codebase #8198

Open

Contract-Runtime Technical Debt Tracking Issue #8203

Open

Ekleog-NEAR self-assigned this Feb 24, 2023

Longarithm mentioned this issue Jun 8, 2023

refactor: remove obsolete rewards config override #9151

Closed

Ekleog-NEAR mentioned this issue Jun 12, 2023

Merge near-vm-errors into near-vm-runner #9180

Closed

Ekleog-NEAR mentioned this issue Jun 15, 2023

move TrieNodesCount to near-vm-errors #9208

Merged

This was referenced Jun 22, 2023

remove near-vm-logic’s dependency on near-primitives #9225

Merged

move action to near-vm-errors #9217

Merged

move DataReceiver to near-vm-errors #9211

Merged

remove old function checking for protocol version #9192

Merged

nikurt pushed a commit that referenced this issue Jul 12, 2023

remove near-stable-hasher dep from near-vm-runner (#9270)

4a88019

Part of #8197

nagisa mentioned this issue Jul 14, 2023

Remove the workspace dependencies from the near-vm-runner crate #9301

Closed

Ekleog-NEAR mentioned this issue Jul 14, 2023

near-vm-runner: move protocol-sensitive error schemas to near-primitives #9295

Merged

This was referenced Aug 7, 2023

runtime: move actions back to near_primitives #9332

Merged

Move flat-storage-reads/fix-contract-loading-cost/implicit-accounts features to runtime configuration #9364

Merged

nagisa mentioned this issue Aug 3, 2023

[2023-08] Make near-vm-runner crate independent #9383

Closed

nagisa mentioned this issue Nov 21, 2023

[ProjectTracking]: Limited replayability near/near-one-project-tracking#3

Open

nagisa unassigned nagisa and Ekleog-NEAR May 16, 2024

Ekleog-NEAR added A-transaction-runtime Area: transaction runtime (transaction and receipts processing, state transition, etc) A-contract-runtime Area: contract compilation and execution, virtual machines, etc T-contract-runtime Team: issues relevant to the contract runtime team labels May 17, 2024

nagisa mentioned this issue Jun 14, 2024

[refactor] Extracted near-vm-runner from near-primitives #11578

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limited Replayability Tracking Issue #8197

Limited Replayability Tracking Issue #8197

Ekleog-NEAR commented Dec 9, 2022 •

edited

Loading

Ekleog-NEAR commented Feb 24, 2023

Ekleog-NEAR commented Feb 24, 2023

jakmeier commented Feb 24, 2023

jakmeier commented Feb 24, 2023

Ekleog-NEAR commented Mar 10, 2023

nagisa commented Mar 10, 2023 •

edited

Loading

nagisa commented Mar 10, 2023

Ekleog-NEAR commented Mar 13, 2023

bowenwang1996 commented Jun 12, 2023

nagisa commented Jun 12, 2023 •

edited

Loading

bowenwang1996 commented Jun 14, 2023

akhi3030 commented Jun 15, 2023

jakmeier commented Jun 15, 2023

bowenwang1996 commented Jun 19, 2023

jakmeier commented Jun 22, 2023

akhi3030 commented Jun 22, 2023

nagisa commented Jul 13, 2023

nagisa commented Jan 5, 2024

nagisa commented Jan 16, 2024 •

edited

Loading

Ekleog-NEAR commented Jan 16, 2024

nagisa commented Jan 19, 2024

nagisa commented May 16, 2024

Limited Replayability Tracking Issue #8197

Limited Replayability Tracking Issue #8197

Comments

Ekleog-NEAR commented Dec 9, 2022 • edited Loading

Current status

Ekleog-NEAR commented Feb 24, 2023

Ekleog-NEAR commented Feb 24, 2023

jakmeier commented Feb 24, 2023

chain -> NightshadeRuntime

NightshadeRuntime <-> transaction runtime (struct Runtime {})

transaction runtime (Runtime) <-> contract runtime (VMLogic)

jakmeier commented Feb 24, 2023

Ekleog-NEAR commented Mar 10, 2023

nagisa commented Mar 10, 2023 • edited Loading

nagisa commented Mar 10, 2023

Ekleog-NEAR commented Mar 13, 2023

bowenwang1996 commented Jun 12, 2023

nagisa commented Jun 12, 2023 • edited Loading

bowenwang1996 commented Jun 14, 2023

akhi3030 commented Jun 15, 2023

jakmeier commented Jun 15, 2023

bowenwang1996 commented Jun 19, 2023

jakmeier commented Jun 22, 2023

akhi3030 commented Jun 22, 2023

nagisa commented Jul 13, 2023

nagisa commented Jan 5, 2024

nagisa commented Jan 16, 2024 • edited Loading

Scope

Remaining work

Ekleog-NEAR commented Jan 16, 2024

nagisa commented Jan 19, 2024

nagisa commented May 16, 2024

Ekleog-NEAR commented Dec 9, 2022 •

edited

Loading

NightshadeRuntime <-> transaction runtime (`struct Runtime {}`)

transaction runtime (`Runtime`) <-> contract runtime (`VMLogic`)

nagisa commented Mar 10, 2023 •

edited

Loading

nagisa commented Jun 12, 2023 •

edited

Loading

nagisa commented Jan 16, 2024 •

edited

Loading