Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace IAVL+ with SMT #6

Open
liamsi opened this issue Jan 4, 2021 · 21 comments
Open

Replace IAVL+ with SMT #6

liamsi opened this issue Jan 4, 2021 · 21 comments
Assignees
Labels
C:ORU This issue will be necessary for both ORUs and the LL-abci app ice-box

Comments

@liamsi
Copy link
Member

liamsi commented Jan 4, 2021

Use https://github.com/lazyledger/smt/ instead of https://github.com/cosmos/iavl which is the default state tree in the cosmos sdk.

Ideally, this would be done in a way that makes upstreaming this to the Cosmos-SDK easy - either as an additional optional store, or, as default that replaces IAVL. The Cosmos-SDK team seems open to the idea and might want to collaborate with them on this.

ref: cosmos#7100

@liamsi liamsi added the C:ORU This issue will be necessary for both ORUs and the LL-abci app label Jan 4, 2021
@tzdybal tzdybal self-assigned this Jan 6, 2021
@adlerjohn
Copy link
Member

adlerjohn commented Jan 6, 2021

There are two concerns with using an SMT over the IAVL+ tree:

  1. Pruning of old state. This happens naturally with the SMT, so there's no work that needs to be done here.
  2. Keeping old state. Necessary for archive nodes (ones that hold on to enough information to easily serve the state at any block height). One approach requires keeping track of all previous state deltas (i.e. changes with proofs from the state at one block height to the next). Coincidentally, these state deltas are used as part of the current immediate execution proposal.

@musalbas
Copy link
Member

musalbas commented Jan 7, 2021

Keeping old state. Necessary for archive nodes (ones that hold on to enough information to easily serve the state at any block height). One approach requires keeping track of all previous state deltas (i.e. changes with proofs from the state at one block height to the next). Coincidentally, these state deltas are used as part of the current immediate execution proposal.

Summarising discussion from Slack: in other words, this could involve keeping an older version of the tree on disk, and storing new nodes in memory/cache. The new nodes are then persisted to disk, and all obsolete nodes pruned. Once this persisting happens, the older version of the state is no longer available on disk (and would have to be re-computed from the chain to be recovered).

@liamsi
Copy link
Member Author

liamsi commented Jan 7, 2021

the older version of the state is no longer available on disk

If you do not keep around at least some older versions (e.g. every X heights), you would need start from genesis to recompute any state that isn't the most recent.

Also, @ethanfrey mentioned (on discord), that at least some past states might need to be efficiently queriable:

Support for persistent historical queries. At least allowing the last 10 or so blocks to be queried. I am pretty sure that this is required by ibc. In any case removing all historical queries is a major feature reduction that should be discussed carefully.

@musalbas
Copy link
Member

musalbas commented Jan 7, 2021

What's the use case for historical queries of old state?

@liamsi
Copy link
Member Author

liamsi commented Jan 7, 2021

@cwgoes @jackzampolin can you help us understand the requirements regarding querying past state in the context of IBC? Ethan Frey mentioned on discord that you (Jack) ran into some issues regarding past queries in the relayer?

@ethanfrey
Copy link

ethanfrey commented Jan 7, 2021

I took a brief look at https://github.com/lazyledger/smt/blob/master/go.mod and you don't seem to import any databases? What do you use to write to disk? (There is a whole lot of work in dbs to handle atomicity, crash recovery, etc that is not in the go standard library).

Update: I see you accept a generic MapStore to handle disk access. This doesn't allow batch atomic writes or any more advanced db functionality, and will need to be expanded to provide proper sdk support

@ethanfrey
Copy link

Here is my full discord comments related to committing to disk, historical versions:

  • Support for persistent historical queries. At least allowing the last 10 or so blocks to be queried. I am pretty sure that this is required by ibc. In any case removing all historical queries is a major feature reduction that should be discussed carefully.

  • All data atomically committed to disk upon abci commit message. There have been issues in the past of some substores with different heights when a node was shut down during commit phase. Also, not writing every block to disk was done in 0.38.0 and caused major headaches on restarts and any attempts to upgrade. Commit means commit.

I also think this historical query issue could be handled much better by low level dB functionality, like rockdb snapshots, which use os level CoW tricks to be highly efficient, rather than building this manually on top of then merklized store. This could remove all pruning logic and maintain historical queries, but only if the underlying kv store supports snapshots (or we try lvm snapshotting tricks, but that is too os dependent I think)

@ethanfrey
Copy link

I add this in particular to refute a suggestion above:

Summarising discussion from Slack: in other words, this could involve keeping an older version of the tree on disk, and storing new nodes in memory/cache. The new nodes are then persisted to disk, and all obsolete nodes pruned. Once this persisting happens, the older version of the state is no longer available on disk (and would have to be re-computed from the chain to be recovered).

This was tried in some iavl optimizations in 0.38.0 (flush to disk every 10 or 100 blocks) and caused huge problems. One poor implementation would never restart properly. Even after that was solved, it remained near impossible to properly upgrade nodes. For that we need everyone to shut down their nodes v1 at height H and either: dump full state at H to new genesis file, or start node v2 with upgrade handler to process block H+1. Neither of those are feasible if state at height H was not properly synced.

@ethanfrey
Copy link

We should be able to atomically commit each block. All in-progress work on txs inside a block is strictly in memory and only flushed to disk on abci Commit command, which must be safe. Currently, when substores use different backing dbs there is no way to guarantee atomicity (one substore may be updated, other not, the mutlistore table of "latest version" may be off). Bez had some ideas how to improve that, but for now, we should just plan one backing db for all state, with atomic commits.

As I said above, I think using db or os level solutions for serving older snapshots could be much easier and more efficient than the current iavl solution.

@musalbas
Copy link
Member

musalbas commented Jan 7, 2021

This was tried in some iavl optimizations in 0.38.0 (flush to disk every 10 or 100 blocks) and caused huge problems.

The "cache" could be on-disk too, I was just considering it from an algorithms perspective, in terms of how to implement pruning.

Support for persistent historical queries. At least allowing the last 10 or so blocks to be queried. I am pretty sure that this is required by ibc. In any case removing all historical queries is a major feature reduction that should be discussed carefully.

What's the use case for historical queries of old state? Why does the SDK require this?

By "allowing the last 10 or so blocks to be queried" do you mean querying the data within those blocks, or the state of the key/value store historically?

@ethanfrey
Copy link

By "allowing the last 10 or so blocks to be queried" do you mean querying the data within those blocks, or the state of the key/value store historically?

Allowing the state of the key/value store to be historically queried. All block data is generally available back to genesis and handled by tendermint (not the sdk app)

I cannot answer why this is important, just that is was considered important and does exist (with considerable work to make it work properly) and cannot be tossed without a deep discussion with the sdk team (or prior sdk team that build this)

@musalbas
Copy link
Member

musalbas commented Jan 7, 2021

cc @zmanian, any insight on the above? I'm not sure why the Cosmos SDK apps or IBC would need to be able to query recently stale state from the kv store. Even standard Ethereum nodes do not do this AFAIK. If apps need any "historical" data, it should use event logging.

@alexanderbez
Copy link

alexanderbez commented Jan 7, 2021

Ethereum and the SDK are drastically different (SDK doesn't have event logging). Historical queries are a critical part of an SDK-based app. Historical queries are used for many things (e.g. fetching old block and tx data, viewing account balances and rewards over time, performing analysis on distribution related data, etc...). Almost all clients (e.g. explorers, wallets, relayers, etc...) rely on historical queries in some capacity.

To summarize, we need the following from a logical store:

  • basic CRUD operations
  • historical queries with proofs (i.e. give me X at height/version H)
  • batching/atomic writes
  • (reverse) prefix iteration
  • inclusion proofs
  • exclusion proofs? (I'm not totally sure about this one)
  • range proofs

@musalbas
Copy link
Member

musalbas commented Jan 7, 2021

Ethereum and the SDK are drastically different (SDK doesn't have event logging). Historical queries are a critical part of an SDK-based app. Historical queries are used for many things (e.g. fetching old block and tx data, viewing account balances and rewards over time, performing analysis on distribution related data, etc...). Almost all clients (e.g. explorers, wallets, relayers, etc...) rely on historical queries in some capacity.

So I think there's two things to consider (to be clear):

  1. Cosmos SDK apps that need to access historical key->value data within the app's state machine itself
  2. Clients that need historical key->value data, to query old app data, e.g. block explorers

It's clear why (2) is necessary, and this is what would be achieved by e.g. an Ethereum archive node or a Bitcoin full node with indexing enabled. But is (1) the case? That would be similar to, for example, Ethereum requiring you to run an archive node to be able to validate transactions, which would be very unpractical.

Support for persistent historical queries is definitely doable (and is currently already the case for the current SMT implementation, as it doesn't support pruning at all currently, so you can simply query via old roots) - but I'm wondering why this is a requirement for e.g. IBC, as ideally this is something that only clients observing the chain for various purposes (e.g. explorers) should do.

@alexanderbez
Copy link

alexanderbez commented Jan 7, 2021

No, (1) is not a valid use-case. Modules do not use historical data.

@musalbas
Copy link
Member

musalbas commented Jan 7, 2021

I see, that makes things a lot clearer, thanks.

If it's infrastructure projects that need to query historical state, rather than the Cosmos SDK app's state machine itself, then maybe it would be feasible to run an instance of the Cosmos SDK app in "archival mode" to e.g. instruct the SMT to keep the historical data (basically, disable pruning). Will need to look more into how this currently works in the SDK/IAVL...

@alexanderbez
Copy link

maybe it would be feasible to run an instance of the Cosmos SDK app in "archival mode" to e.g. instruct the SMT to keep the historical data (basically, disable pruning).

Yes, as long as this is possible, then great! It would need to be configurable (e.g. disable/enable pruning). Just for reference, the way this currently works in the SDK is that we call an API that loads a versioned tree in memory during a query and if that tree doesn't exist, we gracefully return an error.

@ethanfrey
Copy link

Nice summary @alexanderbez and thanks for chiming in. Just two points for the proofs:

  • exclusion proofs? (I'm not totally sure about this one)
  • range proofs

Exclusion proofs are critical to IBC to allow timeouts (this packet was provably not received by height H).

Range proofs are not used nor required as far as I know. Even the IBC spec only refers to batch proofs, allowing us to prove eg. 20 items at once and compress a lot of the redundancy in the proofs, so it may be the size of maybe 3 or 4 single proofs.

In ICS23, I implemented inclusion, exclusion, and batch proofs. Batch proofs may also be provably range proofs if all keys are stored in order, but also work in many other circumstances (or for non-adjacent items)

@alexanderbez
Copy link

alexanderbez commented Jan 7, 2021

It has come to my attention that multiple stakeholders are in need of replacing the current usage of IAVL with a SMT or some equivalent (e.g. LL, Foam, Nomic, and Regen).

It is my suggestion that we all team up and try to flesh out an ADR on a minimal viable logical datastore interface that multiple projects/stakeholders can implement s.t. we can easily plug-n-play different implementations into the SDK's multi-store layer.

@cwgoes
Copy link

cwgoes commented Jan 11, 2021

Nice summary @alexanderbez and thanks for chiming in. Just two points for the proofs:

  • exclusion proofs? (I'm not totally sure about this one)
  • range proofs

Exclusion proofs are critical to IBC to allow timeouts (this packet was provably not received by height H).

Range proofs are not used nor required as far as I know. Even the IBC spec only refers to batch proofs, allowing us to prove eg. 20 items at once and compress a lot of the redundancy in the proofs, so it may be the size of maybe 3 or 4 single proofs.

In ICS23, I implemented inclusion, exclusion, and batch proofs. Batch proofs may also be provably range proofs if all keys are stored in order, but also work in many other circumstances (or for non-adjacent items)

Only inclusion and exclusion proofs are strictly required for the current incarnation of IBC. Batching might be nice in the future.

@hxrts
Copy link

hxrts commented Jan 12, 2021

Continuing discussion here for anyone who may have missed the comment on cosmos#7100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:ORU This issue will be necessary for both ORUs and the LL-abci app ice-box
Projects
No open projects
Status: TODO
Development

No branches or pull requests

9 participants