Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower overhead for the solo-to-chain communication path #3763

Closed
michaelfig opened this issue Aug 26, 2021 · 6 comments
Closed

Lower overhead for the solo-to-chain communication path #3763

michaelfig opened this issue Aug 26, 2021 · 6 comments
Assignees
Labels
cosmic-swingset package: cosmic-swingset enhancement New feature or request solo the solo node (packages/solo)

Comments

@michaelfig
Copy link
Member

michaelfig commented Aug 26, 2021

What is the Problem Being Solved?

Our goal is to use IBC between the solo machine (the private off-chain client that connects to the chain and sends transactions to manipulate private and public objects) and the chain #1670. This may take significant time to do, and we want our solo communications to be more scalable in the meantime.

The main problem is that the Cosmos RPC nodes cannot answer RPC queries while running the JS VM as part of EndBlock.
We want the JS VM output to be available to the AppHash, but not prevent RPCs sent by independent solos. The current implementation of ag-solo subscribes to the new block events, and polls the on-chain mailbox (i.e. makes a ag-cosmos-helper query swingset mailbox agoric1... call) on every new block. This runs amok of the global ABCI lock tendermint/tendermint#6899, since N ag-solos put a load of N queries per block on the RPC nodes and those RPC nodes are spending around 8 seconds in EndBlock.

Description of the Design

We will move more of the work into Tendermint events, whose subscription is not subject to the ABCI lock, even though the WebSocket that subscribes uses the same port (26657) to which the other RPC queries are made.

  1. we will publish on-chain mailbox changes (including acked sequence numbers and messages from the chain) as Tendermint events.
  2. we will subscribe to those events via a WebSocket without hitting the global ABCI/RPC lock
  3. we (already) transfer the messages from those events into the ag-solo, and remove outbound messages from the message pool when the chain has acked them
  4. we (already) have a Nagle timer to batch transactions from the solo, we'll probably increase that timer to 2-3 seconds
  5. we'll add an "impatience" timer to deal with failures in the solo-to-chain submissions (which currently have several failure modes we can't distinguish). When that timer fires (say every 10 seconds, with backoff), without having received a mailbox update event from the chain, we'll submit a new transaction with all the unacknowledged messages and any new messages that were added in the meantime

Security Considerations

The chain-cosmos-sdk.js communication does not do any light client verification. We need that feature from CosmJS and/or the solo-IBC relayer we choose to use. Until then, our solo client trusts the RPC server it connects to.

Fixing this would entail putting commitments to messages outbound from the chain in the KVStore, fetching them, and verifying their Merkle proofs before we accept a packet from the chain.

Test Plan

Create a test network with many ag-solos connected to it and verify that the RPC node stays responsive.

@michaelfig michaelfig added the enhancement New feature or request label Aug 26, 2021
@michaelfig michaelfig assigned michaelfig and warner and unassigned michaelfig Aug 26, 2021
@michaelfig michaelfig added cosmic-swingset package: cosmic-swingset solo the solo node (packages/solo) labels Aug 26, 2021
@michaelfig michaelfig changed the title Steps forward for the solo-to-chain communication path Lower overhead for the solo-to-chain communication path Aug 26, 2021
@zmanian
Copy link
Contributor

zmanian commented Aug 26, 2021

It's possible to hash events into Tendermint headers allowing for light client proofs.

I can go look for an example but this issue is a place to start

tendermint/tendermint#5113

@michaelfig
Copy link
Member Author

It's possible to hash events into Tendermint headers allowing for light client proofs.

I can go look for an example but this issue is a place to start

tendermint/tendermint#5113

Looks like that proposal was declined: https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-058-event-hashing.md#status

Or do you know of something coming up?

@zmanian
Copy link
Contributor

zmanian commented Aug 26, 2021

There were some flaws in

tendermint/tendermint#4845

I feel like they could addressed for this use case

@dckc
Copy link
Member

dckc commented Sep 1, 2021

@dtribble just noted cosmos/cosmos-sdk#10045

With the recent changes to IAVL submitted by terra (cc @YunSuk-Yeo) the reason for why we need to route through tendermint is no longer present. We should revert it ...

@michaelfig
Copy link
Member Author

I did some more testing, and found that the next bottleneck for RPC nodes is suboptimal Tendermint locking: tendermint/tendermint#6899. The benchmarks I've done there by removing some (apparently unnecessary) locks show that we should be able to get ~2800 queries/second out of each RPC node. Between that and Agoric-specific optimisations (like this issue's proposed use of Tendermint events to notify clients), there is light at the end of the tunnel.

I hope it's not an approaching train. :)

@michaelfig
Copy link
Member Author

We have landed a working version of this plan. Block verification has further to go, but that's already tracked in #3803.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cosmic-swingset package: cosmic-swingset enhancement New feature or request solo the solo node (packages/solo)
Projects
None yet
Development

No branches or pull requests

4 participants