Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account sequence mismatch causes unnecessary delay in relaying #2249

Closed
5 tasks
ancazamfir opened this issue May 31, 2022 · 0 comments · Fixed by #2298
Closed
5 tasks

Account sequence mismatch causes unnecessary delay in relaying #2249

ancazamfir opened this issue May 31, 2022 · 0 comments · Fixed by #2298
Assignees
Labels
A: bug Admin: something isn't working
Milestone

Comments

@ancazamfir
Copy link
Collaborator

ancazamfir commented May 31, 2022

Summary of Bug

Sometimes Tx simulation failures due to account sequence mismatch causes hermes to unnecessary delay the relaying.
Background:

  • hermes simulates every transaction by calling the simulateTx (application gRPC). If successfully simulated, it uses the gas to determine the Tx fee and then
  • it broadcast_tx_sync-s the transaction (tendermint RPC), i.e. sent to tendermint mempool, which calls the application checkTx.
  • In both cases the verification happens on a copy of the application state, aka checkTx state.
  • the checkTx state is also updated by the application after a block is being committed (deliverTx is run for all Tx-es in a block)

Here are the details of the issue:

  • hermes simulates and broadcast_tx_sync-s N transactions with following sequence numbers: [1,.., K, K+1, ...,N]
    • check Tx state last account sequence is N.
  • tendermint creates a block with [1,..,K]. This involves calling Commit() which:
    • updates the application by calling deliverTx for [1,..,K] and then recreating the check Tx state with latest application state:
      • check Tx state last account sequence is K.
    • then it updates the mempool. Part of the mempool update is recheckTxs() which calls checkTx for [K+1,..,N]
      • check Tx state last account sequence is K+x.
  • hermes tries to simulate a transaction with sequence number N+1 sometime before recheckTxs() above finishes. It gets the error:
    • "account sequence mismatch, expected K+x, got N+1: incorrect account sequence", with x being the number of Tx-es that have been rechecked
  • hermes recaches to K+x and tries to simulate again a bit later only to get: "account sequence mismatch, expected N+1, got K+x: incorrect account sequence"
  • then recaches again to N+1 and the simulation is usually successful as the recheckTxs() above is finished already.

I changed the code locally to use default gas on simulation failure and things look much better. This is because the mempool lock is grabbed during Commit() so broadcast_tx_sync for N+1 in the middle of recheckTxs() will wait for checkTx on [K+1, .., N]

In summary the difference is:

  • simulateTx gRPC works on a checkTx state that can be concurrently updated by the recheckTxs()
  • broadcast_tx_sync RPC will be executed and the call to checkTx done only after recheckTxs() has finished.

Also checked the Go relayer and the simulation is retried a few times in the hope of catching and getting successful at the end of recheckTxs().

Version

all

Steps to Reproduce

  • create two chains (used gaia v7.0.0 that uses tendermint v0.34.14)
  • create channel
  • send 1000 ft-transfer-s
  • start hermes and check logs

Acceptance Criteria

Packet relaying should continue in this case, i.e. deal differently with the account mismatch error when got > expected. Specifically in this case, either use the default gas or retry the simulation (probably the latter).
(TBD - asking SDK for a simulateTx flag that skips the account sequence check. But if this is made available it will probably be in later releases)
Note that for got < expected we still need to recache as this is an indication that some other entity has used the same account/ wallet.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate milestone (priority) applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: bug Admin: something isn't working
Projects
No open projects
Status: Closed
Development

Successfully merging a pull request may close this issue.

2 participants