Skip to content
This repository has been archived by the owner on Apr 4, 2024. It is now read-only.

Bug: Evm tx randomly timed out #1554

Closed
VictorTrustyDev opened this issue Dec 15, 2022 · 8 comments
Closed

Bug: Evm tx randomly timed out #1554

VictorTrustyDev opened this issue Dec 15, 2022 · 8 comments
Assignees

Comments

@VictorTrustyDev
Copy link
Contributor

VictorTrustyDev commented Dec 15, 2022

System info:

  • ethermint v0.20.0-rc4 on Mac OS 12.2
  • evmos v10.0.0-rc2 on Ubuntu 22.04.1 LTS
  • go 1.19.4
  • typescript + web3 to deploy & execute contracts

Steps to reproduce:

  1. Create local node freshly
  2. Submit some evm tx

Expected behavior:
Tx should be executed

Actual behavior:

web3 deploy smart contract 3-nft721.sol on evmos chain
err Error: Transaction was not mined within 750 seconds, please make sure your transaction was properly sent. Be aware that it might still be mined!
    at Object.TransactionError (.../escraw/scripts/integration/nodejs/node_modules/web3-core-helpers/lib/errors.js:87:21)
    at .../escraw/scripts/integration/nodejs/node_modules/web3-core-method/lib/index.js:419:49
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5) {
}
deploy smart contract WETH.sol on ethermint chain
err Error: Transaction was not mined within 750 seconds, please make sure your transaction was properly sent. Be aware that it might still be mined!
    at Object.TransactionError (.../ethermint/tests/escan/node_modules/web3-core-helpers/lib/errors.js:87:21)
    at .../ethermint/tests/escan/node_modules/web3-core-method/lib/index.js:419:49
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5) {
}
execute function registerCollect() of WETH.sol on ethermint chain
err Error: Transaction was not mined within 750 seconds, please make sure your transaction was properly sent. Be aware that it might still be mined!
    at Object.TransactionError (.../ethermint/tests/escan/node_modules/web3-core-helpers/lib/errors.js:87:21)
    at .../ethermint/tests/escan/node_modules/web3-core-method/lib/index.js:419:49
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5) {
}

I'm building integration test for escan, I have script which creates ~40 txs (both cosmos and evm txs) on local evmos node

The scripts run gracefully multiple times with evmos v9.1.0

Now when I cherry-pick the tests into v10.0.0-rc2, I can not finish the script because it always randomly timed out in one of my evm txs.
(I didn't check the v10.0.0-rc1 yet)

The problem also happens with ethermintd local node and my integration test for ethermintd.
Balance of the wallet used to deploy contract didn't changed.
The only log of ethermintd node (log level = error):

11:53PM ERR Failed to read request err="websocket: close 1006 (abnormal closure): unexpected EOF" module=rpc-server protocol=websocket remote={"IP":"127.0.0.1","Port":60980,"Zone":""} server=node
11:53PM ERR error while stopping connection error="already stopped" module=rpc-server protocol=websocket server=node

and it was printed multiple times.

This issue:

  • 3/3 times happened when I run my own integration test for evmos node (18 evm txs, failed at 8th or 9th evm tx)
  • 4/7 times happened when I run my own integration test for ethermintd node (4 evm txs in 2 batches, randomly failed at 1st or 2nd batch)
Copy link
Contributor

Thank you, could you provide more info about the transaction and contract you were interacting with to try to reproduce the issue?

@VictorTrustyDev
Copy link
Contributor Author

VictorTrustyDev commented Dec 16, 2022

@fedekunze
https://github.com/VictorTrustyDev/ethermint/commit/89e468fa9d477c66fab4349f207ebe9ed002aa3b

Steps:

  1. Clone & checkout branch & make install
  2. npm i in ./tests/escan
  3. Back to ethermint root dir
  4. Execute ./tests/escan/init-test-node.sh and wait server up 8545
  5. Execute ./tests/escan/test.sh and wait, it retries 50 times with 4 tx each attempt

@VictorTrustyDev
Copy link
Contributor Author

VictorTrustyDev commented Dec 16, 2022

You can also do git cherry-pick 89e468fa9d477c66fab4349f207ebe9ed002aa3b to bring the script into other branches

git checkout v0.19.3
make install
git cherry-pick 89e468fa9d477c66fab4349f207ebe9ed002aa3b
// => Success 50/50

git checkout v0.20.0-rc2
make install
git cherry-pick 89e468fa9d477c66fab4349f207ebe9ed002aa3b
// => Success 50/50

git checkout v0.20.0-rc3
make install
git cherry-pick 89e468fa9d477c66fab4349f207ebe9ed002aa3b
// => Hang at the 1st and 5th attemps (still executing...)

So probably something wrong between rc2 and rc3
Any chance it panicked somewhere but no error shown?

@fedekunze fedekunze moved this to 📋 Backlog in Ethermint Board Dec 16, 2022
@GAtom22 GAtom22 moved this from 📋 Backlog to 🏗 In progress in Ethermint Board Dec 19, 2022
@GAtom22
Copy link
Contributor

GAtom22 commented Dec 20, 2022

Hi @VictorTrustyDev, thanks for reporting this issue! I investigated and seems that after running a Tx in ReCheck mode, it is not running it on ModeDeliver. The cycle should be one of these 2:

runTx(ModeCheck) -> runTx(DeliverMode)     // These are the tests that pass
runTx(ModeCheck) -> runTx(ModeReCheck) -> runTx(DeliverMode)     // These are the tests that fail

If getting errors on the runTx(ModeCheck) or runTx(ModeReCheck) should return an error. But in this case there are no errors, and the runTx(DeliverMode) it's not triggered.

This basically means that the Tx never changes the state, so it does not exist on the chain. As result, the test gets a timeout error because they don't get any Tx receipt.

I'm still trying to figure out why this is happening. I'll keep you updated

@VictorTrustyDev
Copy link
Contributor Author

Hi @GAtom22, so, as I guess before ^ with clue, the problem might comes between rc2 and rc3 so I tried to build one by one every commit between 2 versions following this list and the txs start stucking from commit feed84b09cdc46820faddbababf3f69ce881789a of #1455

@VictorTrustyDev VictorTrustyDev changed the title Evm tx randomly timed out Bug: Evm tx randomly timed out Dec 21, 2022
@GAtom22
Copy link
Contributor

GAtom22 commented Dec 21, 2022

Hi @VictorTrustyDev, thanks for your comment! Indeed it is the AnteHandlers refactor what introduced the bug. Particularly the corresponding to the EthGasConsumeDecorator. On ReCheckMode the gasWanted is not updated and there's an error on tendermint that it is not bubbled up. The error comes from the postCheck function at the mempool logic:

The error message is "gas wanted -1 is negative", but this error is only logged at debug level.

A solution is to set a new context with a gasWanted = 0 on ReCheckMode. See PR #1566 as an example of this.

@VictorTrustyDev
Copy link
Contributor Author

Thanks @GAtom22 for your detailed explanation

@GAtom22
Copy link
Contributor

GAtom22 commented Dec 22, 2022

Issue addressed on PR #1566

@GAtom22 GAtom22 closed this as completed Dec 22, 2022
Repository owner moved this from 🏗 In progress to ❌ Canceled in Ethermint Board Dec 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Status: Canceled
Development

No branches or pull requests

3 participants