Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geth crashes when tracing Txs after 20 mins of 'code': -32000, 'message': 'tracing failed: max fee per gas less than block base fee #23397

Closed
danielemiliogarcia opened this issue Aug 13, 2021 · 26 comments · Fixed by #23580
Labels

Comments

@danielemiliogarcia
Copy link

danielemiliogarcia commented Aug 13, 2021

System information

Geth version: Version: 1.10.7-stable
Parameters: geth --config config.toml --verbosity 5 --cache.preimages --txlookuplimit=0 --http --http.addr "0.0.0.0" --http.port 8545 --cache=4096 --pprof --pprof.addr "0.0.0.0" --metrics --metrics.expensive 2
OS & Version: Linux, Ubuntu Server 20.04 LTS

Expected behaviour

Runs stable and do not crash

Actual behaviour

Completely crashes after approx 20 mins of code': -32000, 'message': 'tracing failed: max fee per gas less than block base fee

Grafana-geth-crashes

Steps to reproduce the behavior

debug_traceCall' pending txs in a loop

subscribe to pending TX and call a trace for each using 'debug_traceCall'

Backtrace

logs in full verbosity in the link to download .. it crashes without any stack trace

https://www.mediafire.com/file/y4urjkkcufafj1z/geth.log.zip/file

the full log file contains the logs for 3 consecutive crashes between markers in this graphic

Selection_125

You need to tale 3 hours to logs in order to match, as grafana is in my Timezone GMT-3 and logs are in UTC

summary:

WARN [08-06|18:04:01.349] Served debug_traceCall                   conn=127.0.0.1:43720 reqid=1      t=264.239602ms  err="tracing failed: max fee per gas less than block base fee: address 0xF5BB32B5ff84b35A3278254507995BA3b6F6d90a, maxFeePerGas: 75767908243 baseFee: 83135955851"
TRACE[08-06|18:04:01.452] << FINDNODE/v4                           id=42958123c194893e addr=134.175.181.177:30333           err=nil
DEBUG[08-06|18:04:01.334] Fetching batch of transactions           id=17767c8f8a899314 conn=inbound         count=2
WARN [08-06|18:04:01.341] Served debug_traceCall                   conn=127.0.0.1:43724 reqid=1      t=268.652939ms  err="tracing failed: max fee per gas less than block base fee: address 0x0b6c000D64923f57638b536183A3Ca72427b1c5F, maxFeePerGas: 78000000000 baseFee: 83135955851"
DEBUG[08-06|18:04:01.455] Served eth_getTransactionByHash          conn=127.0.0.1:43686 reqid=345598 t=97.027453ms
DEBUG[08-06|18:04:01.462] Served eth_getTransactionByHash          conn=127.0.0.1:43666 reqid=345597 t=6.306427ms
TRACE[08-06|18:04:01.242] Pooled new future transaction            hash=907c92..c20933 from=0x97E48AC4d875CE810209dCf3da9b2fbEe39dC775 to=0x030bA81f1c18d280636F32af80b9AAd02Cf0854e
TRACE[08-06|18:04:01.467] Discarding already known transaction     hash=e610b8..7cf973
DEBUG[08-06|18:04:01.358] Served eth_getTransactionByHash          conn=127.0.0.1:43654 reqid=345594 t="202.102µs"
TRACE[08-06|18:04:01.368] Failed RLPx handshake                    addr=46.137.153.17:31303             conn=inbound         err=EOF
DEBUG[08-06|18:04:01.578] Served eth_getTransactionByHash          conn=127.0.0.1:43468 reqid=345595 t=339.026728ms
DEBUG[08-06|18:04:01.439] Served eth_getTransactionByHash          conn=127.0.0.1:43652 reqid=345599 t=94.684858ms
DEBUG[08-06|18:04:01.569] Served eth_getTransactionByHash          conn=127.0.0.1:43718 reqid=345590 t=339.557873ms
TRACE[08-06|18:04:01.352] Accepted connection                      addr=44.224.78.181:39478
DEBUG[08-06|18:04:01.430] Fetching batch of transactions           id=15ad06a91ec7ab61 conn=inbound         count=2
TRACE[08-06|18:04:01.290] Sent transaction announcements           id=15265c73d385b8eb conn=inbound         count=1
DEBUG[08-06|18:04:01.559] Served eth_getTransactionByHash          conn=127.0.0.1:43638 reqid=345592 t=106.597535ms
DEBUG[08-06|18:04:01.567] Served eth_getTransactionByHash          conn=127.0.0.1:43656 reqid=345596 t=104.689494ms
DEBUG[08-06|18:04:01.772] Fetching batch of transactions           id=7d2d9e2cd93f2923 conn=inbound         count=1
DEBUG[08-06|18:04:01.753] Fetching batch of transactions           id=499191c60e1d6cbd conn=inbound         count=2
DEBUG[08-06|18:04:01.903] Fetching batch of transactions           id=198a460acbc30f45 conn=inbound         count=1
DEBUG[08-06|18:04:01.994] Fetching batch of transactions           id=68fc6b474ae47eb0 conn=inbound         count=1
TRACE[08-06|18:04:01.696] FINDNODE failed                          id=dbc8d2d5815df3fd failcount=1       dropped=false err="RPC timeout"
DEBUG[08-06|18:04:02.000] Fetching batch of transactions           id=c71fee3ad4b0a215 conn=inbound         count=2
DEBUG[08-06|18:04:02.000] Fetching batch of transactions           id=7e6489d422046b30 conn=dyndial         count=6
DEBUG[08-06|18:04:02.000] Fetching batch of transactions           id=bf3f5d5608bcdac3 conn=inbound         count=1
DEBUG[08-06|18:04:02.000] Fetching batch of transactions           id=c53aeb5882e29b6b conn=inbound         count=2
DEBUG[08-06|18:04:01.679] Fetching batch of transactions           id=1abd8e310455259e conn=dyndial         count=1
DEBUG[08-06|18:04:01.685] Fetching batch of transactions           id=33989d6eb9446268 conn=inbound         count=2
DEBUG[08-06|18:04:02.012] Fetching batch of transactions           id=c76354f06e11ad59 conn=dyndial         count=2
DEBUG[08-06|18:04:02.012] Fetching batch of transactions           id=7883fd099b12f77f conn=inbound         count=1
DEBUG[08-06|18:04:02.013] Fetching batch of transactions           id=855172553759520b conn=inbound         count=1
DEBUG[08-06|18:04:02.017] Fetching batch of transactions           id=cf743aa1c9d39528 conn=inbound         count=17
TRACE[08-06|18:04:02.019] Accepted connection                      addr=209.133.220.13:9320
TRACE[08-06|18:04:02.018] Pooled new future transaction            hash=34ad6b..47121e from=0xb5F8340d82051026573d21B288948Ddc4832bC22 to=0xA4e5961B58DBE487639929643dCB1Dc3848dAF5E

 ... time down ... 6 minutes after... the next line in the log is..

INFO [08-06|18:10:16.382] Starting pprof server                    addr=http://0.0.0.0:6060/debug/pprof
INFO [08-06|18:10:16.383] Starting Geth on Ethereum mainnet...
@0xMarto
Copy link

0xMarto commented Aug 19, 2021

I can confirm the issue. ☝️

It's happening to me too..... the node keeping crashing and start re-syncing after many debug_traceCall returning with the error
tracing failed: max fee per gas less than block base fee.

At first I thought it was some kind of memory leak but I didn't find any memory peaks (so I guess it's not that).
So many some run out of pointers? I dunno....

@MariusVanDerWijden
Copy link
Member

Hmm you're tracing on a full node... On light clients, if the peer that you fetch the state from disappears it will fail with this error.
Can you link the transaction where tracing fails for?
Tracing should only fail with this error if the underlying state is somehow corrupted.

@gelfand
Copy link

gelfand commented Aug 20, 2021

It uses baseFee from current block instead of baseFee from next block.

@holiman
Copy link
Contributor

holiman commented Aug 26, 2021

What parameters are you using the the call? Are you tracing old transactions or non-executed transactions on pending ?

@holiman
Copy link
Contributor

holiman commented Aug 26, 2021

it crashes without any stack trace

Regarding this -- how do you run geth? Does it run in a daemon mode, or docker, or some tmux ?

@rjl493456442
Copy link
Member

@gelfand
https://geth.ethereum.org/docs/rpc/ns-debug#debug_tracecall
Tracing call is happened on your specified block. So using the basefee of the specified block is the expected behavior.

@0xMarto
Copy link

0xMarto commented Aug 26, 2021

The main issue here is not if the debug_traceCall fails or not..... The error is "verbose" enough (it's clearly because I'm trying to trace a mempool tx in the last block context and fails with that error msg).

The issue is that after some time (20 min) of getting this error over and over again trying to debug_traceCall different mempool txs, the node crashes and start re syncing (not all traced txs fail but some do).

@holiman
Yes, the trace call was made with non-executed tx from the mempool (simply: 1. get all mempool txs & 2. iterate and trace)

@gelfand
Copy link

gelfand commented Aug 26, 2021

@gelfand
https://geth.ethereum.org/docs/rpc/ns-debug#debug_tracecall
Tracing call is happened on your specified block. So using the basefee of the specified block is the expected behavior.

yeah, was wrong sorry.

@gelfand
Copy link

gelfand commented Aug 26, 2021

What parameters are you using the the call? Are you tracing old transactions or non-executed transactions on pending ?

Seems he traces unsorted pooled transactions from other nodes, because it shouldn't fail if he traces executable(pending) transactions.

@danielemiliogarcia
Copy link
Author

What parameters are you using the the call? Are you tracing old transactions or non-executed transactions on pending ?

pending txs over latest block

@danielemiliogarcia
Copy link
Author

it crashes without any stack trace

Regarding this -- how do you run geth? Does it run in a daemon mode, or docker, or some tmux ?

systemd daemon in the Linux distribution described at the top (issue description)

Linux, Ubuntu Server 20.04 LTS

also with the parameters of the description

geth --config config.toml --verbosity 5 --cache.preimages --txlookuplimit=0 --http --http.addr "0.0.0.0" --http.port 8545 --cache=4096 --pprof --pprof.addr "0.0.0.0" --metrics --metrics.expensive 2

@s1na
Copy link
Contributor

s1na commented Aug 30, 2021

Does this repro script resemble what you're doing? I do get a lot of max fee per gas less than block base fee but geth hasn't crashed yet after ~45 mins.

https://gist.github.com/s1na/abfde280e575c5de1937098a7ea2cb09

@danielemiliogarcia
Copy link
Author

Does this repro script resemble what you're doing? I do get a lot of max fee per gas less than block base fee but geth hasn't crashed yet after ~45 mins.

https://gist.github.com/s1na/abfde280e575c5de1937098a7ea2cb09

I'm doing single HTTP RPC requests, I'm not using websockets like in your snippet, i have recently tested in latest Geth version v1.10.8 and its going down again in the same way, can we chat in the Geth discord? perhaps i can hit your node, so you see live how it goes down @s1na

@danielemiliogarcia
Copy link
Author

danielemiliogarcia commented Sep 3, 2021

Same behavior for version

Geth
Version: 1.10.8-stable
Git Commit: 26675454bf93bf904be7a43cce6b3f550115ff90
Architecture: amd64
Go Version: go1.16.4
Operating System: linux
GOPATH=
GOROOT=go

logs -> https://www.mediafire.com/file/44rfebp8o4sqdps/geth.log.crash-09-03.zip/file

Selection_150

how to reproduce:

for each new tx that arrives to the mempool (subscription to pending txs)

hit the node with and RPC HTTP request using debug_traceCall, the tx data, the current pending block hash and the tracer 4byteTracer

pseudo code example:

for each TX in the mempool

        curl -H "Content-Type: application/json" -X POST  localhost:8545 --data '{"jsonrpc":"2.0","method":"debug_traceCall","params":[{TX}, "BLOCK_HASH_FOR_PENDING_BLOCK", {"tracer": "4byteTracer", "timeout": "120s"}],"id":1}'
   

concrete instantiated example:

curl -H "Content-Type: application/json" -X POST  localhost:8545 --data '{"jsonrpc":"2.0","method":"debug_traceCall","params":[{"from": "0xD2C82F2e5FA236E114A81173e375a73664610998", "to": "0x8D1f2eBFACCf1136dB76FDD1b86f1deDE2D23852", "gas": "0x7a120", "gasPrice": "0x1c7e188c5d", "value": "0x0", "data": "0x2da03409000000000000000000000000f6316c97e6071d29af2a00b17e12d4623e0a7a63000000000000000000000000dac17f958d2ee523a2206206994597c13d831ec7", "hash": "0xac10a4cfadd3af3fb65ccc75ee2b75fcbcb66db58029ce50b705762a6d4e11a0", "nonce": "0xf9bfd"}, "0xd86f22972570b386c868d465639756b51663f8fa9f1ed3c039d0c0b9fddc7e9a", {"tracer": "4byteTracer", "timeout": "120s"}],"id":1}'

result:

{'jsonrpc': '2.0', 'id': 1, 'error': {'code': -32000, 'message': 'tracing failed: max fee per gas less than block base fee: address 0x57AC908af48e60654b5338A5d6119Cc7476dc4dE, maxFeePerGas: 80000000000 baseFee: 98934815142'}}

@holiman @s1na @martinlsanchez

@holiman
Copy link
Contributor

holiman commented Sep 15, 2021

I have a suspicion that this is due to "timeout": "120s". I'm not sure if the timeout fully stops the js engine. If the timeout is on a surface-level, leaving the js execution running internally, then sending more requests to the machine may cause overload. After a certain time, you'll have N tracers executing, all competing for the same resources -- even though N-1 ones will be silently discarded after completion.

We'll investigate.

@holiman
Copy link
Contributor

holiman commented Sep 15, 2021

Could you please try #23580 , and see if it fixes the issue?

@danielemiliogarcia
Copy link
Author

@holiman please give me some time to try, if it fails again can we re-open the isuue?

@holiman
Copy link
Contributor

holiman commented Sep 16, 2021 via email

@danielemiliogarcia
Copy link
Author

i've tried, and its still crashing.. with latest master

Geth
Version: 1.10.9-unstable
Git Commit: 7ada89d
Git Commit Date: 20210916
Architecture: amd64
Go Version: go1.15.5
Operating System: linux
GOPATH=
GOROOT=go

Selection_245

@holiman do you want a new file with full verbosity logs?

@holiman
Copy link
Contributor

holiman commented Sep 16, 2021 via email

@danielemiliogarcia
Copy link
Author

@s1na
Copy link
Contributor

s1na commented Sep 17, 2021

This might be related to #23552. I'm watching mainnet txes and every once in a while there's a really heavy one (>20m gas) and tracing one of those crashed my node. Can you also please share the last tx your script was tracing before the crash?

@danielemiliogarcia
Copy link
Author

This might be related to #23552. I'm watching mainnet txes and every once in a while there's a really heavy one (>20m gas) and tracing one of those crashed my node. Can you also please share the last tx your script was tracing before the crash?

geth logs

WARN [09-16|21:04:54.731] Served debug_traceCall                   conn=127.0.0.1:56006 reqid=1      t=277.359578ms err="tracing failed: max fee per gas less than block base fee: address 0x17CaE83761bd0E9201Af28717BE285cE1e0Ee156, maxFeePerGas: 110000000000 baseFee: 135815335515"

my logs

{'from': '0x17CaE83761bd0E9201Af28717BE285cE1e0Ee156', 'to': '0xa440467f6d5fBd62F6eEf01192caA52850Aa1D5F', 'gas': '0x97fe8', 'gasPrice': '0x199c82cc00', 'value': '0x16345785d8a0000', 'data': '0xad2307960000000000000000000000000000000000000000000000000000000000000005', 'hash': '0x2f2ebd8af2ef8782084685013548a184c3e2313927ea7d3d61648a7e61d24f17', 'nonce': '0x2d'}

tx: https://etherscan.io/tx/0x2f2ebd8af2ef8782084685013548a184c3e2313927ea7d3d61648a7e61d24f17
@s1na

@holiman
Copy link
Contributor

holiman commented Nov 1, 2021

From the last log, where you included a stack dump:

$ more geth.log | grep worker.go | wc -l
230

It's these two that are bottling up:

sync.runtime_SemacquireMutex(0xc000277f1c, 0xc0248d6400)
	/usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*RWMutex).RLock(0xc000277f10)
	/usr/local/go/src/sync/rwmutex.go:50 +0x4e
github.com/ethereum/go-ethereum/miner.(*worker).pending(0xc000277dc0, 0x0, 0x0)
	/home/admin/gonex/build/_workspace/src/github.com/ethereum/go-ethereum/miner/worker.go:252 +0x4f
--
sync.runtime_SemacquireMutex(0xc000277f1c, 0xe33100)
	/usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*RWMutex).RLock(0xc000277f10)
	/usr/local/go/src/sync/rwmutex.go:50 +0x4e
github.com/ethereum/go-ethereum/miner.(*worker).pendingBlock(0xc000277dc0, 0x0)
	/home/admin/gonex/build/_workspace/src/github.com/ethereum/go-ethereum/miner/worker.go:263 +0x46
--

One is via EstimateGas:

sync.runtime_SemacquireMutex(0xc000277f1c, 0xc01a644100)
	/usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*RWMutex).RLock(0xc000277f10)
	/usr/local/go/src/sync/rwmutex.go:50 +0x4e
github.com/ethereum/go-ethereum/miner.(*worker).pending(0xc000277dc0, 0x0, 0x0)
	/home/admin/gonex/build/_workspace/src/github.com/ethereum/go-ethereum/miner/worker.go:252 +0x4f
github.com/ethereum/go-ethereum/miner.(*Miner).Pending(0xc000322360, 0xab5e310f716a, 0x188ea4)
	/home/admin/gonex/build/_workspace/src/github.com/ethereum/go-ethereum/miner/miner.go:155 +0x2f
github.com/ethereum/go-ethereum/eth.(*EthAPIBackend).StateAndHeaderByNumber(0xc0003da9e0, 0x11c15e0, 0xc02fc2de40, 0xfffffffffffffffe, 0x1e75340, 0x0, 0x0, 0x5208)
	/home/admin/gonex/build/_workspace/src/github.com/ethereum/go-ethereum/eth/api_backend.go:92 +0x124
github.com/ethereum/go-ethereum/internal/ethapi.(*PublicBlockChainAPI).doCall(0xc000076a00, 0x11c15e0, 0xc02fc2de40, 0xac3fb60b90b0f65c, 0xb55023a96ab1ef61, 0xff71d458, 0xc0293e2e60, 0x551ae, 0x0, 0x0, ...)
	/home/admin/gonex/build/_workspace/src/github.com/ethereum/go-ethereum/internal/ethapi/api.go:700 +0x12f
github.com/ethereum/go-ethereum/internal/ethapi.(*PublicBlockChainAPI).EstimateGas.func1(0x551ae, 0x11c1501)
	/home/admin/gonex/build/_workspace/src/github.com/ethereum/go-ethereum/internal/ethapi/api.go:791 +0x8e
github.com/ethereum/go-ethereum/internal/ethapi.(*PublicBlockChainAPI).EstimateGas(0xc000076a00, 0x11c15e0, 0xc02fc2de40, 0xac3fb60b90b0f65c, 0xb55023a96ab1ef61, 0xff71d458, 0xc0293e2e60, 0x551ae, 0x0, 0x0, ...)
	/home/admin/gonex/build/_workspace/src/github.com/ethereum/go-ethereum/internal/ethapi/api.go:800 +0xe5

And the other seem related to the same thing

sync.runtime_SemacquireMutex(0xc000277f1c, 0xe33100)
	/usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*RWMutex).RLock(0xc000277f10)
	/usr/local/go/src/sync/rwmutex.go:50 +0x4e
github.com/ethereum/go-ethereum/miner.(*worker).pendingBlock(0xc000277dc0, 0x0)
	/home/admin/gonex/build/_workspace/src/github.com/ethereum/go-ethereum/miner/worker.go:263 +0x46
github.com/ethereum/go-ethereum/miner.(*Miner).PendingBlock(0xc000322360, 0xe33120)
	/home/admin/gonex/build/_workspace/src/github.com/ethereum/go-ethereum/miner/miner.go:164 +0x2f
github.com/ethereum/go-ethereum/eth.(*EthAPIBackend).BlockByNumber(0xc0003da9e0, 0x11c15e0, 0xc0283696c0, 0xfffffffffffffffe, 0x41e0db, 0xc00001e000, 0xe9c180)
	/home/admin/gonex/build/_workspace/src/github.com/ethereum/go-ethereum/eth/api_backend.go:79 +0xfe
github.com/ethereum/go-ethereum/internal/ethapi.(*PublicBlockChainAPI).EstimateGas(0xc000076a00, 0x11c15e0, 0xc0283696c0, 0x4c5379758efd9550, 0x7a9dd36ab9d02fc4, 0xaab2d0a8, 0xc02241d0e0, 0x0, 0x0, 0x0, ...)
	/home/admin/gonex/build/_workspace/src/github.com/ethereum/go-ethereum/internal/ethapi/api.go:779 +0x1ec
reflect.Value.call(0xc008bb4280, 0xc0001f99c0, 0x13, 0xfa881d, 0x4, 0xc01f518d80, 0x3, 0x4, 0x0, 0xc01f518d80, ...)
	/usr/local/go/src/reflect/value.go:447 +0x454

Unfortunately, I can't reconcile the line numbers with the actual revision 7ada89d, so not sure exactly what's going on.
However, it does not look like "I'm doing single HTTP RPC requests," -- but if you are, then I guess the client side times out before the server side does, and issues the next request while the first one is still waiting for a lock.

The pending block is "owned" by the miner. There are several things that need access to it:

  • The miner modifies it to add new transactions,
  • The miner modifies it to add new uncles,
  • The miner modifies it when there's a new head block,
  • Various RPC endpoints access it to evaluate on top of

In this case, I suspect that the constant accesses stall eachother, and eventually the system goes OOM because various channels can no longer be handled.

@holiman holiman reopened this Nov 1, 2021
@holiman
Copy link
Contributor

holiman commented Nov 1, 2021

In this case, you're not actually interested in the pending block per se, just the most recent state. So the accesses to the pendingblock is actually pretty moot.

@danielemiliogarcia
Copy link
Author

@holiman im closing this issue as is not happening any more on last version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants