-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lighthouse 4.0.0 and "context canceled" Error #7172
Comments
To be clear this is with Lighthouse Lighthouse complains that Erigon no longer responds on the Engine API We've seen this on post-Shapella Goerli and pre-Shapella mainnet. It's very reproducible, so it should be fairly easy to debug. |
@yorickdowne The versions are specified verbatim in the main message, I just messed up the title. I will be upgrading the I suspect that even if the error is with Lighthouse, Erigon should probably not become unresponsive and require a forced restart. |
"This error only appeared after upgrading Erigon to
|
Entirely my bad, typo is fixed 👍 |
I can reproduce this on Goerli with Erigon source-compiled from |
Debug logs. Failure around 11:49 UTC. Logs aren't very instructive alas |
Clarification: Erigon doesn't ignore SIGTERM. It just is unable to actually shut down entirely.
And this is where it remains until forcibly killed |
Uploaded trace logs to here: https://www.dropbox.com/s/qim77hurooysz6b/erigonstall.trace.log.tar.gz?dl=0 Issue occurs around 16:15 |
Given that Erigon was OK with LH v3.5.1 I suspect this is related to the |
We came across the same issue once we updated to lighthouse
It basically looped over that continuously. Updating to lighthouse |
We have a user report that this is not fixed with Erigon |
Maybe related: Syncing get stuck for me as well. Tried lots of versions, now running 2.38 which seems a bit more stable for me, but still have the problem.
Note: When erigon gets stuck |
Can confirm. Erigon 2.41.0 and Lighthouse 4.0.1. Now testing: erigon/v2.41.0-dev-540af96e/linux-amd64/go1.20.2 EDIT: That didn't take long. Crashed with "context canceled" after a few minutes. Testing with this patch now: diff --git a/cmd/rpcdaemon/commands/engine_api.go b/cmd/rpcdaemon/commands/engine_api.go
index e581bcb54..3a3d294dd 100644
--- a/cmd/rpcdaemon/commands/engine_api.go
+++ b/cmd/rpcdaemon/commands/engine_api.go
@@ -409,7 +409,6 @@ var ourCapabilities = []string{
"engine_getPayloadV2",
"engine_exchangeTransitionConfigurationV1",
"engine_getPayloadBodiesByHashV1",
- "engine_getPayloadBodiesByRangeV1",
}
func (e *EngineImpl) ExchangeCapabilities(fromCl []string) []string { |
Same issue here, also with erigon 2.41.0 and lighthouse 4.0.1. The patch from @mwpastore appears to improve the situation. |
observing the same issue, trying out the patch |
same here |
I encountered the same issue on Prysm as well, also I captured a pprof during the hanging |
Or a similar issue. Prysm doesn't use getPayloadBodiesByRangeV1 according to Nishant. The symptoms seen are similar: context canceled and Erigon no longer responds on RPC/Engine. Either this is the same issue as Lighthouse and it's got nothing to do with ByRange, or it's a separate issue. |
Erigon v2.41.0 |
Seems like related to |
Still up and running with the above patch after about 20 hours. |
We're being asked to test #7199 |
On it. |
@mwpastore were you able to test this? |
@ethDreamer It's running now but it's only been about an hour. |
Someone said he tested #7199 before it got merged and he was still having issues. testing it now myself as well. |
thanks - keep us updated :) |
Time to restart these tests on v2.42 |
2.42 stayed stable overnight for 12 hours on 9 servers. As far as I can tell, this is fixed. |
…rigontech#7199) Should hopefully help with Issue erigontech#7172
…rigontech#7199) Should hopefully help with Issue erigontech#7172
System information
erigon version 2.40.1-stable
Linux 6.1.3-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 04 Jan 2023 16:28:15 +0000 x86_64 GNU/Linux
)Behaviour
"context canceled"
.v4.0.0
. I have tested it with thev4.0.1-rc.0
hotfix as well, and the error remains. I suspect that is the root cause of the issue.Logs are located in the follow gist JP-Ellis/30aeca74b7760ea603f709309051aefc
The text was updated successfully, but these errors were encountered: