Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.0.3: p2p connections get dropped frequently #1349

Closed
matthewdarwin opened this issue Jun 28, 2023 · 7 comments · Fixed by #1366 or #1376
Closed

4.0.3: p2p connections get dropped frequently #1349

matthewdarwin opened this issue Jun 28, 2023 · 7 comments · Fixed by #1366 or #1376
Assignees
Labels
bug Something isn't working 👍 lgtm OCI Work exclusive to OCI team
Milestone

Comments

@matthewdarwin
Copy link

matthewdarwin commented Jun 28, 2023

Looking at nodeos logs on 4.0.3, p2p connections between 4.0.3 nodes are getting dropped frequently. Downgrading to 3.2.3 on a single node seems to improve things. Exact scenario is not clear.

This was noticed because BP node is producing blocks with low number of transactions. When downgrading to 3.2.3 on the BP node, the transactions went back to the number expected.

@heifner heifner added the bug Something isn't working label Jun 28, 2023
@heifner
Copy link
Member

heifner commented Jun 28, 2023

Unclear if related, but the logs show a lot of duplicate connection closing. This seems to be because of: eosnetworkfoundation/mandel#756 which has been there since 3.1.

@matthewdarwin
Copy link
Author

The nodes are configured to connect to each other, so there should be a duplicate dropped connection. We've had it set up that way for 4+ years.

@bhazzard bhazzard added 👍 lgtm and removed triage labels Jun 29, 2023
@heifner heifner self-assigned this Jun 29, 2023
@heifner heifner added the OCI Work exclusive to OCI team label Jun 29, 2023
@BenjaminGormanPMP BenjaminGormanPMP added this to the Leap v4.0.4 milestone Jun 29, 2023
@heifner heifner moved this from Todo to In Progress in Team Backlog Jun 29, 2023
@heifner
Copy link
Member

heifner commented Jun 30, 2023

  • Add producer to Block #317926534 trx idle: 283us out of 200046us, success: 3, 593us, fail: 86, 197687us, transient: 0, 0us, other: 1483us log output.

@matthewdarwin
Copy link
Author

Also add these idle metrics to prometheus in 5.0. I'll create a separate issue for that.

@matthewdarwin
Copy link
Author

matthewdarwin commented Jul 1, 2023

Kevin can add more details, but the low number of transactions seems to be because read-mode = head as default in 4.0. read-mode = speculative seems to be better. Consider if documentation or other action is appropriate.

@matthewdarwin
Copy link
Author

There is still investigation needed on the p2p connetions dropping, but probably less urgent, and not related to 4.0 specifically (ie the issue pre-dates 4.0).

@heifner
Copy link
Member

heifner commented Jul 3, 2023

It appears something is severing the connection between a couple of your nodes. There is no indication that this is related to nodeos. It appears to happen with both 4.0.x and 3.2.x. Adding some additional paranoid shutdown of socket changes and make sure a re-connect happens quickly once determined that connection is dead.

Recommend decreasing your p2p-keepalive-interval-ms setting. The default is 10000 (10 seconds). Recommend reducing that to 5000 so that dead connections are determined quicker.

heifner added a commit that referenced this issue Jul 6, 2023
[4.0] Close connection on aysnc_read with a closed socket
heifner added a commit that referenced this issue Jul 7, 2023
[4.0 -> main] Close connection on aysnc_read with a closed socket
@github-project-automation github-project-automation bot moved this from Awaiting Review to Done in Team Backlog Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working 👍 lgtm OCI Work exclusive to OCI team
Projects
Archived in project
5 participants