Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RED-64/65: Remove golden ticket specific logic when adding nodes to standby map #18

Merged
merged 1 commit into from
Apr 23, 2024

Conversation

urnotsam
Copy link
Contributor

https://linear.app/shm/issue/RED-65/secondary-issue-problems-removing-standby-nodes-mentioned-in-logs
https://linear.app/shm/issue/RED-64/nodes-fail-to-fetch-cycles-in-betanext

Summary: GT'd nodes should go through the regular process of getting added to newJoinRequest instead of getting added directly to standby map.

Copy link

linear bot commented Apr 23, 2024

RED-65 Secondary issue . problems removing standby nodes mentioned in logs

--failed to remove standby node 007e6753138444f599ebdf526ab35dce9a079c659a3ae20bd222c684f826ca60 count: 0

--failed to remove standby node c3b9619b91dd60713aa617959bb8595ab3881ad22b798af564b4caf2a7547a94 count: 0

--failed to remove standby node cc0d219e4f7ed2a74749713a0da20310ad9f8e22b54ab07f93e9906bcceb9d88 count: 0

--failed to remove standby node 83fdafdb2ee9f273580843532dfb66ce5d1e7e025e7e57bd32ef3928d3a420f0 count: 0

--failed to remove standby node 94ec5218c6bdad8629b7e7c9f2240c74b09e2a834975387ecf72c754526d4699 count: 0

RED-64 nodes fail to fetch cycles in betanext

ISSUE SUMMARY:

When trying to GT more nodes in beta next they seem mostly crash back out. It seems that very shortly after a node goes active it fails to sync cycle records. It trys this for a few minutes then tries and possibly fails to apop self, but then does restart and "stay dead" to avoid joining back in as zombie.

Logs are available for a node with this issue and will be attached.

In the results observed many nodes were GT'd. This was outside of our design specs, but later when we GT nodes at a slower pace they were still bombing out. I think the issue is worth sorting out.

The ID of the node that we have detailed logs of is: 16bee48798015fcef2575e260a8db7440fe28f6fceae991835a47aad4e98931d

Take a look at p2p logs from [2024-04-19T21:09:03.515] selected/syncing to [2024-04-19T21:10:24.190]

some notable logs that repeat.

[2024-04-19T21:09:48.000] [INFO] p2p - CycleCreator: Q4: start: C220 Q4

[2024-04-19T21:09:48.002] [INFO] p2p - CycleCreator: Q4: END: myC:220  C220 Q4 Certified cycle record: 220

[2024-04-19T21:10:02.998] [WARN] p2p - CycleCreator: cc: !prevRecord. Fetech now. cct3

[2024-04-19T21:10:03.194] [INFO] p2p - Sync: syncNewCycles: myNewest=219 netNewest=220

[2024-04-19T21:10:03.485] [WARN] p2p - Sync: Type validation failed for cycleRecord: safetyMode is required

[2024-04-19T21:10:03.488] [ERROR] p2p - Sync: syncNewCycles: next record does not fit with prev record.
[2024-04-19T21:10:03.488] [WARN] p2p - Sync: syncNewCycles: no progress in the last 5 attempts

[2024-04-19T21:10:03.488] [WARN] p2p - CycleCreator: CycleCreator: fetchLatestRecord: synced record not newer CycleChain.newest.counter: 219 oldCounter: 219

[2024-04-19T21:10:03.488] [WARN] p2p - CycleCreator: cc: cycleCreator: Could not get fetch prevRecord. Trying again in 1 sec...  cct3

[2024-04-19T21:10:04.778] [INFO] p2p - Sync: syncNewCycles: myNewest=219 netNewest=220

[2024-04-19T21:10:05.068] [WARN] p2p - Sync: Type validation failed for cycleRecord: safetyMode is required

[2024-04-19T21:10:05.070] [ERROR] p2p - Sync: syncNewCycles: next record does not fit with prev record.

Some initial random theories:

<<TODO: Replace this with a short summary of the issue.>>


ISSUE REPRO STEPS:

<HINT: Add steps to list as-needed. If interaction is complex, add screenshots or a Slack screen-capture video (just drag and drop)>

  1. Create local network with 10 nodes
  2. When in processing mode create 20 more nodes
  3. GT all the joining nodes at once npx hardhat put_admin_certificate --all-joining --golden-ticket
  4. Observe [ERROR] p2p - Sync: syncNewCycles: next record does not fit with prev record. in p2p logs

EXPECTED RESULT:

<<TODO: Replace this with your expected results.>>


MERGE REQUESTS:

<HINT: If your fix requires changes in multiple repos, add the following info per-repository.>

<<TODO: Enter Repository Name>>

Pull Request Link: <<TODO: Insert PR-LINK>>

GPT Review Link: <<TODO: Insert GPT-Review-Link>>

Jenkins Test Link: <<TODO: Insert Jenkins Test Job Link>>


ADDITIONAL INSTRUCTIONS:

<HINT: Add any additional instructions needed for the assignee. If you have specific requirements for how the task should be implemented or fixed, enter them or link them here.>

<<TODO: Insert additional instructions for assignee.>>

betanext_rc17_t2_debug_34.145.126.178_9001.zip

betanext_rc17_cycles-cycles-176-326.txt

@afostr afostr merged commit 7f0340b into dev Apr 23, 2024
2 checks passed
@mhanson-github mhanson-github deleted the RED-64-RED-65 branch August 17, 2024 03:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants