-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TierTwo] Fix and improve several synchronization problems #2659
[TierTwo] Fix and improve several synchronization problems #2659
Conversation
no functional change.
Added another commit, co-authored with zebra. -- Added issue to PR description as well -- |
c31bba9
to
84678a8
Compare
Final update: fixed an scenario which was causing the node to reject and not connect budget system votes, removed extra tier two sync msgs roundtrips (unneeded network flood), expanded test coverage and more:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great set of changes 👍 Code ACK with a couple of suggestions.
Will run some test.
84678a8
to
2fcea7f
Compare
After further convos with zebra about the orphan maps cleanup functionality, added the following changes:
Ready to go now. |
da5348b
to
c2b8426
Compare
updated per feedback, seen maps cleaned during the orphan votes cleanup process as well. |
ecc3e4b
to
5894675
Compare
Last update:
Let's give it a good testnet testing round now :). |
94e5228
to
e8c6fec
Compare
if we received for example a single proposal +50 seconds ago, then once the budget sync starts (right after this call), it will look like the sync is finished. Will not wait to receive any budget data and declare the sync over.
MNW initial sync is mostly for historical reasons, get +20k items per peer (+120k items total for the previous threshold of 6) is a high amount of duplicated data relayed over the wire. 89k is still very high but.. it's something that we have to live with until v6.0.
…s flow. As we are increase the `RequestedMNAttempt` right after calling `SwitchToNextAsset()`, the mn list sync starts with an invalid extra sync attempt.
Before, this function was being totally skipped for the tiertwo_networksync dispatcher function. The dispatcher internally processes `syncstatusaccount` messages and return true, which means that the message was processed, there by, it was never reaching the `masternodeSync.ProcessMessage` call in net_processing.
And clear the orphan maps as well.
68224cc
to
6f79df3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 6f79df3
Having spent the past few days stress testing this on mainnet (which has a much more realistic amount of layer 2 overall data than testnet), there were indeed some inconsistent results prior to the most recent push, some of which have stabilized when doing repeated testing. Results so far (each test was done 50 times on linux and 50 times on macOS):
^ Some test runs without any on-disk layer 2 cache data resulted in a partial budget vote count, others did not. the ones that "failed" had a noticeably extended period of time spent on the MNW portion of the sync. This test does not yet have a large number of test runs yet due to the extended amount of intentional downtime required
Mostly on macOS, encountered numerous times where the initial layer 2 sync ended with partial vote counts. An incremental sync afterwards would recoup the missing votes. I also have observed a peer-banning issue right after the node catches up with the chain tip and starts the layer 2 sync that looks to be related to mnping messages, but unrelated to the changes here. |
800ed43 Move MESSAGE_START_SIZE into CMessageHeader (furszy) 3296b9e net_processing:PushTierTwoMsg remove unneeded chain height (furszy) 57817e5 net: do not log "notfound" message as unknown. (furszy) Pull request description: Grouped few findings that discovered testing #2659. * The "not found" message is not being processed properly. Logging an unnecessary "unknown message type" error. It become even more evident syncing a peer running 2659 from a normal network peer, because of the non-updated peer blockage from relaying budget items that described there (the `ClearSeen` bug..). * Then some extra refactoring, removing unneeded chain height argument in `PushTierTwoGetDataRequest` and adapted 2c09a52 to our sources as well. ACKs for top commit: random-zebra: utACK 800ed43 Fuzzbawls: ACK 800ed43 Tree-SHA512: 9db984be2baecaf996ca37f27cb41b91d3047e9e987716f15e1d677b78aab67f4e4f9ff16af62fb68ffba41ead7cb88ccd49dc4ab0f88e55e5bdb8380d174957
Are you testing this PR between updated nodes or only against the current network? Some important points that are all correlated if you are connecting your node to the network:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Threw more nodes running this into the connection pool, all good now.
ACK 6f79df3
no functional change. Github-Pull: bitcoin#2659 Rebased-From: fd7d569
if we received for example a single proposal +50 seconds ago, then once the budget sync starts (right after this call), it will look like the sync is finished. Will not wait to receive any budget data and declare the sync over. Github-Pull: bitcoin#2659 Rebased-From: 7349750
MNW initial sync is mostly for historical reasons, get +20k items per peer (+120k items total for the previous threshold of 6) is a high amount of duplicated data relayed over the wire. 89k is still very high but.. it's something that we have to live with until v6.0. Github-Pull: bitcoin#2659 Rebased-From: 7020310
…s flow. As we are increase the `RequestedMNAttempt` right after calling `SwitchToNextAsset()`, the mn list sync starts with an invalid extra sync attempt. Github-Pull: bitcoin#2659 Rebased-From: 2c3b0e8
Before, this function was being totally skipped for the tiertwo_networksync dispatcher function. The dispatcher internally processes `syncstatusaccount` messages and return true, which means that the message was processed, there by, it was never reaching the `masternodeSync.ProcessMessage` call in net_processing. Github-Pull: bitcoin#2659 Rebased-From: ae28000
…ngle budget finalization. Neither block them if it's us who started the budget sync process. Co-authored-by: random-zebra <random.zebra@protonmail.com> Github-Pull: bitcoin#2659 Rebased-From: 5f596b8
…ed to broadcast an inv if the peer is asking for the item directly..) and stop walking through the two maps (budgets and proposals), locking their mutexes, when the peer is requesting a single proposal/budfin. Github-Pull: bitcoin#2659 Rebased-From: 31e40a4
Github-Pull: bitcoin#2659 Rebased-From: 4864739
…al vote reception and single finalization sync based on an orphan finalization vote. Github-Pull: bitcoin#2659 Rebased-From: 32196b1
…a single vote, discarding the previous stored item every time that a new orphan arrives. Blocking any follow-up reception of the orphan votes that were discarded, prohibiting its connection forever. Github-Pull: bitcoin#2659 Rebased-From: 62e9f5e
…(1) single items sync requests and (2) full budget sync requests. Github-Pull: bitcoin#2659 Rebased-From: 62ae606
Github-Pull: bitcoin#2659 Rebased-From: d04f5be
Github-Pull: bitcoin#2659 Rebased-From: 51e85ff
Github-Pull: bitcoin#2659 Rebased-From: fa2ab10
…reset. The seen maps are used to answer inventory requests (to get the item serialized without locking the main budget maps mutexes). If they are cleared prior to the full budget invs relay, the node will not be able to respond to any follow-up getdata.. Plus, decrease full budget data relay to be done once per week instead of once every 14 days. Github-Pull: bitcoin#2659 Rebased-From: 34e9c23
Essentially a node dropping the budget data and waiting until any of the connected peers relays the complete budget data once more (this happens every 28 blocks in regtest). Github-Pull: bitcoin#2659 Rebased-From: d60a105
And clear the orphan maps as well. Github-Pull: bitcoin#2659 Rebased-From: 6f79df3
c9b1909 [net] Remove assert(nMaxInbound > 0) (MarcoFalke) d629df0 [Refactor] Add ReloadMapSeen function to reload "seen" votes (random-zebra) 7427d44 test: add coverage for the incremental budget sync flow. (furszy) 241627c Budget incremental sync: do not clear seen maps when relay status is reset. (furszy) 61477a5 [Refactor] Budget: check mn-enabled/vote-time before adding to mapSeen (random-zebra) 318fc7c Budget: limit orphan votes maps to 10k entries max. (furszy) 55f3ef9 [Refactor] Simplify CBudgetManager::CheckOrphanVotes() Github-Pull: #2659 Rebased-From: d04f5be (random-zebra) e5ad197 Budget: Decouple the all-in-one `Budget::Sync()` into two functions: (1) single items sync requests and (2) full budget sync requests. (furszy) 5fd74e4 tiertwo sync: fix orphans votes maps linking single prop/bud hash to a single vote, discarding the previous stored item every time that a new orphan arrives. Blocking any follow-up reception of the orphan votes that were discarded, prohibiting its connection forever. (furszy) 4cdb587 test: add coverage for single proposal sync based on an orphan proposal vote reception and single finalization sync based on an orphan finalization vote. (furszy) 979bb2c budget: generalize budget full sync items relay. (furszy) 736363d Budget sync: Remove useless extra item sync p2p msgs roundtrip (no need to broadcast an inv if the peer is asking for the item directly..) and stop walking through the two maps (budgets and proposals), locking their mutexes, when the peer is requesting a single proposal/budfin. (furszy) fe6fe56 Tiertwo: do not block peers if they requested a single proposal or single budget finalization. (furszy) ebbef96 masternodesync: connect the not-connected ProcessSyncStatusMsg. (furszy) 281c061 masternodesync: fix extra `RequestedMNAttempt` increment in the sporks flow. (furszy) d95fb9b masternodesync: decrease mnw sync threshold to 4 peers. (furszy) cfc15b6 masternodesync: reset last budget item time before start syncing it. (furszy) 94f1c52 masternodesync: re-format peer version check. (furszy) 946444e Fix ignoring tx data requests when fPauseSend is set on a peer (Matt Corallo) 10cedd0 GUI governance: add missing strings to the translations process. (furszy) dc10e17 GUI governance: fix proposal name and URL size limit validation. (furszy) 594acc6 [GUI] Fix governance nav button hover css (Fuzzbawls) fd8d844 GUI: governance, IBD check before opening the proposal creation wizard. (furszy) 33f5dd8 [GUI] Fix proposal tooltip menu location skewing (Fuzzbawls) 55619cc GUI: Differentiate Budget payment transaction records from MNs block reward records. (furszy) 139596e Refactor: Move MN block reward to chainparams. (furszy) 1a55752 GUI: Fix remaining item naming conflicts (Fuzzbawls) 5925150 GUI: Use sorted layout/spacer names in governance UIs (Fuzzbawls) dfeb3c7 [GUI] Don't show UTXO locking options for shield notes (Fuzzbawls) 6dd96b9 [BUG] GUI: invalid locking of shield notes in coin control Github-Pull: #2661 Rebased-From: d281cde (random-zebra) 1d7f86e Refactor: remove circ depend. primitives/transaction <-> script/standard (random-zebra) cb3a5b4 Refactor: remove circ dependency checkpoints <-> chainparams (random-zebra) e02e70e Refactor: remove circ dependency checkpoints <-> validation (random-zebra) e1a5d0a Break circular dependency: init -> * -> init by extracting shutdown.h (random-zebra) 2d29c42 Cleanup: remove unused init.h includes Github-Pull: #2646 Rebased-From: b09422f (random-zebra) 95c7f2e cleanup: fix header includes sorting and copy Github-Pull: #2642 Rebased-From: 525c2e7 (random-zebra) fdd4045 scripted-diff: replace boost::optional with Optional<> wrapper (random-zebra) Pull request description: List of PRs included: * #2642 * #2646 * #2643 * #2644 * #2661 * #2662 * #2663 * #2671 * #2664 * #2670 * #2665 * #2672 * #2659 * #2677 * #2682 ACKs for top commit: random-zebra: ACK c9b1909 Fuzzbawls: ACK c9b1909 Tree-SHA512: 34871cd7d527302ec6bd3924f4ca4322116952c8b69317831b2652165fbf1c566f893dace9a2998823c6781c4e578c2eeb9bc367fa39f632235aee7f45a68306
A never ending story, more improvements and corrections for the tier two initial synchronization.
This time, the following points are tackled:
Fix a bad workflow that causes the budget data initial synchronization to be skipped.
if we received for example a proposal +50 seconds ago, while we are synchronizing another asset, then once the budget sync starts, it will look like the sync is finished when it's not. Not waiting to receive any budget data, declaring the sync completed.
Decrease
mnw
sync threshold to 4 peers to "minimize" the insane amount of network flood and initial tier two sync delay.MNW initial sync is mostly for historical reasons, getting +20k items per peer (+120k items total for the previous threshold of 6) is a high amount of duplicated data relayed over the wire. 80k is still very high but.. it's something that we have to live with until v6.0 (which removes entirely the need of
mnw
messages..).Fix extra sync attempt for the mnlist asset badly incremented in the sporks sync flow.
Connect the not-connected
ProcessSyncStatusMsg
(exmasternodesync.ProcessMessage()
). Thus why the tier two sync stats counters are always in 0, this function was being skipped.The node, after triggering the internal budget data relay process (relaying budget data to every known peer), is invalidly blocking every remote peer, for an hour, for follow-up budget sync requests.
The orphans votes maps are linking a single prop/bud parent hash to a single vote, discarding the previous stored item every time that a new orphan arrives. Blocking any follow-up reception of the orphan votes that were discarded, prohibiting its connection. (edge case scenario: splitting risk)
Added test coverage for:
Removed unneeded extra items network sync roundtrip: No need to trigger an inv-based sync when the peer sent us a single proposal sync request. And stop walking through the budgets and proposals maps, locking their mutexes, on every single peer unique proposal/bud_fin item request.
Generalized the budget full sync items relay process.
Let's aim to include this work on the coming v5.4.0.