-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - sync2: multipeer: fix edge cases #6447
Conversation
This adds a possibility to take a connection from the pool to use it via the Executor interface, and return it later when it's no longer needed. This avoids connection pool overhead in cases when a lot of quries need to be made, but the use of read transactions is not needed. Using read transactions instead of simple connections has the side effect of blocking WAL checkpoints.
Using single connection for multiple SQL queries which are executed during sync avoids noticeable overhead due to SQLite connection pool delays. Also, this change fixes memory overuse in DBSet. When initializing DBSet from a database table, there's no need to use an FPTree with big preallocated pool for the new entries that are added during recent sync.
Split sync could become blocked when there were slow peers. Their subranges are assigned to other peers, and there were bugs causing indefinite blocking and panics in these cases. Moreover, after other peers managed to sync the slow peers' subranges ahead of them, we need to interrupt syncing against the slow peers as it's no longer needed. In multipeer sync, when every peer has failed to sync, e.g. due to temporary connection interruption, we don't need to wait for the full sync interval, using shorter wait time between retries.
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## develop #6447 +/- ##
=======================================
Coverage 79.8% 79.9%
=======================================
Files 353 353
Lines 46540 46602 +62
=======================================
+ Hits 37161 37248 +87
+ Misses 7268 7244 -24
+ Partials 2111 2110 -1 ☔ View full report in Codecov by Sentry. |
interval := time.Duration( | ||
float64(mpr.cfg.SyncInterval) * | ||
(1 + mpr.cfg.SyncIntervalSpread*(rand.Float64()*2-1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by this interval calculation. Would it be simpler if SyncIntervalSpread
would be defined as time.Duration
and gave the maximum deviation from the interval?
interval := mpr.cfg.SyncInterval + rand.N(mpr.cfg.SyncIntervalSpread)
This will uniformly generate a duration between [SyncInterval, SyncInterval+SyncIntervalSprad)
while the current definition is [SyncInterval, SyncInterval+2*SyncIntervalSpread)
which is a bit odd to me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was for SyncIntervalSpread
to be a floating point number 0..1
and to have intervals between [SyncInterval - SyncInterval*SyncIntervalSpread, SyncInterval + SyncInterval*SyncIntervalSpread]
We could of course also use MinSyncInterval
and MaxSyncInterval
, but I'm not sure which is more convenient.
My idea was that if I e.g. want the actual sync interval to be uniformly spread across SyncInterval +/- 25%
I just set SyncIntervalSpread
to 0.25
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I should reflect this simpler explanation in the comments, incl. godoc comments for the config struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer the Min-
and Max-
config options, but if you think a fractional spread is easier then go for that. But please add some explanation - to the config and/or here - what the values mean 🙂
Will submit another PR for config validation |
bors merge |
------- ## Motivation Split sync could become blocked when there were slow peers. Their subranges are assigned to other peers, and there were bugs causing indefinite blocking and panics in these cases. Moreover, after other peers managed to sync the slow peers' subranges ahead of them, we need to interrupt syncing against the slow peers as it's no longer needed. In multipeer sync, when every peer has failed to sync, e.g. due to temporary connection interruption, we don't need to wait for the full sync interval, using shorter wait time between retries.
Build failed: |
Unrelated failure in |
bors merge |
------- ## Motivation Split sync could become blocked when there were slow peers. Their subranges are assigned to other peers, and there were bugs causing indefinite blocking and panics in these cases. Moreover, after other peers managed to sync the slow peers' subranges ahead of them, we need to interrupt syncing against the slow peers as it's no longer needed. In multipeer sync, when every peer has failed to sync, e.g. due to temporary connection interruption, we don't need to wait for the full sync interval, using shorter wait time between retries.
Build failed: |
This is unrelated to db changes in this PR |
bors merge |
------- ## Motivation Split sync could become blocked when there were slow peers. Their subranges are assigned to other peers, and there were bugs causing indefinite blocking and panics in these cases. Moreover, after other peers managed to sync the slow peers' subranges ahead of them, we need to interrupt syncing against the slow peers as it's no longer needed. In multipeer sync, when every peer has failed to sync, e.g. due to temporary connection interruption, we don't need to wait for the full sync interval, using shorter wait time between retries.
Pull request successfully merged into develop. Build succeeded: |
Motivation
Split sync could become blocked when there were slow peers. Their
subranges are assigned to other peers, and there were bugs causing
indefinite blocking and panics in these cases. Moreover, after other
peers managed to sync the slow peers' subranges ahead of them, we need
to interrupt syncing against the slow peers as it's no longer needed.
In multipeer sync, when every peer has failed to sync, e.g. due to
temporary connection interruption, we don't need to wait for the full
sync interval, using shorter wait time between retries.
Description
This fixes aforementioned multipeer sync issues, and adds tests.
It also adds sync interval randomization to avoid network load spikes.