Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix task handling bugs, so peers are more likely to be available #3191

Merged
merged 6 commits into from
Dec 19, 2021

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Dec 9, 2021

Motivation

Some of our crawler and connection rates can lead to weird interactions between crawlers, such as flapping or task synchronisation.

This is unexpected work in sprint 24, but it's related to the upcoming fanout and notfound work.

Solution

Timer changes:

  • Use prime numbers for crawler and heartbeat intervals, so they are almost always happening at different times
  • Make sure each peer gets at least one connection attempt in each reconnection interval, even if the address book is full
  • Add tests for relative timings
  • Move some tests into their own modules

Task order changes:

Review

@jvff or @dconnolly can review this PR.

This is urgent because syncs are very slow, and #3191 or #3244 should help fix that.

This PR is based on PR #3167, because they change the same import lines.

Reviewer Checklist

  • Code implements Specs and Designs
  • Tests for Expected Behaviour
  • Tests for Errors

Follow Up Work

@teor2345 teor2345 added C-bug Category: This is a bug P-Medium I-slow Problems with performance or responsiveness I-integration-fail Continuous integration fails, including build and test failures A-network Area: Network protocol updates or fixes labels Dec 9, 2021
@teor2345 teor2345 added this to the 2021 Sprint 24 milestone Dec 9, 2021
@teor2345 teor2345 requested a review from jvff December 9, 2021 23:00
@teor2345 teor2345 self-assigned this Dec 9, 2021
@teor2345 teor2345 enabled auto-merge (squash) December 9, 2021 23:43
@teor2345 teor2345 disabled auto-merge December 9, 2021 23:50
@teor2345 teor2345 marked this pull request as draft December 9, 2021 23:51
@teor2345 teor2345 force-pushed the crawler-timing-tweaks branch from bdca9c4 to 3badaf3 Compare December 9, 2021 23:51
@teor2345 teor2345 changed the base branch from main to drop-far-future-blocks December 9, 2021 23:51
@teor2345
Copy link
Contributor Author

teor2345 commented Dec 9, 2021

I've marked this as draft until PR #3167 merges, because they change the same import lines.

@teor2345
Copy link
Contributor Author

The Google Cloud build failed, I opened:

@teor2345 teor2345 changed the title Tweak network timings so peers are more likely to be available Fix task handling so peers are more likely to be available Dec 16, 2021
@teor2345 teor2345 changed the title Fix task handling so peers are more likely to be available Fix task handling bugs, so peers are more likely to be available Dec 16, 2021
@teor2345 teor2345 force-pushed the drop-far-future-blocks branch from 95a329e to 03e8299 Compare December 16, 2021 04:47
@teor2345 teor2345 force-pushed the crawler-timing-tweaks branch from de6b673 to 10f7341 Compare December 16, 2021 04:47
@teor2345 teor2345 force-pushed the drop-far-future-blocks branch from 03e8299 to 8d2a425 Compare December 17, 2021 00:44
@teor2345 teor2345 force-pushed the crawler-timing-tweaks branch from 10f7341 to 94b6f52 Compare December 17, 2021 00:44
@teor2345 teor2345 added P-High and removed P-Medium labels Dec 17, 2021
@teor2345
Copy link
Contributor Author

I've marked this PR and all its dependencies as a high priority, because this is a known bug that has actually happened before, and seems to still be happening.

I'm not sure if this PR fixes the bug, but it's an easy first step.

@mpguerra
Copy link
Contributor

I've marked this PR and all its dependencies as a high priority, because this is a known bug that has actually happened before, and seems to still be happening.

I'm not sure if this PR fixes the bug, but it's an easy first step.

@teor2345 should this still be in draft and are you looking for an additional review at this time? I see @jvff already approved a previous version of this PR a while ago.

conradoplg
conradoplg previously approved these changes Dec 17, 2021
Copy link
Collaborator

@conradoplg conradoplg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is just waiting for #3167 to be merged and is a draft just to prevent it from merging before that one.

It looks good - I don't have much experience with this part of the code but was approved before, so I'm approving it now and leave it unmerged for a bit if anyone else wants to take a look.

jvff
jvff previously approved these changes Dec 17, 2021
Base automatically changed from drop-far-future-blocks to main December 17, 2021 16:31
@conradoplg conradoplg dismissed stale reviews from jvff and themself via 4f13056 December 17, 2021 16:33
@conradoplg conradoplg force-pushed the crawler-timing-tweaks branch from 94b6f52 to 4f13056 Compare December 17, 2021 16:33
@conradoplg conradoplg marked this pull request as ready for review December 17, 2021 16:33
@conradoplg
Copy link
Collaborator

I rebased after merging #3167, @jvff could you please check and reapprove?

@teor2345
Copy link
Contributor Author

teor2345 commented Dec 19, 2021

I believe this is just waiting for #3167 to be merged and is a draft just to prevent it from merging before that one.

Yep, that's the only way to stop a GitHub PR that is based on another PR from merging into the wrong branch.

@mpguerra here's the process we agreed on a few months ago:

If there's something else I need to do, let's talk about it at the Engineering sync. And then let's document it in the onboarding doc.

@teor2345 teor2345 merged commit 6cbd7dc into main Dec 19, 2021
@teor2345 teor2345 deleted the crawler-timing-tweaks branch December 19, 2021 23:02
@teor2345
Copy link
Contributor Author

This got approved then rebased, and it's urgent, so I just admin-merged it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-network Area: Network protocol updates or fixes C-bug Category: This is a bug I-integration-fail Continuous integration fails, including build and test failures I-slow Problems with performance or responsiveness
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants