Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry to GrpcScheduler #324

Merged

Conversation

chrisstaite-menlo
Copy link
Collaborator

@chrisstaite-menlo chrisstaite-menlo commented Oct 19, 2023

Description

Add support for retrying on the GrpcScheduler following the same model as the GrpcStore.

Fixes #322

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Run the GrpcScheduler with Reclient. It is no longer failing with concurrency issues, but it does appear that there are other issues still (see #323)

Checklist

  • Updated documentation if needed
  • Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@aaronmondal , could you review this one too?

Reviewed 1 of 4 files at r1, all commit messages.
Reviewable status: 1 of 4 files reviewed, all discussions resolved (waiting on @aaronmondal)

Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @chrisstaite-menlo)


cas/scheduler/grpc_scheduler.rs line 52 at r1 (raw file):

impl GrpcScheduler {
    pub fn new(config: &config::schedulers::GrpcScheduler) -> Result<Self, Error> {
        let jitter_amt = config.retry.jitter;

WDYT about defining the jitter_fn in new and remove the indirection via a separate new_with_jitter? It would reduce configurability a bit, but would save a public API function.


cas/scheduler/grpc_scheduler.rs line 97 at r1 (raw file):

        self.retrier
            .retry(
                retry_config,

nit: We should really find a better way to handle the retry stuff in general. This is so hard to read. Tracking this in #335

Copy link
Collaborator Author

@chrisstaite-menlo chrisstaite-menlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @aaronmondal)


cas/scheduler/grpc_scheduler.rs line 52 at r1 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

WDYT about defining the jitter_fn in new and remove the indirection via a separate new_with_jitter? It would reduce configurability a bit, but would save a public API function.

The purpose of injecting the jitter function is to ensure that this is reliably testable. You don't really want randomness to be injected into tests by this function. This follows the design from all of the other uses of Retry.


cas/scheduler/grpc_scheduler.rs line 97 at r1 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

nit: We should really find a better way to handle the retry stuff in general. This is so hard to read. Tracking this in #335

Yeah, it's not great... but it does allow you to place the functional block in a central point. Will be nice to see what better solution you come up with.

Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @chrisstaite-menlo)

@aaronmondal aaronmondal merged commit 21519ce into TraceMachina:main Oct 27, 2023
11 of 13 checks passed
blakehatch pushed a commit to blakehatch/nativelink that referenced this pull request Nov 14, 2023
blakehatch pushed a commit to blakehatch/nativelink that referenced this pull request Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GrpcScheduler retry
3 participants