Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(scheduler): implement and register block builder rpc service #15248

Merged
merged 4 commits into from
Dec 9, 2024

Conversation

ashwanthgoli
Copy link
Contributor

@ashwanthgoli ashwanthgoli commented Dec 4, 2024

What this PR does / why we need it:

  • Block scheduler is updated to implement BlockBuilderServiceServer rpc interface and registered with grpc server.
  • Builders now poll for Jobs from the scheduler
  • GetJob is now a server-side streaming rpc which allows builders to wait on the server for pending jobs.
  • JobQueue uses a cond var to wake up pending routines if there are any new pending jobs

minor improvements:

  • uses int32 for partition field in types.Job to maintain consistency
  • updates loadRecords to return any backoff errors

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

pkg/blockbuilder/scheduler/queue.go Outdated Show resolved Hide resolved
panic("unimplemented")
func (s *BlockScheduler) CompleteJob(_ context.Context, req *proto.CompleteJobRequest) (*proto.CompleteJobResponse, error) {
s.queue.MarkComplete(req.Job.Id)
return nil, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should return non-nil response types to satisfy caller expectations even though they're basically empty structs.

},
})

return nil, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should return non-nil response types to satisfy caller expectations even though they're basically empty structs.

@@ -70,7 +70,12 @@ func (t *GRPCTransport) SendGetJobRequest(ctx context.Context, req *GetJobReques
BuilderId: req.BuilderID,
}

resp, err := t.GetJob(ctx, protoReq)
client, err := t.GetJob(ctx, protoReq)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think creating streaming clients for every RPC is any better than using the non-streaming RPCs we had prior. These are generally run in a loop

@owen-d
Copy link
Member

owen-d commented Dec 5, 2024

Some examples of how we generally use streaming methods. e.g. scheduler.proto shows a few bidirectional streaming:

service SchedulerForFrontend {
  // After calling this method, both Frontend and Scheduler enter a loop. Frontend will keep sending ENQUEUE and
  // CANCEL requests, and scheduler is expected to process them. Scheduler returns one response for each request.
  //
  // Long-running loop is used to detect broken connection between frontend and scheduler. This is important for both
  // parties... if connection breaks, frontend can cancel (and possibly retry on different scheduler) all pending
  // requests sent to this scheduler, while scheduler can cancel queued requests from given frontend.
  rpc FrontendLoop(stream FrontendToScheduler) returns (stream SchedulerToFrontend) {}
}

// Scheduler interface exposed to Queriers.
service SchedulerForQuerier {
  // After calling this method, both Querier and Scheduler enter a loop, in which querier waits for
  // "SchedulerToQuerier" messages containing HTTP requests and processes them. After processing the request,
  // querier signals that it is ready to accept another one by sending empty QuerierToScheduler message.
  //
  // Long-running loop is used to detect broken connection between scheduler and querier. This is important
  // for scheduler to keep a list of connected queriers up-to-date.
  rpc QuerierLoop(stream QuerierToScheduler) returns (stream SchedulerToQuerier) {}

  // The querier notifies the query-scheduler that it started a graceful shutdown.
  rpc NotifyQuerierShutdown(NotifyQuerierShutdownRequest) returns (NotifyQuerierShutdownResponse);
}

https://grpc.io/docs/languages/go/basics/#defining-the-service is also a good resource

It might be better to leave it as unary for now and dedicate a single PR to bidirectional streaming GRPC later

@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Dec 5, 2024
@ashwanthgoli ashwanthgoli marked this pull request as ready for review December 5, 2024 07:26
@ashwanthgoli ashwanthgoli requested a review from a team as a code owner December 5, 2024 07:26
@owen-d owen-d merged commit c519ab6 into main Dec 9, 2024
59 checks passed
@owen-d owen-d deleted the scheduler-rpc branch December 9, 2024 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants