Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make number of scheduler workers reloadable #11593

Merged
merged 32 commits into from
Jan 6, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
1908187
Working POC
angrycub Nov 20, 2021
0071e55
Unexport setupNewWorkers; improve comments
angrycub Nov 23, 2021
763671a
Added some VSCode codetours
angrycub Nov 23, 2021
339316a
Update shutdown to use context
angrycub Nov 30, 2021
1a985b3
Apply suggestions from code review
angrycub Dec 1, 2021
22f93b7
Implement GET for SchedulerWorker API + tests
angrycub Dec 1, 2021
16f9dd4
Merge branch 'f-reload-num-schedulers' of github.com:hashicorp/nomad …
angrycub Dec 1, 2021
48428e7
Wired API, refactors, more testing
angrycub Dec 3, 2021
1258128
Merge branch 'main' into f-reload-num-schedulers
angrycub Dec 6, 2021
1845577
Fix linter complaints
angrycub Dec 6, 2021
9c4e5c4
Updating worker to cache EnabledScheduler list
angrycub Dec 6, 2021
0d8b7ec
Refactor `unsafe...` func names to `...Locked`
angrycub Dec 8, 2021
f5bb227
Passing enabled schedulers list to worker
angrycub Dec 10, 2021
292518b
Add note about scheduler death
angrycub Dec 10, 2021
1337f04
Worker API refactor
angrycub Dec 10, 2021
bd345e0
Made handler methods public for OpenAPI, remove unused test bool
angrycub Dec 10, 2021
31687cd
Implement SchedulerWorker status part 1
angrycub Dec 10, 2021
3739987
Fix broken Pause logic; split WorkloadWaiting status
angrycub Dec 11, 2021
7fe5949
Added scheduler info api
angrycub Dec 11, 2021
60d53fa
Added worker info api to api package
angrycub Dec 11, 2021
3d755aa
bugfixes
angrycub Dec 11, 2021
4ee6b8c
Adding stringer to build deps
angrycub Dec 13, 2021
71dab36
Changing route to /v1/agent/schedulers
angrycub Dec 20, 2021
1dc9f96
Adding docs for scheduler worker api
angrycub Dec 21, 2021
0417332
Adding API test for bad worker info
angrycub Dec 22, 2021
420a158
Add changelog message
angrycub Dec 23, 2021
fd016de
typo in changelog 🤦
angrycub Dec 23, 2021
167c6a3
Incorporate API code review feedback
angrycub Jan 3, 2022
f4f610b
Incorporate api-docs feedback
angrycub Jan 4, 2022
689fa77
Updates to worker/leader code from code review
angrycub Jan 4, 2022
982c397
Fix test response type
angrycub Jan 5, 2022
7581957
Set both statuses in markStopped so they are atomic
angrycub Jan 6, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 0 additions & 8 deletions command/agent/agent_endpoint.go
Original file line number Diff line number Diff line change
Expand Up @@ -826,11 +826,3 @@ type agentSchedulerWorkerConfig struct {
NumSchedulers int `json:"num_schedulers"`
EnabledSchedulers []string `json:"enabled_schedulers"`
}
type agentSchedulerWorkerConfigRequest struct {
agentSchedulerWorkerConfig
}

type agentSchedulerWorkerConfigResponse struct {
NumSchedulers int `json:"num_schedulers"`
EnabledSchedulers []string `json:"enabled_schedulers"`
}
6 changes: 2 additions & 4 deletions nomad/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -226,10 +226,8 @@ type Server struct {
vault VaultClient

// Worker used for processing
workers []*Worker
currentNumSchedulers int
currentEnabledSchedulers []string
workerLock sync.RWMutex
workers []*Worker
workerLock sync.RWMutex

// aclCache is used to maintain the parsed ACL objects
aclCache *lru.TwoQueueCache
Expand Down
13 changes: 9 additions & 4 deletions nomad/worker.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@ type Worker struct {
paused bool
stopped bool // Indicates that the worker is in a terminal state; read with Stopped()

// the Server.Config.EnabledSchedulers value is not safe for concurrent access, so
// the worker needs a cached copy of it. Workers are stopped if this value changes.
enabledSchedulers []string

ctx context.Context
cancelFn context.CancelFunc

Expand All @@ -79,9 +83,10 @@ type Worker struct {
// NewWorker starts a new worker associated with the given server
func NewWorker(ctx context.Context, srv *Server) (*Worker, error) {
w := &Worker{
srv: srv,
start: time.Now(),
id: uuid.Generate(),
srv: srv,
start: time.Now(),
id: uuid.Generate(),
enabledSchedulers: srv.GetSchedulerWorkerConfig().EnabledSchedulers,
angrycub marked this conversation as resolved.
Show resolved Hide resolved
}
w.logger = srv.logger.ResetNamed("worker").With("worker_id", w.id)
w.pauseCond = sync.NewCond(&w.pauseLock)
Expand Down Expand Up @@ -206,7 +211,7 @@ func (w *Worker) dequeueEvaluation(timeout time.Duration) (
eval *structs.Evaluation, token string, waitIndex uint64, shutdown bool) {
// Setup the request
req := structs.EvalDequeueRequest{
Schedulers: w.srv.config.EnabledSchedulers,
Schedulers: w.enabledSchedulers,
Timeout: timeout,
SchedulerVersion: scheduler.SchedulerVersion,
WriteRequest: structs.WriteRequest{
Expand Down