Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true into release/1.5.x #16661

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
130 commits
Select commit Hold shift + click to select a range
c5d54ab
Prepare release 1.5.0-rc.1
schmichael Feb 27, 2023
0deb5c4
Generate files for 1.5.0-rc.1 release
hc-github-team-nomad-core Feb 27, 2023
b739144
Prepare for next release
hc-github-team-nomad-core Feb 27, 2023
cf5b14c
Merge pull request #16284 from hashicorp/post-1.5.0-rc.1-release
jrasell Mar 2, 2023
ba6d20b
prepare release 1.5.0
jrasell Mar 1, 2023
01d049e
Generate files for 1.5.0 release
hc-github-team-nomad-core Mar 1, 2023
4734c02
Prepare for next release
hc-github-team-nomad-core Mar 1, 2023
646a82b
Merge release 1.5.0 files
jrasell Mar 2, 2023
a9bb8e9
client: use RPC address and not serf after initial Consul discovery (…
tgross Mar 2, 2023
fbd0dcb
tests: add functionality to skip a test if it's not running in CI and…
farbodahm Mar 2, 2023
9102a24
deps: update go-plugin to 1.4.9 (#16292)
tgross Mar 2, 2023
bbd41c8
handle `FSM.Apply` errors in `raftApply` (#16287)
tgross Mar 2, 2023
f88e3b0
[ui, helios] Toast Component (#16099)
philrenaud Mar 2, 2023
f553dc8
Merge pull request #16293 from hashicorp/post-1.5.0-release
schmichael Mar 2, 2023
273b76a
cli: sort Node value in `nomad operator raft list-peers` command (#16…
dttung2905 Mar 2, 2023
f89910d
Add namespace argument to the job verification help text (#16243)
Valefant Mar 2, 2023
64d27c6
docs: fix typos in task-api.mdx and workload-identity.mdx (#16309)
aofei Mar 3, 2023
2ec6575
api: add new test case for force-leave (#16260)
dttung2905 Mar 3, 2023
a4f7926
service: fix regression in task access to list/read endpoint (#16316)
tgross Mar 3, 2023
0e824d3
cli: use shared logic for resolving job prefix (#16306)
lgfa29 Mar 3, 2023
158d6a9
docs: fix alloc stop `no_shutdown_delay` (#16282)
lgfa29 Mar 3, 2023
ceed255
remove backcompat support for non-atomic job registration (#16305)
tgross Mar 3, 2023
b24dddc
api: set last index and request time on alloc stop (#16319)
lgfa29 Mar 3, 2023
b07af57
client: don't emit task shutdown delay event if not waiting (#16281)
lgfa29 Mar 3, 2023
a57f97e
[ui] Fix: Wildcard-datacenter system/sysbatch jobs stopped showing cl…
philrenaud Mar 6, 2023
78bcd32
deps: update test to 0.6.2 for new functions (#16326)
shoenig Mar 6, 2023
605f155
CI: delete test-link-rewrites.yml (#16354)
ashleemboyer Mar 6, 2023
6f52a91
scheduler: correctly detect inplace update with wildcard datacenters …
tgross Mar 7, 2023
03d6a8c
docs: note that secrets dir is usually mounted `noexec` (#16363)
tgross Mar 7, 2023
003a567
cli: support `json` and `t` on `acl binding-rule info` command. (#16357)
jrasell Mar 7, 2023
b677ec7
docs: add 1.5.0, 1.4.5, and 1.3.10 pause regression upgrade note. (#1…
jrasell Mar 7, 2023
b3f7559
docker: fix bug where network pause containers would be erroneously r…
shoenig Mar 7, 2023
24af468
e2e: fix permissions on nomad data directory (#16376)
shoenig Mar 7, 2023
5d5740b
Outage recovery link fix (#16365)
philrenaud Mar 7, 2023
37e9eca
build(deps): bump golang.org/x/crypto from 0.5.0 to 0.7.0 (#16337)
dependabot[bot] Mar 8, 2023
99f43c1
Update ioutil library references to os and io respectively for comman…
lhaig Mar 8, 2023
962b65f
Update ioutil library references to os and io respectively for e2e he…
lhaig Mar 8, 2023
0e74431
Update ioutil library references to os and io respectively for API an…
lhaig Mar 8, 2023
3160c76
deps: Update ioutil library references to os and io respectively for …
lhaig Mar 8, 2023
48e7d70
deps: Update ioutil deprecated library references to os and io respec…
lhaig Mar 8, 2023
fcd51dc
[ui] Fix: New toast notifications no longer last forever (#16384)
philrenaud Mar 8, 2023
40ab325
e2e: setup nomad permissions correctly (client vs. server) (#16399)
shoenig Mar 8, 2023
95359b8
client: disable running artifact downloader as nobody (#16375)
shoenig Mar 8, 2023
1227615
Updated who-uses-nomad to add Behavox (#16339)
Oloremo Mar 9, 2023
c36d3bd
scheduling: prevent self-collision in dynamic port network offerings …
tgross Mar 9, 2023
0f7ad3b
cli: fix help output format on `job init` command. (#16407)
jrasell Mar 9, 2023
1dd3203
docs: update content-conformance package (#16412)
Mar 9, 2023
4fdb5c4
cli: remove hard requirement on `list-jobs` (#16380)
lgfa29 Mar 9, 2023
730adaa
env/aws: update ec2 cpu info data (#16417)
schmichael Mar 9, 2023
712c669
cli: add `-json` and `-t` flag for `alloc checks` command (#16405)
Juanadelacuesta Mar 10, 2023
419c4bf
allocrunner: fix health check monitoring for Consul services (#16402)
lgfa29 Mar 10, 2023
9fefc18
e2e fixes: cli output, timing issue, and some cleanups (#16418)
schmichael Mar 10, 2023
d0ddd5e
acl: prevent privilege escalation via workload identity
tgross Mar 3, 2023
669495b
Generate files for 1.5.1 release
hc-github-team-nomad-core Mar 10, 2023
6c91cc8
Prepare for next release
hc-github-team-nomad-core Mar 10, 2023
172f49f
Merge release 1.5.1 files
tgross Mar 13, 2023
2a0e45b
Merge pull request #16445 from hashicorp/post-1.5.1-release
tgross Mar 13, 2023
a34925f
deps: remove replace statement for go-discover (#16304)
shoenig Mar 13, 2023
12688f2
scheduler: add simple benchmark for tasksUpdated (#16422)
shoenig Mar 13, 2023
b6d6cc4
scheduler: refactor system util tests (#16416)
tgross Mar 13, 2023
5febe9b
build(deps): bump go.uber.org/goleak from 1.2.0 to 1.2.1 (#16439)
dependabot[bot] Mar 13, 2023
5f37b2f
build: update from go1.20.1 to go1.20.2 (#16427)
schmichael Mar 13, 2023
f3a527b
doc: Update `nomad fmt` doc to run against non-deprecated HCL2 jobspe…
dttung2905 Mar 13, 2023
b2c8732
plugin: add missing fields to `TaskConfig` (#16434)
lgfa29 Mar 13, 2023
f2bfbfa
acl: update job eval requirement to `submit-job` (#16463)
lgfa29 Mar 13, 2023
a42a33f
cgv1: do not disable cpuset manager if reserved interface already exi…
shoenig Mar 13, 2023
c70bbd1
agent: trim space when parsing X-Nomad-Token header (#16469)
tgross Mar 14, 2023
101e5d0
docs: clarify migration behavior under `nomad alloc stop` (#16468)
tgross Mar 14, 2023
eaf22f2
cli: Add `-json` and `-t` flags to `namespace status` command (#16442)
Juanadelacuesta Mar 14, 2023
362f752
Updated trial license link and wording
tunzor Mar 14, 2023
d5e0130
Merge pull request #16484 from hashicorp/tunzor-patch-1
tunzor Mar 14, 2023
1a01e87
scheduler: annotate tasksUpdated with reason and purge DeepEquals (#1…
shoenig Mar 14, 2023
bdf468c
cli: fix login help output formatting. (#16502)
jrasell Mar 15, 2023
e4963b9
test: set BuildDate in default TestAgent config (#16499)
gulducat Mar 15, 2023
323abf7
build: fix `test-nomad` make target when running locally. (#16506)
jrasell Mar 16, 2023
098650e
artifact: use specific version link for zipbomb artifact (#16513)
shoenig Mar 16, 2023
ea727df
artifact: do not set process attributes on darwin (#16511)
shoenig Mar 16, 2023
46ae102
docs: dispatch_payload and jobs api docs had some weirdness (#16514)
schmichael Mar 16, 2023
995ab41
artifact: git needs more files for private repositories (#16508)
shoenig Mar 16, 2023
8684183
client: don't use `Status` RPC for Consul discovery (#16490)
tgross Mar 16, 2023
282e3bc
Enable ACLs on E2E test clients (#16530)
schmichael Mar 16, 2023
57a3cbe
docs: add binding-rule selector escape example on Windows PS (#16273)
jrasell Mar 17, 2023
76649df
acl: fix canonicalization of OIDC auth method mock (#16534)
pkazmierczak Mar 17, 2023
ed498f8
nsd: always set deregister flag after deregistration of group (#16289)
shoenig Mar 17, 2023
b95b105
cli: nomad login command should not require a -type flag and should r…
pkazmierczak Mar 17, 2023
1cfa95e
tls enforcement flaky tests (#16543)
shoenig Mar 17, 2023
cd8615d
Spelling update (#16553)
DocAdam Mar 20, 2023
151147b
cli: Add `json` and `-t` flags to `server members` command (#16444)
Juanadelacuesta Mar 20, 2023
26b4fcc
cli: add `-json` and `-t` flags to `quota status` command (#16485)
Juanadelacuesta Mar 20, 2023
cc110f4
Add `-json` flag to `quota inspect` command (#16478)
Juanadelacuesta Mar 20, 2023
0071844
[ui] Perform common job tasks with keyboard shortcuts (#16378)
philrenaud Mar 20, 2023
96740b5
docs: remove Java and Scala SDKs from supported list. (#16555)
jrasell Mar 20, 2023
aacc7c6
dev: remove use of cfssl and use Nomad CLI for TLS certs. (#16145)
jrasell Mar 20, 2023
695df42
contrib: mock driver (#16573)
tgross Mar 20, 2023
fb08518
client/metadata: fix crasher caused by AllowStale = false (#16549)
schmichael Mar 20, 2023
a633b79
changelog: update #16427 to improvement (#16565)
lgfa29 Mar 21, 2023
a90df9d
contrib: architecture guide to the drainer (#16569)
tgross Mar 21, 2023
5309325
Update csi_plugin.mdx (#16584)
Suselz Mar 21, 2023
a73a399
Windows fixes for e2e tests (#16592)
schmichael Mar 21, 2023
aece7b0
E2E: fix events tests (#16595)
tgross Mar 21, 2023
337a8d2
e2e: sleep to ensure logs are picked up (#16596)
schmichael Mar 21, 2023
4d31fd3
taskapi: use HasSuffix to detect errors from rpcs (#16594)
schmichael Mar 21, 2023
39ec124
docs: detail support for Nomad checks in service block. (#16598)
jrasell Mar 22, 2023
cb9ce8b
Fix broken test for quotas CLI (#16610)
Juanadelacuesta Mar 22, 2023
2a22d71
[ui] Copyable server and client attribute values (#16548)
philrenaud Mar 22, 2023
1a53d9c
Post 1.5.2 release (#16614)
schmichael Mar 22, 2023
23b3647
drainer: test refactoring to clarify behavior around delete/down node…
tgross Mar 23, 2023
1061ddd
ci: send notification when prepare is complete (#16627)
lgfa29 Mar 23, 2023
fffdbdf
cli: job restart command (#16278)
lgfa29 Mar 23, 2023
b84c455
docs: added section of needed ACL rules for Nomad UI (#16494)
ron-savoia Mar 24, 2023
72ad885
scheduler: fix reconciliation of reconnecting allocs (#16609)
lgfa29 Mar 24, 2023
6626965
style: rename ForceRun to ForceEval, for clarity (#16617)
Juanadelacuesta Mar 27, 2023
51249fc
Multiple instances of a periodic job are run simultaneously, when pro…
jrasell Mar 21, 2023
e9850f3
Multiple instances of a periodic job are run simultaneously, when pro…
Juanadelacuesta Mar 21, 2023
3c858a9
style: refactor force run function
Juanadelacuesta Mar 22, 2023
4c59344
fix: remove defer and inline unlock for speed optimization
Juanadelacuesta Mar 22, 2023
8ac3e0e
Update nomad/leader.go
Juanadelacuesta Mar 22, 2023
90db021
Update nomad/leader_test.go
Juanadelacuesta Mar 22, 2023
23807bd
Update nomad/leader_test.go
Juanadelacuesta Mar 22, 2023
eb6cd35
Update nomad/leader_test.go
Juanadelacuesta Mar 22, 2023
f4c24bc
Update nomad/leader_test.go
Juanadelacuesta Mar 22, 2023
a2ce7f0
backport of commit f4c24bc8763d5ebf0eebbaddb7e84cfd2a39dc2a
Juanadelacuesta Mar 22, 2023
6cbe024
backport of commit c762dc873be5b66a78a5e3ae0b1476471abc4508
Juanadelacuesta Mar 22, 2023
2c363fd
backport of commit f4352f0eb675c82e4c8cd41f4e8bcb4c917cc873
Juanadelacuesta Mar 22, 2023
a7260c0
backport of commit 2a9a785b433a2e8bacfeff732e7153c7c53a6961
Juanadelacuesta Mar 22, 2023
2385f05
backport of commit f841f4f06ba0fe73cf05c70db5744bd465d9cbf4
Juanadelacuesta Mar 22, 2023
124700a
backport of commit 186f982b2f206e67404acae2017c5ecd9a177b74
Juanadelacuesta Mar 22, 2023
096cb3b
Merge f4c24bc8763d5ebf0eebbaddb7e84cfd2a39dc2a into backport/b-gh-110…
hc-github-team-nomad-core Mar 27, 2023
4fd336a
backport of commit b3eacaae4a858a9c30971b33cd93850f7318989a
Juanadelacuesta Mar 22, 2023
b431198
Merge branch 'release/1.5.x' into backport/b-gh-11052/uniquely-glowin…
Juanadelacuesta Mar 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 31 additions & 1 deletion nomad/leader.go
Original file line number Diff line number Diff line change
Expand Up @@ -774,16 +774,46 @@ func (s *Server) restorePeriodicDispatcher() error {
continue
}

// We skip if the job doesn't allow overlap and there are already
// instances running
needed, err := s.isNewEvalNeeded(job)
if err != nil {
return fmt.Errorf("failed to get job status: %v", err)
}
if !needed {
continue
}

if _, err := s.periodicDispatcher.ForceEval(job.Namespace, job.ID); err != nil {
logger.Error("force run of periodic job failed", "job", job.NamespacedID(), "error", err)
return fmt.Errorf("force run of periodic job %q failed: %v", job.NamespacedID(), err)
}
logger.Debug("periodic job force runned during leadership establishment", "job", job.NamespacedID())

logger.Debug("periodic job force run during leadership establishment", "job", job.NamespacedID())
}

return nil
}

// isNewEvalNeeded checks if the job allows for overlap and if there are already
// instances of the job running in order to determine if a new evaluation needs to
// be created upon periodic dispatcher restore
func (s *Server) isNewEvalNeeded(job *structs.Job) (bool, error) {

if job.Periodic.ProhibitOverlap {
running, err := s.periodicDispatcher.dispatcher.RunningChildren(job)
if err != nil {
return false, fmt.Errorf("failed to determine if periodic job has running children %q error %q", job.NamespacedID(), err)
}

if running {
return false, nil
}
}

return true, nil
}

// schedulePeriodic is used to do periodic job dispatch while we are leader
func (s *Server) schedulePeriodic(stopCh chan struct{}) {
evalGC := time.NewTicker(s.config.EvalGCInterval)
Expand Down
162 changes: 160 additions & 2 deletions nomad/leader_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -411,7 +411,7 @@ func TestLeader_PeriodicDispatcher_Restore_NoEvals(t *testing.T) {
now := time.Now()

// Sleep till after the job should have been launched.
time.Sleep(3 * time.Second)
time.Sleep(5 * time.Second)

// Restore the periodic dispatcher.
s1.periodicDispatcher.SetEnabled(true)
Expand All @@ -438,13 +438,35 @@ func TestLeader_PeriodicDispatcher_Restore_NoEvals(t *testing.T) {
}
}

type mockJobEvalDispatcher struct {
forceEvalCalled, children bool
evalToReturn *structs.Evaluation
JobEvalDispatcher
}

func (mjed *mockJobEvalDispatcher) DispatchJob(_ *structs.Job) (*structs.Evaluation, error) {
mjed.forceEvalCalled = true
return mjed.evalToReturn, nil
}

func (mjed *mockJobEvalDispatcher) RunningChildren(_ *structs.Job) (bool, error) {
return mjed.children, nil
}

func testPeriodicJob_OverlapEnabled(times ...time.Time) *structs.Job {
job := testPeriodicJob(times...)
job.Periodic.ProhibitOverlap = true
return job
}

func TestLeader_PeriodicDispatcher_Restore_Evals(t *testing.T) {
ci.Parallel(t)

s1, cleanupS1 := TestServer(t, func(c *Config) {
c.NumSchedulers = 0
})
defer cleanupS1()

testutil.WaitForLeader(t, s1.RPC)

// Inject a periodic job that triggered once in the past, should trigger now
Expand All @@ -465,7 +487,15 @@ func TestLeader_PeriodicDispatcher_Restore_Evals(t *testing.T) {
}

// Create an eval for the past launch.
s1.periodicDispatcher.createEval(job, past)
eval, _ := s1.periodicDispatcher.createEval(job, past)

md := &mockJobEvalDispatcher{
children: false,
evalToReturn: eval,
JobEvalDispatcher: s1,
}

s1.periodicDispatcher.dispatcher = md

// Flush the periodic dispatcher, ensuring that no evals will be created.
s1.periodicDispatcher.SetEnabled(false)
Expand All @@ -475,6 +505,7 @@ func TestLeader_PeriodicDispatcher_Restore_Evals(t *testing.T) {

// Restore the periodic dispatcher.
s1.periodicDispatcher.SetEnabled(true)

s1.restorePeriodicDispatcher()

// Ensure the job is tracked.
Expand All @@ -495,6 +526,133 @@ func TestLeader_PeriodicDispatcher_Restore_Evals(t *testing.T) {
if last.Launch == past {
t.Fatalf("restorePeriodicDispatcher did not force launch")
}

must.True(t, md.forceEvalCalled, must.Sprint("failed to force job evaluation"))
}

func TestLeader_PeriodicDispatcher_No_Overlaps_No_Running_Job(t *testing.T) {
ci.Parallel(t)

s1, cleanupS1 := TestServer(t, func(c *Config) {
c.NumSchedulers = 0
})
defer cleanupS1()
testutil.WaitForLeader(t, s1.RPC)

// Inject a periodic job that triggered once in the past, should trigger now
// and once in the future.
now := time.Now()
past := now.Add(-1 * time.Second)
future := now.Add(10 * time.Second)

job := testPeriodicJob_OverlapEnabled(past, now, future)
req := structs.JobRegisterRequest{
Job: job,
WriteRequest: structs.WriteRequest{
Namespace: job.Namespace,
},
}
_, _, err := s1.raftApply(structs.JobRegisterRequestType, req)
must.NoError(t, err)

// Create an eval for the past launch.
eval, _ := s1.periodicDispatcher.createEval(job, past)

md := &mockJobEvalDispatcher{
children: false,
evalToReturn: eval,
}

s1.periodicDispatcher.dispatcher = md

// Flush the periodic dispatcher, ensuring that no evals will be created.
s1.periodicDispatcher.SetEnabled(false)

// Sleep till after the job should have been launched.
time.Sleep(3 * time.Second)

// Restore the periodic dispatcher.
s1.periodicDispatcher.SetEnabled(true)
must.NoError(t, s1.restorePeriodicDispatcher())

// Ensure the job is tracked.
tuple := structs.NamespacedID{
ID: job.ID,
Namespace: job.Namespace,
}
must.MapContainsKey(t, s1.periodicDispatcher.tracked, tuple, must.Sprint("periodic job not restored"))

// Check that an eval was made.
ws := memdb.NewWatchSet()
last, err := s1.fsm.State().PeriodicLaunchByID(ws, job.Namespace, job.ID)
must.NoError(t, err)
must.NotNil(t, last)

if last.Launch == past {
t.Fatalf("restorePeriodicDispatcher did not force launch")
}

if md.forceEvalCalled != true {
t.Fatalf("failed to force job evaluation")
}
t.Fail()
}

func TestLeader_PeriodicDispatcher_No_Overlaps_Running_Job(t *testing.T) {
ci.Parallel(t)

s1, cleanupS1 := TestServer(t, func(c *Config) {
c.NumSchedulers = 0
})
defer cleanupS1()
testutil.WaitForLeader(t, s1.RPC)

// Inject a periodic job that triggered once in the past, should trigger now
// and once in the future.
now := time.Now()
past := now.Add(-1 * time.Second)
future := now.Add(10 * time.Second)

job := testPeriodicJob_OverlapEnabled(past, now, future)
req := structs.JobRegisterRequest{
Job: job,
WriteRequest: structs.WriteRequest{
Namespace: job.Namespace,
},
}
_, _, err := s1.raftApply(structs.JobRegisterRequestType, req)
if err != nil {
t.Fatalf("err: %v", err)
}

// Create an eval for the past launch.
eval, _ := s1.periodicDispatcher.createEval(job, past)

md := &mockJobEvalDispatcher{
children: true,
evalToReturn: eval,
}

s1.periodicDispatcher.dispatcher = md

// Flush the periodic dispatcher, ensuring that no evals will be created.
s1.periodicDispatcher.SetEnabled(false)

// Sleep till after the job should have been launched.
time.Sleep(3 * time.Second)

// Restore the periodic dispatcher.
s1.periodicDispatcher.SetEnabled(true)
must.NoError(t, s1.restorePeriodicDispatcher())

// Ensure the job is tracked.
tuple := structs.NamespacedID{
ID: job.ID,
Namespace: job.Namespace,
}
must.MapContainsKey(t, s1.periodicDispatcher.tracked, tuple, must.Sprint("periodic job not restored"))

must.False(t, md.forceEvalCalled, must.Sprint("evaluation forced with job already running"))
}

func TestLeader_PeriodicDispatch(t *testing.T) {
Expand Down
4 changes: 1 addition & 3 deletions nomad/periodic.go
Original file line number Diff line number Diff line change
Expand Up @@ -278,10 +278,10 @@ func (p *PeriodicDispatch) removeLocked(jobID structs.NamespacedID) error {
// subsequent eval.
func (p *PeriodicDispatch) ForceEval(namespace, jobID string) (*structs.Evaluation, error) {
p.l.Lock()
defer p.l.Unlock()

// Do nothing if not enabled
if !p.enabled {
p.l.Unlock()
return nil, fmt.Errorf("periodic dispatch disabled")
}

Expand All @@ -291,11 +291,9 @@ func (p *PeriodicDispatch) ForceEval(namespace, jobID string) (*structs.Evaluati
}
job, tracked := p.tracked[tuple]
if !tracked {
p.l.Unlock()
return nil, fmt.Errorf("can't force run non-tracked job %q (%s)", jobID, namespace)
}

p.l.Unlock()
return p.createEval(job, time.Now().In(job.Periodic.GetLocation()))
}

Expand Down