Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of Multiple instances of a periodic job are run simultaneously, when prohibit_overlap is true into release/1.3.x #16659

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #16583 to be assessed for backporting due to the inclusion of the label backport/1.3.x.

WARNING automatic cherry-pick of commits failed. Commits will require human attention.

The below text is copied from the body of the original PR.


This PR addresses the bug reported on #11052

When a leader change happens, the periodic dispatcher on the new leader starts by re running all periodic jobs by force, without checking if there is an instance of the said job already.
A new check is introduced that skips the job if prohibit_overlap is set and there is already a instance running.

jorgemarey and others added 30 commits January 30, 2023 09:31
* Change api Fields for expose and paths

* Add changelog entry

* changelog: add deprecation notes about connect fields

* api: minor style tweaks

---------

Co-authored-by: Seth Hoenig <shoenig@duck.com>
* Ensure infra_image gets proper label used for reconciliation

Currently infra containers are not cleaned up as part of the dangling container
cleanup routine. The reason is that Nomad checks if a container is a Nomad owned
container by verifying the existence of the: `com.hashicorp.nomad.alloc_id` label.

Ensure we set this label on the infra container as well.

* fix unit test

* changelog: add entry

---------

Co-authored-by: Seth Hoenig <shoenig@duck.com>
While you can use any string value for a variable Item's key name
using characters that are outside of the set [unicode.Letter,
unicode.Number,`_`] will require the `index` function for direct
access.
… multiple tags (#15962)

* docker: set force=true on remove image to handle images referenced by multiple tags

This PR changes our call of docker client RemoveImage() to RemoveImageExtended with
the Force=true option set. This fixes a bug where an image referenced by more than
one tag could never be garbage collected by Nomad. The Force option only applies to
stopped containers; it does not affect running workloads.

* docker: add note about image_delay and multiple tags
Prior to 2409f72 the code compared the modification index of a job to itself. Afterwards, the code compared the creation index of the job to itself. In either case there should never be a case of re-parenting of allocs causing the evaluation to trivially always result in false, which leads to unreclaimable memory.

Prior to this change allocations and evaluations for batch jobs were never garbage collected until the batch job was explicitly stopped. The new `batch_eval_gc_threshold` server configuration controls how often they are collected. The default threshold is `24h`.
* refact: add conditional error handling

* test: test conditional logic
This pile was deprecated when we starting using HCP Consul for e2e
instead of standing up our own cluster and managing Consuls at test
runtime.
The ACL token decoding was not correctly handling time duration
syntax such as "1h" which forced people to use the nanosecond
representation via the HTTP API.

The change adds an unmarshal function which allows this syntax to
be used, along with other styles correctly.
* consul: reset consul token on job during registration of a reversion

* e2e: add test for reverting a job with a consul service

* cl: fixup cl entry
Also allows for default value of `datacenters = ["*"]`
* Add `bridge_network_hairpin_mode` client config setting
* Add node attribute: `nomad.bridge.hairpin_mode`
* Changed format string to use `%q` to escape user provided data
* Add test to validate template JSON for developer safety

Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>
* Extend variables under the nomad path prefix to allow for job-templates (#15570)

* Extend variables under the nomad path prefix to allow for job-templates

* Add job-templates to error message hinting

* RadioCard component for Job Templates (#15582)

* chore: add

* test: component API

* ui: component template

* refact: remove  bc naming collission

* styles: remove SASS var causing conflicts

* Disallow specific variable at nomad/job-templates (#15681)

* Disallows variables at exactly nomad/job-templates

* idiomatic refactor

* Expanding nomad job init to accept a template flag (#15571)

* Adding a string flag for templates on job init

* data-down actions-up version of a custom template editor within variable

* Dont force grid on job template editor

* list-templates flag started

* Correctly slice from end of path name

* Pre-review cleanup

* Variable form acceptance test for job template editing

* Some review cleanup

* List Job templates test

* Example from template test

* Using must.assertions instead of require etc

* ui: add choose template button (#15596)

* ui: add new routes

* chore: update file directory

* ui: add choose template button

* test: button and page navigation

* refact: update var name

* ui: use `Button` component from `HDS` (#15607)

* ui: integrate  buttons

* refact: remove  helper

* ui: remove icons on non-tertiary buttons

* refact: update normalize method for key/value pairs (#15612)

* `revert`: `onCancel` for `JobDefinition`

The `onCancel` method isn't included in the component API for `JobEditor` and the primary cancel behavior exists outside of the component. With the exception of the `JobDefinition` page where we include this button in the top right of the component instead of next to the `Plan` button.

* style: increase button size

* style: keep lime green

* ui: select template (#15613)

* ui: deprecate unused component

* ui: deprecate tests

* ui: jobs.run.templates.index

* ui: update logic to handle templates

* refact: revert key/value changes

* style: padding for cards + buttons

* temp: fixtures for mirage testing

* Revert "refact: revert key/value changes"

This reverts commit 124e95d.

* ui: guard template for unsaved job

* ui: handle reading template variable

* Revert "refact: update normalize method for key/value pairs (#15612)"

This reverts commit 6f5ffc9.

* revert: remove test fixtures

* revert: prettier problems

* refact: test doesnt need filter expression

* styling: button sizes and responsive cards

* refact: remove route guarding

* ui: update variable adapter

* refact: remove model editing behavior

* refact: model should query variables to populate editor

* ui: clear qp on exit

* refact: cleanup deprecated API

* refact: query all namespaces

* refact: deprecate action

* ui: rely on  collection

* refact: patch deprecate transition API

* refact: patch test to expect namespace qp

* styling: padding, conditionals

* ui: flashMessage on 404

* test: update for o(n+1) query

* ui: create new job template (#15744)

* refact: remove unused code

* refact: add type safety

* test: select template flow

* test: add data-test attrs

* chore: remove dead code

* test: create new job flow

* ui: add create button

* ui: create job template

* refact: no need for wildcard

* refact:  record instead of delete

* styling: spacing

* ui: add error handling and form validation to job create template (#15767)

* ui: handle server side errors

* ui: show error to prevent duplicate

* refact: conditional namespace

* ui: save as template flow (#15787)

* bug:  patches failing tests associated with `pretender` (#15812)

* refact: update assertion

* refact: test set-up

* ui: job templates manager view (#15815)

* ui: manager list view

* test: edit flow

* refact: deprecate column-helper

* ui: template edit and delete flow (#15823)

* ui: manager list view

* refact: update title

* refact: update permissions

* ui: template edit page

* bug: typo

* refact: update toast messages

* bug:  clear selections on exit (#15827)

* bug:  clear controllers on exit

* test: mirage config changes (#15828)

* refact: deprecate column-helper

* style: update z-index for HDS

* Revert "style: update z-index for HDS"

This reverts commit d3d87ce.

* refact: update delete button

* refact: edit redirect

* refact: patch reactivity issues

* styling: fixed width

* refact: override defaults

* styling: edit text causing overflow

* styling:  add inline text

Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>

* bug: edit `text` to `template`

Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>

Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>

* test:  delete flow job templates (#15896)

* refact: edit names

* bug:  set correct ref to store

* chore: trim whitespace:

* test: delete flow

* bug: reactively update view (#15904)

* Initialized default jobs (#15856)

* Initialized default jobs

* More jobs scaffolded

* Better commenting on a couple example job specs

* Adapter doing the work

* fall back to epic config

* Label format helper and custom serialization logic

* Test updates to account for a never-empty state

* Test suite uses settled and maintain RecordArray in adapter return

* Updates to hello-world and variables example jobspecs

* Parameterized job gets optional payload output

* Formatting changes for param and service discovery job templates

* Multi-group service discovery job

* Basic test for default templates (#15965)

* Basic test for default templates

* Percy snapshot for manage page

* Some late-breaking design changes

* Some copy edits to the header paragraphs for job templates (#15967)

* Added some init options for job templates (#15994)

* Async method for populating default job templates from the variable adapter

---------

Co-authored-by: Jai <41024828+ChaiWithJai@users.noreply.github.com>
This PR updates golangci-lint to work better with go1.20 - the previous
version would cause in oom on 'make check'.
If a deployment fails, the `deployment status` command can get a nil deployment
when it checks for a rollback deployment if there isn't one (or at least not one
at the time of the query). Fix the panic.
#15997)

recommend .nomad.hcl for job files instead of .nomad (without .hcl)
* nomad job init -> example.nomad.hcl
* update docs
Add `identity` jobspec block to expose workload identity tokens to tasks.

---------

Co-authored-by: Anders <mail@anars.dk>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
* Removed line from previous implementation
* remove import

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
* largely a doc-ification of this commit message:
  d476780
  this doesn't spell out all the possible failure modes,
  but should be a good starting point for folks.

* connect: add doc link to envoy bootstrap error

* add Unwrap() to RecoverableError
  mainly for easier testing
Service jobs should have unique allocation Names, derived from the
Job.ID. System jobs do not have unique allocation Names because the index is
intended to indicated the instance out of a desired count size. Because system
jobs do not have an explicit count but the results are based on the targeted
nodes, the index is less informative and this was intentionally omitted from the
original design.

Update docs to make it clear that NOMAD_ALLOC_INDEX is always zero for 
system/sysbatch jobs

Validate that `volume.per_alloc` is incompatible with system/sysbatch jobs.
System and sysbatch jobs always have a `NOMAD_ALLOC_INDEX` of 0. So
interpolation via `per_alloc` will not work as soon as there's more than one
allocation placed. Validate against this on job submission.
Juanadelacuesta and others added 15 commits March 27, 2023 15:38
…hibit_overlap is true

Fixes #11052
When restoring periodic dispatcher, all periodic jobs are forced without checking for previous childre.
…hibit_overlap is true

Fixes #11052
When restoring periodic dispatcher, all periodic jobs are forced without checking for previous children.
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
@hc-github-team-nomad-core hc-github-team-nomad-core requested a review from a team March 27, 2023 15:25
@hc-github-team-nomad-core hc-github-team-nomad-core requested a review from a team as a code owner March 27, 2023 15:25
@hc-github-team-nomad-core hc-github-team-nomad-core force-pushed the backport/b-gh-11052/steadily-adapting-tortoise branch from 6dc287e to 1b61b24 Compare March 27, 2023 15:25
@hc-github-team-nomad-core hc-github-team-nomad-core requested review from marianoasselborn and alvin-huang and removed request for a team March 27, 2023 15:25
@hc-github-team-nomad-core hc-github-team-nomad-core force-pushed the backport/b-gh-11052/steadily-adapting-tortoise branch from 1b61b24 to 5d06ed8 Compare March 27, 2023 15:25
@hc-github-team-nomad-core hc-github-team-nomad-core force-pushed the backport/b-gh-11052/steadily-adapting-tortoise branch from 5d06ed8 to 1b61b24 Compare March 27, 2023 15:25
@github-actions
Copy link

Ember Asset Size action

As of 6041961

Files that stayed the same size 🤷‍:

File raw gzip
nomad-ui.js 0 B 0 B
vendor.js 0 B 0 B
nomad-ui.css 0 B 0 B
vendor.css 0 B 0 B

@github-actions
Copy link

Ember Test Audit comparison

release/1.3.x 6041961 change
passes 1301 1301 0
failures 0 0 0
flaky 0 0 0
duration 9m 01s 584ms 9m 25s 722ms +24s 138ms

@Juanadelacuesta Juanadelacuesta deleted the backport/b-gh-11052/steadily-adapting-tortoise branch May 10, 2023 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet