Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of [15090] Ensure no leakage of evaluations for batch jobs. into release/1.4.x #15989

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #15097 to be assessed for backporting due to the inclusion of the label backport/1.4.x.

The below text is copied from the body of the original PR.


Prior to 2409f72 the code compared the modification index of a job to itself. Afterwards, the code compared the creation index of the job to itself. In either case there should never be a case of re-parenting of allocs causing the evaluation to trivially always result in false, which leads to memory leakage as reported here:

#15090
#14842
#4532

The comments within the code and tests are self-contradictory. Some state that evals of batch jobs should never be GCed others claim that they should be GCed if a new job version is created (as determined by comparing the indexes). I believe that the latter is the expected state.

Thus, we compare the creation index of the allocation with the modification index of the job to determine whether or not an alloc belongs to the current job version or not.

This pull request specifically:

  1. Removes references to alloc.Job which is unsafe given the deserialization of allocs from memDB which may include invalid pointers
  2. Fixes the logical unity described above (compare creation with modification and not creation with creation)
  3. Fixes the test (the test breaks the assumption of alloc reparenting, thus testing the mock rather than production code)
  4. Adds more tests

Any and all feedback welcome.

For more details, please see the issues linked. Specifically #15090 describes the issue in detail.

EDIT:
I am assuming here that an update to the job results in an eval and consequently allocs for the job. If this is not correct than we must take into account the ordering of evals/allocs which has not been done prior.

@hc-github-team-nomad-core hc-github-team-nomad-core force-pushed the backport/15090-infinite-memory-growth/kindly-many-bulldog branch from 859fd7a to af29a22 Compare January 31, 2023 18:34
@hc-github-team-nomad-core hc-github-team-nomad-core merged commit 9f5bd92 into release/1.4.x Jan 31, 2023
@hc-github-team-nomad-core hc-github-team-nomad-core deleted the backport/15090-infinite-memory-growth/kindly-many-bulldog branch January 31, 2023 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants