Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: prevent -Inf in job anti affinity scoring #18039

Closed
wants to merge 1 commit into from

Conversation

nvanthao
Copy link
Contributor

Similar to #17198, we are seeing

job-anti-affinity": -Inf,

I am yet able to replicate the issue. My understanding is that with desiredCount of 0, there should be no allocation placement, hence no ranking/feasibility code should run.

return nil
}
// Compute the placements
place := make([]placementResult, 0, len(results.place))
for _, p := range results.place {
s.queuedAllocs[p.taskGroup.Name] += 1
place = append(place, p)
}
destructive := make([]placementResult, 0, len(results.destructiveUpdate))
for _, p := range results.destructiveUpdate {
s.queuedAllocs[p.placeTaskGroup.Name] += 1
destructive = append(destructive, p)
}
return s.computePlacements(destructive, place)

That being said, it is happening and I believe the fix is straightforward.

I'll update the PR with more unit tests if possible once we understood the root cause.

@tgross
Copy link
Member

tgross commented Jul 24, 2023

@nvanthao can you confirm which version of Nomad you're using? You had [NOMAD-401] in the title here for some reason as well. If that's a Jira reference, well we don't use Jira in Nomad Eng so if you can do me (and our community members) a favor and put the details of whatever the problem is here in the PR or a new issue, that'd be swell.

@tgross tgross changed the title [NOMAD-401] scheduler: prevent -Inf in job anti affinity scoring scheduler: prevent -Inf in job anti affinity scoring Jul 24, 2023
@nvanthao
Copy link
Contributor Author

Thanks @tgross,

Nomad version is 1.3.8 and this is the recent job history

Version     = 76
Stable      = false
Submit Date = 2023-07-21T13:40:00Z

Diff        =
+/- Job: "<job-id>"
+/- Task Group: "<tg>"
  +/- Count: "3" => "0"
      Task: "<task>"
 
Version     = 75
Stable      = true
Submit Date = 2023-07-21T05:40:00Z
Diff        =

+/- Job: "<job-id>"
+/- Task Group: "<tg>" 
  +/- Count: "0" => "3"
      Task: "<task"

Version     = 74
Stable      = false
Submit Date = 2023-07-20T19:16:53Z
Diff        =
+/- Job: "<task>"
+/- Task Group: "<tg>"
  +/- Count: "1" => "0"
      Task: "<task>"

I don't know how to replicate this issue yet.

@tgross
Copy link
Member

tgross commented Jul 26, 2023

@nvanthao the fix described in #17198 is not present in Nomad 1.3.8. It shipped in the (currently unsupported) 1.3.15. See the 1.3.15 changelog.

I'm going to close this for now. If you can reproduce on a currently supported version of Nomad with the fix, I'd be happy to revisit.

@tgross tgross closed this Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants