Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit "resources.cores" does not work after nomad client restart #16291

Closed
TheSpbra1n opened this issue Mar 2, 2023 · 2 comments · Fixed by #16467
Closed

Limit "resources.cores" does not work after nomad client restart #16291

TheSpbra1n opened this issue Mar 2, 2023 · 2 comments · Fixed by #16467
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/resource-utilization type/bug
Milestone

Comments

@TheSpbra1n
Copy link

Hello.
We test resources.cores for nomad jobs and found a bug: after nomad client restart, cores limit does not work.

Nomad version

Nomad v1.4.2 (039d70e)

Operating system and Environment details

Codename: focal

Reproduction steps

  1. Run job:
# job name
job "cpu-loader" {
  region      = "global"
  datacenters = ["dc1"]

  type = "service"

  meta {
      ...
  }

  group "cpu-loader-group" {
    count = 1

    # task
    task "cpu-loader-task" {
      kill_signal = "SIGTERM"
      driver = "docker"

      config {
        image        = "docker-image"
      }

      env {
          ...
      }

      resources {
       memory = 1000
       cores  = 4 
      }
    }
  }
}
  1. Stop job
  2. Restart nomad client
  3. Run job again

(for loader we used: https://github.com/vikyd/go-cpu-load )

Expected Result

Job used only 4 cores

Actual Result

Job used more than 4 core

@RobloxMatt
Copy link

RobloxMatt commented Mar 10, 2023

We've seen something that sounds like this problem as well. Do you see something like this in your logs?

[WARN] client.cpuset.v1: failed to ensure reserved cpuset.cpus interface exists; disable cpuset management: error="mkdir /sys/fs/cgroup/cpuset/nomad/reserved: file exists"

@shoenig shoenig moved this from Needs Triage to In Progress in Nomad - Community Issues Triage Mar 13, 2023
@shoenig shoenig added this to the 1.5.x milestone Mar 13, 2023
@shoenig shoenig added stage/accepted Confirmed, and intend to work on. No timeline committment though. and removed stage/needs-investigation labels Mar 13, 2023
@shoenig
Copy link
Member

shoenig commented Mar 13, 2023

Thanks for the report @TheSpbra1n and for the logs @RobloxMatt, it seems this got broken during a refactoring in 0a3d57f where we (I) missed the edge case where the cgroup structure is already set on Client initialization. Should only affect machines using cgroups v1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/resource-utilization type/bug
Projects
Development

Successfully merging a pull request may close this issue.

4 participants