Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: ensure minimal cgroup controllers enabled #15027

Merged
merged 3 commits into from
Oct 24, 2022
Merged

Conversation

shoenig
Copy link
Member

@shoenig shoenig commented Oct 24, 2022

This PR fixes a bug where Nomad could not operate properly on operating
systems that set the root cgroup.subtree_control to a set of controllers that
do not include the minimal set of controllers needed by Nomad.

Nomad needs these controllers enabled to operate:

  • cpuset
  • cpu
  • io
  • memory
  • pids

Now, Nomad will ensure these controllers are enabled during Client initialization,
adding them to cgroup.subtree_control as necessary. This should be particularly
helpful on the RHEL/CentOS/Fedora family of system. Ubuntu systems should be
unaffected as they enable all controllers by default.

Fixes: #14494

Backports to 1.4.x and 1.3.x (1.2.x predates cgroups v2 support)

This PR fixes a bug where Nomad could not operate properly on operating
systems that set the root cgroup.subtree_control to a set of controllers that
do not include the minimal set of controllers needed by Nomad.

Nomad needs these controllers enabled to operate:
- cpuset
- cpu
- io
- memory
- pids

Now, Nomad will ensure these controllers are enabled during Client initialization,
adding them to cgroup.subtree_control as necessary. This should be particularly
helpful on the RHEL/CentOS/Fedora family of system. Ubuntu systems should be
unaffected as they enable all controllers by default.

Fixes: #14494
client/lib/cgutil/cpuset_manager_v2.go Show resolved Hide resolved
parentAbs := filepath.Join(CgroupRoot, parent)
if err := os.MkdirAll(parentAbs, 0o755); err != nil {
logger.Warn("failed to ensure nomad parent cgroup exists; disable cpuset management", "error", err)
logger.Error("failed to ensure nomad parent cgroup exists; disable cpuset management", "error", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change these log entries to say "disabling cpuset management"? "disable" sounds like we're telling the cluster administrator to do it.

@shoenig
Copy link
Member Author

shoenig commented Oct 24, 2022

spot check on CentOS 9 AMI

[centos@ip-172-31-72-243 ~]$ ./nomad node status -self -verbose | grep cpu 
cpu.arch                                 = amd64
cpu.frequency                            = 2200
cpu.modelname                            = AMD EPYC 7571
cpu.numcores                             = 2
cpu.reservablecores                      = 2
cpu.totalcompute                         = 4400
[centos@ip-172-31-72-243 ~]$ ./nomad node status -self -verbose | grep 'os\.'
os.name                                  = centos
os.signals                               = SIGSTOP,SIGTRAP,SIGUSR2,SIGHUP,SIGILL,SIGIO,SIGIOT,SIGABRT,SIGALRM,SIGPROF,SIGTSTP,SIGXFSZ,SIGKILL,SIGTERM,SIGWINCH,SIGXCPU,SIGTTIN,SIGTTOU,SIGUSR1,SIGNULL,SIGCONT,SIGPIPE,SIGQUIT,SIGFPE,SIGSEGV,SIGBUS,SIGINT,SIGSYS
os.version                               = 9

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/1.3.x backport to 1.3.x release line backport/1.4.x backport to 1.4.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cgroups: unable to initialize cpuset manager on CentOS9
2 participants