Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce CPU usage running large numbers of clients #5951

Merged
merged 6 commits into from
Jul 19, 2019
Merged

Conversation

langmartin
Copy link
Contributor

raw_exec uses the universal executor, not the executor linux. on linux, this still executes the job in a cgroup, which gives us a handle to use the cgroup process list rather than scanning all processes on the client machine.

Copy link
Contributor

@notnoop notnoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the right approach but I have few suggestions.

It might be good to rely on memory/cpu cgroup to get aggregate usage from kernel directly, rather than get all pids and then query and aggregate for each pid usage in user space; but that can be done in a follow up PR.

configureBasicCgroups(cfg)
err := configureBasicCgroups(cfg)
if err != nil {
e.logger.Debug("cgroup create", "error", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a warning or an error - maybe something like. Also would be nice to wrap the error.

Suggested change
e.logger.Debug("cgroup create", "error", err)
e.logger.Warn("failed to create cgroup", "error", err)
return fmt.Errorf("failed to create cgroups: %v", err)

if e.resConCtx.isEmpty() {
go e.pidCollector.collectPids(e.processExited, getAllPids)
} else {
go e.pidCollector.collectPids(e.processExited, e.getAllPids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing the same function name to refer to a executor method as well as a package function with different implementations make it a bit harder to follow code. Apparently, we do that already with libcontainer executor :(.

Also, l.getAllPids and resConCtx.isEmpty are only defined on linux, so compilation fails on Windows and macOS, e.g. https://ci.appveyor.com/project/hashicorp/nomad/builds/25919504 . See the pattern of executor_universal_linux.go, executor_windows.go, and executor_unix.go .

I would suggest refactoring this to have a universal executor getAllPids method that calls helpers to look up pids (e.g. getPidsByCgroup vs getPidsByScanningProcesses)?

return rc.groups == nil
}

func (e *UniversalExecutor) getAllPids() (map[int]*nomadPid, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move method to executor_universal_linux.go file as it's a UniversalExecutor linux method.

paths = rc.groups.Paths
} else {
paths[""] = rc.groups.Path
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cgroups, we want to use a single cgroup hierarchy. Checking multiple hierarchy would lead to duplicate values in the case of multiple cgroups attached.

In Nomad, we always create the freezer cgroup only in configureBasicCgroups[1], so that's an appropriate one to use. For comparison, docker/runc requires devices cgroup to be mounted, so they only check the devices subsystem[2]

[1]

// Manually create freezer cgroup
cfg.Cgroups.Paths = map[string]string{}
root, err := cgroups.FindCgroupMountpointDir()
if err != nil {
return err
}
if _, err := os.Stat(root); err != nil {
return err
}
freezer := cgroupFs.FreezerGroup{}
subsystem := freezer.Name()
path, err := cgroups.FindCgroupMountpoint("", subsystem)

[2] https://github.com/opencontainers/runc/blob/master/libcontainer/cgroups/fs/apply_raw.go#L286-L289

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in configureBasicCgroups, we make an empty Freezer and ask it it's name, which constantly returns "freezer". Should I do that here? or use just the string over there?

@langmartin langmartin requested review from nickethier and removed request for preetapan July 18, 2019 21:18
@langmartin langmartin merged commit e34c60e into master Jul 19, 2019
@langmartin langmartin deleted the b-cpu-pid-scan branch July 19, 2019 15:07
Copy link
Member

@nickethier nickethier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If theres a good way to get a test for resourceContainerContext.getAllPidsByCgroup that would be nice. That can be looked into post merge.

I agree with @notnoop that there may be more efficient means to get some of these stats if the mem/cpu cgroups are being utilized but that can be filed and done later too.

Nice work!

@github-actions
Copy link

github-actions bot commented Feb 6, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce CPU utilization in executors when scanning for PIDs.
4 participants