Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a flag for cpu_hard_limit #3825

Merged
merged 8 commits into from
Feb 9, 2018
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions client/driver/docker.go
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,13 @@ const (
// https://docs.docker.com/engine/reference/run/#block-io-bandwidth-blkio-constraint
dockerBasicCaps = "CHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID," +
"SETUID,SETFCAP,SETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE"

// This is cpu.cfs_period_us: the length of a period.
// The default values is 100 microsecnds represented in nano Seconds.
// Below is the documnentation:
// https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
// https://docs.docker.com/engine/admin/resource_constraints/#cpu
defaultCFSPeriod = 100000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for a task allocated 10% of the CPU it could pause for 90ms at a time? That seems awfully long, perhaps 10000 (10ms) or 1000 (1ms) increments would be a better level of granularity to ensure responsiveness for low priority interactive services?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if a task is allocated 10% of CPU and we keep the default 100 microseconds cfs_peroid then it will be paused for 90 microseconds. We can change that to 10 ms, for that we will have to pass CPUPeriod as well (which would always be 10 ms )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: Sorry, I meant "ms" as in milli seconds. I know that kernel doc uses "ms" to mean microseconds, so I'm sorry for confusing the matter. Within Nomad code/comments/docs let's always use "ms" for millis and "us" or "μs" for micros.

I'd be in favor of lowering this to 10000μs (10ms) to minimize pause duration unless somebody has a strong preference.

Since it's only used for an internal calculation tweaking it in the future should be ok (and it should probably even be configurable once we make this cross-driver).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schmichael The default for cfs period in docker is 100000, changing this would mean people's experience of using the quota flag would differ between using normal docker cli and Nomad. If you want a lower value, maybe introduce another flag to tweak that but keep the default as is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schmichael I am testing the changes that you recommended and running into an issue. When I specify cpu.cfs_period_us=10000 (milliseconds) and when I want to use 6% of CPU, I will have to specify cpu.cfs_quota_us=600(milliseconds). When I do that, it gives me an error: CPU cfs quota cannot be less than 1ms (i.e. 1000).

Copy link
Member

@schmichael schmichael Feb 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diptanu has a good point. Matching Docker's setting is the right thing to do. Sorry for the noise!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although now I'm confused by the error @jaininshah9 mentioned. According to Docker's API docs this setting should be in microseconds (like the kernel itself expects), not nanoseconds as the code comment implies.

The Docker docs lead me to believe this should be the value?

defaultCFSPeriod = 100

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schmichael You may be right, that got me confused as well, I am digging into the error and will let you know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think initially I made a mistake mentioning that it's a nanosecond. The reason I mentioned nanosecond it is because the docker docs says (see screenshot below) the default value is 100 microseconds, which is wrong, it should say 100 milliseconds (which is what kernel doc says).

screen shot 2018-02-06 at 6 53 23 pm

I also tested and the docker API does not do any translation. Whatever value we pass in CPUPeriod is what we find under /sys/fs/cgroups...
So the value is the code is correct, I need to update the comment on it though

Sorry for the confusion I created. I will add a change which has a clear comment on what the defaultCFSPeriod refers to.

)

type DockerDriver struct {
Expand Down Expand Up @@ -217,6 +224,7 @@ type DockerDriverConfig struct {
CapAdd []string `mapstructure:"cap_add"` // Flags to pass directly to cap-add
CapDrop []string `mapstructure:"cap_drop"` // Flags to pass directly to cap-drop
ReadonlyRootfs bool `mapstructure:"readonly_rootfs"` // Mount the container’s root filesystem as read only
CPUHardLimit bool `mapstructure:"cpu_hard_limit"` // Enforce CPU hard limit.
}

func sliceMergeUlimit(ulimitsRaw map[string]string) ([]docker.ULimit, error) {
Expand Down Expand Up @@ -674,6 +682,9 @@ func (d *DockerDriver) Validate(config map[string]interface{}) error {
"readonly_rootfs": {
Type: fields.TypeBool,
},
"cpu_hard_limit": {
Type: fields.TypeBool,
},
},
}

Expand Down Expand Up @@ -1119,6 +1130,12 @@ func (d *DockerDriver) createContainerConfig(ctx *ExecContext, task *structs.Tas
VolumeDriver: driverConfig.VolumeDriver,
}

// Calculate CPU Quota
if driverConfig.CPUHardLimit {
percentTicks := float64(task.Resources.CPU) / shelpers.TotalTicksAvailable()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use d.node.Resources.CPU instead as we don't always properly detect the available CPU so some users have to override it.

hostConfig.CPUQuota = int64(percentTicks * defaultCFSPeriod)
}

// Windows does not support MemorySwap/MemorySwappiness #2193
if runtime.GOOS == "windows" {
hostConfig.MemorySwap = 0
Expand All @@ -1137,6 +1154,9 @@ func (d *DockerDriver) createContainerConfig(ctx *ExecContext, task *structs.Tas

d.logger.Printf("[DEBUG] driver.docker: using %d bytes memory for %s", hostConfig.Memory, task.Name)
d.logger.Printf("[DEBUG] driver.docker: using %d cpu shares for %s", hostConfig.CPUShares, task.Name)
if driverConfig.CPUHardLimit {
d.logger.Printf("[DEBUG] driver.docker: using %d cpu quota (cpu-period: %d) for %s", hostConfig.CPUQuota, defaultCFSPeriod, task.Name)
}
d.logger.Printf("[DEBUG] driver.docker: binding directories %#v for %s", hostConfig.Binds, task.Name)

// set privileged mode
Expand Down