-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet: add setting for configuring cpuManagerPolicy #1638
Conversation
072e2d7
to
c584417
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to add a setting for cpu-manager-reconcile-period
while we're at it.
Could you backport this option to k8s 1.18 and 1.19? I think the migration process will not make any problems if you set "none" as default value. |
2ff61ea
to
a259e4c
Compare
Push above
|
sources/api/migration/migrations/v1.1.3/cpu-manager-policy/src/main.rs
Outdated
Show resolved
Hide resolved
a259e4c
to
ad7063d
Compare
Push above improving error handle for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the CPU state migration, it might be better to just always remove the state file on boot, when we're sure no pods are running.
I'd recommend trying it without the migration, and adding this line to all the kubernetes-tmpfiles.conf
snippets instead:
r! /var/lib/kubelet/cpu_manager_state
That still doesn't make the setting safe to switch on a running system, but it at least means a reboot will always fix it.
sources/api/migration/migrations/v1.1.3/cpu-manager-policy/src/main.rs
Outdated
Show resolved
Hide resolved
sources/api/migration/migrations/v1.1.3/cpu-manager-policy/src/main.rs
Outdated
Show resolved
Hide resolved
sources/api/migration/migrations/v1.1.3/cpu-manager-policy/src/main.rs
Outdated
Show resolved
Hide resolved
Oops - we'll still need the migration to fix up the downgrade path when going back to a version that didn't have the |
ad7063d
to
b40cedd
Compare
Push above
|
sources/api/migration/migrations/v1.1.3/kubelet-cpu-manager-state/src/main.rs
Outdated
Show resolved
Hide resolved
|
||
impl Migration for CpuManagerPolicyCleaner { | ||
fn forward(&mut self, input: MigrationData) -> Result<MigrationData> { | ||
println!("CpuManagerPolicyCleaner forward has no work to do on upgrade."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's best to explain why it doesn't have work to do on upgrade. We're not changing the value of a setting or anything. It's just that we're adding a behavior where we always remove the cpu_manager_state
checkpoint file, therefore we don't need to explicitly remove the file during forward migration.
sources/api/migration/migrations/v1.1.3/kubelet-cpu-manager-state/src/main.rs
Outdated
Show resolved
Hide resolved
sources/api/migration/migrations/v1.1.3/kubelet-cpu-manager-state/src/main.rs
Outdated
Show resolved
Hide resolved
sources/api/migration/migrations/v1.1.3/kubelet-cpu-manager-state/src/main.rs
Outdated
Show resolved
Hide resolved
sources/api/migration/migrations/v1.1.3/kubelet-cpu-manager-state/src/main.rs
Outdated
Show resolved
Hide resolved
b40cedd
to
d0bc9be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Push above improve KubernetesDurationValue
regex and improve comments/variable name
sources/api/migration/migrations/v1.1.3/kubelet-cpu-manager-state/src/main.rs
Outdated
Show resolved
Hide resolved
sources/api/migration/migrations/v1.1.3/kubelet-cpu-manager-state/src/main.rs
Outdated
Show resolved
Hide resolved
sources/api/migration/migrations/v1.1.3/kubelet-cpu-manager-state/src/main.rs
Outdated
Show resolved
Hide resolved
d0bc9be
to
943417f
Compare
Push above:
|
README.md
Outdated
@@ -358,6 +358,11 @@ The following settings are optional and allow you to further configure your clus | |||
* `settings.kubernetes.kube-api-burst`: The burst to allow while talking with kubernetes. | |||
* `settings.kubernetes.container-log-max-size`: The maximum size of container log file before it is rotated. | |||
* `settings.kubernetes.container-log-max-files`: The maximum number of container log files that can be present for a container. | |||
* `settings.kubernetes.cpu-manager-policy`: Specifies the CPU manager policy. Possible values are `static` and `none`. Defaults to `none`. If you want to allow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We usually wrap at the sentence break. This seems like kind of a wide wrapping.
943417f
to
b1ba1e4
Compare
push above improve README. |
pass cpu-manager-policy argument to kubelet
upgrade: remove the cpu manager policy state file on boot downgrade: add a migration `kubelet-cpu-manager-state` to remove the cpu manager policy state file
pass cpu-manager-reconcile-period argument to kubelet
…oncile-period adds migration for two new settings `settings.kubernetes.cpu-manager-policy` and `settings.kubernetes.cpu-manager-reconcile-period`
b1ba1e4
to
48638ed
Compare
Push above improve README according to Tom's comments |
Issue number:
Part of #1628
Description of changes:
Fluent-bit does not work with
cpuPolicyManager: static
, and it does work withcpuPolicyManager: none
. We exposekubernetes.cpu-manager-policy
as a setting, so customers can specifycpuManagerPolicy
tonone
and not be blocked by this issue while running fluent bit.kubernetes.cpu-manager-policy
as a settingkubernetes.cpu-manager-reconcile-policy
as a settingSolution about deal with the issue that changing the CPU manager policy caused issues on upgrade/downgrade
error
The reason why upgrade/downgrade of CPU manager policy migration is configured policy "static" differs from state checkpoint policy "none". Therefore, I use migration tool to delete cpu-manager-policy checkpoint for every migration process. No orchestrated workload would be running when the migrator is running, so it should be safe to delete that file before kubelet starts.
Testing done:
cpu-manager-policy
cpu-manager-reconcile-policy
cpu-manager-policy
cpu-manager-reconcile-policy
cpu-manager-policy
launching instance with the AMI which contains new setting and check if
cpu-manager-policy
has been configured expectantly. Meanwhile, configure it asnone
and test if fluent-bit pod work as expected.Step1 check if
cpu-manager-policy
has been configured expectantly:Test with
cpu-manager-policy= "none"
Test with default ("none")
Test with
cpu-manager-policy= "static"
Step2 configure it as
none
and test if fluent-bit pod work as expected:result (web server works perfectly ):
Migration test:
upgrade
Step1: Upgrade to v1.1.3
cat var/lib/kubelet/cpu_manager_state
{"policyName":"static","defaultCpuSet":"0-3","checksum":611748604}
Step2: check if cpu-manager-policy works as expected after upgrading
Step3: Specify new setting
cpu-manage-policy
through control containerdowngrade
Step1: Check migration binary
Step2: Downgrade to previous verison
Step3: check if cpu-manager-policy works as expected after downgrading
Step4: Check if
cpu-manager-policy
have been removedcpu-manager-reconcile-period
launching instance with the AMI which contains new setting and check if
cpu-manager-reconcile-period
has been configured expectantly.Step1 check if
cpu-manager-reconcile-period
has been configured expectantly:Test with
cpu-manager-reconcile-period = 100s
Test with
cpu-manager-reconcile-period = 10m
Test with
cpu-manager-reconcile-period = 1h
Test with
cpu-manager-reconcile-period = 1h0m20s
Test with
cpu-manager-reconcile-period = 10m20s
Test with
cpu-manager-reconcile-period = 1h10m2s
Migration test:
upgrade
Step1: Upgrade to v1.1.3
Step2: Specify new setting
cpu-manager-reconcile-period
through control containerdowngrade
Step1: Check migration binary
Step2: Downgrade to previous verison
Step3: Check if
cpu-manager-policy
have been removedTerms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.