-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change/add params to support CIS scan #1295
Change/add params to support CIS scan #1295
Conversation
packages/release/release-sysctl.conf
Outdated
kernel.panic_on_oops = 1 | ||
|
||
# Overcommit handling mode - 1: Always overcommit | ||
vm.overcommit_memory = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any information on why the overcommit mode would need to be changed from the default of 0 to 1?
I don't know if this would be appropriate for all variants. In particular, this would be an unexpected change for the aws-ecs-1
variant, as the AL2-based ECS-optimized AMIs inherit AL2's default setting of 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any information on why the overcommit mode would need to be changed from the default of 0 to 1?
Per kubernetes/kubernetes#14935 it's done to disable the heuristics that could let a "best effort" container cause malloc to fail for a "guaranteed" container.
I don't know if this would be appropriate for all variants.
If ECS doesn't have different QoS classes then it may not apply. I tend to want consistent behavior across variants here, though. The kernel's heuristics can introduce some surprising behaviors - running containers as root or not can give different results; applications can see unexpected malloc failures even if they're within their limits.
Allowing containers to exceed their reservation and killing them when they do is often much easier for operators to reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ECS doesn't have different QoS classes then it may not apply.
ECS does not have different QoS classes; there are no "best effort" or "guaranteed" containers. Instead, ECS controls memory limits on containers via cgroups; every container has at least a memory limit or a memory reservation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bcressey and I dug into this a bit more together. The kernel behavior for OVERCOMMIT_GUESS
(0
, the default) changed in Linux 5.2. The new behavior is that allocations only fail if the requested virtual memory allocation exceeds the physical ram + swap space rather than using more complicated heuristics to guess. This brings the behavior closer to OVERCOMMIT_ALWAYS
(1
), except that "wild allocations" will still fail.
Bottlerocket's kernel is 5.4, meaning that it includes this change. With that said, my opinion is that we should leave the kernel configured with the default behavior (OVERCOMMIT_GUESS
) in all variants; the QoS issues described in kubernetes/kubernetes#14935 shouldn't happen in Bottlerocket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samuelkarp found that my understanding is out-of-date, and the kernel documentation is no longer accurate since torvalds/linux@8c7829b. There's no special treatment of root or processes with CAP_SYS_ADMIN
since kernel 5.2+.
The difference is now:
overcommit_memory = 0
only blocks requests for more memory than is physically present in ram + swapovercommit_memory = 1
allows all requests
I doubt the Kubernetes check is still relevant for newer kernels. It could be relaxed upstream to allow either 0
or 1
- but the application code would need to account for the running kernel version which is not recommended as a general practice.
The two paths I see are to patch our kubelets so they expect 0
instead of 1
, or ship a sysctl snippet for kubelet so that we can apply the expected value, and take advantage of the other checks for protected settings. I prefer the second approach.
@felipeac - can you split this into a few commits based on discussion?
The second one would add |
This should be fixed properly with aquasecurity/kube-bench#808 and bottlerocket-os/bottlerocket#1295 But, as these are known failures, this should not be a failure in our CI. Workaround by patching kube-bench config for bottlerocket config paths, and allowing for a known number of failures.
This should be fixed properly with aquasecurity/kube-bench#808 and bottlerocket-os/bottlerocket#1295 But, as these are known failures, this should not be a failure in our CI. Workaround by patching kube-bench config for bottlerocket config paths, and allowing for a known number of failures.
Hi @felipeac just checking in to see if you need any help with the split. |
cd8cd15
to
2263c22
Compare
33eeb8a
to
61f1cd1
Compare
@felipeac - Thanks again for your contribution here! You've had the bad luck of touching some files - the various Kubernetes specs - that have seen more churn over the last few weeks as we've added new features. If it helps, I'm happy to do the rebase to solve the merge conflicts, and make that last tweak from Let me know if you have any objections, otherwise I'll plan to pick this up next week. |
@bcressey - Sorry about the delay. No worries. I'll push those changes tomorrow. Thanks! |
61f1cd1
to
63774ff
Compare
@felipeac - thanks! This looks good to me. My lingering concern is around I need to do some more testing to see if this is going to be a problem. Worst case, this would end up blocked on the backport of the newer BPF helpers to the 5.4 kernel, which is currently in progress. |
Ah, that issue is a kernel "warning" so it'd be controlled by |
Summary
Issue number #1294
This PR update configs to achieve good results on a CIS scan using kube-bench.
These were my results:
After update the kubelet location on kube-bench
Issue opened on kube-bench: aquasecurity/kube-bench#808
After building a new Bottlerocket AMI with the changes requested in this PR
Testing done:
I've only tested it by deploying a new image and running the CIS scan, but I'm more than happy to run any required tests.
Important: I've only validated those changes in an EKS cluster running Kubernetes 1.18. Eventually, if this PR gets approved, I'll apply and validate the same change for different versions.
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.
PS: Thanks to @gregdek for the support