Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

al2023: Use base_runtime_spec to set default rlimits #1794

Merged
merged 1 commit into from
May 31, 2024

Conversation

cartermckinnon
Copy link
Member

@cartermckinnon cartermckinnon commented May 10, 2024

Issue #, if available:

Fixes #1746

Description of changes:

This uses the base_runtime_spec option of containerd to set RLMIT_NOFILE defaults that are much lower than 2^63-1, the default on AL2023. This approach lets us define the limits for containers without changing containerd's own limits.

I've gone with 65536:1048576 with the understanding that programs using select(2) will need lower limits to be defined by the user to avoid undefined behavior. In practice, it's more common for our users to run programs that do not raise their limits appropriately--such as Envoy--so I think this is an acceptable default today. Bottlerocket is using the same values, so the consistency is beneficial: https://github.com/bottlerocket-os/bottlerocket/blob/fcf71a47c0ff005327ecde870d8a70877fc89196/sources/models/shared-defaults/oci-defaults-containerd-cri-resource-limits.toml

These defaults can be overridden by a new NodeConfig field, Spec.Containerd.BaseRuntimeSpec. This field behaves the same as Spec.Kubelet.Config -- it's an unstructured JSON/YAML document that will be merged on top of the default, allowing partial overrides.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@cartermckinnon
Copy link
Member Author

/ci
+workflow:os_distros al2023

Copy link
Contributor

@cartermckinnon roger that! I've dispatched a workflow. 👍

Comment on lines 6 to 8
"type": "RLIMIT_NOFILE",
"soft": 65536,
"hard": 1048576
Copy link
Member

@ndbaker1 ndbaker1 May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appreciate making this configurable, but do we have to set a default?
asking based on the time we updated limits in #1535 in al2, but I see the benefit of making this a breaking al2023 change to set the default and follow bottlerocket's example

Copy link
Member Author

@cartermckinnon cartermckinnon May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we have to set a lower default in order to fix #1746. We don't want folks to have to pass baseRuntimeSpec to run stuff as common as Redis.

The issues caused by #1535 were mostly related to the 1024 effective soft limit (which is the "correct" config but is nevertheless incompatible with a lot of software written in the container age). That's why we use a much higher soft limit here, as well as a higher hard limit. I think this is fairly unlikely to be disruptive; and there's an easy mitigation if it does (setting baseRuntimeSpec).

Copy link
Contributor

@cartermckinnon the workflow that you requested has completed. 🎉

AMI variantBuildTest
1.23 / al2023success ✅failure ❌
1.24 / al2023success ✅failure ❌
1.25 / al2023success ✅failure ❌
1.26 / al2023success ✅failure ❌
1.27 / al2023success ✅failure ❌
1.28 / al2023success ✅failure ❌
1.29 / al2023success ✅failure ❌
1.30 / al2023failure ❌skipped ⏭️

@cartermckinnon
Copy link
Member Author

cartermckinnon commented May 10, 2024

Argh, we have to pass a complete OCI spec instead of just process.rlimits. I'm going to merge the NodeConfig's baseRuntimeSpec with a fully-formed default, because if someone wants to override the rlimits, they shouldn't have to pass all the other junk.

@ps-jay
Copy link

ps-jay commented May 23, 2024

Thanks for your work on this @cartermckinnon

@cartermckinnon cartermckinnon force-pushed the al2023-rlimits branch 7 times, most recently from c694e6d to cfcccdf Compare May 29, 2024 23:29
@cartermckinnon
Copy link
Member Author

/ci
+workflow:os_distros al2023

Copy link
Contributor

@cartermckinnon roger that! I've dispatched a workflow. 👍

Copy link
Contributor

@cartermckinnon the workflow that you requested has completed. 🎉

AMI variantBuildTest
1.23 / al2023success ✅failure ❌
1.24 / al2023success ✅failure ❌
1.25 / al2023success ✅failure ❌
1.26 / al2023success ✅failure ❌
1.27 / al2023success ✅failure ❌
1.28 / al2023success ✅failure ❌
1.29 / al2023success ✅failure ❌
1.30 / al2023success ✅failure ❌

@dims
Copy link
Member

dims commented May 30, 2024

@cartermckinnon is this the same limits used in bottlerocket?

@cartermckinnon
Copy link
Member Author

@dims yep!

@cartermckinnon
Copy link
Member Author

Need to fix something up with the capabilities block in the latest rev

[sig-node] Security Context When creating a pod with privileged should run the container as unprivileged when false [LinuxOnly] [NodeConformance] [Conformance]

@cartermckinnon
Copy link
Member Author

/ci
+workflow:os_distros al2023

Copy link
Contributor

@cartermckinnon roger that! I've dispatched a workflow. 👍

Copy link
Contributor

@cartermckinnon the workflow that you requested has completed. 🎉

AMI variantBuildTest
1.23 / al2023success ✅failure ❌
1.24 / al2023success ✅failure ❌
1.25 / al2023success ✅failure ❌
1.26 / al2023success ✅failure ❌
1.27 / al2023success ✅failure ❌
1.28 / al2023success ✅failure ❌
1.29 / al2023success ✅failure ❌
1.30 / al2023success ✅failure ❌

@cartermckinnon
Copy link
Member Author

/ci
+workflow:os_distros al2023

Copy link
Contributor

@cartermckinnon roger that! I've dispatched a workflow. 👍

Copy link
Contributor

@cartermckinnon the workflow that you requested has completed. 🎉

AMI variantBuildTest
1.23 / al2023success ✅success ✅
1.24 / al2023success ✅success ✅
1.25 / al2023success ✅success ✅
1.26 / al2023success ✅success ✅
1.27 / al2023success ✅success ✅
1.28 / al2023success ✅success ✅
1.29 / al2023success ✅success ✅
1.30 / al2023success ✅success ✅

@cartermckinnon
Copy link
Member Author

Alright, there we go. Problem was CAP_NET_ADMIN.

@cartermckinnon cartermckinnon force-pushed the al2023-rlimits branch 4 times, most recently from e4f8dc6 to 35ddf3c Compare May 31, 2024 20:14
@@ -0,0 +1,174 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, how this been generated? Is there any reference I can read to understand more in case we need to debug sth for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I derived this from a few sources:

  1. The default base OCI spec given by ctr oci spec.
  2. An actual spec for a running container grabbed with ctr container info.
  3. The base spec used by Bottlerocket for this same purpose (providing RLIMIT overrides): https://github.com/bottlerocket-os/bottlerocket/blob/b3e476f382dbaf6bb6bea20d99d3bf143662cef6/packages/containerd/containerd-cri-base-json#L4

The spec itself is thoroughly documented here: https://github.com/opencontainers/runtime-spec

Copy link
Member

@ndbaker1 ndbaker1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for tackling this 👏

@cartermckinnon cartermckinnon merged commit b15c2b7 into main May 31, 2024
10 checks passed
@cartermckinnon cartermckinnon deleted the al2023-rlimits branch May 31, 2024 23:49
@bjhaid
Copy link

bjhaid commented Jun 4, 2024

when will an AMI containing this be released?

@bjhaid
Copy link

bjhaid commented Jun 6, 2024

There was a pre-release 3 hours ago https://github.com/awslabs/amazon-eks-ami/releases/tag/v20240605 that does not contain this PR. @ndbaker1 @Issacwww what needs to be done to get this changes in the next AMI release?

@cartermckinnon
Copy link
Member Author

This will be included in an AMI next week.

@msvechla
Copy link

Is there an update on when this will be released? We are waiting for this hotfix

@bjhaid
Copy link

bjhaid commented Jun 13, 2024

@cartermckinnon the week is almost over 😢 . We are actively blocked from completing our upgrade and this AMI will unblock us

atmosx pushed a commit to gathertown/amazon-eks-ami that referenced this pull request Jun 18, 2024
@cartermckinnon
Copy link
Member Author

This is going out today in v20240615: https://github.com/awslabs/amazon-eks-ami/releases/tag/v20240615

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AL2023: OOMKills caused by very high file descriptor limits
7 participants