Customize node configuration: add pod-max-pids to avoid PID exhaustion #2276

tdihp · 2021-04-18T10:20:55Z

What happened:

Applications can allocate too many threads, triggering EAGAIN when kubelet/containerd tries to create new thread with pthread_create. We observe PLEG failures and nodes not ready due to some offending application.

What you expected to happen:

Add pod-pid-limits as an option for custom node configuration. Configure a smaller value for pods should provide safety for node readiness.

How to reproduce it (as minimally and precisely as possible):

Simply add a testing Python pod in 2 steps and wait for around 6 mins to wait for node not ready:

kubectl run -it --rm --restart=Never --image=python:3-slim python -- python

import time

def thread_burst(n=1000, t=600):
    import threading
    
    threads = []
    for i in range(n):
        thread = threading.Thread(target=time.sleep, args=(t,))
        try:
            thread.start()
        except Exception as e:
            print('got exception when starting thread: %s' % e)
            break
        threads.append(thread)
    return threads

def main():
    threads = []
    while True:
        print('bursting threads')
        threads.extend(thread_burst(n=1000))
        time.sleep(10)

main()

Environment:

Kubernetes version (use kubectl version): 1.19.7

The text was updated successfully, but these errors were encountered:

ghost · 2021-04-18T10:20:59Z

Hi tdihp, AKS bot here 👋
Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such:

If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster.
Please abide by the AKS repo Guidelines and Code of Conduct.
If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics?
Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS.
Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue.
If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

tdihp · 2021-04-18T10:21:26Z

related: #323

ghost · 2021-04-20T12:01:17Z

Triage required from @Azure/aks-pm

ghost · 2021-04-25T16:01:16Z

Action required from @Azure/aks-pm

ghost · 2021-05-10T18:02:00Z

Issue needing attention of @Azure/aks-leads

ghost · 2021-05-26T00:02:08Z

Issue needing attention of @Azure/aks-leads

ghost · 2021-06-10T06:02:09Z

Issue needing attention of @Azure/aks-leads

justindavies · 2021-06-17T11:00:21Z

A PR has been raised to enable this in the azure cli for the next release, and the documentation will also be updated. I'll keep this open until this has been completed

zacharias33 · 2021-08-30T13:08:49Z

Hi I am experiencing a similar issue on Kubernetes version 1.19.11. The PID space of a random node gets exhausted and my only solution for now is to restart that node. Do we have any updates on this feature?

thunter1000 · 2021-09-27T07:55:26Z

Hi this is also impacting us with Kubernetes version 1.191.11. When the PID space is exhausted this impacts our calico-node pod which impacts everything on the node. Restarting the node does seem to resolve the issue. Has this change been released yet?

ghost · 2022-03-31T16:00:41Z

Action required from @Azure/aks-pm

ghost · 2022-04-15T18:00:41Z

Issue needing attention of @Azure/aks-leads

tdihp · 2022-04-19T06:42:22Z

For users affected by this, I'd suggest to identify the culprit application causing the exhaustion.

tdihp · 2022-04-19T06:43:26Z

Closing as custom node configuration is now GA with podMaxPids

ghost added the triage label Apr 18, 2021

tdihp mentioned this issue Apr 19, 2021

Customize node configuration: kernel.threads-max is not sufficient for creating more container threads #2279

Open

ghost added the action-required label Apr 20, 2021

ghost added the Needs Attention 👋 Issues needs attention/assignee/owner label Apr 25, 2021

ghost removed action-required Needs Attention 👋 Issues needs attention/assignee/owner labels Jun 17, 2021

justindavies added the feature-request Requested Features label Jun 17, 2021

ghost removed the triage label Jun 17, 2021

ghost added the action-required label Mar 26, 2022

ghost added the Needs Attention 👋 Issues needs attention/assignee/owner label Mar 31, 2022

ghost removed action-required Needs Attention 👋 Issues needs attention/assignee/owner labels Apr 19, 2022

tdihp closed this as completed Apr 19, 2022

ghost locked as resolved and limited conversation to collaborators May 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customize node configuration: add pod-max-pids to avoid PID exhaustion #2276

Customize node configuration: add pod-max-pids to avoid PID exhaustion #2276

tdihp commented Apr 18, 2021

ghost commented Apr 18, 2021

tdihp commented Apr 18, 2021

ghost commented Apr 20, 2021

ghost commented Apr 25, 2021

ghost commented May 10, 2021

ghost commented May 26, 2021

ghost commented Jun 10, 2021

justindavies commented Jun 17, 2021

zacharias33 commented Aug 30, 2021 •

edited

Loading

thunter1000 commented Sep 27, 2021 •

edited

Loading

ghost commented Mar 31, 2022

ghost commented Apr 15, 2022

tdihp commented Apr 19, 2022

tdihp commented Apr 19, 2022

Customize node configuration: add pod-max-pids to avoid PID exhaustion #2276

Customize node configuration: add pod-max-pids to avoid PID exhaustion #2276

Comments

tdihp commented Apr 18, 2021

ghost commented Apr 18, 2021

tdihp commented Apr 18, 2021

ghost commented Apr 20, 2021

ghost commented Apr 25, 2021

ghost commented May 10, 2021

ghost commented May 26, 2021

ghost commented Jun 10, 2021

justindavies commented Jun 17, 2021

zacharias33 commented Aug 30, 2021 • edited Loading

thunter1000 commented Sep 27, 2021 • edited Loading

ghost commented Mar 31, 2022

ghost commented Apr 15, 2022

tdihp commented Apr 19, 2022

tdihp commented Apr 19, 2022

zacharias33 commented Aug 30, 2021 •

edited

Loading

thunter1000 commented Sep 27, 2021 •

edited

Loading