Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daemonize thar-be-settings to avoid zombie processes #1507

Merged
merged 2 commits into from
Apr 21, 2021

Conversation

tjkirch
Copy link
Contributor

@tjkirch tjkirch commented Apr 20, 2021

Issue number:

Better fix for #1380. (Replaces #1384.)

Description of changes:

    Revert "apiserver: reap exited child processes"
    
    This reverts commit 5c9eb7c0a5302405738aa47eba8e3a0c5f2116b4.
    
    The waitpid thread was intercepting non-zombie children, because it waits on
    pid -1 (any child), and does so very rapidly.  In some cases, this means things
    like `Command::new().status()` that waitpid internally are racing the waitpid
    thread, lose the race, and can't find their own child.
    
    This was seen as apiserver (rarely) failing to launch thar-be-updates when an
    updates API call came in; in fact, t-b-u would run, but apiserver couldn't
    report its status and so had to fail.
    Daemonize thar-be-settings to avoid zombie processes
    
    5c9eb7c0a530 did this with a waitpid loop, but that would (rarely) catch
    processes freshly spawned by Command::new before status() could wait for them
    itself, causing a "No child processes" error for the caller of Command.status.
    
    Instead, this adds a --daemon flag to thar-be-settings, used by apiserver, that
    forks quickly after starting, so it reparents itself to init.  Interactive use
    of thar-be-settings wouldn't pass --daemon, so you can still give settings on
    the CLI for debugging purposes.

Testing done:

Similar to #1384:

  • Confirmed that apiclient set leaves no zombies.
  • Confirmed that a tight loop of apiclient set is OK; I see thar-be-settings --daemon running with a parent PID of 1, changes happen successfully, no zombies.
  • Confirmed that manual PATCH, /tx/commit, /tx/apply was OK because that follows a slightly different code path to call thar-be-settings.
  • Confirmed that calling thar-be-settings manually, without --daemon, used stdin interactively OK. echo bla | t-b-s also works. --all works too, with no stdin. --daemon --all works too, and does obviously fork.
  • systemctl status running, pod ran OK.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

This reverts commit 5c9eb7c.

The waitpid thread was intercepting non-zombie children, because it waits on
pid -1 (any child), and does so very rapidly.  In some cases, this means things
like `Command::new().status()` that waitpid internally are racing the waitpid
thread, lose the race, and can't find their own child.

This was seen as apiserver (rarely) failing to launch thar-be-updates when an
updates API call came in; in fact, t-b-u would run, but apiserver couldn't
report its status and so had to fail.
Copy link
Contributor

@samuelkarp samuelkarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@webern webern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good solution.

sources/api/thar-be-settings/src/main.rs Outdated Show resolved Hide resolved
sources/api/thar-be-settings/src/main.rs Show resolved Hide resolved
5c9eb7c did this with a waitpid loop, but that would (rarely) catch
processes freshly spawned by Command::new before status() could wait for them
itself, causing a "No child processes" error for the caller of Command.status.

Instead, this adds a --daemon flag to thar-be-settings, used by apiserver, that
forks quickly after starting, so it reparents itself to init.  Interactive use
of thar-be-settings wouldn't pass --daemon, so you can still give settings on
the CLI for debugging purposes.
@tjkirch
Copy link
Contributor Author

tjkirch commented Apr 21, 2021

^ Clarify the comment, per @webern.

@tjkirch tjkirch merged commit 6cc444d into bottlerocket-os:develop Apr 21, 2021
@tjkirch tjkirch deleted the catch-kidnapper branch April 21, 2021 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants