Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use sd-notify to signal readyness #35

Merged
merged 2 commits into from
Oct 18, 2022
Merged

Conversation

flokli
Copy link
Contributor

@flokli flokli commented Oct 7, 2022

Provide a way to signal nsncd has successfully started up, and is ready to accept connections.

A more reliable way would be to only signal this once at least one worker has started up, but this is good enough (tm) for now.

This can be used in a systemd.service file with Type=notify.

@leifwalsh
Copy link
Collaborator

@flokli can you tell us a bit more about why you want to add this? What can you do with this change that you can't without?

I know this isn't the change you're proposing, but just FYI if you're doing systemd things, we tried using socket activation once and hit a deadlock: systemd only activates one service at a time, so if the activation of another service tries to do a NSS lookup, that will make systemd enqueue the activation of nsncd, but won't actually try to start it until the other service completes its own startup (which then won't proceed since it's waiting for nsncd to respond to the lookup). #10

@leifwalsh
Copy link
Collaborator

Actually, from #9:

I think it's also best to avoid readiness notification, because I don't totally understand whether the time between starting nsncd and getting the readiness notification counts as a running job and what that means for the job queue. We should just make nsncd a Type=simple service. This might cause race conditions with services that require nsncd (i.e., services with a foreign libc) that try to say Requires=nsncd.service, because they don't know when the NSCD socket has actually been created. If that causes a problem in practice for anyone, we should come back and add readiness notification and test it a bit more.

So...it sounds like @geofft considered that the change you're proposing might be good, but also maybe could cause similar job queue problems? I think we just never tested this out and if you want it it might be good but we should test carefully.

I'll wait to hear more about your use case first though.

@leifwalsh leifwalsh requested a review from geofft October 10, 2022 01:26
@leifwalsh leifwalsh added the enhancement New feature or request label Oct 10, 2022
@flokli
Copy link
Contributor Author

flokli commented Oct 10, 2022

So...it sounds like @geofft considered that the change you're proposing might be good, but also maybe could cause similar job queue problems? I think we just never tested this out and if you want it it might be good but we should test carefully.

I'll wait to hear more about your use case first though.

I plan using nsncd on NixOS, as a drop-in replacement for glibc-nscd (pending #37). In NixOS, as there's no common /usr/lib, glibc code paths in client applications don't know how to load any "non-glibc" nss modules listed in /etc/nsswitch.conf, so we absolutely need to do the lookups via the nscd socket - otherwise lookups provided by external NSS modules fail.

We put nscd.service with a RequiredBy=nss-lookup.target nss-user-lookup.target (and even think about pulling it in in early boot), so systemd waits until nscd is ready before starting services that have After= set to one of these targets.

glibc-nscd does fork when it's ready (Type=forking), but nsncd is currently using Type=simple, so systemd has no idea when the service is ready, starts it as soon as it started the nsncd process, while that one might not have created the unix socket, to accept connections. So downstream clients fail to connect.

This change lets nsncd signal readyness via https://docs.rs/sd-notify/latest/sd_notify/ once the nscd unix socket has been created, which is more precise.

I don't see how this could cause any deadlocking of service startup.

The issue @geofft mentioned was probably caused by the socket activation, and systemd only activating one service at a time (?) 1

Footnotes

  1. even though that smells like a bug that should be investigated, but let's leave it out of scope for here.

@flokli
Copy link
Contributor Author

flokli commented Oct 13, 2022

With #36 merged, I rebased this.

It now only adds the sd-notify to Cargo.lock, and all the upgrade noise is gone.

Copy link
Collaborator

@geofft geofft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I'm convinced that this cannot deadlock: the problem is that when one job asks systemd to start up the NSCD, systemd won't do it until the job is done. Here we're not asking systemd to do anything. We're informing systemd that we're ready, but we're already listening and handling requests on our own. systemd is asking us if we've started up so it can enqueue downstream jobs. Even if the time between process start and readiness notification counts as a job to systemd (which I'm still unclear about) and prevents other jobs from running, there aren't other jobs, i.e., nsncd itself doesn't need to activate anything else.

(And yes, I believe this is a systemd bug or at least design flaw, and I was surprised by the response to that freedesktop bug report linked in #9. But if systemd is designed around a concept of jobs that are serially executed, it seems like a pain to fix, and it seems like the maintainers are saying "You're not supposed to use systemd this way.")

@@ -93,6 +93,8 @@ fn main() -> Result<()> {
std::fs::set_permissions(path, std::fs::Permissions::from_mode(0o777))?;
spawn_acceptor(&mut wg, &logger, listener, tx, handoff_timeout);

let _ = sd_notify::notify(true, &[NotifyState::Ready]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was curious about this but a) that's the example in the Rust crate and b) the C docs specifically say, "...it is generally recommended to ignore the return value of this call."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://docs.rs/sd-notify/latest/src/sd_notify/lib.rs.html#105

You don't want to fail if informing about readyness failed, e.g. if the service is /not/ run via systemd and/or the communication with the unix socket not possible.

See also https://gist.github.com/grawity/6e5980981dccf66f554bbebb8cd169fc.

flokli and others added 2 commits October 18, 2022 11:28
Provide a way to signal nsncd has successfully started up, and is ready
to accept connections.

A more reliable way would be to only signal this once at least one
worker has started up, but this is good enough (tm) for now.

This can be used in a systemd.service file with Type=notify.
@codecov-commenter
Copy link

Codecov Report

Base: 30.18% // Head: 30.04% // Decreases project coverage by -0.14% ⚠️

Coverage data is based on head (9fa26c5) compared to base (1d0bf2b).
Patch coverage: 0.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #35      +/-   ##
==========================================
- Coverage   30.18%   30.04%   -0.15%     
==========================================
  Files           5        5              
  Lines         212      213       +1     
==========================================
  Hits           64       64              
- Misses        148      149       +1     
Impacted Files Coverage Δ
src/main.rs 0.00% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@geofft
Copy link
Collaborator

geofft commented Oct 18, 2022

I'm gonna merge this by itself and do an internal release to make sure nothing exciting happens. (The changes are de minimis so we don't need to bother with a CLA for this particular PR.)

@geofft geofft merged commit 706d416 into twosigma:main Oct 18, 2022
@flokli flokli deleted the sd-notify branch November 5, 2022 08:29
@leifwalsh leifwalsh mentioned this pull request Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants