-
-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DynamicUser is broken #50273
Comments
Also related: #36297 |
Also related: #50042 (comment) It seems like |
When I run
It seems to try to load all the
Anything but the right place. So loading custom |
Okay so it actually does seem to load the libraries, after suggestion from @dezgeg
Then why is |
cc @Ekleog |
Also cc @peterhoeg who originally fixed this |
You can use https://github.com/systemd/systemd/blob/master/src/nss-systemd/nss-systemd.c#L190 |
In |
The call is being made. I more and more suspect this being a systemd issue
|
The response looks as expected. |
Indeed. so why is the systemd unit then failing on |
Ah I also get this in the logs:
Here is the full trace: |
This error seems to be coming from which leads us to: |
After hacking around with @Mic92 this turns out to be a cache invalidation problem. Since we only use This is already disabled when So the simplest option seems to be to also use this config when Note (this should probably be documented somewhere) we can't fully disable the cache, as then |
Awesome detective work! |
Systemd provides an option for allocating DynamicUsers which we want to use in NixOS to harden service configuration. However, we discovered that the user wasn't allocated properly for services. After some digging this turned out to be, of course, a cache inconsistency problem. When a DynamicUser creation is performed, Systemd check beforehand whether the requested user already exists statically. If it does, it bails out. If it doesn't, systemd continues with allocating the user. However, by checking whether the user exists, nscd will store the fact that the user does not exist in it's negative cache. When the service tries to lookup what user is associated to its uid (By calling whoami, for example), it will try to consult libnss_systemd.so However this will read from the cache and tell report that the user doesn't exist, and thus will return that there is no user associated with the uid. It will continue to do so for the cache duration time. If the service doesn't immediately looks up its username, this bug is not triggered, as the cache will be invalidated around this time. However, if the service is quick enough, it might end up in a situation where it's incorrectly reported that the user doesn't exist. Preferably, we would not be using nscd at all. But we need to use it because glibc reads nss modules from /etc/nsswitch.conf by looking relative to the global LD_LIBRARY_PATH. Because LD_LIBRARY_PATH is not set globally (as that would lead to impurities and ABI issues), glibc will fail to find any nss modules. Instead, as a hack, we start up nscd with LD_LIBRARY_PATH set for only that service. Glibc will forward all nss syscalls to nscd, which will then respect the LD_LIBRARY_PATH and only read from locations specified in the NixOS config. we can load nss modules in a pure fashion. However, I think by accident, we just copied over the default settings of nscd, which actually caches user and group lookups. We already disable this when sssd is enabled, as this interferes with the correct working of libnss_sss.so as it already does its own caching of LDAP requests. (See https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/usingnscd-sssd) Because nscd caching is now also interferring with libnss_systemd.so and probably also with other nsss modules, lets just pre-emptively disable caching for now for all options related to users and groups, but keep it for caching hosts ans services lookups. Note that we can not just put in /etc/nscd.conf: enable-cache passwd no As this will actually cause glibc to _not_ forward the call to nscd at all, and thus never reach the nss modules. Instead we set the negative and positive cache ttls to 0 seconds as a workaround. This way, Glibc will always forward requests to nscd, but results will never be cached. Fixes NixOS#50273
Systemd provides an option for allocating DynamicUsers which we want to use in NixOS to harden service configuration. However, we discovered that the user wasn't allocated properly for services. After some digging this turned out to be, of course, a cache inconsistency problem. When a DynamicUser creation is performed, Systemd check beforehand whether the requested user already exists statically. If it does, it bails out. If it doesn't, systemd continues with allocating the user. However, by checking whether the user exists, nscd will store the fact that the user does not exist in it's negative cache. When the service tries to lookup what user is associated to its uid (By calling whoami, for example), it will try to consult libnss_systemd.so However this will read from the cache and tell report that the user doesn't exist, and thus will return that there is no user associated with the uid. It will continue to do so for the cache duration time. If the service doesn't immediately looks up its username, this bug is not triggered, as the cache will be invalidated around this time. However, if the service is quick enough, it might end up in a situation where it's incorrectly reported that the user doesn't exist. Preferably, we would not be using nscd at all. But we need to use it because glibc reads nss modules from /etc/nsswitch.conf by looking relative to the global LD_LIBRARY_PATH. Because LD_LIBRARY_PATH is not set globally (as that would lead to impurities and ABI issues), glibc will fail to find any nss modules. Instead, as a hack, we start up nscd with LD_LIBRARY_PATH set for only that service. Glibc will forward all nss syscalls to nscd, which will then respect the LD_LIBRARY_PATH and only read from locations specified in the NixOS config. we can load nss modules in a pure fashion. However, I think by accident, we just copied over the default settings of nscd, which actually caches user and group lookups. We already disable this when sssd is enabled, as this interferes with the correct working of libnss_sss.so as it already does its own caching of LDAP requests. (See https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/usingnscd-sssd) Because nscd caching is now also interferring with libnss_systemd.so and probably also with other nsss modules, lets just pre-emptively disable caching for now for all options related to users and groups, but keep it for caching hosts ans services lookups. Note that we can not just put in /etc/nscd.conf: enable-cache passwd no As this will actually cause glibc to _not_ forward the call to nscd at all, and thus never reach the nss modules. Instead we set the negative and positive cache ttls to 0 seconds as a workaround. This way, Glibc will always forward requests to nscd, but results will never be cached. Fixes NixOS#50273
Since upstream systemd merged their nscd cache invalidation fix and released it in v240, and we're on v242 now, should some caching be turned on again? Seems to me like it'd be nice, but I can't imagine getent lookups are hugely performance-critical. 🤷♂️ Either way though, perhaps you folks would be good reviewers for my pull request #64268, which gets rid of various years-old cruft from the nscd module. If non-zero TTLs are still worth-while, I'd hope my patches would be merged first. |
Issue description
Currently,
DynamicUser
seems to be broken. I thought this was a systemdissue, but they claim it's because we misconfigured
nss-systemd
.systemd/systemd#10740
However, I thought we configured this correctly since
18.09
since this commit72a64ea
and
seems to indeed confirm this. Yet
DynamicUser
is still currently broken in18.09
.I have a feeling
nscd
isn't correctly loadingnss_modules
that aren't shipped withglibc
.This would mean
mymachines
andmyhostname
aren't loaded either if that's the caseSteps to reproduce
Technical details
Please run
nix-shell -p nix-info --run "nix-info -m"
and paste theresults.
The text was updated successfully, but these errors were encountered: