Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DynamicUser is broken #50273

Closed
arianvp opened this issue Nov 12, 2018 · 17 comments · Fixed by #50316
Closed

DynamicUser is broken #50273

arianvp opened this issue Nov 12, 2018 · 17 comments · Fixed by #50316

Comments

@arianvp
Copy link
Member

arianvp commented Nov 12, 2018

Issue description

Currently, DynamicUser seems to be broken. I thought this was a systemd
issue, but they claim it's because we misconfigured nss-systemd.
systemd/systemd#10740

However, I thought we configured this correctly since 18.09 since this commit
72a64ea

and

[arian@t430s:~/Projects/rfcs]$ cat /etc/nsswitch.conf 
passwd:    files mymachines systemd
group:     files mymachines systemd
shadow:    files

hosts:     files mymachines dns myhostname
networks:  files

ethers:    files
services:  files
protocols: files
rpc:       files

seems to indeed confirm this. Yet DynamicUser is still currently broken in 18.09.

I have a feeling nscd isn't correctly loading nss_modules that aren't shipped with glibc.
This would mean mymachines and myhostname aren't loaded either if that's the case

Steps to reproduce

$ sudo systemd-run --pty --property=DynamicUser=yes --property=User=iamatest whoami
Running as unit: run-u347.service
Press ^] three times within 1s to disconnect TTY.
/run/current-system/sw/bin/whoami: cannot find name for user ID 63915

Technical details

Please run nix-shell -p nix-info --run "nix-info -m" and paste the
results.

@arianvp
Copy link
Member Author

arianvp commented Nov 12, 2018

Related #50011 #49228

@arianvp
Copy link
Member Author

arianvp commented Nov 12, 2018

Also related: #36297

@arianvp
Copy link
Member Author

arianvp commented Nov 12, 2018

Also related: #50042 (comment)

It seems like nscd isn't properly loading all nss_modules

@arianvp
Copy link
Member Author

arianvp commented Nov 12, 2018

When I run

sudo systemctl stop nscd
sudo strace nscd -d

It seems to try to load all the nss_modules from a very strange location, confirming my suspicion that nscd isn't loading any of the NSS modules out side of glibc

[arian@t430s:~/Projects/nixpkgs]$ sudo strace nscd -d 2>&1 | grep nss
openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/run/opengl-driver/lib/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/run/opengl-driver/lib/libnss_mymachines.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libnss_mymachines.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/run/opengl-driver/lib/libnss_systemd.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libnss_systemd.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/run/opengl-driver/lib/libnss_dns.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libnss_dns.so.2", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/run/opengl-driver/lib/libnss_myhostname.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libnss_myhostname.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)


Anything but the right place. So loading custom nss modules seems to be broken.

@arianvp arianvp changed the title DynamicUser support is broken Loading NSS Modules that are not in glibc is broken Nov 12, 2018
@arianvp arianvp changed the title Loading NSS Modules that are not in glibc is broken DynamicUser is broken because Loading NSS Modules that are not in glibc is broken Nov 12, 2018
@arianvp
Copy link
Member Author

arianvp commented Nov 12, 2018

Okay so it actually does seem to load the libraries, after suggestion from @dezgeg

[arian@t430s:~]$ sudo strace -E LD_LIBRARY_PATH=/nix/store/pb1vvbzzm0q9axbcbv9989mbz10f881z-systemd-239/lib  nscd -d 2>&1 | grep nss
openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/nix/store/pb1vvbzzm0q9axbcbv9989mbz10f881z-systemd-239/lib/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/nix/store/pb1vvbzzm0q9axbcbv9989mbz10f881z-systemd-239/lib/libnss_mymachines.so.2", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/nix/store/pb1vvbzzm0q9axbcbv9989mbz10f881z-systemd-239/lib/libnss_systemd.so.2", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/nix/store/pb1vvbzzm0q9axbcbv9989mbz10f881z-systemd-239/lib/libnss_dns.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libnss_dns.so.2", O_RDONLY|O_CLOEXEC) = 4
openat(AT_FDCWD, "/nix/store/pb1vvbzzm0q9axbcbv9989mbz10f881z-systemd-239/lib/libnss_myhostname.so.2", O_RDONLY|O_CLOEXEC) = 4

Then why is DynamicUser not working?

@arianvp arianvp changed the title DynamicUser is broken because Loading NSS Modules that are not in glibc is broken DynamicUser is broken Nov 12, 2018
@arianvp
Copy link
Member Author

arianvp commented Nov 12, 2018

cc @Ekleog

@arianvp
Copy link
Member Author

arianvp commented Nov 12, 2018

Also cc @peterhoeg who originally fixed this

@Mic92
Copy link
Member

Mic92 commented Nov 13, 2018

You can use busctl and watch for dbus calls named LookupDynamicUserByName to the org.freedesktop.systemd1.Manager service

https://github.com/systemd/systemd/blob/master/src/nss-systemd/nss-systemd.c#L190

@Mic92
Copy link
Member

Mic92 commented Nov 13, 2018

In strace you should see a unix socket connection opened to dbus follow by sendmsg system calls on the resulting file descriptor.

@arianvp
Copy link
Member Author

arianvp commented Nov 13, 2018

The call is being made. I more and more suspect this being a systemd issue

‣ Type=method_call  Endian=l  Flags=0  Version=1  Priority=0 Cookie=2
  Sender=:1.49  Destination=org.freedesktop.systemd1  Path=/org/freedesktop/systemd1  Interface=org.freedesktop.systemd1.Manager  Member=LookupDynamicUserByName
  UniqueName=:1.49
  MESSAGE "s" {
          STRING "iamatest";
  };
‣ Type=method_call  Endian=l  Flags=0  Version=1  Priority=0 Cookie=2
  Sender=:1.51  Destination=org.freedesktop.systemd1  Path=/org/freedesktop/systemd1  Interface=org.freedesktop.systemd1.Manager  Member=LookupDynamicUserByUID
  UniqueName=:1.51
  MESSAGE "u" {
          UINT32 63915;
  };

@Mic92
Copy link
Member

Mic92 commented Nov 13, 2018

The response looks as expected.

@arianvp
Copy link
Member Author

arianvp commented Nov 13, 2018

Indeed. so why is the systemd unit then failing on whoami ?

@arianvp
Copy link
Member Author

arianvp commented Nov 13, 2018

Ah I also get this in the logs:

‣ Type=error  Endian=l  Flags=1  Version=1  Priority=0 Cookie=2232  ReplyCookie=2
  Sender=:1.0  Destination=:1.167
  ErrorName=org.freedesktop.systemd1.NoSuchDynamicUser  ErrorMessage="Dynamic user iamatest does not exist."
  UniqueName=:1.0
  MESSAGE "s" {
          STRING "Dynamic user iamatest does not exist.";
  };
‣ Type=error  Endian=l  Flags=1  Version=1  Priority=0 Cookie=2233  ReplyCookie=2
  Sender=:1.0  Destination=:1.168
  ErrorName=org.freedesktop.systemd1.NoSuchDynamicUser  ErrorMessage="Dynamic user ID 63915 does not exist."
  UniqueName=:1.0
  MESSAGE "s" {
          STRING "Dynamic user ID 63915 does not exist.";
};

Here is the full trace:
https://gist.github.com/arianvp/afa176048d1ad5ffd0698ddd5af729d2

@arianvp
Copy link
Member Author

arianvp commented Nov 13, 2018

@arianvp
Copy link
Member Author

arianvp commented Nov 13, 2018

After hacking around with @Mic92 this turns out to be a cache invalidation problem.
See systemd/systemd#10740 (comment)

Since we only use nscd for tricking libc to load nss modules from an absolute path, and actually
not for its caching behaviour the recommended course of action seems to be to disable all caching that nscd provides (at least for passwd and group, perhaps not for host)

This is already disabled when sssd is enabled (https://github.com/NixOS/nixpkgs/blob/61d125b8425da501f07765197186ed7351a55f48/nixos/modules/services/misc/nscd-sssd.conf)

So the simplest option seems to be to also use this config when sssd is disabled.

Note (this should probably be documented somewhere) we can't fully disable the cache, as then nscd will not preload the nss-modules, and delegate that to glibc instead, which will cause glibc to not be able to find the modules, as they are not in the global LD_LIBRARY_PATH. To trick nscd into loading these modules from a pure path, we enable caching but set all ttls to 0.

@arianvp arianvp mentioned this issue Nov 13, 2018
9 tasks
@peterhoeg
Copy link
Member

Awesome detective work!

flokli pushed a commit to arianvp/nixpkgs that referenced this issue Dec 5, 2018
flokli pushed a commit to arianvp/nixpkgs that referenced this issue Dec 5, 2018
Systemd provides an option for allocating DynamicUsers
which we want to use in NixOS to harden service configuration.
However, we discovered that the user wasn't allocated properly
for services. After some digging this turned out to be, of course,
a cache inconsistency problem.

When a DynamicUser creation is performed, Systemd check beforehand
whether the requested user already exists statically. If it does,
it bails out. If it doesn't, systemd continues with allocating the
user.

However, by checking whether the user exists,  nscd will store
the fact that the user does not exist in it's negative cache.
When the service tries to lookup what user is associated to its
uid (By calling whoami, for example), it will try to consult
libnss_systemd.so However this will read from the cache and tell
report that the user doesn't exist, and thus will return that
there is no user associated with the uid. It will continue
to do so for the cache duration time.  If the service
doesn't immediately looks up its username, this bug is not
triggered, as the cache will be invalidated around this time.
However, if the service is quick enough, it might end up
in a situation where it's incorrectly reported that the
user doesn't exist.

Preferably, we would not be using nscd at all. But we need to
use it because glibc reads  nss modules from /etc/nsswitch.conf
by looking relative to the global LD_LIBRARY_PATH.  Because LD_LIBRARY_PATH
is not set globally (as that would lead to impurities and ABI issues),
glibc will fail to find any nss modules.
Instead, as a hack, we start up nscd with LD_LIBRARY_PATH set
for only that service. Glibc will forward all nss syscalls to
nscd, which will then respect the LD_LIBRARY_PATH and only
read from locations specified in the NixOS config.
we can load nss modules in a pure fashion.

However, I think by accident, we just copied over the default
settings of nscd, which actually caches user and group lookups.
We already disable this when sssd is enabled, as this interferes
with the correct working of libnss_sss.so as it already
does its own caching of LDAP requests.
(See https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/usingnscd-sssd)

Because nscd caching is now also interferring with libnss_systemd.so
and probably also with other nsss modules, lets just pre-emptively
disable caching for now for all options related to users and groups,
but keep it for caching hosts ans services lookups.

Note that we can not just put in /etc/nscd.conf:
enable-cache passwd no

As this will actually cause glibc to _not_ forward the call to nscd
at all, and thus never reach the nss modules. Instead we set
the negative and positive cache ttls  to 0 seconds as a workaround.
This way, Glibc will always forward requests to nscd, but results
will never be cached.

Fixes NixOS#50273
arianvp added a commit to arianvp/nixpkgs that referenced this issue Dec 12, 2018
arianvp added a commit to arianvp/nixpkgs that referenced this issue Dec 12, 2018
Systemd provides an option for allocating DynamicUsers
which we want to use in NixOS to harden service configuration.
However, we discovered that the user wasn't allocated properly
for services. After some digging this turned out to be, of course,
a cache inconsistency problem.

When a DynamicUser creation is performed, Systemd check beforehand
whether the requested user already exists statically. If it does,
it bails out. If it doesn't, systemd continues with allocating the
user.

However, by checking whether the user exists,  nscd will store
the fact that the user does not exist in it's negative cache.
When the service tries to lookup what user is associated to its
uid (By calling whoami, for example), it will try to consult
libnss_systemd.so However this will read from the cache and tell
report that the user doesn't exist, and thus will return that
there is no user associated with the uid. It will continue
to do so for the cache duration time.  If the service
doesn't immediately looks up its username, this bug is not
triggered, as the cache will be invalidated around this time.
However, if the service is quick enough, it might end up
in a situation where it's incorrectly reported that the
user doesn't exist.

Preferably, we would not be using nscd at all. But we need to
use it because glibc reads  nss modules from /etc/nsswitch.conf
by looking relative to the global LD_LIBRARY_PATH.  Because LD_LIBRARY_PATH
is not set globally (as that would lead to impurities and ABI issues),
glibc will fail to find any nss modules.
Instead, as a hack, we start up nscd with LD_LIBRARY_PATH set
for only that service. Glibc will forward all nss syscalls to
nscd, which will then respect the LD_LIBRARY_PATH and only
read from locations specified in the NixOS config.
we can load nss modules in a pure fashion.

However, I think by accident, we just copied over the default
settings of nscd, which actually caches user and group lookups.
We already disable this when sssd is enabled, as this interferes
with the correct working of libnss_sss.so as it already
does its own caching of LDAP requests.
(See https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/usingnscd-sssd)

Because nscd caching is now also interferring with libnss_systemd.so
and probably also with other nsss modules, lets just pre-emptively
disable caching for now for all options related to users and groups,
but keep it for caching hosts ans services lookups.

Note that we can not just put in /etc/nscd.conf:
enable-cache passwd no

As this will actually cause glibc to _not_ forward the call to nscd
at all, and thus never reach the nss modules. Instead we set
the negative and positive cache ttls  to 0 seconds as a workaround.
This way, Glibc will always forward requests to nscd, but results
will never be cached.

Fixes NixOS#50273
@jameysharp
Copy link
Contributor

Since upstream systemd merged their nscd cache invalidation fix and released it in v240, and we're on v242 now, should some caching be turned on again? Seems to me like it'd be nice, but I can't imagine getent lookups are hugely performance-critical. 🤷‍♂️

Either way though, perhaps you folks would be good reviewers for my pull request #64268, which gets rid of various years-old cruft from the nscd module. If non-zero TTLs are still worth-while, I'd hope my patches would be merged first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants