Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The user "nobody" missing in /etc/passwd #1197

Open
ahjohannessen opened this issue May 17, 2022 · 17 comments
Open

The user "nobody" missing in /etc/passwd #1197

ahjohannessen opened this issue May 17, 2022 · 17 comments
Labels

Comments

@ahjohannessen
Copy link

I am using Hashicorp Nomad with its exec task driver and it uses chroot and expects a nobody user in /etc/passwd as it by default uses that user. I can see that nobody is somewhere on my system:

# Fedora CoreOS 35.20220424.3.0
[core@f05 ~]$ getent passwd nobody
nobody:x:99:99:Nobody:/:/sbin/nologin
[core@f05 ~]$ cat /etc/passwd
root:x:0:1:Nomad user:/root:/bin/bash
core:x:1000:1000:CoreOS Admin:/var/home/core:/bin/bash
consul:x:992:1001:Consul user:/var/home/consul:/bin/bash
coredns:x:985:100::/:/usr/sbin/nologin
[core@f05 ~]$

However, the exec driver fails as it cannot find the nobody user in /etc/passwd.

Is there a way I can via ignition or otherwise make the nobody a part of /etc/passwd?

Creating the user via useradd does not work as it fails with the user already exists.

@bgilbert
Copy link
Contributor

bgilbert commented May 17, 2022

It's in /usr/lib/passwd, which is read by the altfiles NSS module listed in /etc/nsswitch.conf. It sounds as though Nomad isn't properly querying user accounts through NSS?

@ahjohannessen
Copy link
Author

ahjohannessen commented May 18, 2022

Hi @bgilbert - I created an issue on nomad GH and got this reply:

Nomad looks up the nobody user by making a query to Go's standard library user.Lookup function

https://github.com/hashicorp/nomad/blob/v1.3.0/client/allocdir/fs_unix.go#L42

https://pkg.go.dev/os/user#Lookup

which on *nix defers to getpwnam

https://man7.org/linux/man-pages/man3/getpwnam.3.html

If there is a more correct way to perform this operation, we're welcome to ideas.

I am not familiar with getpwnam and if it exists on Fedora CoreOS. Also, I wonder if this is related?

@bgilbert
Copy link
Contributor

getpwnam(3) is a glibc function, and is available on Fedora CoreOS, regardless of whether nscd is available. It should properly query NSS modules, rather than just reading /etc/passwd. So it seems as though this should work correctly.

You mentioned a chroot; is Nomad querying password entries inside the chroot or outside? If inside, how is the chroot created?

@ahjohannessen
Copy link
Author

You mentioned a chroot; is Nomad querying password entries inside the chroot or outside? If inside, how is the chroot created?

I do not know. Nomad itself is running as root. I can ask the nomad devs.

@travier
Copy link
Member

travier commented May 18, 2022

Not really a solution but as a workaround, manually copying the nobody entry from /usr/... to /etc/... should work.

@ahjohannessen
Copy link
Author

@bgilbert I got this answer:

The lookup is happening outside the chroot (this is all part of Nomad creating the chroot)

@bgilbert
Copy link
Contributor

Okay, then I'd expect everything to work properly. Sounds like this will need some debugging.

@jdoss
Copy link
Contributor

jdoss commented Sep 28, 2022

I hit this snag randomly today while testing Nomad 1.4.0-rc1. Nomad now seems to care if the Nobody user is missing from /etc/passwd and /etc/group. All of my jobs fail with:

failed to setup alloc: pre-run hook "alloc_dir" failed: user: unknown user nobody

and manually adding the entries:

/etc/passwd

nobody:x:65534:65534:Kernel Overflow User:/:/sbin/nologin

/etc/group

nobody:x:65534:

allows for jobs to start but when trying to add the nobody user in via Butane results in more failures because nobody is already on the system...

$ bupy template nomad.bu.j2 ../apex-butanevars.yaml --show
    1 variant: fcos                                                                                                                                                                                                                                                                                                                                                                                                                          
    2 version: 1.4.0                                                                                                                                                                                                                                                                                                                                                                                                                         
    3 passwd:                                                                                                                                                                                                                                                                                                                                                                                                                                
    4   users:                                                                                                                                                                                                                                                                                                                                                                                                                               
**snip**                                                                                                                                                                                                                                                                        
   14   - name: nomad                                                                                                                                                                                                                                                                                                                                                                                                                        
   15     system: true                                                                                                                                                                                                                                                                                                                                                                                                                       
   16   - name: vault                                                                                                                                                                                                                                                                                                                                                                                                                        
   17     system: true                                                                                                                                                                                                                                                                                                                                                                                                                       
   18   - name: consul                                                                                                                                                                                                                                                                                                                                                                                                                       
   19     system: true                                                                                                                                                                                                                                                                                                                                                                                                                       
   20   - name: nobody                                                                                                                                                                                                                                                                                                                                                                                                                       
   21     system: true                                                                                                                                                                                                                                                                                                                                                                                                                       
   22     uid: 65534                                                                                                                                                                                                                                                                                                                                                                                                                         
   23     shell: /sbin/nologin                                                                                                                                                                                                                                                                                                                                                                                                               

Results in failure because the Nobody user is already half present.

image

Can we just ensure that Nobody has the correct entries in /etc/passwd and /etc/group if it is already present in FCOS?

@cgwalters
Copy link
Member

I am using Hashicorp Nomad with its exec task driver and it uses chroot

I suspect the problem is that this is statically linked Go code, and hence it's not using NSS so it's not picking up altfiles:

$ grep altfiles /etc/nsswitch.conf 
passwd:     files altfiles sss systemd
group:      files altfiles sss systemd

The fix here should be on the Hashicorp side; no one should be using chroot() in modern times. Instead, spawn a proper container runtime (crun/bwrap/etc) and run code in there, hence picking up glibc nsswitch logic from the target OS.

@cgwalters
Copy link
Member

Please file a bug with them.

@jdoss
Copy link
Contributor

jdoss commented Sep 28, 2022

@cgwalters I am not using the exec task driver. I am using the podman driver. If FCOS has users present on the system, it should have entries in the standard files otherwise things can break.

I don't have an issue filing an issue on their end but I will have to work around this until a fix is present because the nobody user is half present in FCOS. IMO that is a bug with FCOS not specifically just Nomad.

What is the technical reason for not having these common users present in /etc/passwd and /etc/group?

@cgwalters
Copy link
Member

What is the technical reason for not having these common users present in /etc/passwd and /etc/group?

It's because /etc is machine local state (owned by you), whereas /usr/lib/passwd is owned by the OS vendor (us). Having them in distinct files means we can change which "system users" are shipped without affecting user state.

@jdoss
Copy link
Contributor

jdoss commented Sep 28, 2022

Thanks for taking the time to explain it Colin. Makes sense.

I opened an issue to see if I can get Hashicorp to address the problem in their up-and-coming new version. I will work around it in the meantime.

@tgross
Copy link

tgross commented Sep 29, 2022

👋 Nomad developer here! For clarity, the problem @jdoss hit with osusergo was a fix to avoid a crash we were seeing when we were making user.Lookup calls with the CGO version. We were aware it was going to break NSS users and it was intended to be temporary. A fix for that is in flight for Nomad 1.4.0 GA (or RC2).

Even with that fix the original problem reported here by @ahjohannessen is still open in Nomad as hashicorp/nomad#13047 and we're still puzzled by this. The user lookup is happening in the Nomad agent as it sets up a directory for the task (which get bind-mounted to the runc container; there's no chroot here just the usual pivot_root of runc unless the user has specifically opted-in to legacy behavior). But as far as I can tell there's nothing unexpected going on on the Fedora-CoreOS side.

@dustymabe
Copy link
Member

@tgross if you just take the code from Nomad that does the user lookup (i.e. should be able to distill it down to a few line golang file) and compile and run that on an FCOS node does it give you the correct or incorrect result?

@tgross
Copy link

tgross commented Sep 30, 2022

Yup, that works as we'd expect, which is why it's still a head scratcher! Our equivalent code in Nomad is fs_unix.go#L42, and this is all happening in the agent running as root. But here's a minimal repro that works just fine:

package main

import (
	"fmt"
	"os"
	"os/user"
)

func main() {
	user, err := user.Lookup("nobody")
	if err != nil {
		fmt.Printf("failed: %v", err)
		os.Exit(1)
	}
	if user == nil {
		fmt.Printf("failed: no such user")
		os.Exit(1)
	}

	fmt.Printf("uid=%s\n", user.Uid)
}

Running on the most recently-published Fedora CoreOS stable container image:

$ docker run -it --name coreos -v $(pwd):/src -w /src \
    quay.io/fedora/fedora-coreos:stable /bin/bash

# check for the user
bash-5.1# grep nobody /usr/lib/passwd
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/usr/sbin/nologin
nobody:x:99:99:Kernel Overflow User:/:/usr/sbin/nologin

bash-5.1# grep nobody /etc/passwd

# install go and build environment
bash-5.1# curl -Lso /tmp/go1.19.1.tar.gz https://go.dev/dl/go1.19.1.linux-amd64.tar.gz
bash-5.1# mkdir /var/usrlocal
bash-5.1# tar -C /var/usrlocal -xzf /tmp/go1.19.1.tar.gz
bash-5.1# export PATH=$PATH:/usr/local/go/bin
bash-5.1# go version
go version go1.19.1 linux/amd64
bash-5.1# mkdir -p /var/roothome

# run it
bash-5.1# go run .
uid=99

But as far as we're concerned, this is definitely a Nomad issue and not a Fedora CoreOS one.

@dustymabe
Copy link
Member

Anybody have handy the error message that a user sees? Is what @jdoss posted in #1197 (comment) right or was that just the issue that got fixed in 1.4.0 GA or RC2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants