Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

permanent changes in ownership of /dev/null #3674

Closed
akitzing opened this issue Nov 30, 2022 · 7 comments · Fixed by #3707
Closed

permanent changes in ownership of /dev/null #3674

akitzing opened this issue Nov 30, 2022 · 7 comments · Fixed by #3707

Comments

@akitzing
Copy link

I hope I am in the right place here.

Since the update from SLES 15/SP3 to SP4, ownership rights of containers to the /dev/null device are passed to the node.
On newly deployed SP4 systems we could also generate this behavior.
For example, the container user with ID=100 passes its permissions to /dev/null of the host. Then the permissions are given back to root or taken away and on and on.
This problem probably occurred with changes in RunC package 1.0.3- to 1.1.3. There was no improvement with version 1.1.4.
The servers have the latest patches installed.
Changing node ownership on /dev/null does not affect the container /dev/null device. Also, the other way around, ownership is not transferred from inside the container to the node.
There are no bidirectional effects.

Any ideas?

@kolyshkin
Copy link
Contributor

Please post a repro here:

  • what you do
  • what you actually see
  • what do you expect to see

@zimo-github
Copy link

Hi!
Maybe I can contribute some details.

what you do
We have a setup of docker swarm with six nodes on SLES15SP4, managed with Portainer, with several stacks deployed.

what you actually see
Since the abovementioned update of runc we see, that the device /dev/null on the swarm node changes ownership from time to time.
Every few seconds the owner (or owner and group) change to some UID/GID of a user from inside one of the running containers on that node, for example 5001:root or 999:999. Then after a few seconds it is changed back to root:root.
This ist detected by our IDS (AIDE), maybe otherwise we wouldn't even have noticed it, because it doesn't affect the mode or the file type (/dev/null stays a device).
Additionally we check our systems for unknown owners of system files. Since these UIDs are not known to the underlying node's passwd, they come up in this check.
The change in ownership can be seen with "ls" and with "stat".

what do you expect to see
We expect to see /dev/null continously owned by root:root.

Regards,
Zimo

@kolyshkin
Copy link
Contributor

We have a setup of docker swarm with six nodes on SLES15SP4, managed with Portainer, with several stacks deployed.

Well, this is not good enough for me to reproduce this locally.

@akitzing
Copy link
Author

akitzing commented Jan 5, 2023

We use kernel 5.14.21-150400.24.11-default and the follwing packages:

  • docker-20.10.17_ce-150000.166.1.x86_64
  • runc-1.1.3-150000.30.1.x86_64

We use Portainer's "Community Edition 2.16.1".

Docker configuration from /etc/docker/daemon.json:

{
"log-level": "warn",
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "5"
},
"bridge": "none",
"storage-driver": "overlay2",
"experimental": true,
"metrics-addr": "0.0.0.0:9323",
"insecure-registries" : ["<registry-uri>", "<registry-uri>:5001"]
}

Since the symptom is not bound to a specific stack it shouldn't be necessary to publish any stacks here.
Maybe it's noticeable, that the UIDs which temporarily own /dev/null are mainly the ones of database users like postgres or mysql.

@Dzejrou
Copy link
Contributor

Dzejrou commented Jan 20, 2023

Well, this is not good enough for me to reproduce this locally.

Not sure whether it helps, but I managed to reproduce this issue (or at least a very similar one) using docker exec -u on an already running container (SLES12SP5 with runc-1.0.3-16.18.1, so maybe SUSE specific bug):

Terminal 1:

germ52:~ # docker run -ti alpine sh
/ # adduser test
...
passwd: password for test changed by root
/ #

Terminal 2:

germ52:~ # ls -l /dev/null
crw-rw-rw- 1 root root 1, 3 Aug 1 14:12 /dev/null
germ52:~ # docker exec -u test 0ad6d3064e9d ls
germ52:~ # ls -l /dev/null
crw-rw-rw- 1 test root 1, 3 Aug 1 14:12 /dev/null

After downgrading to runc-1.0.3-16.18.1 the issue disappears (similarly to the original report above).

Note: The owner is changed in the host system, but not inside the container (not sure whether that is relevant).

Since this happens during docker exec, couldn't the cause be in #3355 , specifically the removal of:

https://github.com/opencontainers/runc/pull/3355/files#diff-84fffb149f8b86c903e09613cbebad12de596f4139d54e288388b1458e08b8ecL419

I am only assuming that docker exec redirects stdout/stdin/stderr to /dev/null.

@zimo-github
Copy link

Thanks a lot for looking into this!

Since we use SLES15SP4 we can only go back to runc-1.0.3-27.1, which would leave us with more "outdated" packages we're supposed to keep up to date.

We just never had a clue, what could possibly cause this and it didn't come up in a direct context of a specific action inside a container.

Anyway, I can exactly reproduce this on SLES15SP4 with runc-1.1.4-150000.36.1 and the alpine image 3.17.1 as described above.

If I open an SR with SUSE to point them to this I assume they will respond pointing back until this is changed back in runc to build a new package release.

Thanks again to anyone contributing!

@Dzejrou
Copy link
Contributor

Dzejrou commented Jan 23, 2023

Thanks a lot for looking into this!
...
If I open an SR with SUSE to point them to this I assume they will respond pointing back until this is changed back in runc to build a new package release.

Well, I am looking into this because somebody opened an SR with SUSE ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants