-
Notifications
You must be signed in to change notification settings - Fork 2.4k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support User= in systemd for running rootless services #12778
Comments
This is a limitation on the systemd side. They will only accept notifications, or PID files, that are created by or sent by root, for security reasons - even if the User and Group of the unit file are explicitly set to start the process as a non-root user. Their recommendation was to start the container as a user service of the user in question via |
Previous discussion: #9642 |
Thank you both. For now I've worked around it by managing the service under the user's systemd which is clunky to say the least. I don't understand systemd's security argument - if the process is run as a given user, why would systemd not allow that user's process to send sd_notify? Who else could? But I guess this is no flaw of podman. #9642 mentions some code changes that need to happen to podman for sd_notify, what are those? And have they progressed since March? I guess you could close this issue or use it to track progress. |
Yes, there is some progress. The main PID is now communicated via sd notify but there are still some remaining issues. For instance, |
I think the next big thing to tackle is finding a way how to lift the |
But even that is rejected by systemd, as seen in the logs above. |
I fear there's not much Podman can do at the moment. |
Only after solving this problem can become truly rootless. So I have to keep using the root account for now. |
Is there a quick overview what, at the moment, the best approach / workaround is for starting podman containers with systemd as a specific non-root user? Furthermore, if a container is run as root, is there a workaround how to change the ownership of files and directories created inside the container (in a bound volume) to a specific host user? |
use use |
The services need to be started and managed as the specific non-root user. Using the |
For the moment my workaround is to run such containers in a systemd --user. This means that for every system service I want to run as a rootless container, I need to create a separate system user, enable linger, and run a separate systemd --user instance for that user. It works but it's clunky, e.g. restarting Nginx is Inside these rootless containers root is mapped to the system user, which is a different uid for each service. If something inside the containers runs as non-root, that gets mapped to a high-numbered host uid by default. However with some magic on the host you can map a specific non-root uid in the container to a host uid of your choice, which can then be mapped to a different non-root uid in a different container running under a different user. I should probably document my setup one of these days... |
@Gchbg If you are running a recent systemd version (for instance by running Fedora 35), I think you could run
No need to set |
Is that
How does that relate to what @Gchbg and @eriksjolund wrote above? Do I have to run several instances of systemd or is there another way? For systemd beginners like me, it is quite difficult to understand the various layers of abstraction and user permission between systemd, host processes and containers. After all, I would assume that this is the use case for 80 % of the users: run some container service that gets restarted automatically when the machine boots and that is as restricted as possible (by means of user permissions). |
I got it wrong, modifying UID and GID via env requires entrypoint.sh。 https://docs.docker.com/engine/security/userns-remap/ |
With the |
So far #13236 is an issue. To be sure it's working, we need a pull request :) |
A friendly reminder that this issue had no activity for 30 days. |
For now I use tmux to run the systemctl service from rootless podman. It works even after I detach or close ssh connection, because it kept the user logged in 🤣 |
Could you please describe this in more detail? I'm curious how it compares to my workaround. |
It's just simple work around as I'm kepping tmux running, it means I'm always logged in, so the systemd user service will kept running as simple as that. It's just a silly ways for me for now. Anyway loginctl should close this issue I think. I talk across folks on /r/podman, but it require root user first to allow user service running in background after boot. |
Just a quick heads-up: The commandline from #12778 (comment):
can be simplified to:
|
A friendly reminder that this issue had no activity for 30 days. |
Hello @vrothberg 👋 I tested the As @hmoffatt and @eriksjolund mentionned, it seems to be possible to notify with very simple examples. This simple unit, does work ok (systemd 253):
Can you be more specific on why it’s an issue with podman? Is it because of the forking? |
See #12778 (comment). |
It seems a part of the problem is to set Quote from Git commit
I tried to use OpenFile= to set MAINPID in a test (without using Podman) but it didn't work. Then I tried another test (also without using Podman) where I managed to set the MAINPID by using
mytest_notifymainpid source code contains
senderpid is here the PID of the program that I started with
An untested idea: Let Podman send the Output from journalctl
|
It seems to work. I tried out an echo server that listens on TCP port 908.
The echo server replied The file /etc/systemd/system/echo.socket contains:
The port number is smaller than 1024. An unprivileged user does not
The file /etc/systemd/system/echo.service contains:
A summary of the proof-of-concept demo
As soon as libpod/container_internal.go has sent
notify-mainpid is a little program I wrote as a proof-of-concept #include <systemd/sd-daemon.h>
#include <format>
#include <iostream>
#include <fstream>
int main(int argc, char *argv[]) {
if (argc != 2) {
fprintf(stderr, "error: incorrect number of arguments\n");
return 1;
}
std::ifstream mainpid_stream(argv[1]);
pid_t mainpid;
mainpid_stream >> mainpid;
char *podmanpidstr = getenv("SYSTEMD_EXEC_PID");
pid_t podmanpid = atoi(podmanpidstr);
std::string msg = std::format("MAINPID={}\nREADY=1", mainpid);
sd_pid_notify(podmanpid, 0, msg.c_str());
return 0;
} notify-mainpid sends a notification message on behalf of the podman process (the current MAINPID) and notifies systemd that MAINPID should be equal to the conmon PID. I created a branch where I put the code. There is room for a lot of improvements, for example to replace the racy solution with the 5 seconds delay with something else. This demo was for |
I've been following this issue for a while. And have had a few tries at it as well. I think I've gotten it to work now, with Admittedly, this is for a hobby Minecraft server, but I've been thinking of using the same pattern for actual production stuff as well. So, I wonder if I'm missing something here as this does seem to work. :) If it's only the notify things, then I assume I can go forward with this solution as stopping, starting and so on seem to be fine. Logs also work via journald. With the conmon name and PID, but the container name is in the journald JSON output, so I can grab it from there if necessary. I've generated the unit file with the |
Thanks for sharing, @quulah.
With |
I've kind of forgotten everything that's been tried, but what's wrong with using |
The only supported ways of running Podman inside of system is via the units generated via Those generated units use
|
But |
At the moment, it is unsupported but this issue is meant to find means where it can be supported. |
Maybe I'm wrong here (I didn't read the entire thread). But, since the user is already defined as lingering, what about running the systemd service as a user service ( |
@ygalblum, in most cases it's a UX issue. It's easier to manage rootless services as the root user when the services make use of |
Specifically what I would like is to be able to use |
So you want to run podman as a separate user or do you want your containers all running with different users. You could use --userns=auto for the second option. |
Sure that runs each container in a separate namespace on the container side but they're still all running as the same user on the host side so if there is any sort of vulnerability that allows breaking out of the container then there's no isolation left if the containers are all running as the same user on the host side and if that user is root then it's game over. |
@tomhughes If you use SELinux and mount with |
@vrothberg Thanks, the wrong PID being selected would indeed a source of confusion and weirdness. :) I also realized that I'd be missing out on auto update, as to my limited understanding that requires visibility to both the systemd unit and the container, since the label is checked, image pulled and unit then restarted. With a root unit and user container that would probably require some workarounds. I'm also mostly after the UX here, so that it's similar to how any other service is managed. For what it's worth I realized that systemd has better support now for managing user units as root. And has had for a while, but I've missed this.
While this is not quite And I can do the configuring with Quadlet which seems like the way to go now. Generating systemd units hasn't been a problem, since I can leverage Ansible for that, but Quadlet is a nice abstraction. |
@quulah Are you suggesting that in a file at
|
@tomhughes No --userns=auto runs everything as a different user. No overlap with the root user or the user who ran the container. The UID running podman is not in the user namespace of the container processes. There is some risk in just running Podman, which could be mitigated via running in a different user for each run. However from an SELinux point of view, their is a chance that the container MCS Range could overlap. There is no guarantee if two different users run a podman command, that the containers could not run with the same SELinux label. SELinux separation is only guaranteed for a single podman database. |
@quulah the latest version of the Ansible Role for Podman supports Quadlet as well. |
@markstos Not really, just documenting the fact that you need to use |
This is also what I need, and what seems to be a pretty valid use case. Additionally, for peristent state one can still use I actually almost got it working but ran against a brick wall with this issue, i.e. Hacking away with things like (I got the normal |
Since I don't think it was mentioned yet, you should probably not do that since it can happen that the Podman process is killed while the container keeps running (see #9642 (reply in thread)). |
I'm still confused why we're all having problems with this; clearly using User= is not the recommended/supported approach. So the recommended/supported approach really is to run containers as root? Am I missing something and people generally think that's ok? Non-containerised services wouldn't be running as root right? So why is it ok to run containerised services as root? |
Personally, I have taken to just running it as a user service with lingering enabled. It still lets me start it on boot, and manage/observe it via systemctl and journalctl. |
I think at this point we should change this to a discussion. User= causes lots of issues with running podman and rootless support is fairly easy. I also recomend that people look at using rootful with --userns=auto, which will run your containers each in a unigue user nemespace. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Is this a BUG REPORT or FEATURE REQUEST?
/kind bug
Description
I want to have a systemd system service that runs a rootless container under an isolated user, but systemd rejects the sd_notify call and terminates the service.
A similar problem was menitoned but not resolved in #5572, which seems to have been closed without a resolution.
Happy to help tracking this down.
Steps to reproduce the issue:
podman generate systemd --new
:Describe the results you received:
Describe the results you expected:
Nginx runs until the end of time.
Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
apt list podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)
Yes and yes.
Additional environment details (AWS, VirtualBox, physical, etc.):
Machine is a VM.
The text was updated successfully, but these errors were encountered: