-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
F34: systemd.path doesn't always work on first boot #861
Comments
It would be useful to know the last release of FCOS where it worked. Then we can pinpoint the package set and try to determine the culprit. Even better would be to go through our |
@dustymabe FCOS |
That one has
|
Just for clarity, Ignition creates |
Yeah, I've observed systemd path units not activating recently as well. It does seem to be around F34/systemd 248 timeframe and doesn't get a ton of eyes (Typhoon bare-metal and DigitalOcean use path units, but most platforms don't). When I observe a path unit ignoring file existence, no amount of ssh'ing to touch the file, move the file away and back, write the file, etc. can trick the waiting path unit into activating, even though the file exists. Its unusual behvaior. What CAN activate the path unit is restarting the path unit (as the OP mentioned) or touching the parent directory. I'm guessing this may be somehow related to systemd path units being implemented atop The file ( A workaround seems to be to temporarily use
|
Actually, probably |
Thanks @dghubble - the extra context helps. We might be able to isolate a small reproducer (i.e. excluding typhoon/kubernetes) and hone in on the problem now. |
I may have a smaller repro. ---
variant: fcos
version: 1.2.0
systemd:
units:
- name: hello.service
contents: |
[Unit]
Description=Hello
[Service]
ExecStart=/usr/bin/yes
[Install]
WantedBy=multi-user.target
- name: hello.path
enabled: true
contents: |
[Unit]
Description=Watch hello
[Path]
PathExists=/etc/kubernetes/kubeconfig
[Install]
WantedBy=multi-user.target
storage:
directories:
- path: /etc/kubernetes
passwd:
users:
- name: core
ssh_authorized_keys:
- "MY-PUBKEY" SSH to the machine and try to create
Now here is where I may be going insane. Let's name our directory something else, like ---
variant: fcos
version: 1.2.0
systemd:
units:
- name: hello.service
contents: |
[Unit]
Description=Hello
[Service]
ExecStart=/usr/bin/yes
[Install]
WantedBy=multi-user.target
- name: hello.path
enabled: true
contents: |
[Unit]
Description=Watch hello
[Path]
PathExists=/etc/hello/kubeconfig <- rename
[Install]
WantedBy=multi-user.target
storage:
directories:
- path: /etc/hello <- rename
passwd:
users:
- name: core
ssh_authorized_keys:
- "MY-PUBKEY" |
@dghubble In our case not always all of the machines fail, when provisioning a cluster. Maybe reprovisioning a couple of times with |
I've been able to see the behaviors mentioned in each setup, reliably each time. Though perhaps with even more attempts, case 2 could fail too. This at least shows you don't need a Kubernetes cluster to repro though, just a single machine and butane config, if someone else can confirm. I used stable 34.20210529.3.0. |
Thanks for the reproducer! Looks like the culprit is our good old friend SELinux:
|
Adds two tests for the ability of `systemd` to read and watch files labeled with `kubernetes_file_t`. See: https://bugzilla.redhat.com/show_bug.cgi?id=1973418 See: coreos/fedora-coreos-tracker#861 See: containers/container-selinux#135
Adds two tests for the ability of `systemd` to read and watch files labeled with `kubernetes_file_t`. See: https://bugzilla.redhat.com/show_bug.cgi?id=1973418 See: coreos/fedora-coreos-tracker#861 See: containers/container-selinux#135
opened https://bugzilla.redhat.com/show_bug.cgi?id=1980560 to track this |
Adds two tests for the ability of `systemd` to read and watch files labeled with `kubernetes_file_t`. See: https://bugzilla.redhat.com/show_bug.cgi?id=1973418 See: coreos/fedora-coreos-tracker#861 See: containers/container-selinux#135
Adds a test for the ability of `systemd` to watch files labeled with `kubernetes_file_t`. See: coreos/fedora-coreos-tracker#861 See: containers/container-selinux#135
Adds a test for the ability of `systemd` to watch files labeled with `kubernetes_file_t`. See: coreos/fedora-coreos-tracker#861 See: containers/container-selinux#135 Co-authored-by: Dusty Mabe <dusty@dustymabe.com>
Adds a test for the ability of `systemd` to watch files labeled with `kubernetes_file_t`. See: coreos/fedora-coreos-tracker#861 See: containers/container-selinux#135 Co-authored-by: Dusty Mabe <dusty@dustymabe.com>
The fix for this went into testing stream release |
The fix for this went into stable stream release |
Adds a test for the ability of `systemd` to watch files labeled with `kubernetes_file_t`. See: coreos/fedora-coreos-tracker#861 See: containers/container-selinux#135 Co-authored-by: Dusty Mabe <dusty@dustymabe.com>
Adds a test for the ability of `systemd` to watch files labeled with `kubernetes_file_t`. See: coreos/fedora-coreos-tracker#861 See: containers/container-selinux#135 Co-authored-by: Dusty Mabe <dusty@dustymabe.com>
Describe the bug
We are using Typhoon (https://typhoon.psdn.io/fedora-coreos/bare-metal/) to provision Fedora CoreOS and Kubernetes on bare metal. Since upgrade to F34 (last tested version 34.20210518.3.0), we see following systemd.path unit not activating (often, but not always) on first boot:
The directory is created by ignition:
From the logs we see, that the
kubelet.path
is created by ignition and started by systemd, thekubeconfig
is moved to/etc/kubernetes/kubeconfig
, but thekubelet.path
is stillwaiting
:Restarting the
kubelet.path
fixes it.Is it a systemd bug? What could we do, to avoid the manual step in provisioning?
Expected behavior
kubelet.path
is running as soon as the/etc/kubernetes/kubeconfig
exists.Actual behavior
kubelet.path
manual restart is needed after the first boot.System details
The text was updated successfully, but these errors were encountered: