Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

umount failures from udev-worker process in Fedora 38+ #1475

Closed
jlebon opened this issue Apr 18, 2023 · 12 comments
Closed

umount failures from udev-worker process in Fedora 38+ #1475

jlebon opened this issue Apr 18, 2023 · 12 comments
Labels
jira for syncing to jira

Comments

@jlebon
Copy link
Member

jlebon commented Apr 18, 2023

We see logs like this:

Apr 18 16:23:47 localhost (udev-worker)[534]: vda2: Process '/bin/sh -c '/bin/umount -fl /dev/vda2 && /usr/bin/logger -p daemon.warn -s WARNING: hot-removed /dev/vda2 that was still mounted, data may have been corrupted'' failed with exit code 32.
Apr 18 16:23:47 localhost (udev-worker)[536]: vda3: Process '/bin/sh -c '/bin/umount -fl /dev/vda3 && /usr/bin/logger -p daemon.warn -s WARNING: hot-removed /dev/vda3 that was still mounted, data may have been corrupted'' failed with exit code 32.
Apr 18 16:23:47 localhost (udev-worker)[535]: vda1: Process '/bin/sh -c '/bin/umount -fl /dev/vda1 && /usr/bin/logger -p daemon.warn -s WARNING: hot-removed /dev/vda1 that was still mounted, data may have been corrupted'' failed with exit code 32.
Apr 18 16:23:47 localhost (udev-worker)[529]: vda4: Process '/bin/sh -c '/bin/umount -fl /dev/vda4 && /usr/bin/logger -p daemon.warn -s WARNING: hot-removed /dev/vda4 that was still mounted, data may have been corrupted'' failed with exit code 32.
Apr 18 16:23:47 localhost (udev-worker)[535]: vda1: Process '/bin/sh -c '/bin/umount -fl /dev/vda1 && /usr/bin/logger -p daemon.warn -s WARNING: hot-removed /dev/vda1 that was still mounted, data may have been corrupted'' failed with exit code 32.
Apr 18 16:23:47 localhost (udev-worker)[536]: vda3: Process '/bin/sh -c '/bin/umount -fl /dev/vda3 && /usr/bin/logger -p daemon.warn -s WARNING: hot-removed /dev/vda3 that was still mounted, data may have been corrupted'' failed with exit code 32.
Apr 18 16:23:47 localhost (udev-worker)[529]: vda4: Process '/bin/sh -c '/bin/umount -fl /dev/vda4 && /usr/bin/logger -p daemon.warn -s WARNING: hot-removed /dev/vda4 that was still mounted, data may have been corrupted'' failed with exit code 32.
Apr 18 16:23:47 localhost (udev-worker)[534]: vda2: Process '/bin/sh -c '/bin/umount -fl /dev/vda2 && /usr/bin/logger -p daemon.warn -s WARNING: hot-removed /dev/vda2 that was still mounted, data may have been corrupted'' failed with exit code 32.

Full logs: umount-error.journal.txt

@dustymabe
Copy link
Member

I think this isn't uncommon. It's happening on every F38 boot.

Here's some logs from the testing build we did last Friday: 38-20230414-2-0.journal.txt

@dustymabe dustymabe pinned this issue May 2, 2023
@dustymabe
Copy link
Member

pinned this ticket in our repo since I think it needs some attention.

@jlebon
Copy link
Member Author

jlebon commented May 17, 2023

This should be fixed by GoogleCloudPlatform/guest-configs#51.

There's an interesting backstory to this. This originally came into FCOS as part of coreos/fedora-coreos-config#160. The rule was slightly tweaked however from upstream: it had an additional ENV{ID_VENDOR}=="Google" check so that it only applied to GCP.

The rule was then removed in coreos/fedora-coreos-config#162 on the basis that it made FCOS act differently on GCP than on other platforms (due to the check that we added).

It was then re-added to FCOS as part of coreos/fedora-coreos-config#2350 but this time without the added check. This meant that it now ran on every platform, and not just GCP.

However, the rule itself had a logic error which made the system emit spurious warnings. And since it now ran everywhere, it was much easier to see it happen in QEMU logs like we did here.

Meanwhile, since this rule was introduced, systemd started using private mounts for systemd-udevd.service, which means that the rule doesn't actually work correctly. For this reason and others, I've proposed simply deleting it upstream.

We could manually delete our copy too, though we'll do #1494 soon, which will also give us that (assuming the upstream PR is merged and the package is bumped). We'll want to drop it from RHCOS though, since we'll be carrying the other rules manually there for now until the -udev subpackage becomes available there.

@cgwalters
Copy link
Member

Also related downstream https://issues.redhat.com/browse/OCPBUGS-13754

@bgilbert
Copy link
Contributor

OCPBUGS-13754 appears unrelated, except that it's also broadly about the GCP rules.

@cgwalters
Copy link
Member

Hm, the reason I linked these is from jlebon's comment

it had an additional ENV{ID_VENDOR}=="Google" check so that it only applied to GCP.

which seems quite relevant, because actually just today with e.g.:

$ cosa run --qemu-image fedora-coreos-38.20230514.2.0-qemu.x86_64.qcow2 --qemu-nvme
...
[root@cosa-devsh ~]# ls -al /dev/disk/by-id/
total 0
drwxr-xr-x. 2 root root 440 May 17 20:32 .
drwxr-xr-x. 9 root root 180 May 17 20:32 ..
lrwxrwxrwx. 1 root root  13 May 17 20:32 google-primary-disk -> ../../nvme0n1
lrwxrwxrwx. 1 root root  15 May 17 20:32 google-primary-disk-part1 -> ../../nvme0n1p1
lrwxrwxrwx. 1 root root  15 May 17 20:32 google-primary-disk-part2 -> ../../nvme0n1p2
lrwxrwxrwx. 1 root root  15 May 17 20:32 google-primary-disk-part3 -> ../../nvme0n1p3
lrwxrwxrwx. 1 root root  15 May 17 20:32 google-primary-disk-part4 -> ../../nvme0n1p4
lrwxrwxrwx. 1 root root  13 May 17 20:32 nvme-QEMU_NVMe_Ctrl_primary-disk -> ../../nvme0n1
lrwxrwxrwx. 1 root root  15 May 17 20:32 nvme-QEMU_NVMe_Ctrl_primary-disk-part1 -> ../../nvme0n1p1
lrwxrwxrwx. 1 root root  15 May 17 20:32 nvme-QEMU_NVMe_Ctrl_primary-disk-part2 -> ../../nvme0n1p2
lrwxrwxrwx. 1 root root  15 May 17 20:32 nvme-QEMU_NVMe_Ctrl_primary-disk-part3 -> ../../nvme0n1p3
lrwxrwxrwx. 1 root root  15 May 17 20:32 nvme-QEMU_NVMe_Ctrl_primary-disk-part4 -> ../../nvme0n1p4
lrwxrwxrwx. 1 root root  13 May 17 20:32 nvme-QEMU_NVMe_Ctrl_primary-disk_1 -> ../../nvme0n1
lrwxrwxrwx. 1 root root  15 May 17 20:32 nvme-QEMU_NVMe_Ctrl_primary-disk_1-part1 -> ../../nvme0n1p1
lrwxrwxrwx. 1 root root  15 May 17 20:32 nvme-QEMU_NVMe_Ctrl_primary-disk_1-part2 -> ../../nvme0n1p2
lrwxrwxrwx. 1 root root  15 May 17 20:32 nvme-QEMU_NVMe_Ctrl_primary-disk_1-part3 -> ../../nvme0n1p3
lrwxrwxrwx. 1 root root  15 May 17 20:32 nvme-QEMU_NVMe_Ctrl_primary-disk_1-part4 -> ../../nvme0n1p4
lrwxrwxrwx. 1 root root  13 May 17 20:32 nvme-nvme.1b36-7072696d6172792d6469736b-51454d55204e564d65204374726c-00000001 -> ../../nvme0n1
lrwxrwxrwx. 1 root root  15 May 17 20:32 nvme-nvme.1b36-7072696d6172792d6469736b-51454d55204e564d65204374726c-00000001-part1 -> ../../nvme0n1p1
lrwxrwxrwx. 1 root root  15 May 17 20:32 nvme-nvme.1b36-7072696d6172792d6469736b-51454d55204e564d65204374726c-00000001-part2 -> ../../nvme0n1p2
lrwxrwxrwx. 1 root root  15 May 17 20:32 nvme-nvme.1b36-7072696d6172792d6469736b-51454d55204e564d65204374726c-00000001-part3 -> ../../nvme0n1p3
lrwxrwxrwx. 1 root root  15 May 17 20:32 nvme-nvme.1b36-7072696d6172792d6469736b-51454d55204e564d65204374726c-00000001-part4 -> ../../nvme0n1p4
[root@cosa-devsh ~]# 

@bgilbert
Copy link
Contributor

Ah, yup, you're right. The downstream report has the aliases showing up on bare metal nodes.

@jlebon
Copy link
Member Author

jlebon commented May 19, 2023

Also related downstream issues.redhat.com/browse/OCPBUGS-13754

For reference, I've opened GoogleCloudPlatform/guest-configs#52 for this.

@dustymabe
Copy link
Member

dustymabe commented Jun 5, 2023

This should be fixed by coreos/fedora-coreos-config#2450

@dustymabe dustymabe added status/pending-testing-release Fixed upstream. Waiting on a testing release. status/pending-next-release Fixed upstream. Waiting on a next release. labels Jun 5, 2023
@dustymabe dustymabe unpinned this issue Jun 6, 2023
@dustymabe
Copy link
Member

The fix for this went into next stream release 38.20230609.1.0. Please try out the new release and report issues.

@dustymabe
Copy link
Member

The fix for this went into testing stream release 38.20230609.2.1. Please try out the new release and report issues.

@dustymabe dustymabe changed the title Odd udev-worker initrd log entries complaining about umount failures in Fedora 38+ umount fails from udev-worker process in Fedora 38+ Jun 13, 2023
@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-testing-release Fixed upstream. Waiting on a testing release. status/pending-next-release Fixed upstream. Waiting on a next release. labels Jun 13, 2023
@dustymabe dustymabe changed the title umount fails from udev-worker process in Fedora 38+ umount failures from udev-worker process in Fedora 38+ Jun 13, 2023
@dustymabe
Copy link
Member

The fix for this went into stable stream release 38.20230609.3.0.

@dustymabe dustymabe removed the status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. label Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira
Projects
None yet
Development

No branches or pull requests

4 participants