-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test iso-offline-install
on multipath on ppc64le and aarch64 failing coreos-ignition-unique-boot.service
check
#1373
Closed
Comments
jlebon
changed the title
Test
Test Jan 9, 2023
iso-offline-install
on multipath on ppc64le failing coreos-ignition-unique-boot.service
checkiso-offline-install
on multipath on ppc64le and aarch64 failing coreos-ignition-unique-boot.service
check
The ppc64le one happened on rawhide and f37. The aarch64 one on f37 with the following diff:
Doesn't happen all the time, so there seems to be a flaky component to it. |
This seems to have happened again in: |
Just saw this on |
jlebon
added a commit
to jlebon/fedora-coreos-config
that referenced
this issue
Jan 18, 2023
We're hitting an issue right now where `coreos-ignition-unique-boot.service` (backed by `rdcore`) is failing on multipath with: ``` Error: System has 2 devices with a filesystem labeled 'boot': ["/dev/sdb3", "/dev/mapper/mpatha3"] ``` The unique label detection code in `rdcore` determines whether multiple lower-level devices actually refer to the same higher-level device (e.g. multipath or RAID1) by looking at the filesystem UUID. It uses blkid to query device UUIDs. libblkid maintains a cache of devices to avoid reprobing all devices all the time. This cache normally gets updated (I *think* via udev, but I'm not sure) when changes occur. But something changed recently at least in the multipath case where the cache is only updated for the multipathed device, but not the underlying backing paths. This then leads `rdcore` to think that they're separate devices. We probably should make `rdcore` smarter here in how it handles multipath devices, but still we don't want to have this stale cache around for the sake of other tools relying on it. We started hitting this more frequently starting with kernel v6.0.17, but the issue triggers equally as easily on v6.0.16 when reproduced artificially. So I think we've just been lucky so far that this hasn't bit us (possibly we raced with another service that helped refresh the cache). There's likely a bug here either in the kernel, or multipath or blkid. This is tracked by https://bugzilla.redhat.com/show_bug.cgi?id=2162151. Until then, nuke the blkid cache to force a reprobe on the next call. Closes: coreos/fedora-coreos-tracker#1373
Filed https://bugzilla.redhat.com/show_bug.cgi?id=2162151. |
dustymabe
pushed a commit
to coreos/fedora-coreos-config
that referenced
this issue
Jan 19, 2023
We're hitting an issue right now where `coreos-ignition-unique-boot.service` (backed by `rdcore`) is failing on multipath with: ``` Error: System has 2 devices with a filesystem labeled 'boot': ["/dev/sdb3", "/dev/mapper/mpatha3"] ``` The unique label detection code in `rdcore` determines whether multiple lower-level devices actually refer to the same higher-level device (e.g. multipath or RAID1) by looking at the filesystem UUID. It uses blkid to query device UUIDs. libblkid maintains a cache of devices to avoid reprobing all devices all the time. This cache normally gets updated (I *think* via udev, but I'm not sure) when changes occur. But something changed recently at least in the multipath case where the cache is only updated for the multipathed device, but not the underlying backing paths. This then leads `rdcore` to think that they're separate devices. We probably should make `rdcore` smarter here in how it handles multipath devices, but still we don't want to have this stale cache around for the sake of other tools relying on it. We started hitting this more frequently starting with kernel v6.0.17, but the issue triggers equally as easily on v6.0.16 when reproduced artificially. So I think we've just been lucky so far that this hasn't bit us (possibly we raced with another service that helped refresh the cache). There's likely a bug here either in the kernel, or multipath or blkid. This is tracked by https://bugzilla.redhat.com/show_bug.cgi?id=2162151. Until then, nuke the blkid cache to force a reprobe on the next call. Closes: coreos/fedora-coreos-tracker#1373
HuijingHei
pushed a commit
to HuijingHei/fedora-coreos-config
that referenced
this issue
Oct 10, 2023
We're hitting an issue right now where `coreos-ignition-unique-boot.service` (backed by `rdcore`) is failing on multipath with: ``` Error: System has 2 devices with a filesystem labeled 'boot': ["/dev/sdb3", "/dev/mapper/mpatha3"] ``` The unique label detection code in `rdcore` determines whether multiple lower-level devices actually refer to the same higher-level device (e.g. multipath or RAID1) by looking at the filesystem UUID. It uses blkid to query device UUIDs. libblkid maintains a cache of devices to avoid reprobing all devices all the time. This cache normally gets updated (I *think* via udev, but I'm not sure) when changes occur. But something changed recently at least in the multipath case where the cache is only updated for the multipathed device, but not the underlying backing paths. This then leads `rdcore` to think that they're separate devices. We probably should make `rdcore` smarter here in how it handles multipath devices, but still we don't want to have this stale cache around for the sake of other tools relying on it. We started hitting this more frequently starting with kernel v6.0.17, but the issue triggers equally as easily on v6.0.16 when reproduced artificially. So I think we've just been lucky so far that this hasn't bit us (possibly we raced with another service that helped refresh the cache). There's likely a bug here either in the kernel, or multipath or blkid. This is tracked by https://bugzilla.redhat.com/show_bug.cgi?id=2162151. Until then, nuke the blkid cache to force a reprobe on the next call. Closes: coreos/fedora-coreos-tracker#1373
HuijingHei
pushed a commit
to HuijingHei/fedora-coreos-config
that referenced
this issue
Oct 10, 2023
We're hitting an issue right now where `coreos-ignition-unique-boot.service` (backed by `rdcore`) is failing on multipath with: ``` Error: System has 2 devices with a filesystem labeled 'boot': ["/dev/sdb3", "/dev/mapper/mpatha3"] ``` The unique label detection code in `rdcore` determines whether multiple lower-level devices actually refer to the same higher-level device (e.g. multipath or RAID1) by looking at the filesystem UUID. It uses blkid to query device UUIDs. libblkid maintains a cache of devices to avoid reprobing all devices all the time. This cache normally gets updated (I *think* via udev, but I'm not sure) when changes occur. But something changed recently at least in the multipath case where the cache is only updated for the multipathed device, but not the underlying backing paths. This then leads `rdcore` to think that they're separate devices. We probably should make `rdcore` smarter here in how it handles multipath devices, but still we don't want to have this stale cache around for the sake of other tools relying on it. We started hitting this more frequently starting with kernel v6.0.17, but the issue triggers equally as easily on v6.0.16 when reproduced artificially. So I think we've just been lucky so far that this hasn't bit us (possibly we raced with another service that helped refresh the cache). There's likely a bug here either in the kernel, or multipath or blkid. This is tracked by https://bugzilla.redhat.com/show_bug.cgi?id=2162151. Until then, nuke the blkid cache to force a reprobe on the next call. Closes: coreos/fedora-coreos-tracker#1373
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
coreos-installer runs successfully and reboots the machine, and then:
I suspect something is going wrong with
rdcore verify-unique-fs-label
's multipath detection.iso-offline-install.zip
The text was updated successfully, but these errors were encountered: