Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISO mount still racing? #437

Closed
cgwalters opened this issue May 29, 2020 · 18 comments
Closed

ISO mount still racing? #437

cgwalters opened this issue May 29, 2020 · 18 comments
Assignees
Labels
jira For syncing to Jira. Only works for issues (i.e. not PRs)

Comments

@cgwalters
Copy link
Member

Even after #411 I just saw this in a CI run for RHCOS:

"Found device QEMU_DVD-ROM rhcos-46.82.202005292045-0."
"Mounting /run/media/iso..."
"mount: /run/media/iso: /dev/sr0 already mounted or mount point busy."
"run-media-iso.mount: Mount process exited, code=exited status=32"
"run-media-iso.mount: Failed with result 'exit-code'."
"Failed to mount /run/media/iso."

I'm 87.2% sure we have that PR, though I need to triple check.

@dustymabe
Copy link
Member

dustymabe commented Jun 18, 2020

I just saw it locally:

[    5.131654] ata1.01: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100                                                                                                                                                                                                                                          
[    5.134278] scsi 0:0:0:0: Direct-Access     ATA      QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5                                                                                                                                                                                                              
[    5.136662] scsi 0:0:1:0: CD-ROM            QEMU     QEMU DVD-ROM     2.5+ PQ: 0 ANSI: 5                                                                                                                                                                                                              
[    5.167197] scsi 0:0:0:0: Attached scsi generic sg0 type 0                                                                                                                                                                                                                                            
[    5.168497] scsi 0:0:1:0: Attached scsi generic sg1 type 5                                                                                                                                                                                                                                            
[    5.175112] sd 0:0:0:0: [sda] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)                                                                                                                                                                                                                    
[    5.176458] sd 0:0:0:0: [sda] Write Protect is off                                                                                               
[    5.177427] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    5.190230] sr 0:0:1:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
[    5.191576] cdrom: Uniform CD-ROM driver Revision: 3.20
[    5.306952] e1000 0000:00:02.0 eth0: (PCI:33MHz:32-bit) 52:54:00:47:bc:0e
[    5.308249] e1000 0000:00:02.0 eth0: Intel(R) PRO/1000 Network Connection
[    5.311430] systemd-udevd[618]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.                         
[    5.311575] e1000 0000:00:02.0 ens2: renamed from eth0                                                                                           
[    5.338689] sd 0:0:0:0: [sda] Attached SCSI disk
[  OK  ] Found device QEMU_DVD-ROM rhcos-46.82.202006181745-0.   
[    5.450194] systemd[1]: Found device QEMU_DVD-ROM rhcos-46.82.202006181745-0.
         Mounting /run/media/iso...                                                                                                                 
[    5.452506] systemd[1]: Mounting /run/media/iso...           
[    5.516750] mount[657]: mount: /run/media/iso: /dev/sr0 already mounted or mount point busy.
[FAILED] Failed to mount /run/media/iso.           
See 'systemctl status run-media-iso.mount' for details.    
[    5.524089] systemd[1]: run-media-iso.mount: Mount process exited, code=exited status=32
[DEPEND] Dependency failed for /sysroot.                          
[DEPEND] Dependency failed for Initrd Root File System.                                                                                             
[    5.527851] systemd[1]: run-media-iso.mount: Failed with result 'exit-code'.
[DEPEND] Dependency failed for Ignition (files).                                                                                                    
[    5.529851] systemd[1]: Failed to mount /run/media/iso.   
[DEPEND] Dependency failed for Ignition Complete.                                                                                                   
[DEPEND] Dependency failed for Initrd Default Target. 

@dustymabe
Copy link
Member

dustymabe commented Jun 18, 2020

reproduced 3 times in a row with:

virt-install --name cdrom --ram 4500 --vcpus 4 --disk size=20,cache=unsafe --accelerate --network bridge=virbr0 --graphics=none --rng random --cdrom /var/b/images/rhcos-46.82.202006181745-0-live.x86_64.iso --boot=uefi

I stop at the boot prompt and edit the kargs to tell it to do an install and then it fails. <-- You don't need to do that part in order to reproduce

@dustymabe
Copy link
Member

Interesting.. I think memory plays a role here. If I bump to --ram 6500 it's less consistent to fail.

@dustymabe
Copy link
Member

I think --boot=uefi plays a role too.

@dustymabe
Copy link
Member

FTR: I can't reproduce with an FCOS ISO (fedora-coreos-32.20200615.2.0-live.x86_64.iso) vs a RHCOS one (rhcos-46.82.202006181745-0-live.x86_64.iso)

@jlebon jlebon added the jira For syncing to Jira. Only works for issues (i.e. not PRs) label Jun 22, 2020
@dustymabe
Copy link
Member

Colin has a PR over in #501 that does seem to help the problem. I crafted a set up where I use the exact same ISO except just switching out the live generator with the ro setting. The problem reproduces pretty easily without the ro setting. It's harder to reproduce with the ro setting. I did 10 tests and it didn't fail until I got to #7. Then it failed for #8 and #9, but succeeded for #10. I think it's easier to reproduce when the VM host is under load. I also found it harder to reproduce when I had more than one network card attached to the VM (i.e. many things can affect the race condition).

@rollandf
Copy link

@dustymabe Are you testing with a single VM each time or a few in parallel?

@cgwalters
Copy link
Member Author

You're right, it is just less racy now 😢 Right after my 10/10 boots (which before would always be like 6/10) I got a failure. Next time, I'm testing 100 boots at least.

I think the big hammer to use here is
After=systemd-udev-settle.service
but the thing is that's unfortunate to use exactly on the potentially large bare metal servers with lots of devices that are likely targets of the ISO.

The DefaultDependencies=no we have sprinkled all around are suspicious too.

Also I do see some of our services doing After=systemd-udevd.service.

@jlebon
Copy link
Member

jlebon commented Jun 29, 2020

For those of you able to reproduce this, can you add systemd.log_level=debug systemd.log_target=console systemd.journald.forward_to_console=1? Given that it's racy, it's possible that'll interfere with it. Though if you do get a repro with more logging, it might help diagnose this further.

@cgwalters cgwalters self-assigned this Jun 29, 2020
@cgwalters
Copy link
Member Author

OK so I looked at the output of what systemd-fstab-generator and dracut do on a traditional system and I noticed a few things; working on this.

@cgwalters
Copy link
Member Author

Currently playing with this:

diff --git a/overlay.d/05core/usr/lib/dracut/modules.d/20live/live-generator b/overlay.d/05core/usr/lib/dracut/modules.d/20live/live-generator
index 5ab8b55..fd8b8e9 100755
--- a/overlay.d/05core/usr/lib/dracut/modules.d/20live/live-generator
+++ b/overlay.d/05core/usr/lib/dracut/modules.d/20live/live-generator
@@ -78,14 +78,19 @@ else
     mkdir -p /run/media/iso
     isosrc=dev/disk/by-label/${isoroot}
     isosrc_escaped=$(systemd-escape -p --suffix=device "${isosrc}")
+    initrd_rootdev_target_d="${UNIT_DIR}"/initrd-root-device.target.d
+    mkdir -p "${initrd_rootdev_target_d}"
+    cat > "${initrd_rootdev_target_d}/50-root-device.conf" <<EOF
+[Unit]
+After=${isosrc_escaped}
+Requires=${isosrc_escaped}
+EOF
     cat >"${UNIT_DIR}/run-media-iso.mount" <<EOF
 # Automatically generated by live-generator
 
 [Unit]
-DefaultDependencies=false
+After=initrd-root-device.target local-fs-pre.target
 Before=initrd-root-fs.target
-After=${isosrc_escaped}
-Requires=${isosrc_escaped}
 
 [Mount]
 What=/${isosrc}
@@ -98,8 +103,8 @@ EOF
 # Automatically generated by live-generator
 
 [Unit]
-DefaultDependencies=false
 Before=initrd-root-fs.target
+After=run-media-iso.mount
 Requires=run-media-iso.mount
 
 [Mount]
@@ -113,11 +118,10 @@ common_etcvar_unit() {
     cat << EOF
 # Automatically generated by live-generator
 [Unit]
-DefaultDependencies=false
-
 # Make sure /sysroot is mounted first, since we're mounting under there
-Requires=initrd-root-fs.target
-After=initrd-root-fs.target
+Requires=sysroot.mount
+After=sysroot.mount
+Before=initrd-root-fs.target
 
 # Make sure our tmpfs is available
 RequiresMountsFor=/writable

@cgwalters
Copy link
Member Author

One notable thing is I think our overlay hack for /etc really needs to be Before=initrd-root-fs.target and not After= it's part of assembling the root filesystem. I don't offhand have a theory for why that would cause the mount failure, but I noticed it when looking at dependencies.

@cgwalters
Copy link
Member Author

I was trying to address the race and SELinux enforcing separately, but it turns out they're related; the hack we're carrying to do the sed also has weak dependencies. So any attempt to fix this should involve using enforcing=0 on the kernel cmdline by default in the ISO rather than the hack we're carrying (or as I'm trying now, fix both at once).

Other notes

  • I can't see anything obvious in the logs corresponding to failures; whatever is causing this isn't emitting anything to the journal

A few avenues to explore

  • Write a systemtap module that logs mount invocations and the process
  • diff the fcos/rhcos initramfs modules and look for any differences in udev rules (my most likely suspect)
  • write tooling to more efficiently iterate on changing the initramfs for the ISO
  • Implement Add storage: volatile: true ignition#935 (which is probably generally useful) and see if that makes this reproduce outside of the ISO/live path
  • deduplicate the live-generator with our regular rootfs mount logic so we e.g. have a similar abstract dependency chain for finding the rootfs

@jlebon
Copy link
Member

jlebon commented Jun 30, 2020

I guess one major hack we could do meanwhile is switch the mount unit to a systemd service that retries mounting a bunch of times until it succeeds.

@cgwalters
Copy link
Member Author

One chain of possibility I ran through here is that the different Ignition versions might be a factor - ignition-mount.service is a critical serialization point right now because it orders ostree-prepare-root.service. But since we made that a stub in spec2x I think things are OK there.

I guess one major hack we could do meanwhile is switch the mount unit to a systemd service that retries mounting a bunch of times until it succeeds.

Yeah. I am trying out now doing After=systemd-udev-settle.service but if that doesn't pan out, will try that.

@jlebon
Copy link
Member

jlebon commented Jun 30, 2020

Write a systemtap module that logs mount invocations and the process

You can also try using ftrace probes via the kernel cmdline. Here's what I was playing with (but with the qemu image):

(host)$ cosa run -c --kargs 'ftrace=function ftrace_filter=__x64_sys_mount rd.break'
...
(in emergency shell)
sh-5.0# mount -t tracefs tracefs /sys/kernel/tracing
sh-5.0# cat /sys/kernel/tracing/trace
# tracer: function
#
# entries-in-buffer/entries-written: 54/54   #P:4
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
         systemd-1     [001] ....     2.645388: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.645472: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.645499: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.645640: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.647609: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.647641: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.647674: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.647790: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.647859: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.647909: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.648006: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.648078: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.648113: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.662276: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.662502: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.662644: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.663151: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.663269: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.672740: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.673183: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.673352: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.673455: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.673597: __x64_sys_mount <-do_syscall_64
         systemd-1     [001] ....     2.673867: __x64_sys_mount <-do_syscall_64
      (md-udevd)-350   [002] ....     3.500794: __x64_sys_mount <-do_syscall_64
      (md-udevd)-350   [002] ....     3.500801: __x64_sys_mount <-do_syscall_64
      (md-udevd)-350   [002] ....     3.500978: __x64_sys_mount <-do_syscall_64
      (md-udevd)-350   [002] ....     3.501064: __x64_sys_mount <-do_syscall_64
      (md-udevd)-350   [002] ....     3.501718: __x64_sys_mount <-do_syscall_64
      (md-udevd)-350   [002] ....     3.502493: __x64_sys_mount <-do_syscall_64
      (md-udevd)-350   [002] ....     3.502868: __x64_sys_mount <-do_syscall_64
      (md-udevd)-350   [002] ....     3.502878: __x64_sys_mount <-do_syscall_64
           mount-403   [003] ....     3.994890: __x64_sys_mount <-do_syscall_64
      (-network)-431   [000] ....     5.385530: __x64_sys_mount <-do_syscall_64
      (-network)-431   [000] ....     5.385554: __x64_sys_mount <-do_syscall_64
      (-network)-431   [000] ....     5.385596: __x64_sys_mount <-do_syscall_64
      (-network)-431   [000] ....     5.385602: __x64_sys_mount <-do_syscall_64
      (tup-user)-435   [000] ....     5.401413: __x64_sys_mount <-do_syscall_64
      (tup-user)-435   [000] ....     5.401421: __x64_sys_mount <-do_syscall_64
      (tup-user)-435   [000] ....     5.401480: __x64_sys_mount <-do_syscall_64
      (tup-user)-435   [000] ....     5.401491: __x64_sys_mount <-do_syscall_64
           mount-433   [000] ....     5.406290: __x64_sys_mount <-do_syscall_64
           mount-441   [001] ....     5.425259: __x64_sys_mount <-do_syscall_64
           mount-488   [003] ....     5.672217: __x64_sys_mount <-do_syscall_64
 ostree-prepare--555   [001] ....     6.658932: __x64_sys_mount <-do_syscall_64
 ostree-prepare--555   [001] ....     6.659030: __x64_sys_mount <-do_syscall_64
 ostree-prepare--555   [001] ....     6.659072: __x64_sys_mount <-do_syscall_64
 ostree-prepare--555   [001] ....     6.659085: __x64_sys_mount <-do_syscall_64
 ostree-prepare--555   [001] ....     6.659145: __x64_sys_mount <-do_syscall_64
 ostree-prepare--555   [001] ....     6.659154: __x64_sys_mount <-do_syscall_64
 ostree-prepare--555   [001] ....     6.659166: __x64_sys_mount <-do_syscall_64
 ostree-prepare--555   [001] ....     6.659182: __x64_sys_mount <-do_syscall_64
           mount-561   [002] ....     6.742653: __x64_sys_mount <-do_syscall_64
           mount-706   [002] ....    22.126260: __x64_sys_mount <-do_syscall_64

I'm not sure though if there's a way for it to pretty print the function arguments too. But it might be enough to figure out what other process is doing mounts.

Another approach if this all happens after the cmdline hook is to rd.break=cmdline and then systemd-run strace ....

@cgwalters
Copy link
Member Author

You can also try using ftrace probes via the kernel cmdline.

Cool! I had not tried ftrace before. Definitely an easy way to set up some tracing from the shell in an initramfs without extra tools/setup.

For now I'm focusing on SELinux.

@cgwalters
Copy link
Member Author

I think this is fixed by #499

c4rt0 pushed a commit to c4rt0/fedora-coreos-config that referenced this issue Mar 27, 2023
FAQ: Link to building.md and enhance it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira For syncing to Jira. Only works for issues (i.e. not PRs)
Projects
None yet
Development

No branches or pull requests

4 participants