Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

bootengine: some udev rules are in rootfs but missing in initramfs #2481

Closed
r7vme opened this issue Jul 30, 2018 · 16 comments
Closed

bootengine: some udev rules are in rootfs but missing in initramfs #2481

r7vme opened this issue Jul 30, 2018 · 16 comments
Labels

Comments

@r7vme
Copy link

r7vme commented Jul 30, 2018

Issue Report

Proper way to detect a disk in Azure was added here. Udev rules are creating symlink like /dev/disk/azure/scsi1/lun0. Unfortunately i can not use them from ignition.

Following fails with Timed out waiting for device dev-disk-azure-scsi1-lun0.device

storage:
  filesystems:
    - name: docker
      mount:
        device: /dev/disk/azure/scsi1/lun0
        format: xfs
        wipe_filesystem: true
        label: docker

Inside debug console, i see no rules for azure (which is probably expected as initramfs does not suppy them)

:/# ls /usr/lib/udev/rules.d/
10-dm.rules                  63-md-raid-arrays.rules    80-drivers.rules
13-dm-disk.rules             64-btrfs.rules             80-net-setup-link.rules
50-udev-default.rules        64-md-raid-assembly.rules  90-vconsole.rules
60-block.rules               71-seat.rules              95-dm-notify.rules
60-cdrom_id.rules            73-seat-late.rules         99-systemd.rules
60-persistent-storage.rules  75-net-description.rules

Same applies to ignition on AWS (nvme disks) i assume.

Would be really great to use ignition "filesystems" as it has many advantages over having systemd units to format and mount filesystems.

Bug

Container Linux Version

$ cat /etc/os-release
cat /etc/os-release 
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1745.7.0
VERSION_ID=1745.7.0
BUILD_ID=2018-06-14-0909
PRETTY_NAME="Container Linux by CoreOS 1745.7.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Environment

Azure

Expected Behavior

Filesystem created on /dev/disk/azure/lun0

Actual Behavior

Ignition fails to boot with timeout

Reproduction Steps

  1. Create Azure VM and put ignition with filesystems snippet above
  2. VM will fail with provisioning timeout

Other Information

N/A

@r7vme r7vme changed the title Unable to use Azure disk (and AWS nvme disks) symlinks with ignition Unable to format Azure disk (and AWS nvme disks) with ignition Jul 30, 2018
r7vme pushed a commit to giantswarm/giantnetes-terraform that referenced this issue Jul 30, 2018
Switch to systemd units. See coreos/bugs#2481
@ajeddeloh
Copy link

Whoops. Looks like we forgot to pull them into the initramfs as well. Should grab coreos/init#268 while we're at it.

@r7vme
Copy link
Author

r7vme commented Aug 7, 2018

Label "jira" probably means that this bug is already tracked internally. :)

Do you have some rough estimates?

@lucab
Copy link

lucab commented Aug 7, 2018

@r7vme yes, but work has not started on it yet. You'll likely see a pingback as soon as it picked up.

@r7vme
Copy link
Author

r7vme commented Aug 28, 2018

Hi, anyhow i can help with this issue? Some high-level steps would help.

We (and i assume many others who use ignition with AWS or Azure) are waiting for this fix. Briinging back systemd units that format disks are the last thing i want to do :)

@lucab
Copy link

lucab commented Aug 28, 2018

@r7vme at a very high level, the problem is that bootengine is not installing any of the udev-related cloud bits that are used in the real rootfs. At a low level, it means setting up proper dependencies between the ebuilds, installing those bits via a dracut module, and testing the result in azure.

I haven't personally looked into this so I don't know if this was simply completely overlooked or if it is done somewhere already and just some bits are missing.

@r7vme
Copy link
Author

r7vme commented Aug 30, 2018

@lucab thank for the instructions.

I've create PR, but not 100% sure about ebuild you mentioned. I'm not sure how i can use ebuilds in dracut. I've just added udev rule for Azure disks.

@lucab
Copy link

lucab commented Aug 30, 2018

@r7 where did you source those udev rules from? Are they already in CL rootfs somewhere?

We should probably have them normally in CL root first, and then tell dracut to copy them to the initramfs. The ebuilds for the two components are in our overlay: bootengine and coreos-init. From those packages we assemble the content of the images.

@r7vme
Copy link
Author

r7vme commented Aug 30, 2018

where did you source those udev rules from? Are they already in CL rootfs somewhere?

https://github.com/coreos/init/blob/master/udev/rules.d/66-azure-storage.rules

We should probably have them normally in CL root first, and then tell dracut to copy them to the initramfs.

Aha, i can do it.

@r7vme
Copy link
Author

r7vme commented Aug 30, 2018

@lucab updated my PR.

@r7vme
Copy link
Author

r7vme commented Aug 31, 2018

Thanks for the review. I agree that this issue should be closed after AWS fixed too. Should it be reopen then?

@lucab lucab reopened this Aug 31, 2018
@lucab lucab changed the title Unable to format Azure disk (and AWS nvme disks) with ignition bootengine: some udev rules are in rootfs but missing in initramfs Aug 31, 2018
@lucab
Copy link

lucab commented Aug 31, 2018

Ack, re-opened. While the specific Azure+SCSI usecase should be unblocked, there may be a few more items missing (like AWS NVMe) and we should check that relevant scripts and helpers/utilities are present in the initramfs.

@r7vme
Copy link
Author

r7vme commented Aug 31, 2018

PR that bumps bootengine in coreos-overlay coreos/coreos-overlay#3396

@r7vme
Copy link
Author

r7vme commented Sep 26, 2018

PR for AWS disks coreos/bootengine#149

@seh
Copy link

seh commented Dec 3, 2018

I find that even with the AWS EBS rules in place, I can use "storage.filesystems" with an unstable path like /dev/nvme1n1 for the "storage.filesystems.mount.device" field, but I can't use the more stable name I have to supply to AWS to attach the volume like /dev/sdf. Ignition times out consistently waiting for the corresponding systemd "device unit" to start, as noted over in coreos/coreos-overlay#3366 here.

In my EC2 instance, I can see the rules present that @r7vme had originally pointed out were missing:

% grep -C3 'Elastic Block' /usr/lib/udev/rules.d/90-cloud-storage.rules 
## AWS EBS NVMe names
## https://github.com/coreos/bugs/issues/2399
# NVMe devices
KERNEL=="nvme[0-9]*n[0-9]*", ENV{DEVTYPE}=="disk", ATTRS{model}=="Amazon Elastic Block Store", ATTRS{serial}=="?*", SYMLINK+="disk/by-id/nvme-$attr{model}_$attr{serial}-ns-%n", OPTIONS+="string_escape=replace"
KERNEL=="nvme[0-9]*n[0-9]*", ENV{DEVTYPE}=="disk", ATTRS{model}=="Amazon Elastic Block Store", ATTRS{serial}=="?*", PROGRAM="cloud_aws_ebs_nvme_id -d /dev/%k", SYMLINK+="%c"
# NVMe partitions
KERNEL=="nvme[0-9]*n[0-9]*p[0-9]*", ENV{DEVTYPE}=="partition", ATTRS{model}=="Amazon Elastic Block Store", ATTRS{serial}=="?*", IMPORT{program}="cloud_aws_ebs_nvme_id -n /dev/%k"
KERNEL=="nvme[0-9]*n[0-9]*p[0-9]*", ENV{DEVTYPE}=="partition", ATTRS{model}=="Amazon Elastic Block Store", ATTRS{serial}=="?*", ENV{_NS_ID}=="?*", SYMLINK+="disk/by-id/nvme-$attr{model}_$attr{serial}-ns-$env{_NS_ID}-part%n", OPTIONS+="string_escape=replace"
KERNEL=="nvme[0-9]*n[0-9]*p[0-9]*", ENV{DEVTYPE}=="partition", ATTRS{model}=="Amazon Elastic Block Store", ATTRS{serial}=="?*", ENV{_NS_ID}=="?*", PROGRAM="cloud_aws_ebs_nvme_id -d /dev/%k", SYMLINK+="%c%n"

# TODO: Anyone else support friendly names?

Those rules do indeed work, but is the problem that they wind up working too late for Ignition to use when creating the filesystems?

% ls -l /dev/sdf
lrwxrwxrwx. 1 root root 7 Dec  3 18:46 /dev/sdf -> nvme1n1

@seh
Copy link

seh commented Dec 4, 2018

Please see #2531 for related trouble.

r7vme pushed a commit to giantswarm/giantnetes-terraform that referenced this issue Dec 20, 2018
Requires CoreOS 1995.0.0.

We can not use predictable disk names with ignition
as coreos/bugs#2481 was fixed.

TODO: Enable docker disk wipe
r7vme pushed a commit to giantswarm/giantnetes-terraform that referenced this issue Dec 24, 2018
* Use persistent paths for disks

Requires CoreOS 1995.0.0.

We can not use predictable disk names with ignition
as coreos/bugs#2481 was fixed.

TODO: Enable docker disk wipe

* Wipe docker disks for master VMs

* Use 1995.0.0 in CI
@r7vme
Copy link
Author

r7vme commented Dec 28, 2018

For everyone who will come here. We finally switched to 1995.0.0 (currently in alpha channel), which has both fixes for AWS and Azure.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants