Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Fedora CoreOS missing /dev/xvd* symlinks #601

Open
velothump opened this issue Aug 12, 2020 · 17 comments
Open

AWS Fedora CoreOS missing /dev/xvd* symlinks #601

velothump opened this issue Aug 12, 2020 · 17 comments

Comments

@velothump
Copy link

CoreOS used to symlink /dev/xvd* to the /dev/nvme devices: coreos/bugs#2399.

We're attempting to migrate Kubernetes workloads, and the kubelet is attempting to mount the /dev/xvd* devices.

State: idle
Deployments:
* ostree://fedora:fedora/x86_64/coreos/stable
                   Version: 32.20200715.3.0 (2020-07-27T11:36:29Z)
                    Commit: a3b08ee51b1d950afd9d0d73f32d5424ad52c7703a6b5830e0dc11c3a682d869
              GPGSignature: Valid signature by 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0
@lucab
Copy link
Contributor

lucab commented Aug 12, 2020

Thanks for the report. This is a duplicate of #104.

@davdunc from AWS was having a look at this. Our hope is that they eventually manage to upstream this, possibly into systemd default udev rules: systemd/systemd#11532.

@velothump
Copy link
Author

Thanks @lucab, thought I was almost there by using the files from https://github.com/coreos/fedora-coreos-config/pull/476/files, but it turns out the nvme binary isn't available on FCOS. Is there any plans to add it? And do you have any suggestions to get around this for now?

@lucab
Copy link
Contributor

lucab commented Aug 12, 2020

@velothump good catch! I think it's quite safe to say we should bake it into FCOS config. The utility is packaged in Fedora as nvme-cli.

@wernerb
Copy link

wernerb commented Nov 14, 2020

This has been a headscratcher for a couple of hours.
For the cloud_aws_ebs_nvme_id script (or whatever PROGRAM you are calling in udev) you are using you need to reference /usr/sbin/nvme directly and not leave the script call nvme as its not part of the path for udev and will result in no symlinks being created.

Is there any traction on this? Because this is a necessary and vital udev configuration for anyone using AWS.

@wmedlar
Copy link

wmedlar commented May 27, 2021

Bumping this issue for traction. We're running into this while migrating our Kubernetes clusters to m5 instances and not sure where to go from here.

@wcurry
Copy link

wcurry commented May 27, 2021

Bumping this issue for traction. We're running into this while migrating our Kubernetes clusters to m5 instances and not sure where to go from here.

More detail: We have added a script that calls nvme for the block device name along with a udev rule file (via ignition files), but it appears as though ignition attempts and fails to format the /dev/xvd* partitions before the symlinks are created and times out. Is this expected? Is there a race condition that we can work around?

@bgilbert
Copy link
Contributor

Ignition normally waits for device nodes before trying to use them. Could you post logs of the failure? One thing to check is that you've included nvme and the udev rules in the initramfs, not just in the root filesystem.

@wcurry
Copy link

wcurry commented May 27, 2021

We have been booting vanilla upstream FCOS AMIs. What you're suggesting would require building our own images, right?

@bgilbert
Copy link
Contributor

Oh, I see what you meant. Yes, that's right. You can't use an Ignition config to add udev rules affecting the behavior of Ignition.

@wcurry
Copy link

wcurry commented May 27, 2021

That makes sense. Hopefully this is an easy process. We are under a lot of pressure to move to m5 instances. It looks like Amazon Linux creates the symlink for you (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html).

@bgilbert
Copy link
Contributor

Can you use a hybrid approach as a workaround? If you install the udev rules into the rootfs via Ignition, that should make the kubelet happy, and in the Ignition config itself you can use the underlying /dev/nvme* device paths.

@wcurry
Copy link

wcurry commented May 27, 2021

From the link:

Occasionally, devices can respond to discovery in a different order in subsequent instance starts, which causes the device name to change.

I don't think that's going to be reliable.

@davdunc
Copy link
Contributor

davdunc commented May 27, 2021

I would love to work in tandem, so if we can start to identify individual rules in ec2-utils and then milestone on specific use cases and get it done sooner that would be ideal. Are we still targeting moving this into systemd?

@dustymabe
Copy link
Member

I would love to work in tandem, so if we can start to identify individual rules in ec2-utils and then milestone on specific use cases and get it done sooner that would be ideal.

@lucab - would you want to work with @davdunc on this?

Are we still targeting moving this into systemd?

I think that was (is?) the plan.

@lucab
Copy link
Contributor

lucab commented Jun 2, 2021

@davdunc the rules we are talking about are basically https://github.com/coreos/fedora-coreos-config/pull/476/files, which originates in Container Linux https://github.com/coreos/init/pull/268/files but are based on what ec2-utils does, minus the Python part.

@dustymabe
Copy link
Member

@davdunc has taken an action in today's community meeting:

13:34:48  dustymabe | #action davdunc to put a package review in for
                    | ec2-net-utils and brainstorm on how we can use that for #601

@dustymabe
Copy link
Member

We discussed this in the community meeting today.

12:50:52   dustymabe | #info for #601 after davadunc talked with Amazon engineers
                     | the code in ec2-utils and ec2-net-utils might be a good
                     | reference for other distributions, but is not promised to
                     | work. They encourage us to write our own with those as a  
                     | reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants