Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Fedora CoreOS low NVMe io timeout #605

Open
paulnivin opened this issue Aug 17, 2020 · 2 comments
Open

AWS Fedora CoreOS low NVMe io timeout #605

paulnivin opened this issue Aug 17, 2020 · 2 comments

Comments

@paulnivin
Copy link

Per https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#timeout-nvme-ebs-volumes EBS NVMe volumes need to have an io timeout greater than the Linux kernel default of 30 seconds. AWS recommends 4294967295. Container Linux fixed this issue in coreos/bugs#2484; however, FCOS AMIs use the Linux upstream default value of 30 seconds. For reliable operation on AWS, FCOS needs to increase the default io timeout for EBS NVMe volumes.

nvme_core.io_timeout=4294967295 as a boot param is the AWS recommended approach. https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1842562 goes a step further to use udev to then lower the per-device timeouts for EC2 instance storage NVMe drives back to the Linux default io timeout after the global boot param has been maxed out.

Here's an FCC snippet that sets appropriate io timeouts based on the type of NVMe device (max io_timeout for EBS and 30s for instance storage) at a later stage in the boot vs the nvme_core kernel argument:

variant: fcos
version: 1.1.0
storage:
  files:
    - path: /etc/udev/rules.d/01-nvme.rules
      contents:
        inline: |
          ACTION=="add", KERNEL=="nvme[0-9]*n[0-9]*", ENV{DEVTYPE}=="disk", ATTRS{model}=="Amazon Elastic Block Store", ATTR{queue/io_timeout}="4294967295"
          ACTION=="add", KERNEL=="nvme[0-9]*n[0-9]*", ENV{DEVTYPE}=="disk", ATTRS{model}=="Amazon EC2 NVMe Instance Storage", ATTR{queue/io_timeout}="30000"
# rpm-ostree status   
State: idle
Deployments:
* ostree://fedora:fedora/x86_64/coreos/stable
                   Version: 32.20200726.3.1 (2020-08-12T05:29:32Z)
                    Commit: 2579b41aa614c3a40b9e24ff0b9dd288f99222dc3ed3a527ef0d8e8667196ff5
              GPGSignature: Valid signature by 97A1AE57C3A2372CCA3A4ABA6C13026D12C944D0
@jlebon
Copy link
Member

jlebon commented Aug 19, 2020

Hmm right, this rings a bell from RHEL AH too.

I do prefer the udev rules approach instead of a kernel arg since we don't really have platform-specific kargs in our images (except the Ignition platform ID, but that one is kinda special).

On the topic of NVMe and udev rules in AWS, this came up recently as part of #601, which pointed to having it part of systemd (systemd/systemd#11532). This can be another rule we can add there (and that way it benefits all systemd-based distros).

@dustymabe
Copy link
Member

cc @davdunc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants