Skip to content
This repository has been archived by the owner on Sep 18, 2020. It is now read-only.

Update grub.cfg #834

Closed
wants to merge 2 commits into from
Closed

Update grub.cfg #834

wants to merge 2 commits into from

Conversation

Alalk
Copy link

@Alalk Alalk commented Aug 10, 2018

updating the grub config to use the nvme defaults required by aws. Should solve the failure to pass status checks. (eventually)
This only works on kernel version above 4.15. (core timeout max is 255 for below 4.15)

coreos/bugs#2464
coreos/bugs#2484
coreos/bugs#2371
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#timeout-nvme-ebs-volumes

updating the grub config to use the nvme defaults required by aws. Should solve the failure to pass status checks. (eventually)
https://github.com/coreos/bugs/issues
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#timeout-nvme-ebs-volumes
@ajeddeloh
Copy link
Contributor

Thanks for hunting this down. This should probably go in the ec2 specific config instead unless there's a reason to do it on all platforms. I can get around to it at some point, but if you want to implement it there instead, go ahead.

I'm not sure how 4.14.x kernels would handle a value bigger than 255, so maybe also PR the build-1800 (current stable) and build-1855 (alpha that is about to become beta) branches with it set to 255. Regardless of how they'd handle it, it would be more obvious what the value being used is.

@Alalk
Copy link
Author

Alalk commented Aug 10, 2018

You're welcome.
i did not know about the ec2 specific config >.<. ill update the pull request.

as for the 255 vs 4294967295 According to the aws docs for the nvme.

NVMe EBS volumes use the default NVMe driver provided by the operating system. Most operating systems specify a timeout for I/O operations submitted to NVMe devices. The default timeout is 30 seconds and can be changed using the nvme_core.io_timeout boot parameter (or the nvme.io_timeout boot parameter for Linux kernels prior to version 4.6). For an experience similar to EBS volumes attached to Xen instances, we recommend setting this to the highest value possible. For Amazon Linux AMI 2017.09.01 (or greater), and for Linux kernels with version 4.15 or greater, the maximum is 4294967295. Prior to Linux 4.15, the maximum is 255 seconds. If you are using a current version of the Amazon Linux AMI, we have already increased the timeout.

moving 
nvme_core.io_timeout=4294967295 (and reducimng to 255) nvme_core.max_retries=10 to the 
https://github.com/coreos/coreos-overlay/blob/master/coreos-base/oem-ec2-compat/files/grub-ec2.cfg .
@ajeddeloh
Copy link
Contributor

I meant that I'm not sure what a 4.14 kernel will do if handed 4294967295 instead of 255.

@Alalk
Copy link
Author

Alalk commented Aug 10, 2018

oh i tried that. the ami freaks out and throws boot errors. I don't have the logs handy for when i tried that on coreos stable unfortunately.

@Alalk Alalk closed this Aug 10, 2018
@Alalk Alalk deleted the nvme-aws-timeout-patch branch August 10, 2018 18:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants