Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image: improve AWS performance by retiring idle=poll option #3387

Merged
merged 1 commit into from
Oct 4, 2024

Conversation

burgerdev
Copy link
Contributor

@burgerdev burgerdev commented Oct 4, 2024

Context

The Linux command line parameter idle=poll was introduced mid-2023 to work around a hypervisor issue at AWS. This issue has been fixed since end of 2023.

idle=poll has some negative consequences, see the kernel docs linked below. For example, it basically prevents the CPU from going idle at all under light load, resulting in the symptoms explained in #3383. I took these measurements on m6a.xlarge machines before and after the change proposed here:

/ # cat /proc/cmdline
roothash=de436e547b918bf0411d9738485c431918e068a051241db88bc099c0782da5e4 preempt=full rd.shell=0 rd.emergency=reboot loglevel=8 selinux=1 enforcing=0 audit=0 constellation.debug console=ttyS0 constel.csp=aws idle=poll mitigations=auto constel.attestation-variant=aws-sev-snp
/ # sysbench --threads=1 cpu run | grep -F "events per second"
    events per second:  1124.96
/ # sysbench --threads=2 cpu run | grep -F "events per second"
    events per second:  2187.34
/ # sysbench --threads=4 cpu run | grep -F "events per second"
    events per second:  2716.90
/ # iostat -c
Linux 6.6.30-100.constellation.fc40.x86_64 (b360f3f934b9)       10/04/24        _x86_64_        (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.06    0.00   71.65    5.77    0.15   20.38
/ # cat /proc/cmdline 
roothash=91c73aa9caf458e8b15537ae608a1690a0f0f852e7835ebc49b5e3be1d5d1068 preempt=full rd.shell=0 rd.emergency=reboot loglevel=8 selinux=1 enforcing=0 audit=0 constellation.debug console=ttyS0 constel.csp=aws mitigations=auto,nosmt constel.attestation-variant=aws-sev-snp
/ # sysbench --threads=1 cpu run | grep -F "events per second"
    events per second:  1877.43
/ # sysbench --threads=2 cpu run | grep -F "events per second"
    events per second:  3364.97
/ # sysbench --threads=4 cpu run | grep -F "events per second"
    events per second:  3403.21
/ # iostat -c
Linux 6.6.30-100.constellation.fc40.x86_64 (490e83bc2711)       10/04/24        _x86_64_        (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.75    0.00    7.68    4.75    0.03   86.79

Proposed change(s)

  • Remove idle=poll
  • Align the mitigations to other CSPs: mitigations=auto,nosmt

Related issue

Additional info

Checklist

@burgerdev burgerdev added the bug fix Fixing a bug label Oct 4, 2024
@burgerdev burgerdev added this to the v2.19.0 milestone Oct 4, 2024
@burgerdev burgerdev requested a review from msanft as a code owner October 4, 2024 13:29
Copy link

netlify bot commented Oct 4, 2024

Deploy Preview for constellation-docs canceled.

Name Link
🔨 Latest commit 6c481f2
🔍 Latest deploy log https://app.netlify.com/sites/constellation-docs/deploys/66ffedaa2cb6e60008086118

@burgerdev burgerdev merged commit bd31361 into main Oct 4, 2024
17 checks passed
@burgerdev burgerdev deleted the burgerdev/100-pct branch October 4, 2024 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix Fixing a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

100% CPU utilization of AWS EC2 Worker Nodes
2 participants