Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Available CPU MHz Varying Wildly for Same Instance Type #7681

Closed
herter4171 opened this issue Apr 9, 2020 · 15 comments · Fixed by #7828
Closed

Available CPU MHz Varying Wildly for Same Instance Type #7681

herter4171 opened this issue Apr 9, 2020 · 15 comments · Fixed by #7828

Comments

@herter4171
Copy link

Nomad version

Nomad v0.10.4 (f750636)

Operating system and Environment details

Amazon Linux 2 with a fixed head node and an auto-scaling group of c5.24xlarge instances, with scaling driven by Nomad state using a custom cloud metric.

Issue

The number of MHz available on a node varies wildly. For the exact same instance type (96 cores, 3 GHz stock, 3.9 GHz max), I'm seeing as low as 1.6E5 MHz all the way up to 3.4E5 MHz. Just now, I've launched 3 c5.24xlarge nodes, and their max MHz are

  • 305184
  • 280128
  • 340800

I'd rather not hard-wire cpu_total_compute in the client config, and everything else I've read claims Nomad sets the MHz based on core count multiplied by rated clock speed rather than current.

Having MHz vary like this causes jobs to not be placed, even when the node actually has the capacity. Would a short-term fix be forcing all but one core to 100%, launching the Nomad client, and taking the load off of CPU? The docs I've read claim Nomad uses stock clock speed, so I'm kind of at a loss here.

Reproduction steps

Launch a few instances of the same type with the Nomad client running on boot (I'm using systemctl). Rated MHz for each client in the web UI should vary appreciably.

@jrasell
Copy link
Member

jrasell commented Apr 10, 2020

Hi @herter4171 and thanks for the detail in this issue. In order to help diagnose this problem would you be able to provide the output of the following two commands from a couple of the instances where you are seeing this behaviour?

cat /proc/cpuinfo |grep 'cpu MHz'
cat /proc/cpuinfo |grep 'cpu cores'

@shoenig
Copy link
Member

shoenig commented Apr 10, 2020

Seems like parsing cpu MHz out of /proc/cpuinfo is only going to get us current clockspeed, which could vary widely given power states, etc. Has nomad always determined clock speed this way? We should be getting the rated speed, instead. e.g.

$ lscpu | grep MHz
CPU MHz:                         3899.997
CPU max MHz:                     4700.0000
CPU min MHz:                     400.0000

@herter4171
Copy link
Author

Hi @jrasell, thank you for the response! Output for those grep commands are a bit lengthy due to there being 96 cores. Here is some truncated output.

For the first instance,

$ cat /proc/cpuinfo | grep 'cpu MHz' | head -n 1
cpu MHz         : 1843.994
$ cat /proc/cpuinfo |grep 'cpu cores' | head -n 1
cpu cores       : 24

For the second instance,

$ cat /proc/cpuinfo | grep 'cpu MHz' | head -n 1
cpu MHz         : 1677.167
$ cat /proc/cpuinfo |grep 'cpu cores' | head -n 1
cpu cores       : 24

For the third instance,

$ cat /proc/cpuinfo | grep 'cpu MHz' | head -n 1
cpu MHz         : 1506.577
$ cat /proc/cpuinfo |grep 'cpu cores' | head -n 1
cpu cores       : 24

Givne the nproc output, I'm guessing the "cpu cores" output of 24 implies there are four physical processors.

$ nproc
96

@dvusboy
Copy link

dvusboy commented Apr 10, 2020

Nomad uses gopsutil.cpu.InfoStat to get the CPU MHz, and by default, it uses /sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq on Linux, to determine the maximum frequency of the CPU see. But it will fall back on value from /proc/cpuinfo if that failed. You should check that sysfs path on your VMs, @herter4171.

@herter4171
Copy link
Author

herter4171 commented Apr 10, 2020

Hi @dvusboy, the lay of the land is that I'm using Amazon Linux 2 pretty much out of the box. That platform has /sys/devices/system/cpu, and from there it's cpu0 and so on. The subdirectories for cpu* don't have a cpufreq directory, so I'm not sure how to proceed. Is there something I can do to populate that? This seems like a pretty major detail for supporting Nomad on Amazon Linux 2, and I'd like to avoid switching distros.

@dvusboy
Copy link

dvusboy commented Apr 10, 2020

@herter4171 By cpuN, I meant, substituting N with some non-negative integer. Since cpufreq is not there, I'd say you don't have access to the actual maximum frequency, and gopsutil defaults to MHz out of cpuinfo, which is the current frequency. It would explain what you're seeing.

@herter4171
Copy link
Author

@dvusboy, I latently picked up on that and edited my last comment accordingly. Can I do something to make Amazon Linux 2 play ball for Nomad, or can something be done on the Nomad side of things to fix this? One idea I have is spawning yes > /dev/null & for all but one core before launching the Nomad client to make Nomad recognize actual MHz, but I'd really appreciate some support for the given platform. I can't be the only guy running Nomad on Amazon Linux 2, after all.

@dvusboy
Copy link

dvusboy commented Apr 10, 2020

I suppose you can use cpu_total_compute in the client configuration to override the fingerprinted values.

@herter4171
Copy link
Author

@dvusboy, I'm aware of that option, and I don't think it addresses the core issue. Nomad should be capable enough to set available MHz.

@herter4171
Copy link
Author

Hi @dvusboy and company, after rooting around a bit, I can see the difficulty in getting rated clock speed on Amazon Linux 2 without assumed access to sudo. In case it helps on your end, what I've put in place for initializing a Nomad client is as follows.

# Get max rated core speed
CORE_MAX_MHZ=$(sudo dmidecode processor-frequency \
    | grep '^\s*Max Speed' \
    | head -n 1 \
    | awk '{print $3}')

# Multiply by number of cores to get total MHz
TOTAL_MHZ=$((CORE_MAX_MHZ*`nproc`))

I'd still like to see this functionality become native instead of depending on my hacky Bash, but I'm equipped to move on if there's not interest in pursuing this. Thanks for the help so far.

@shoenig shoenig self-assigned this Apr 13, 2020
@herter4171
Copy link
Author

Hey @shoenig, I'm having a bit of additional difficulty in spite of my fix. Even though I've set the client stanza like I described and verified the updated value is reflected in Nomad, jobs still fail to be placed due to this other hidden limit shown in my screenshot. I'm a bit confused, because 262144 MHz / 96 cores = 2.73 GHz/core, and that's above the rated speed of 2.5 GHz and well below the max of 3.5 GHz.
image
I'd hope to be able to move on with things, but this is still holding things back, I'm afraid.

@shoenig
Copy link
Member

shoenig commented Apr 14, 2020

I'm thinking this is actually a problem on all EC2 instances, not just Linxu2. On an Ubuntu micro:

ubuntu@ip-172-31-82-121:~$ cpupower frequency-info
analyzing CPU 0:
  no or unknown cpufreq driver is active on this CPU
  CPUs which run at the same hardware frequency: Not Available
  CPUs which need to have their frequency coordinated by software: Not Available
  maximum transition latency:  Cannot determine or is not supported.
Not Available
  available cpufreq governors: Not Available
  Unable to determine current policy
  current CPU frequency: Unable to call hardware
  current CPU frequency:  Unable to call to kernel
  boost state support:
    Supported: no
    Active: no
ubuntu@ip-172-31-82-121:~$ # there is no cpufreq/cpuinfo_max_freq
ubuntu@ip-172-31-82-121:~$ ls /sys/devices/system/cpu/cpu0
cache  crash_notes  crash_notes_size  driver  firmware_node  hotplug  node0  power  subsystem  topology  uevent
ubuntu@ip-172-31-82-121:~$ ls /sys/devices/system/cpu/cpufreq  # empty

If there's any good news, the CPU cgroup management seems unaffected

Allocated Resources
CPU           Memory          Disk
250/2400 MHz  32 MiB/983 MiB  300 MiB/6.6 GiB

Allocation Resource Utilization
CPU         Memory
0/2400 MHz  388 KiB/983 MiB

Host Resource Utilization
CPU            Memory           Disk
2400/2400 MHz  146 MiB/983 MiB  1.4 GiB/8.0 GiB  # loaded deliberately 
[ec2-user@ip-172-31-94-218 proc]$ cat /proc/cgroups
#subsys_name	hierarchy	num_cgroups	enabled
cpuset	11	3	1
cpu	9	3	1
cpuacct	9	3	1
blkio	10	3	1
memory	6	3	1
devices	5	25	1
freezer	4	3	1
net_cls	2	3	1
perf_event	8	3	1
net_prio	2	3	1
hugetlb	7	3	1
pids	3	3	1

I'm going to keep researching and asking around, but I suspect this may boil down to parsing the rated CPU speed out of the CPU model name string. Hacky as that may be, it should be more accurate than parsing cpu MHz, which is tantamount to using a random number.

@herter4171
Copy link
Author

Hey @shoenig, thanks for the digging. One thing about using model name I've noticed is that certain instance types, like "memory optimized," use AMD chips that don't have the rated frequency in the name like Intel procs tends to. Also, I think the driver error I'm seeing in the pic from my last comment is related to this issue, since it's requiring a value for MHz between rated and max. I'd be happy to open a separate thread for that if it's going to muddy waters here, though.

@shoenig
Copy link
Member

shoenig commented Apr 21, 2020

Another possibility might be to modify gopsutil to briefly load a single CPU thread and take measurements of the current speed, the maximum of which would be presumed to be the max CPU speed.

I put together a quick demo to check if this works, before submitting the idea upstream

$ for i in {1..10}; do ./loadcpu && sleep 3 && echo ""; done
read current speed: 800.04
loaded max speed:   3900.70

read current speed: 1924.65
loaded max speed:   3901.08

read current speed: 1495.16
loaded max speed:   3900.33

read current speed: 2826.81
loaded max speed:   3900.00

read current speed: 3400.18
loaded max speed:   3902.43

read current speed: 1979.91
loaded max speed:   3900.95

read current speed: 2627.13
loaded max speed:   3900.19

read current speed: 889.96
loaded max speed:   3901.62

read current speed: 3391.65
loaded max speed:   3902.97

read current speed: 906.17
loaded max speed:   3900.63

shoenig added a commit that referenced this issue Apr 29, 2020
Fixes #7681

The current behavior of the CPU fingerprinter in AWS is that it
reads the **current** speed from `/proc/cpuinfo` (`CPU MHz` field).

This is because the max CPU frequency is not available by reading
anything on the EC2 instance itself. Normally on Linux one would
look at e.g. `sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq`
or perhaps parse the values from the `CPU max MHz` field in
`/proc/cpuinfo`, but those values are not available.

Furthermore, no metadata about the CPU is made available in the
EC2 metadata service.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-categories.html

Since go-psutil cannot determine the max CPU speed, it defaults to
the current CPU speed, which could be basically any number between
0 and the true max. This is particularly bad on large, powerful
reserved instances which often idle at ~800 MHz while Nomad does
its fingerprinting (typically IO bound), which Nomad then uses as
the max, which results in severe loss of available resources.

Since the CPU specification is unavailable programmatically (at least
not without sudo), use a best-effort lookup table. This table was
generated by going through every instance type in AWS documentation
and copy-pasting the numbers.
https://aws.amazon.com/ec2/instance-types/

This approach obviously is not ideal, as future instance types will
need to be added as they are introduced to AWS. However, using the
table should only be an improvement over the status quo, since right
now Nomad miscalculates available CPU resources on all instance types.
shoenig added a commit that referenced this issue Apr 29, 2020
Fixes #7681

The current behavior of the CPU fingerprinter in AWS is that it
reads the **current** speed from `/proc/cpuinfo` (`CPU MHz` field).

This is because the max CPU frequency is not available by reading
anything on the EC2 instance itself. Normally on Linux one would
look at e.g. `sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq`
or perhaps parse the values from the `CPU max MHz` field in
`/proc/cpuinfo`, but those values are not available.

Furthermore, no metadata about the CPU is made available in the
EC2 metadata service.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-categories.html

Since `go-psutil` cannot determine the max CPU speed it defaults to
the current CPU speed, which could be basically any number between
0 and the true max. This is particularly bad on large, powerful
reserved instances which often idle at ~800 MHz while Nomad does
its fingerprinting (typically IO bound), which Nomad then uses as
the max, which results in severe loss of available resources.

Since the CPU specification is unavailable programmatically (at least
not without sudo) use a best-effort lookup table. This table was
generated by going through every instance type in AWS documentation
and copy-pasting the numbers.
https://aws.amazon.com/ec2/instance-types/

This approach obviously is not ideal as future instance types will
need to be added as they are introduced to AWS. However, using the
table should only be an improvement over the status quo since right
now Nomad miscalculates available CPU resources on all instance types.
shoenig added a commit that referenced this issue Apr 29, 2020
Fixes #7681

The current behavior of the CPU fingerprinter in AWS is that it
reads the **current** speed from `/proc/cpuinfo` (`CPU MHz` field).

This is because the max CPU frequency is not available by reading
anything on the EC2 instance itself. Normally on Linux one would
look at e.g. `sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq`
or perhaps parse the values from the `CPU max MHz` field in
`/proc/cpuinfo`, but those values are not available.

Furthermore, no metadata about the CPU is made available in the
EC2 metadata service.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-categories.html

Since `go-psutil` cannot determine the max CPU speed it defaults to
the current CPU speed, which could be basically any number between
0 and the true max. This is particularly bad on large, powerful
reserved instances which often idle at ~800 MHz while Nomad does
its fingerprinting (typically IO bound), which Nomad then uses as
the max, which results in severe loss of available resources.

Since the CPU specification is unavailable programmatically (at least
not without sudo) use a best-effort lookup table. This table was
generated by going through every instance type in AWS documentation
and copy-pasting the numbers.
https://aws.amazon.com/ec2/instance-types/

This approach obviously is not ideal as future instance types will
need to be added as they are introduced to AWS. However, using the
table should only be an improvement over the status quo since right
now Nomad miscalculates available CPU resources on all instance types.
shoenig added a commit that referenced this issue Apr 29, 2020
Fixes #7681

The current behavior of the CPU fingerprinter in AWS is that it
reads the **current** speed from `/proc/cpuinfo` (`CPU MHz` field).

This is because the max CPU frequency is not available by reading
anything on the EC2 instance itself. Normally on Linux one would
look at e.g. `sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq`
or perhaps parse the values from the `CPU max MHz` field in
`/proc/cpuinfo`, but those values are not available.

Furthermore, no metadata about the CPU is made available in the
EC2 metadata service.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-categories.html

Since `go-psutil` cannot determine the max CPU speed it defaults to
the current CPU speed, which could be basically any number between
0 and the true max. This is particularly bad on large, powerful
reserved instances which often idle at ~800 MHz while Nomad does
its fingerprinting (typically IO bound), which Nomad then uses as
the max, which results in severe loss of available resources.

Since the CPU specification is unavailable programmatically (at least
not without sudo) use a best-effort lookup table. This table was
generated by going through every instance type in AWS documentation
and copy-pasting the numbers.
https://aws.amazon.com/ec2/instance-types/

This approach obviously is not ideal as future instance types will
need to be added as they are introduced to AWS. However, using the
table should only be an improvement over the status quo since right
now Nomad miscalculates available CPU resources on all instance types.
@shoenig shoenig added this to the 0.11.2 milestone Apr 29, 2020
@github-actions
Copy link

github-actions bot commented Nov 8, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants