Available CPU MHz Varying Wildly for Same Instance Type #7681

herter4171 · 2020-04-09T20:46:57Z

Nomad version

Nomad v0.10.4 (f750636)

Operating system and Environment details

Amazon Linux 2 with a fixed head node and an auto-scaling group of c5.24xlarge instances, with scaling driven by Nomad state using a custom cloud metric.

Issue

The number of MHz available on a node varies wildly. For the exact same instance type (96 cores, 3 GHz stock, 3.9 GHz max), I'm seeing as low as 1.6E5 MHz all the way up to 3.4E5 MHz. Just now, I've launched 3 c5.24xlarge nodes, and their max MHz are

305184
280128
340800

I'd rather not hard-wire cpu_total_compute in the client config, and everything else I've read claims Nomad sets the MHz based on core count multiplied by rated clock speed rather than current.

Having MHz vary like this causes jobs to not be placed, even when the node actually has the capacity. Would a short-term fix be forcing all but one core to 100%, launching the Nomad client, and taking the load off of CPU? The docs I've read claim Nomad uses stock clock speed, so I'm kind of at a loss here.

Reproduction steps

Launch a few instances of the same type with the Nomad client running on boot (I'm using systemctl). Rated MHz for each client in the web UI should vary appreciably.

The text was updated successfully, but these errors were encountered:

jrasell · 2020-04-10T08:29:57Z

Hi @herter4171 and thanks for the detail in this issue. In order to help diagnose this problem would you be able to provide the output of the following two commands from a couple of the instances where you are seeing this behaviour?

cat /proc/cpuinfo |grep 'cpu MHz'
cat /proc/cpuinfo |grep 'cpu cores'

shoenig · 2020-04-10T17:18:44Z

Seems like parsing cpu MHz out of /proc/cpuinfo is only going to get us current clockspeed, which could vary widely given power states, etc. Has nomad always determined clock speed this way? We should be getting the rated speed, instead. e.g.

$ lscpu | grep MHz
CPU MHz:                         3899.997
CPU max MHz:                     4700.0000
CPU min MHz:                     400.0000

herter4171 · 2020-04-10T17:52:06Z

Hi @jrasell, thank you for the response! Output for those grep commands are a bit lengthy due to there being 96 cores. Here is some truncated output.

For the first instance,

$ cat /proc/cpuinfo | grep 'cpu MHz' | head -n 1
cpu MHz         : 1843.994
$ cat /proc/cpuinfo |grep 'cpu cores' | head -n 1
cpu cores       : 24

For the second instance,

$ cat /proc/cpuinfo | grep 'cpu MHz' | head -n 1
cpu MHz         : 1677.167
$ cat /proc/cpuinfo |grep 'cpu cores' | head -n 1
cpu cores       : 24

For the third instance,

$ cat /proc/cpuinfo | grep 'cpu MHz' | head -n 1
cpu MHz         : 1506.577
$ cat /proc/cpuinfo |grep 'cpu cores' | head -n 1
cpu cores       : 24

Givne the nproc output, I'm guessing the "cpu cores" output of 24 implies there are four physical processors.

$ nproc
96

dvusboy · 2020-04-10T19:20:21Z

Nomad uses gopsutil.cpu.InfoStat to get the CPU MHz, and by default, it uses /sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq on Linux, to determine the maximum frequency of the CPU see. But it will fall back on value from /proc/cpuinfo if that failed. You should check that sysfs path on your VMs, @herter4171.

herter4171 · 2020-04-10T20:31:57Z

Hi @dvusboy, the lay of the land is that I'm using Amazon Linux 2 pretty much out of the box. That platform has /sys/devices/system/cpu, and from there it's cpu0 and so on. The subdirectories for cpu* don't have a cpufreq directory, so I'm not sure how to proceed. Is there something I can do to populate that? This seems like a pretty major detail for supporting Nomad on Amazon Linux 2, and I'd like to avoid switching distros.

dvusboy · 2020-04-10T20:47:53Z

@herter4171 By cpuN, I meant, substituting N with some non-negative integer. Since cpufreq is not there, I'd say you don't have access to the actual maximum frequency, and gopsutil defaults to MHz out of cpuinfo, which is the current frequency. It would explain what you're seeing.

herter4171 · 2020-04-10T21:00:36Z

@dvusboy, I latently picked up on that and edited my last comment accordingly. Can I do something to make Amazon Linux 2 play ball for Nomad, or can something be done on the Nomad side of things to fix this? One idea I have is spawning yes > /dev/null & for all but one core before launching the Nomad client to make Nomad recognize actual MHz, but I'd really appreciate some support for the given platform. I can't be the only guy running Nomad on Amazon Linux 2, after all.

dvusboy · 2020-04-10T22:33:56Z

I suppose you can use cpu_total_compute in the client configuration to override the fingerprinted values.

herter4171 · 2020-04-11T00:50:05Z

@dvusboy, I'm aware of that option, and I don't think it addresses the core issue. Nomad should be capable enough to set available MHz.

herter4171 · 2020-04-13T16:13:35Z

Hi @dvusboy and company, after rooting around a bit, I can see the difficulty in getting rated clock speed on Amazon Linux 2 without assumed access to sudo. In case it helps on your end, what I've put in place for initializing a Nomad client is as follows.

# Get max rated core speed
CORE_MAX_MHZ=$(sudo dmidecode processor-frequency \
    | grep '^\s*Max Speed' \
    | head -n 1 \
    | awk '{print $3}')

# Multiply by number of cores to get total MHz
TOTAL_MHZ=$((CORE_MAX_MHZ*`nproc`))

I'd still like to see this functionality become native instead of depending on my hacky Bash, but I'm equipped to move on if there's not interest in pursuing this. Thanks for the help so far.

herter4171 · 2020-04-13T19:18:16Z

Hey @shoenig, I'm having a bit of additional difficulty in spite of my fix. Even though I've set the client stanza like I described and verified the updated value is reflected in Nomad, jobs still fail to be placed due to this other hidden limit shown in my screenshot. I'm a bit confused, because 262144 MHz / 96 cores = 2.73 GHz/core, and that's above the rated speed of 2.5 GHz and well below the max of 3.5 GHz.

I'd hope to be able to move on with things, but this is still holding things back, I'm afraid.

shoenig · 2020-04-14T15:54:57Z

I'm thinking this is actually a problem on all EC2 instances, not just Linxu2. On an Ubuntu micro:

ubuntu@ip-172-31-82-121:~$ cpupower frequency-info
analyzing CPU 0:
  no or unknown cpufreq driver is active on this CPU
  CPUs which run at the same hardware frequency: Not Available
  CPUs which need to have their frequency coordinated by software: Not Available
  maximum transition latency:  Cannot determine or is not supported.
Not Available
  available cpufreq governors: Not Available
  Unable to determine current policy
  current CPU frequency: Unable to call hardware
  current CPU frequency:  Unable to call to kernel
  boost state support:
    Supported: no
    Active: no

ubuntu@ip-172-31-82-121:~$ # there is no cpufreq/cpuinfo_max_freq
ubuntu@ip-172-31-82-121:~$ ls /sys/devices/system/cpu/cpu0
cache  crash_notes  crash_notes_size  driver  firmware_node  hotplug  node0  power  subsystem  topology  uevent
ubuntu@ip-172-31-82-121:~$ ls /sys/devices/system/cpu/cpufreq  # empty

If there's any good news, the CPU cgroup management seems unaffected

Allocated Resources
CPU           Memory          Disk
250/2400 MHz  32 MiB/983 MiB  300 MiB/6.6 GiB

Allocation Resource Utilization
CPU         Memory
0/2400 MHz  388 KiB/983 MiB

Host Resource Utilization
CPU            Memory           Disk
2400/2400 MHz  146 MiB/983 MiB  1.4 GiB/8.0 GiB  # loaded deliberately

[ec2-user@ip-172-31-94-218 proc]$ cat /proc/cgroups
#subsys_name	hierarchy	num_cgroups	enabled
cpuset	11	3	1
cpu	9	3	1
cpuacct	9	3	1
blkio	10	3	1
memory	6	3	1
devices	5	25	1
freezer	4	3	1
net_cls	2	3	1
perf_event	8	3	1
net_prio	2	3	1
hugetlb	7	3	1
pids	3	3	1

I'm going to keep researching and asking around, but I suspect this may boil down to parsing the rated CPU speed out of the CPU model name string. Hacky as that may be, it should be more accurate than parsing cpu MHz, which is tantamount to using a random number.

herter4171 · 2020-04-14T16:59:56Z

Hey @shoenig, thanks for the digging. One thing about using model name I've noticed is that certain instance types, like "memory optimized," use AMD chips that don't have the rated frequency in the name like Intel procs tends to. Also, I think the driver error I'm seeing in the pic from my last comment is related to this issue, since it's requiring a value for MHz between rated and max. I'd be happy to open a separate thread for that if it's going to muddy waters here, though.

shoenig · 2020-04-21T21:31:43Z

Another possibility might be to modify gopsutil to briefly load a single CPU thread and take measurements of the current speed, the maximum of which would be presumed to be the max CPU speed.

I put together a quick demo to check if this works, before submitting the idea upstream

$ for i in {1..10}; do ./loadcpu && sleep 3 && echo ""; done
read current speed: 800.04
loaded max speed:   3900.70

read current speed: 1924.65
loaded max speed:   3901.08

read current speed: 1495.16
loaded max speed:   3900.33

read current speed: 2826.81
loaded max speed:   3900.00

read current speed: 3400.18
loaded max speed:   3902.43

read current speed: 1979.91
loaded max speed:   3900.95

read current speed: 2627.13
loaded max speed:   3900.19

read current speed: 889.96
loaded max speed:   3901.62

read current speed: 3391.65
loaded max speed:   3902.97

read current speed: 906.17
loaded max speed:   3900.63

Fixes #7681 The current behavior of the CPU fingerprinter in AWS is that it reads the **current** speed from `/proc/cpuinfo` (`CPU MHz` field). This is because the max CPU frequency is not available by reading anything on the EC2 instance itself. Normally on Linux one would look at e.g. `sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq` or perhaps parse the values from the `CPU max MHz` field in `/proc/cpuinfo`, but those values are not available. Furthermore, no metadata about the CPU is made available in the EC2 metadata service. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-categories.html Since go-psutil cannot determine the max CPU speed, it defaults to the current CPU speed, which could be basically any number between 0 and the true max. This is particularly bad on large, powerful reserved instances which often idle at ~800 MHz while Nomad does its fingerprinting (typically IO bound), which Nomad then uses as the max, which results in severe loss of available resources. Since the CPU specification is unavailable programmatically (at least not without sudo), use a best-effort lookup table. This table was generated by going through every instance type in AWS documentation and copy-pasting the numbers. https://aws.amazon.com/ec2/instance-types/ This approach obviously is not ideal, as future instance types will need to be added as they are introduced to AWS. However, using the table should only be an improvement over the status quo, since right now Nomad miscalculates available CPU resources on all instance types.

Fixes #7681 The current behavior of the CPU fingerprinter in AWS is that it reads the **current** speed from `/proc/cpuinfo` (`CPU MHz` field). This is because the max CPU frequency is not available by reading anything on the EC2 instance itself. Normally on Linux one would look at e.g. `sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq` or perhaps parse the values from the `CPU max MHz` field in `/proc/cpuinfo`, but those values are not available. Furthermore, no metadata about the CPU is made available in the EC2 metadata service. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-categories.html Since `go-psutil` cannot determine the max CPU speed it defaults to the current CPU speed, which could be basically any number between 0 and the true max. This is particularly bad on large, powerful reserved instances which often idle at ~800 MHz while Nomad does its fingerprinting (typically IO bound), which Nomad then uses as the max, which results in severe loss of available resources. Since the CPU specification is unavailable programmatically (at least not without sudo) use a best-effort lookup table. This table was generated by going through every instance type in AWS documentation and copy-pasting the numbers. https://aws.amazon.com/ec2/instance-types/ This approach obviously is not ideal as future instance types will need to be added as they are introduced to AWS. However, using the table should only be an improvement over the status quo since right now Nomad miscalculates available CPU resources on all instance types.

github-actions · 2022-11-08T02:31:53Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

jrasell added type/bug theme/fingerprint stage/waiting-reply labels Apr 10, 2020

stale bot removed stage/waiting-reply labels Apr 10, 2020

jrasell added the stage/waiting-reply label Apr 10, 2020

stale bot removed the stage/waiting-reply label Apr 10, 2020

shoenig self-assigned this Apr 13, 2020

herter4171 mentioned this issue Apr 15, 2020

Docker Driver Fails With Upper Limit of 262144 CPU Shares #7731

Open

shoenig mentioned this issue Apr 29, 2020

env_aws: use best-effort lookup table for CPU performance in EC2 #7828

Merged

shoenig closed this as completed in #7828 Apr 29, 2020

shoenig added this to the 0.11.2 milestone Apr 29, 2020

github-actions bot locked as resolved and limited conversation to collaborators Nov 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Available CPU MHz Varying Wildly for Same Instance Type #7681

Available CPU MHz Varying Wildly for Same Instance Type #7681

herter4171 commented Apr 9, 2020

jrasell commented Apr 10, 2020

shoenig commented Apr 10, 2020

herter4171 commented Apr 10, 2020

dvusboy commented Apr 10, 2020

herter4171 commented Apr 10, 2020 •

edited

Loading

dvusboy commented Apr 10, 2020

herter4171 commented Apr 10, 2020

dvusboy commented Apr 10, 2020

herter4171 commented Apr 11, 2020

herter4171 commented Apr 13, 2020

herter4171 commented Apr 13, 2020

shoenig commented Apr 14, 2020

herter4171 commented Apr 14, 2020

shoenig commented Apr 21, 2020

github-actions bot commented Nov 8, 2022

Available CPU MHz Varying Wildly for Same Instance Type #7681

Available CPU MHz Varying Wildly for Same Instance Type #7681

Comments

herter4171 commented Apr 9, 2020

Nomad version

Operating system and Environment details

Issue

Reproduction steps

jrasell commented Apr 10, 2020

shoenig commented Apr 10, 2020

herter4171 commented Apr 10, 2020

dvusboy commented Apr 10, 2020

herter4171 commented Apr 10, 2020 • edited Loading

dvusboy commented Apr 10, 2020

herter4171 commented Apr 10, 2020

dvusboy commented Apr 10, 2020

herter4171 commented Apr 11, 2020

herter4171 commented Apr 13, 2020

herter4171 commented Apr 13, 2020

shoenig commented Apr 14, 2020

herter4171 commented Apr 14, 2020

shoenig commented Apr 21, 2020

github-actions bot commented Nov 8, 2022

herter4171 commented Apr 10, 2020 •

edited

Loading