Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu_total_compute not working #2638

Closed
xtracredit opened this issue May 12, 2017 · 12 comments
Closed

cpu_total_compute not working #2638

xtracredit opened this issue May 12, 2017 · 12 comments

Comments

@xtracredit
Copy link

If you have a question, prepend your issue with [question] or preferably use the nomad mailing list.

If filing a bug please include the following:

Nomad version

Output from nomad version
Nomad v0.5.6

Operating system and Environment details

SUSE 12
s390x

Issue

The client stanza variable cpu_total_compute is not working. There is no MHZ variable in /proc/cpuinfo. So the MHZ defaults to zero. I wish to set that value using this cput_total_compute but it does not change the value. I have also tried in a VM running in an x86 VM using VirtualBox without any luck.

Reproduction steps

Create a client configuration with the value set.
client {
enabled = true
servers = ["x.x.x.x"]
cpu_total_compute = 4200
}
Run nomad.
nomad node-status --verbose <node_id>
cpu.totalcompute = 0

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

Job file (if appropriate)

@dadgar
Copy link
Contributor

dadgar commented May 12, 2017

Can you give the client logs when it just starts up at debug level?

@dadgar
Copy link
Contributor

dadgar commented May 12, 2017

We currently only use the cpu_total_compute when there is an error detecting the cpu. So I have a feeling, that no error is being returned and thus the cpu_total_compute is being skipped!

@xtracredit
Copy link
Author

xtracredit commented May 12, 2017

Here are the debug logs.

2017-05-12T13:19:16.748780-04:00 lxxp1 systemd[1]: Started Nomad.
2017-05-12T13:19:16.752532-04:00 lxxp1 nomad[37987]: Loaded configuration from /etc/nomad/client.hcl
2017-05-12T13:19:16.752654-04:00 lxxp1 nomad[37987]: ==> Starting Nomad agent...
2017-05-12T13:19:20.859435-04:00 lxxp1 nomad[37987]: ==> Nomad agent configuration:
2017-05-12T13:19:20.859639-04:00 lxxp1 nomad[37987]: Atlas: <disabled>
2017-05-12T13:19:20.859729-04:00 lxxp1 nomad[37987]: Client: true
2017-05-12T13:19:20.859812-04:00 lxxp1 nomad[37987]: Log Level: DEBUG
2017-05-12T13:19:20.859894-04:00 lxxp1 nomad[37987]: Region: us-east (DC: dc1)
2017-05-12T13:19:20.859979-04:00 lxxp1 nomad[37987]: Server: false
2017-05-12T13:19:20.860061-04:00 lxxp1 nomad[37987]: Version: 0.5.6
2017-05-12T13:19:20.860142-04:00 lxxp1 nomad[37987]: ==> Nomad agent started! Log data will stream in below:
2017-05-12T13:19:20.860223-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.750634 [INFO] client: using state directory /var/lib/nomad/client
2017-05-12T13:19:20.860306-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.750659 [INFO] client: using alloc directory /var/lib/nomad/alloc
2017-05-12T13:19:20.860390-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.752150 [DEBUG] client: built-in fingerprints: [arch cgroup consul cpu host memory network nomad signal storage vault env_aws env_gce]
2017-05-12T13:19:20.860471-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.752239 [INFO] fingerprint.cgroups: cgroups are available
2017-05-12T13:19:20.860556-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.752396 [DEBUG] client: fingerprinting cgroup every 15s
2017-05-12T13:19:20.860643-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.753301 [INFO] fingerprint.consul: consul agent is available
2017-05-12T13:19:20.860725-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.753373 [DEBUG] fingerprint.cpu: frequency: 0 MHz
2017-05-12T13:19:20.860806-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.753376 [DEBUG] fingerprint.cpu: core count: 1
2017-05-12T13:19:20.860887-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.753489 [DEBUG] client: fingerprinting consul every 15s
2017-05-12T13:19:20.860969-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.756224 [DEBUG] fingerprint.network: Detected interface eth0 with IP 10.10.11.67 during fingerprinting
2017-05-12T13:19:20.861051-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.756804 [DEBUG] fingerprint.network: link speed for eth0 set to 10
2017-05-12T13:19:20.861132-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:16.757553 [DEBUG] client: fingerprinting vault every 15s
2017-05-12T13:19:20.861213-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:18.757545 [DEBUG] fingerprint.env_gce: Could not read value for attribute "machine-type"
2017-05-12T13:19:20.861299-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:18.757552 [DEBUG] fingerprint.env_gce: Error querying GCE Metadata URL, skipping
2017-05-12T13:19:20.861381-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.757684 [DEBUG] fingerprint.env_aws: Error querying AWS Metadata URL, skipping
2017-05-12T13:19:20.861462-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.757702 [DEBUG] client: applied fingerprints [arch cgroup consul cpu host memory network nomad signal storage]
2017-05-12T13:19:20.861544-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.857162 [DEBUG] driver.docker: using client connection initialized from environment
2017-05-12T13:19:20.861633-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.857391 [DEBUG] client: fingerprinting rkt every 15s
2017-05-12T13:19:20.861720-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.857980 [DEBUG] driver.exec: exec driver is enabled
2017-05-12T13:19:20.861802-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.857986 [DEBUG] client: available drivers [java docker exec]
2017-05-12T13:19:20.861910-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.858034 [INFO] client: Node ID "90c465a2-c888-342d-a551-b6021c71ee77"
2017-05-12T13:19:20.861995-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.858261 [DEBUG] client: fingerprinting docker every 15s
2017-05-12T13:19:20.862077-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.858274 [DEBUG] client: fingerprinting exec every 15s
2017-05-12T13:19:20.862159-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.859343 [DEBUG] client: updated allocations at index 2760 (total 0) (pulled 0) (filtered 0)
2017-05-12T13:19:20.862252-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.859379 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 0)
2017-05-12T13:19:20.863293-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.862943 [INFO] client: node registration complete
2017-05-12T13:19:20.863399-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:20.862965 [DEBUG] client: periodically checking for node changes at duration 5s
2017-05-12T13:19:20.867789-04:00 lxxp1 consul[4642]: 2017/05/12 13:19:20 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
2017-05-12T13:19:20.871818-04:00 lxxp1 consul[4642]: 2017/05/12 13:19:20 [INFO] agent: Synced check '289cc7e1737904489a34a4705d50e2dea3a55881'
2017-05-12T13:19:26.153037-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:26.152924 [DEBUG] client: state updated to ready
2017-05-12T13:19:27.910406-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:27.910039 [DEBUG] http: Request /v1/agent/servers (307.222µs)
2017-05-12T13:19:27.915296-04:00 lxxp1 consul[4642]: 2017/05/12 13:19:27 [INFO] agent: Synced check '289cc7e1737904489a34a4705d50e2dea3a55881'
2017-05-12T13:19:37.911399-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:37.911152 [DEBUG] http: Request /v1/agent/servers (51.724µs)
2017-05-12T13:19:47.912582-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:47.912343 [DEBUG] http: Request /v1/agent/servers (550.765µs)
2017-05-12T13:19:57.913724-04:00 lxxp1 nomad[37987]: 2017/05/12 13:19:57.913417 [DEBUG] http: Request /v1/agent/servers (314.099µs)
2017-05-12T13:20:07.914753-04:00 lxxp1 nomad[37987]: 2017/05/12 13:20:07.914481 [DEBUG] http: Request /v1/agent/servers (57.11µs)
2017-05-12T13:20:17.915731-04:00 lxxp1 nomad[37987]: 2017/05/12 13:20:17.915462 [DEBUG] http: Request /v1/agent/servers (269.01µs)
2017-05-12T13:20:27.916676-04:00 lxxp1 nomad[37987]: 2017/05/12 13:20:27.916414 [DEBUG] http: Request /v1/agent/servers (55.357µs)

@dadgar
Copy link
Contributor

dadgar commented May 12, 2017

Hmm yeah! Looks like my prediction was right. Will get this fixed for 0.6!

@dadgar dadgar added this to the v0.6.0 milestone May 12, 2017
@samisil
Copy link

samisil commented Jun 22, 2017

Is it possible to permit the override even when it's possible to detect the cpu?

@dadgar
Copy link
Contributor

dadgar commented Jun 22, 2017

@samisil Not currently but the fix to this issue would be exactly that. Hopefully will be part of 0.6.0

@samisil
Copy link

samisil commented Jun 22, 2017 via email

@balupton
Copy link

balupton commented Apr 29, 2018

just ran into this too, perhaps dmidecode -t 4 can be used for arm64 centos machines (scaleway ARM64-2GB servers)

uname -a output is:

Linux par1_slave_0 4.14.33-mainline-rev1 #1 SMP Sun Apr 8 12:40:59 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux

/usr/local/bin/nomad agent -config=/etc/systemd/system/nomad.d output is:

==> Starting Nomad agent...
==> Error starting agent: client setup failed: fingerprinting failed: cannot detect cpu total compute. CPU compute must be set manually using the client config option "cpu_total_compute"

cat /proc/cpuinfo output is:

processor	: 0
BogoMIPS	: 200.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
CPU implementer	: 0x43
CPU architecture: 8
CPU variant	: 0x1
CPU part	: 0x0a1
CPU revision	: 1

processor	: 1
BogoMIPS	: 200.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
CPU implementer	: 0x43
CPU architecture: 8
CPU variant	: 0x1
CPU part	: 0x0a1
CPU revision	: 1

processor	: 2
BogoMIPS	: 200.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
CPU implementer	: 0x43
CPU architecture: 8
CPU variant	: 0x1
CPU part	: 0x0a1
CPU revision	: 1

processor	: 3
BogoMIPS	: 200.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
CPU implementer	: 0x43
CPU architecture: 8
CPU variant	: 0x1
CPU part	: 0x0a1
CPU revision	: 1

dmidecode -t 4 output is:

# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.

Handle 0x0400, DMI type 4, 42 bytes
Processor Information
	Socket Designation: CPU 0
	Type: Central Processor
	Family: Other
	Manufacturer: QEMU
	ID: 00 00 00 00 00 00 00 00
	Version: 1.0
	Voltage: Unknown
	External Clock: Unknown
	Max Speed: 2000 MHz
	Current Speed: 2000 MHz
	Status: Populated, Enabled
	Upgrade: Other
	L1 Cache Handle: Not Provided
	L2 Cache Handle: Not Provided
	L3 Cache Handle: Not Provided
	Serial Number: Not Specified
	Asset Tag: Not Specified
	Part Number: Not Specified
	Core Count: 1
	Core Enabled: 1
	Thread Count: 1
	Characteristics: None

Handle 0x0401, DMI type 4, 42 bytes
Processor Information
	Socket Designation: CPU 1
	Type: Central Processor
	Family: Other
	Manufacturer: QEMU
	ID: 00 00 00 00 00 00 00 00
	Version: 1.0
	Voltage: Unknown
	External Clock: Unknown
	Max Speed: 2000 MHz
	Current Speed: 2000 MHz
	Status: Populated, Enabled
	Upgrade: Other
	L1 Cache Handle: Not Provided
	L2 Cache Handle: Not Provided
	L3 Cache Handle: Not Provided
	Serial Number: Not Specified
	Asset Tag: Not Specified
	Part Number: Not Specified
	Core Count: 1
	Core Enabled: 1
	Thread Count: 1
	Characteristics: None

Handle 0x0402, DMI type 4, 42 bytes
Processor Information
	Socket Designation: CPU 2
	Type: Central Processor
	Family: Other
	Manufacturer: QEMU
	ID: 00 00 00 00 00 00 00 00
	Version: 1.0
	Voltage: Unknown
	External Clock: Unknown
	Max Speed: 2000 MHz
	Current Speed: 2000 MHz
	Status: Populated, Enabled
	Upgrade: Other
	L1 Cache Handle: Not Provided
	L2 Cache Handle: Not Provided
	L3 Cache Handle: Not Provided
	Serial Number: Not Specified
	Asset Tag: Not Specified
	Part Number: Not Specified
	Core Count: 1
	Core Enabled: 1
	Thread Count: 1
	Characteristics: None

Handle 0x0403, DMI type 4, 42 bytes
Processor Information
	Socket Designation: CPU 3
	Type: Central Processor
	Family: Other
	Manufacturer: QEMU
	ID: 00 00 00 00 00 00 00 00
	Version: 1.0
	Voltage: Unknown
	External Clock: Unknown
	Max Speed: 2000 MHz
	Current Speed: 2000 MHz
	Status: Populated, Enabled
	Upgrade: Other
	L1 Cache Handle: Not Provided
	L2 Cache Handle: Not Provided
	L3 Cache Handle: Not Provided
	Serial Number: Not Specified
	Asset Tag: Not Specified
	Part Number: Not Specified
	Core Count: 1
	Core Enabled: 1
	Thread Count: 1
	Characteristics: None

So for me the magic becomes:

cpu_total_compute="$(dmidecode -t 4 | grep 'Current Speed' | sed 's/.*: //' | sed 's/ .*//' | awk '{s+=$1} END {print s}')"
cat > ../data/local/conf/nomad_slave.json <<EOF
{
	"client": {
		"cpu_total_compute": ${cpu_total_compute},
		"enabled": true
	}
}
EOF

@schmichael
Copy link
Member

Thanks for the tip @balupton! I put it in a new issue - #4233 - so we don't lose track of it!

@Legogris
Copy link

Legogris commented Feb 7, 2020

dmidecode -t 4 | grep 'Current Speed' | sed 's/.*: //' | sed 's/ .*//'

@balupton This will only give you the total compute per core, so if you have more than a single core it will be underreported. This should do the trick (note this will not fly if you have more than one CPU with different number of frequency/cores)

cpu_freq="$(dmidecode -t 4 | grep 'Current Speed' | sed 's/.*: //' | sed 's/ .*//' | awk '{s+=$1} END {print s}')"
cpu_count=$(sudo dmidecode -t 4 | grep 'Core Enabled' | sed 's/.*: //' | sed 's/ .*//' | awk '{s+=$1} END {print s}')
cpu_total_compute=$(( $cpu_count * $freq))

kennethkalmer added a commit to kennethkalmer/ansible-nomad that referenced this issue Jan 3, 2021
kennethkalmer added a commit to kennethkalmer/ansible-nomad that referenced this issue Jan 3, 2021
kennethkalmer added a commit to kennethkalmer/ansible-nomad that referenced this issue Jan 3, 2021
@roylez
Copy link

roylez commented Jun 5, 2021

A more compact version

dmidecode -t 4 |awk '/Current Speed/ {s=$3} /Core Enabled/ {c=$3} END {print s*c}'

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants