Nomad v0.6.0-dev arm32v7 "network: no networks available" #3005

minusdelta · 2017-08-10T17:44:10Z

Nomad version

Nomad v0.6.0-dev c075349 from here #2963 (comment)

Operating system and Environment details

Ubuntu 16.04.3 LTS
test setup, 1 server (amd64), 1 client (arm32v7)

Issue

nomad plan http-test.job

Scheduler dry-run:
- WARNING: Failed to place all allocations.
  Task Group "http" (failed to place 1 allocation):
    * Resources exhausted on 1 nodes
    * Dimension "network: no networks available" exhausted on 1 nodes

verify

curl -s 127.0.0.1:4646/v1/node/0d7d067d-2abf-44a9-87b4-f4461ae79061 |jq -rM .Resources
{
  "CPU": 5472,
  "MemoryMB": 1985,
  "DiskMB": 27064,
  "IOPS": 0,
  "Networks": []
}

but on client

ip -o l sh eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP ...

ip -o -4 a sh eth0
2: eth0    inet 169.254.155.20/16 brd 169.254.255.255 scope global eth0 ...

ethtool eth0 |tail -9
Speed: 1000Mb/s
Duplex: Full
Port: MII
PHYAD: 0
Transceiver: external
Auto-negotiation: on
Current message level: 0x00000000 (0)
Link detected: yes

cat /sys/class/net/eth0/speed
1000

even setting it explicitly doesnt help

/etc/nomad.d/client.hcl

client {
  enabled       = true
  servers       = [ "nomad.service.consul:4647" ]
  network_interface = "eth0"
  network_speed = 100
}

The text was updated successfully, but these errors were encountered:

dadgar · 2017-08-10T17:55:24Z

Can you set your clients log level to DEBUG and start it up and provide the logs?

minusdelta · 2017-08-11T09:05:29Z

@dadgar , sorry forgot those, but nothing important there (to me). Here is a replay:

     Loaded configuration from /etc/nomad.d/client.hcl
 ==> Starting Nomad agent...
 ==> Nomad agent configuration:
                 Client: true
              Log Level: DEBUG
                 Region: global (DC: dc1)
                 Server: false
                Version: 0.6.0dev
 ==> Nomad agent started! Log data will stream in below:
     2017/08/11 10:51:40.015429 [INFO] client: using state directory /var/lib/nomad/client
     2017/08/11 10:51:40.016443 [INFO] client: using alloc directory /var/lib/nomad/alloc
     2017/08/11 10:51:40.036089 [DEBUG] client: built-in fingerprints: [arch cgroup consul cpu host memory network nomad signal storage vault env_aws env_gc
     2017/08/11 10:51:40.037582 [INFO] fingerprint.cgroups: cgroups are available
     2017/08/11 10:51:40.038201 [DEBUG] client: fingerprinting cgroup every 15s
     2017/08/11 10:51:40.068199 [INFO] fingerprint.consul: consul agent is available
     2017/08/11 10:51:40.068566 [DEBUG] client: fingerprinting consul every 15s
     2017/08/11 10:51:40.072231 [DEBUG] fingerprint.cpu: frequency: 1368 MHz
     2017/08/11 10:51:40.072341 [DEBUG] fingerprint.cpu: core count: 4
     2017/08/11 10:51:40.105728 [DEBUG] fingerprint.network: setting link speed to user configured speed: 100
     2017/08/11 10:51:40.120143 [DEBUG] client: fingerprinting vault every 15s
     2017/08/11 10:51:42.120788 [DEBUG] fingerprint.env_aws: Error querying AWS Metadata URL, skipping
     2017/08/11 10:51:43.266089 [DEBUG] fingerprint.env_gce: Could not read value for attribute "machine-type"
     2017/08/11 10:51:43.266186 [DEBUG] fingerprint.env_gce: Error querying GCE Metadata URL, skipping
     2017/08/11 10:51:43.266323 [DEBUG] client: applied fingerprints [arch cgroup consul cpu host memory network nomad signal storage]
     2017/08/11 10:51:43.267257 [DEBUG] driver.docker: using client connection initialized from environment
     2017/08/11 10:51:43.267510 [DEBUG] client: fingerprinting rkt every 15s
     2017/08/11 10:51:43.286995 [DEBUG] driver.exec: exec driver is enabled
     2017/08/11 10:51:43.287218 [DEBUG] client: available drivers [docker exec]
     2017/08/11 10:51:43.287248 [DEBUG] client: fingerprinting docker every 15s
     2017/08/11 10:51:43.287328 [DEBUG] client: fingerprinting exec every 15s
     2017/08/11 10:51:43.307753 [INFO] client: Node ID "ce94e4d8-c62f-7eba-bf4a-293edca206cc"
     2017/08/11 10:51:43.315813 [DEBUG] client: updated allocations at index 4361 (total 0) (pulled 0) (filtered 0)
     2017/08/11 10:51:43.316628 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 0)
     2017/08/11 10:51:43.345182 [INFO] client: node registration complete
     2017/08/11 10:51:43.345605 [DEBUG] client: periodically checking for node changes at duration 5s
     2017/08/11 10:51:43.568818 [DEBUG] consul.sync: registered 1 services, 1 checks; deregistered 0 services, 0 checks
     2017/08/11 10:51:50.293970 [DEBUG] http: Request /v1/agent/servers (4.350197ms)
     2017/08/11 10:51:51.706383 [DEBUG] client: state updated to ready
     2017/08/11 10:52:00.304277 [DEBUG] http: Request /v1/agent/servers (2.173911ms)
     2017/08/11 10:52:10.316009 [DEBUG] http: Request /v1/agent/servers (3.750804ms)

... last 2 lines every 10s.

minusdelta · 2017-08-11T10:05:46Z

Ok, a short follow up [after accepting the fact that Nomad on arm wants to be babysitted]
... going back to v0.5.6 on the client, job starts up immediately.

jstoja · 2017-08-16T10:52:41Z

Hello guys,

I'm having the same issue on amd64, the hosts have 4 ethernet interfaces in bonding mode. After the upgrade from 0.5.6 to 0.6.0, all the allocs stayed, but if we try to plan new ones, we have the same message:

- WARNING: Failed to place all allocations.
  Task Group "http" (failed to place 1 allocation):
    * Resources exhausted on 5 nodes
    * Dimension "network: no networks available" exhausted on 5 nodes

The logs in debug show the following lines:

[...]
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:15.448166 [INFO] fingerprint.cgroups: cgroups are available
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:15.448281 [DEBUG] client: fingerprinting cgroup every 15s
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:15.449814 [INFO] fingerprint.consul: consul agent is available
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:15.449957 [DEBUG] client: fingerprinting consul every 15s
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:15.451236 [DEBUG] fingerprint.cpu: frequency: 2397 MHz
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:15.451243 [DEBUG] fingerprint.cpu: core count: 16
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:15.521172 [DEBUG] fingerprint.network: link speed for eth0 set to 1000
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:15.524414 [DEBUG] client: fingerprinting vault every 15s
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:17.524523 [DEBUG] fingerprint.env_aws: Error querying AWS Metadata URL, skipping
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:19.524732 [DEBUG] fingerprint.env_gce: Could not read value for attribute "machine-type"
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:19.524745 [DEBUG] fingerprint.env_gce: Error querying GCE Metadata URL, skipping
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:19.524761 [DEBUG] client: applied fingerprints [arch cgroup consul cpu host memory network nomad signal storage]
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:19.566736 [DEBUG] driver.docker: using client connection initialized from environment
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:19.566836 [DEBUG] client: fingerprinting rkt every 15s
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:19.569636 [DEBUG] driver.exec: exec driver is enabled
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:19.569658 [WARN] driver.raw_exec: raw exec is enabled. Only enable if needed
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:19.569668 [DEBUG] client: available drivers [qemu docker exec raw_exec]
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:19.569724 [DEBUG] client: fingerprinting docker every 15s
Aug 16 10:51:25 lux4 nomad[4004]: 2017/08/16 10:51:19.569769 [DEBUG] client: fingerprinting exec every 15s
[...]

Wasn't having the issue with the 0.5.6.
Edit: I'm downgrading to 0.5.6 and the jobs are being scheduled.

jstoja · 2017-08-18T06:52:27Z

@minusdelta What was the network interfaces configuration on your side?

minusdelta · 2017-08-18T08:04:16Z

@jstoja Absolutely nothing fancy like an 4x aggregation here ... (abbr. vers):

docker0   Link encap:Ethernet  HWaddr 02:42: ..
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0

eth0      Link encap:Ethernet  HWaddr 02:81: ..
          inet addr:169.254.155.20  Bcast:169.254.255.255  Mask:255.255.0.0

eth0.1816 Link encap:Ethernet  HWaddr 02:81: ..

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0

eth0.1816 is a macvlan iface, therefore no ip.

Only special things here are:

eth0 with an ipv4ll ip
first octet of macaddr means "locally administered"

I observed exactly the same issue with a CoreOS test instance (vm) with only one (v)nic and no macvlan.

@dadgar I think the "platform-arm" label doesnt fit anymore here, unfortunately.

nealmchugh · 2017-08-19T06:19:45Z

I ran into this problem on Ubuntu 16.04.3 on amd64. I solved this in Nomad 0.6.0 by remembering I use LXC as well and had a bridge interface. Once I switched from the equivalent of eth0 to br0 in the nomad conf file, all was well.

dadgar · 2017-08-21T22:08:40Z

Any one have a vagrant image or terraform that brings something up to reproduce this? Do not have any raspberry pis lying around.

minusdelta · 2017-08-22T10:18:51Z

@dadgar IMHO most "nomad-on-arm" users (#1693 #2291) != rpi (depending on the model that could also mean armv6) but are from here: https://www.scaleway.com/baremetal-cloud-servers : armv7 = "C1".
Perhaps this could be a possibility for reproducing also (64 bit/armv8 offer there too).

dadgar · 2017-08-22T17:46:54Z

@minusdelta So you are using "C1" and experiencing this?

angrycub · 2017-08-23T04:22:49Z

It would seem that this commit might have something to do with it if it's always a link local address.

ad00ec8

This PR changes the fingerprint handling of network interfaces that only contain link local addresses. The new behavior is to prefer globally routable addresses and if none are detected, to fall back to link local addresses if the operator hasn't disallowed it. This gives us pre 0.6 behavior for interfaces with only link local addresses but 0.6+ behavior for IPv6 interfaces that will always have a link-local address. Fixes #3005 /cc diptanuc

github-actions · 2022-12-06T02:16:52Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

dadgar added type/bug theme/client theme/fingerprint theme/platform-arm labels Aug 10, 2017

schmichael removed the theme/platform-arm label Aug 18, 2017

angrycub mentioned this issue Aug 23, 2017

Added debug logging for ignored link-local ip addresses #3085

Closed

dadgar mentioned this issue Aug 23, 2017

Handle interfaces that only have link-local addrs #3089

Merged

dadgar closed this as completed in #3089 Aug 24, 2017

schmichael added theme/platform-arm and removed theme/platform-arm labels Oct 23, 2017

github-actions bot locked as resolved and limited conversation to collaborators Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nomad v0.6.0-dev arm32v7 "network: no networks available" #3005

Nomad v0.6.0-dev arm32v7 "network: no networks available" #3005

minusdelta commented Aug 10, 2017

dadgar commented Aug 10, 2017

minusdelta commented Aug 11, 2017

minusdelta commented Aug 11, 2017

jstoja commented Aug 16, 2017 •

edited

Loading

jstoja commented Aug 18, 2017

minusdelta commented Aug 18, 2017

nealmchugh commented Aug 19, 2017

dadgar commented Aug 21, 2017

minusdelta commented Aug 22, 2017

dadgar commented Aug 22, 2017

angrycub commented Aug 23, 2017

github-actions bot commented Dec 6, 2022

Nomad v0.6.0-dev arm32v7 "network: no networks available" #3005

Nomad v0.6.0-dev arm32v7 "network: no networks available" #3005

Comments

minusdelta commented Aug 10, 2017

Nomad version

Operating system and Environment details

Issue

dadgar commented Aug 10, 2017

minusdelta commented Aug 11, 2017

minusdelta commented Aug 11, 2017

jstoja commented Aug 16, 2017 • edited Loading

jstoja commented Aug 18, 2017

minusdelta commented Aug 18, 2017

nealmchugh commented Aug 19, 2017

dadgar commented Aug 21, 2017

minusdelta commented Aug 22, 2017

dadgar commented Aug 22, 2017

angrycub commented Aug 23, 2017

github-actions bot commented Dec 6, 2022

jstoja commented Aug 16, 2017 •

edited

Loading