Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fingerprint: should be aware of available cointainernetworking plugin #14022

Closed
shoenig opened this issue Aug 4, 2022 · 2 comments · Fixed by #15473
Closed

fingerprint: should be aware of available cointainernetworking plugin #14022

shoenig opened this issue Aug 4, 2022 · 2 comments · Fixed by #15473
Milestone

Comments

@shoenig
Copy link
Member

shoenig commented Aug 4, 2022

To use bridge networking mode, one must first install containernetworking plugins into /opt/cni/bin (or where configured).

Nomad doesn't fingerprint the availability of these plugins, and you find out about their absence only when a task is set to run and crashloop on looking for them. Tracking down the problem ends up taking a number of steps:

Allocs dying:

Allocations
ID        Node ID   Task Group  Version  Desired  Status  Created    Modified
e28426a9  1d5216fa  fake        0        run      failed  1m57s ago  1m52s ago
ff79d1d6  1d5216fa  fake        0        stop     failed  3m ago     1m57s ago
b1da3290  1d5216fa  fake        0        stop     failed  3m34s ago  3m ago

No alloc logs:

➜ nomad alloc logs e2
Error reading file: Unexpected response code: 404 (task "faketask" not started yet. No logs available)

Client logs:

    2022-08-04T14:17:37.815-0500 [WARN]  client.alloc_runner.runner_hook: failed to configure network: alloc_id=87fe5f48-74a4-5afe-0308-5436d0fe961f err="plugin type=\"loopback\" failed (add): failed to find plugin \"loopback\" in path [/opt/cni/bin]" attempt=1

At this point, if you're lucky, you remember to lookup the instructions under Consul Service Mesh (?) to setup the plugins to make bridge networking work

https://www.nomadproject.io/docs/integrations/consul-connect#cni-plugins

@shoenig
Copy link
Member Author

shoenig commented Dec 5, 2022

noting to self, these are the plugins that get invoked for bridge networking; for which we should be able to add implicit job constraints

execve("/opt/cni/bin/bridge",
execve("/opt/cni/bin/firewall",
execve("/opt/cni/bin/host-local",
execve("/opt/cni/bin/loopback",
execve("/opt/cni/bin/portmap",

via:

sudo strace -fe trace=clone,fork,execve /opt/bin/nomad agent -dev

and some grep

@shoenig
Copy link
Member Author

shoenig commented Jan 9, 2023

#15473 adds the CNI plugin fingerprinting. Until that change propogates to supported versions of Nomad we need to hold off on merging #15473 which adds implicit constraints - otherwise the upgrade path will cause breakage in the way of tasks that can no longer be scheduled.

Basically I think if we wait until Nomad 1.6 we should be reasonably safe to make this change.

@shoenig shoenig removed their assignment Jan 9, 2023
@tgross tgross modified the milestones: 1.6.0, 1.7.0 Jun 23, 2023
@shoenig shoenig modified the milestones: 1.7.0, 1.7.x Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants