Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reloading agent does detect CNI plugin changes (due to FingerprintManager.Reload() not being invoked) #14614

Closed
pruiz opened this issue Sep 17, 2022 · 5 comments

Comments

@pruiz
Copy link
Contributor

pruiz commented Sep 17, 2022

Nomad version

Nomad v1.3.5 (1359c25)

Operating system and Environment details

Linux/amd64

Issue

While trying to integrate cilium with nomad (see: #12120), I've found nomad agent's is not checking changes on CNI during SIGHUP, which means it cannot detect changes applied to CNI's configuration files. However, by inspecting the code, all the needed infrastructure is in place, it just looks like the call to FingerprintManager.Reload() is missing from SIGHUP handling code.

Reproduction steps

start nomad
make changes to /etc/cni/net.d/*
kill -HUP $NOMAD_PID

Expected Result

Reloading of CNI config changes, including errors on logfile in case CNI config can't be processed. At any point, at least the message "reloading fingerprinter" from ./client/fingerprint_manager.go:114 should appear on log file.

Actual Result

CNI changes are ignored, and nothing (related to fingerprint manager, nor CNI) appears on logfile.

Job file (if appropriate)

N/A

@lgfa29
Copy link
Contributor

lgfa29 commented Sep 19, 2022

Hi @pruiz 👋

Thanks for the report. Client config reload is kind of a tricky area because there are all sorts of dangerous race conditions that can happen during this process, so for now there is a very specific list of configuration values that are reloaded:
https://www.nomadproject.io/docs/configuration#configuration-reload

That being said, I was investigation the history of this part of the code and noticed that it was added in this PR: #7518, but fingerprintManager.Reload() is never called, as you've noticed. But in addition to that, the CNIFingerprint.Reload is a no-op, so I was curious if you were able to successfully reload the CNI config using your PR changes?

@lgfa29 lgfa29 self-assigned this Sep 19, 2022
@lgfa29 lgfa29 added this to Needs Triage in Nomad - Community Issues Triage via automation Sep 19, 2022
@lgfa29 lgfa29 moved this from Needs Triage to Triaging in Nomad - Community Issues Triage Sep 19, 2022
@pruiz
Copy link
Contributor Author

pruiz commented Sep 20, 2022

Hi @lgfa29,

From what I see, Reload() on cni code is a noop, because the way it works is the fingerprint manager, after calling Reload, re-does the actual fingerprinting by calling 'fingerprint' on each (reloadable) fingerprint implementation again, see this code from fingerprint manager:

image

As I understood, 'Reload' exists so each 'reloadable fingerprint' implementation can free o release resources before being "reloaded", but CNI fp does not need to release anything, so Reload is a noop in this case, as everything happens when calling fingerprint method again. Or at last that I came to when digging the code.

Regards
Pablo

@lgfa29
Copy link
Contributor

lgfa29 commented Sep 22, 2022

Ah right, I missed that detail that the fingreprinter will run again thanks!

So it does seem like we may have forgotten to invoke the FingerprintManager.Reloadmethod during the initial implementation 😞

I will try to get some tests for this, probably in our end-to-end test suite. But thanks for the report and the PR!

@pruiz
Copy link
Contributor Author

pruiz commented Sep 25, 2022

@lgfa29 sure, np. Glad to help ;)

@tgross tgross assigned tgross and unassigned lgfa29 Oct 3, 2022
@tgross tgross added this to the 1.4.x milestone Oct 3, 2022
@tgross tgross moved this from Triaging to In Progress in Nomad - Community Issues Triage Oct 4, 2022
@tgross tgross closed this as completed in 536260c Oct 6, 2022
Nomad - Community Issues Triage automation moved this from In Progress to Done Oct 6, 2022
hc-github-team-nomad-core added a commit that referenced this issue Oct 6, 2022
…SIGHUP (Fixes #14614) into release/1.4.x (#14836)

This pull request was automerged via backport-assistant
@github-actions
Copy link

github-actions bot commented Feb 4, 2023

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Development

No branches or pull requests

3 participants