NFD will remove and re-add node labels if nfd-worker pod is deleted (and re-created by the nfd-worker DS) #1752

adrianchiris · 2024-07-01T15:25:04Z

What happened:

NFD will remove any node labels associated with NodeFeature of a specific node if nfd-worker pod of that node gets deleted.
after pod delete, it will get re-created, which will then recreate NodeFeature CR for the node and labels will be back (same goes for annotations, extendedResources).

workloads that rely on such labels in their nodeSelector/affinity will get disrupted as they will be removed and re scheduled.

This happens since nfd-worker is creating NodeFeature CR with OwnerReference pointing to itself[1]

[1]

node-feature-discovery/pkg/nfd-worker/nfd-worker.go

Line 716 in 0418e7d

// Create owner ref

What you expected to happen:

At the end id expect labels to not get removed if nfd-worker pod get restarted.
going further into the details, id expect NodeFeature CR is not deleted if pod is deleted.

This can be achieved by setting owner reference to nfd-worker daemonset which is not as ephemeral as the pod it creates.
In addition to deal with redeploying daemonset with different selectors/affinity/tolerations the gc component can be extended to clean up NodeFeature objects for nodes that are not intended to run nfd-worker pods.

How to reproduce it (as minimally and precisely as possible):

Deploy NFD v0.15.0 and newer (i used master) with NodeFeatureAPI enabled.
Delete one of NFD worker pods
see NodeFeature get deleted and re-created (kubectl get nodefeatures -A -w)
get node labels in a loop and see labels get deleted and re-created

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): 1.30 (but will reproduce in any)
Cloud provider or hardware configuration: local setup
OS (e.g: cat /etc/os-release): N/A
Kernel (e.g. uname -a): N/A
Install tools: N/A
Network plugin and version (if this is a network-related bug): N/A
Others: N/A

The text was updated successfully, but these errors were encountered:

ArangoGutierrez · 2024-07-02T08:56:23Z

I like the first of the idea This can be achieved by setting owner reference to nfd-worker daemonset which is not as ephemeral as the pod it creates.

but the second part gc component can be extended to clean up NodeFeature objects for nodes that are not intended to run nfd-worker pods How will GC know which nodes are tainted for the worker? A label?

adrianchiris · 2024-07-02T09:09:16Z

but the second part gc component can be extended to clean up NodeFeature objects for nodes that are not intended to run nfd-worker pods How will GC know which nodes are tainted for the worker? A label?

this bit is intended to handle update of nfd-worker ds selectors/affinity/tolerations where nfd-worker pods may get removed from some nodes in the cluster. this can be an additional improvement (separate PR ?) as im not sure how often this will happen.

what i was thinking re the GC flow for this case:

list nodeFeature CRs in current namespace
for each CR determine if the nfd-worker daemonset is expected to schedule nfd-worker pod on that node
- will require to GET the node and the nfd-worker DS and check selectors, affinity, tolerations against the node obj

other ideas are welcome :)

ArangoGutierrez · 2024-07-02T11:08:50Z

will require to GET the node and the nfd-worker DS and check selectors, affinity, tolerations against the node obj

I would prefer to go for something like finalizers initially and check if that's enough. Having an annotation that the GC can read, and if present, not remove the NF from the Node. Would this be enough?

adrianchiris · 2024-07-02T12:30:21Z

I would prefer to go for something like finalizers initially and check if that's enough. Having an annotation that the GC can read, and if present, not remove the NF from the Node. Would this be enough?

how would this work ? who adds/removes the finalizer ?
AFAIU finalizers prevent deletion, in our case we want to trigger deletion for "orphaned" NF.

ArangoGutierrez · 2024-07-02T13:55:33Z

yeah you are right, after thinking about it, your idea is the right approach.

ArangoGutierrez · 2024-07-02T13:57:09Z

I think we can split this issue into 2 action items:

Change ownerReference from POD to DAEMONSET
Add extra logic to GC to Check if NodeFeature is orphan before deletion.

ArangoGutierrez · 2024-07-03T14:19:16Z

First PR to address this issue:

Use worker DS OwnerReference for NF's #1755

marquiz · 2024-07-04T10:42:08Z

I was exploring this very issue in the v0.16 cycle but didn't come up with any good solution (lack of bandwidth). I was pondering three possible solutions, two of which have been mentioned here:

Have a grace period (in the nfd-master) after NF delete event, before updating the node
Use finalizers
Copy worker pod's owner refs to NF

From these 2) isn't viable (afaiu) as the finalizer will only delay the deletion of NF but you cannot undelete it when the new worker pod comes up. For 1) I did some prototyping but the code ended up hairy (at least in my hands) and this felt like a probable source of caveats and different problems. So maybe 3) would be the least problematic solution.

For the possible GC improvements (if we need/want it), could we exploit the owner refs for that, too. E.g. see if the owner pod of NF exists, if not, mark the NF as orphaned. On the next gc round, if the NF still is orphaned (and the owner pod uuid hasn't changed), delete the NF.

Thoughts?

adrianchiris · 2024-07-04T11:02:41Z

For the possible GC improvements (if we need/want it), could we exploit the owner refs for that, too. E.g. see if the owner pod of NF exists, if not, mark the NF as orphaned. On the next gc round, if the NF still is orphaned (and the owner pod uuid hasn't changed), delete the NF.

Hey @marquiz !
i didnt understand how this will work.

if we set NF ownerReference to nfd-worker DS then if the DS is deleted (e.g helm uninstall of NFD) then k8s will garbage collect NF objects which is what we want.

if user updates the daemonset (e.g changes pod template node affinity via helm update) then worker DS will be updated and pods will get re-created potentially running on different nodes. in this case the owner of NF is still the same DS.

is this not the case ?
if it is, then for this case i dont think we can use OwnerReference can we ?

marquiz · 2024-07-04T14:50:26Z

i didnt understand how this will work.

This would rely on having two owner references, both the DS and the Pod.

a) If the DS is deleted then both DS and Pod are gone -> NF will be GC'd by kubernetes

b) If only the Pod is gone but DS remains -> NF will not be GC'd by kubernetes. However, nfd-gc could detect this situation and GC the NF

Makes sense?

adrianchiris · 2024-07-04T15:08:49Z

Makes sense?

yes it does, thx.

ArangoGutierrez · 2024-07-08T11:03:37Z

kubernetes/enhancements#4753

adrianchiris added the kind/bug Categorizes issue or PR as related to a bug. label Jul 1, 2024

adrianchiris mentioned this issue Jul 1, 2024

chore: Update NFD version to v0.15.6 Mellanox/network-operator#981

Merged

age9990 mentioned this issue Jul 2, 2024

nvidia driver daemonset pod is recreated when ever there is a nfd restart NVIDIA/gpu-operator#782

Closed

ArangoGutierrez self-assigned this Jul 2, 2024

ArangoGutierrez mentioned this issue Jul 2, 2024

Expose ownerReferences via valueFrom and downward API kubernetes/kubernetes#116662

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NFD will remove and re-add node labels if nfd-worker pod is deleted (and re-created by the nfd-worker DS) #1752

NFD will remove and re-add node labels if nfd-worker pod is deleted (and re-created by the nfd-worker DS) #1752

adrianchiris commented Jul 1, 2024

ArangoGutierrez commented Jul 2, 2024

adrianchiris commented Jul 2, 2024 •

edited

Loading

ArangoGutierrez commented Jul 2, 2024

adrianchiris commented Jul 2, 2024

ArangoGutierrez commented Jul 2, 2024

ArangoGutierrez commented Jul 2, 2024

ArangoGutierrez commented Jul 3, 2024

marquiz commented Jul 4, 2024

adrianchiris commented Jul 4, 2024

marquiz commented Jul 4, 2024

adrianchiris commented Jul 4, 2024

ArangoGutierrez commented Jul 8, 2024

NFD will remove and re-add node labels if nfd-worker pod is deleted (and re-created by the nfd-worker DS) #1752

NFD will remove and re-add node labels if nfd-worker pod is deleted (and re-created by the nfd-worker DS) #1752

Comments

adrianchiris commented Jul 1, 2024

ArangoGutierrez commented Jul 2, 2024

adrianchiris commented Jul 2, 2024 • edited Loading

ArangoGutierrez commented Jul 2, 2024

adrianchiris commented Jul 2, 2024

ArangoGutierrez commented Jul 2, 2024

ArangoGutierrez commented Jul 2, 2024

ArangoGutierrez commented Jul 3, 2024

marquiz commented Jul 4, 2024

adrianchiris commented Jul 4, 2024

marquiz commented Jul 4, 2024

adrianchiris commented Jul 4, 2024

ArangoGutierrez commented Jul 8, 2024

adrianchiris commented Jul 2, 2024 •

edited

Loading