Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VPA] check OwnerRef against TargetRef to confirm VPA/Pod association #6460

Merged

Conversation

dbenque
Copy link
Contributor

@dbenque dbenque commented Jan 19, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

The function GetControllingVPAForPod tries to match pods with VPA. The problem is that it does it based on label only.

If some pods have label that match the labelSelector that was compute for the VPA the program will consider that the pod is controlled by the VPA. This is not always tru.

For example an orphan pod (no controller) can match the labelSelector but it should not be associated with any VPA! We had
that particular case in our infra. Another theoric case is that 2 deployments having same set of labels (or overlapping set) could result in bad assignment pod<=>vpa.

To fix these kind of issue, in this PR, we propose to validate that the function GetControllingVPAForPod validates that the pod matches in terms of labels but also in terms of ownerRef<=>targetRef.

The first commit is just code move to expose the ControllerFetecher to other packages.
The second commit adds the missing bits to have the ownerRef<=>targetRef validation.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 19, 2024
@k8s-ci-robot k8s-ci-robot added area/vertical-pod-autoscaler size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 19, 2024
@kwiesmueller
Copy link
Member

/assign @sophieliu15

@dbenque
Copy link
Contributor Author

dbenque commented Jan 22, 2024

@mwielgus
As discussed during the SIG Meeting here is a reference to some kubernetes code showing that an orphan pod (not having ownerRef) is not considered as part of the deployment even if the labelSelector matches:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/deployment/deployment_controller.go#L553-L581

This function also shows that if the pod matches in terms of label, it is not considered as part as the deployment if the ownerReference chain (Pod=>ReplicaSet=>Deployment) is not correct.

@@ -37,7 +40,22 @@ func parseLabelSelector(selector string) labels.Selector {
}

func TestGetMatchingVpa(t *testing.T) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a testing case where VPA does not select a pod when its selector matches but the target ref does not match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test was missing, I have added a test where selector match but not the ownerRef, here it is:
a5cae3b

@@ -52,7 +70,7 @@ func TestGetMatchingVpa(t *testing.T) {
name: "matching selector",
pod: podBuilder.Get(),
vpas: []*vpa_types.VerticalPodAutoscaler{
vpaBuilder.WithUpdateMode(vpa_types.UpdateModeAuto).WithName("auto-vpa").Get(),
vpaBuilder.WithUpdateMode(vpa_types.UpdateModeAuto).WithName("auto-vpa").WithTargetRef(targetRef).Get(),
Copy link

@sophieliu15 sophieliu15 Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if I am wrong. I thought the target ref of vpa is usually "Deployment" kind. E.g., https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler#example-vpa-configuration
Are we assigning ReplicationController kind here?

Once the code is submitted, does it still work for vpas using Deployment as target reference?

Copy link
Contributor Author

@dbenque dbenque Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is using a controllerfetcher.FakeControllerFetcher to retrieve the owner reference. The implementation of this mock returns the same reference that it receives (here). Not reflecting the reality of the ownerRef chain is not so important because here we are not testing the code of the embedded controllerfetcher.

So for this test, the only important thing is that the reference set in the pod is the same that the one set in the VPA. We are left with 2 options if we stick to deployments, none reflecting really the reality:
Option 1: vpa.targetRef=> Deployment and pod.ownerRef=> Deployment (which is not possible in reality)
Option 2: pod.ownerRef=> ReplicaSet and pod.ownerRef=> Replicaset (which is not happening in reality)

In fact there is a 3rd option that we can use in the test and the would be more correct compare to reality: we use Statefulset. Here is a commit that does that change:
3c47994

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, the functional code does traverse ownerRefs (pod -> replica-set -> deployment)?
If this test is really not meant to test the owner chain resolution (and if we really have another test making sure that still happens correctly) I'm okay with a surreal test case given it's explained in a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbenque dbenque force-pushed the david.benque/validate-owner-ref branch from 58d117c to a5cae3b Compare January 29, 2024 13:55
@sophieliu15
Copy link

/lgtm

cc: @kwiesmueller for final code review and approval.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 29, 2024
Copy link
Member

@kwiesmueller kwiesmueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me.
I'd prefer that we clarify with API review if this change is okay. I think we'll need to cut this as a behavior change.

@@ -52,7 +70,7 @@ func TestGetMatchingVpa(t *testing.T) {
name: "matching selector",
pod: podBuilder.Get(),
vpas: []*vpa_types.VerticalPodAutoscaler{
vpaBuilder.WithUpdateMode(vpa_types.UpdateModeAuto).WithName("auto-vpa").Get(),
vpaBuilder.WithUpdateMode(vpa_types.UpdateModeAuto).WithName("auto-vpa").WithTargetRef(targetRef).Get(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, the functional code does traverse ownerRefs (pod -> replica-set -> deployment)?
If this test is really not meant to test the owner chain resolution (and if we really have another test making sure that still happens correctly) I'm okay with a surreal test case given it's explained in a comment.

vertical-pod-autoscaler/pkg/utils/vpa/api.go Show resolved Hide resolved
@kwiesmueller
Copy link
Member

@sophieliu15 can you look into the API question? Feel free to loop me in if help is needed.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 31, 2024
@dbenque
Copy link
Contributor Author

dbenque commented Jan 31, 2024

@kwiesmueller , following your comment, I am adding a comment into the test:
4f9f840

@sophieliu15 sophieliu15 removed their assignment Jan 31, 2024
@sophieliu15
Copy link

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 31, 2024
Copy link

@sophieliu15 sophieliu15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good to me but I don't have approval rights. Please contact other reviewers on the list for review and approval. Thank you!

Copy link
Member

@kwiesmueller kwiesmueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dbenque, kwiesmueller

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 1, 2024
@k8s-ci-robot k8s-ci-robot merged commit a2f4cac into kubernetes:master Feb 1, 2024
6 checks passed
@dbenque
Copy link
Contributor Author

dbenque commented May 28, 2024

Complete with fix:
#6724

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants