Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiKueue skip garbage collection for disconnected clients. #2369

Merged
merged 1 commit into from
Jun 6, 2024

Conversation

trasc
Copy link
Contributor

@trasc trasc commented Jun 6, 2024

What type of PR is this?

/kind bug
/kind failing-test
/kind flake

What this PR does / why we need it:

MultiKueue skip garbage collection for disconnected clients, which otherwise can cause a panic if the GC runes between the time a remoteClient is created and it's initial connection to the worker cluster.

Which issue(s) this PR fixes:

Fixes #2365

Special notes for your reviewer:

Does this PR introduce a user-facing change?

MultiKueue: Skip garbage collection for disconnected clients which could occasionally result in panic.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 6, 2024
Copy link

netlify bot commented Jun 6, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 5117780
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/66618883134ba80008a36d96

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jun 6, 2024
@trasc
Copy link
Contributor Author

trasc commented Jun 6, 2024

/test pull-kueue-test-integration-main

@trasc
Copy link
Contributor Author

trasc commented Jun 6, 2024

/test pull-kueue-test-integration-main

3 similar comments
@trasc
Copy link
Contributor Author

trasc commented Jun 6, 2024

/test pull-kueue-test-integration-main

@trasc
Copy link
Contributor Author

trasc commented Jun 6, 2024

/test pull-kueue-test-integration-main

@trasc
Copy link
Contributor Author

trasc commented Jun 6, 2024

/test pull-kueue-test-integration-main

@trasc trasc force-pushed the multikueue-int-test-flaky branch from ba5dbee to cbad495 Compare June 6, 2024 08:41
@trasc
Copy link
Contributor Author

trasc commented Jun 6, 2024

/test pull-kueue-test-integration-main

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 6, 2024
@trasc
Copy link
Contributor Author

trasc commented Jun 6, 2024

/test pull-kueue-test-integration-main

@trasc trasc force-pushed the multikueue-int-test-flaky branch from cbad495 to 1e4ef0c Compare June 6, 2024 09:30
@trasc trasc changed the title MultiKueue flaky integration test MultiKueue skip garbage collection for disconnected clients. Jun 6, 2024
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 6, 2024
@trasc trasc marked this pull request as ready for review June 6, 2024 09:43
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 6, 2024
@trasc trasc force-pushed the multikueue-int-test-flaky branch from 1e4ef0c to 5117780 Compare June 6, 2024 09:59
@trasc
Copy link
Contributor Author

trasc commented Jun 6, 2024

/cc @mimowo

@k8s-ci-robot k8s-ci-robot requested a review from mimowo June 6, 2024 09:59
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, one question.

Also, it seems that this issue can affect users. If so, please add a relevant release note and we could cherry-pick.

@@ -98,6 +98,7 @@ func newRemoteClient(localClient client.Client, wlUpdateCh, watchEndedCh chan<-
localClient: localClient,
origin: origin,
}
rc.connecting.Store(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed for the skip itself or it fixes another scenario?

Copy link
Contributor Author

@trasc trasc Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed, it's one of the two key parts of the fix, the flaky panic wold come up when the GC ran during the initial connect attempt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jun 6, 2024
@trasc
Copy link
Contributor Author

trasc commented Jun 6, 2024

LGTM overall, one question.

Also, it seems that this issue can affect users. If so, please add a relevant release note and we could cherry-pick.

done

@mimowo
Copy link
Contributor

mimowo commented Jun 6, 2024

/assign
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, trasc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 6, 2024
@mimowo
Copy link
Contributor

mimowo commented Jun 6, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 6, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: c99d3e4db8be53d05cc64479cb301b4e3475b471

@mimowo
Copy link
Contributor

mimowo commented Jun 6, 2024

/release-note-edit

[MultiKueue] Skip garbage collection for disconnected clients which could occasionally result in panic.

To make it more relatable to users.

@k8s-ci-robot k8s-ci-robot merged commit 704af2f into kubernetes-sigs:main Jun 6, 2024
16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.8 milestone Jun 6, 2024
@mimowo
Copy link
Contributor

mimowo commented Jun 6, 2024

/cherry-pick release-0.7

@k8s-infra-cherrypick-robot

@mimowo: new pull request created: #2370

In response to this:

/cherry-pick release-0.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@trasc trasc deleted the multikueue-int-test-flaky branch June 6, 2024 11:30
Fiona-Waters pushed a commit to Fiona-Waters/kueue that referenced this pull request Jun 25, 2024
@alculquicondor
Copy link
Contributor

/release-note-edit

MultiKueue: Skip garbage collection for disconnected clients which could occasionally result in panic.

@alculquicondor
Copy link
Contributor

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MultiKueue] integration tests fail occasionally
5 participants