Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add e2e test for ttl seconds after finished in jobset #511

Merged

Conversation

dejanzele
Copy link
Contributor

CHANGELOG

  • add e2e test for ttl seconds after finished in jobset

This PR is an improvement for #279

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 13, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @dejanzele. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 13, 2024
Copy link

netlify bot commented Apr 13, 2024

Deploy Preview for kubernetes-sigs-jobset canceled.

Name Link
🔨 Latest commit 4988703
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-jobset/deploys/661e7b2ad66a7800084b94ee

@dejanzele
Copy link
Contributor Author

/cc @ahg-g

@k8s-ci-robot k8s-ci-robot requested a review from ahg-g April 13, 2024 22:57
@kannon92
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 14, 2024

// Create JobSet.
ginkgo.By("creating jobset with ttl seconds after finished")
js := pingTestJobSetSubdomain(ns).TTLSecondsAfterFinished(5).Obj()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not have the TTL test also be doing an unrelated networking test, this can cause issues/flakiness unrelated to the TTL feature being tested here. We should isolate the feature being tested to reduce flakiness and make debugging test failures easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid flakiness, we need to create the jobset with a finalizer to ensure that the jobset doesn't get deleted before we check that it exists and is complete; and then before we do the JobSetDeleted check, we remove the finalizer.

Copy link
Contributor

@danielvegamyhre danielvegamyhre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, left one comment.

gomega.Expect(k8sClient.Create(ctx, js)).Should(gomega.Succeed())

// We'll need to retry getting this newly created jobset, given that creation may not immediately happen.
gomega.Eventually(k8sClient.Get(ctx, types.NamespacedName{Name: js.Name, Namespace: js.Namespace}, &jobset.JobSet{}), timeout, interval).Should(gomega.Succeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if create succeeded above, we shouldn't need to use eventually here

Comment on lines 123 to 124
ginkgo.By("checking all jobs were created successfully")
gomega.Eventually(util.NumJobs, timeout, interval).WithArguments(ctx, k8sClient, js).Should(gomega.Equal(util.NumExpectedJobs(js)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove this check, we may face a race where the jobset is completed and marked as deleted before we get here.


// Create JobSet.
ginkgo.By("creating jobset with ttl seconds after finished")
js := pingTestJobSetSubdomain(ns).TTLSecondsAfterFinished(5).Obj()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid flakiness, we need to create the jobset with a finalizer to ensure that the jobset doesn't get deleted before we check that it exists and is complete; and then before we do the JobSetDeleted check, we remove the finalizer.

@dejanzele dejanzele force-pushed the feat/jobset-ttl-e2e-test branch 2 times, most recently from 1122e1d to ea522c4 Compare April 16, 2024 00:10
@dejanzele
Copy link
Contributor Author

@danielvegamyhre @ahg-g can you take another look please?

Comment on lines 204 to 210
ginkgo.By("removing jobset finalizers")
for i, f := range js.Finalizers {
if f == finalizer {
js.Finalizers = append(js.Finalizers[:i], js.Finalizers[i+1:]...)
}
}
gomega.Expect(k8sClient.Update(ctx, js)).Should(gomega.Succeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap this in Eventually to handle potential update conflicts


// Check jobset is cleaned up after ttl seconds.
ginkgo.By("checking jobset is cleaned up after ttl seconds")
util.JobSetDeleted(ctx, k8sClient, &fresh, timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use js instead of fresh here


// We get the latest version of the jobset before removing the finalizer
var fresh jobset.JobSet
gomega.Expect(k8sClient.Get(ctx, types.NamespacedName{Name: js.Name, Namespace: js.Namespace}, &fresh)).Should(gomega.Succeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this get to inside removeJobSetFinalizer within Eventually (see comment there)

@dejanzele dejanzele force-pushed the feat/jobset-ttl-e2e-test branch from ea522c4 to 4988703 Compare April 16, 2024 13:20
@dejanzele
Copy link
Contributor Author

@ahg-g thanks for the review, I have addressed your comments

@ahg-g
Copy link
Contributor

ahg-g commented Apr 16, 2024

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 16, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, dejanzele

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 16, 2024
@ahg-g
Copy link
Contributor

ahg-g commented Apr 16, 2024

Thanks! one thing I usually do is to force a bug in the test to check if it fails, just to verify that it is working as expected. I wonder if you did that?

@k8s-ci-robot k8s-ci-robot merged commit 8342871 into kubernetes-sigs:main Apr 16, 2024
12 checks passed
@dejanzele
Copy link
Contributor Author

dejanzele commented Apr 16, 2024

@ahg-g what kind of bug? i.e. the test will fail if you don't get latest version of the jobset before updating as it will fail due to conflict, or it will fail if you delete the jobset manually in the middle of the test run, I did those two things which forced a failure

@danielvegamyhre danielvegamyhre mentioned this pull request Aug 19, 2024
20 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants