Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if workerGroupSpec is not empty before accessing it #549

Conversation

ChristianZaccaria
Copy link
Contributor

Issue link

Jira: https://issues.redhat.com/browse/RHOAIENG-6614

What changes have been made

  • The failing Ray Tests were caused by an index out of range error in the webhook logic. The workerGroupSpec was attempted to be accessed when it doesn't exist.
  • Check is now made to verify it's not empty before accessing it.

Verification steps

  1. Have all components deployed including this version of the CFO and Kueue.
  2. Deploy KubeRay-Operator in the same namespace as referenced in the DSCI default-dsci
  3. In the KubeRay repo, run the previously failing test:
    go test -timeout 30m -v ./test/e2e/rayjob_lightweight_test.go ./test/e2e/support.go
  4. Test should pass.

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

@sutaakar
Copy link
Contributor

sutaakar commented May 9, 2024

@ChristianZaccaria Can you please provide a unit test verifying the changes?

@ChristianZaccaria
Copy link
Contributor Author

@sutaakar I added unit tests covering these changes. Please let me know if there are other test cases I should be covering here.

Also, I am thinking of adding a RayCluster template in the test support file and perhaps append items to it in the tests to reduce the amount of code in the test file. Is this a good approach or what is the best practices on this? Thank you! :)

@sutaakar
Copy link
Contributor

@ChristianZaccaria Creating one valid RayCluster CR and then modify it is a possible approach, may increase test readability.

Comment on lines 369 to 574
warnings, err := rcWebhook.ValidateCreate(test.Ctx(), runtime.Object(validRayCluster))
t.Run("Expected no warnings or errors on call to ValidateCreate function", func(t *testing.T) {
g := gomega.NewWithT(t)
g.Expect(warnings).To(gomega.BeNil())
g.Expect(err).To(gomega.BeNil())
})
Copy link
Contributor Author

@ChristianZaccaria ChristianZaccaria May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sutaakar I only then realised that this basically covers all test cases as it will fail if the elements to be validated are not in the defined RayCluster in the test. Perhaps the tests after this one are not needed?

Also, If I do the same for the Default function, would all the other tests/assertions not be required too?

Open to thoughts on this before continuing, thanks!

Copy link
Contributor Author

@ChristianZaccaria ChristianZaccaria May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking perhaps the assertions made for the Default (mutating) function is required to compare outputs.

For the validating functions such as ValidateCreate, as long as no errors are returned (meaning all elements exist in the defined RayCluster), the validation should succeed. - Making any other assertion tests not necessary. Correct me if I'm wrong

Copy link
Contributor

@sutaakar sutaakar May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ValidateCreate and ValidateUpdate it is enough to have one positive test, providing negative tests to validate incorrect modification and its detection.

Copy link
Contributor

@sutaakar sutaakar May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or possibly to have several tests, providing valid RayClusters, i.e. more elements in workerGroupSpecs and such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully, latest changes in this PR reflects the above. - One positive test case, providing negative tests for manipulated fields. Thanks!

@ChristianZaccaria ChristianZaccaria force-pushed the webhook-index-range branch 2 times, most recently from b80e2bd to ecafab2 Compare May 10, 2024 17:35
Comment on lines 96 to 110
t.Run("Expected no errors on call to Default function", func(t *testing.T) {
g := gomega.NewWithT(t)
g.Expect(err).To(gomega.BeNil())
})

t.Run("Expected required OAuth proxy container for the head group", func(t *testing.T) {
g := gomega.NewWithT(t)
g.Expect(func() bool {
for _, container := range validRayCluster.Spec.HeadGroupSpec.Template.Spec.Containers {
if container.Name == oauthProxyContainerName {
return true
}
}
return false
}()).To(gomega.BeTrue(), "Expected the OAuth proxy container to be present in the list of head group containers")
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This boilerplate isn't needed, you can leverage test directly as it embeds gomega:

	test.Expect(err).NotTo(HaveOccurred(), "Expected no errors on call to Default function")
	test.Expect(validRayCluster.Spec.HeadGroupSpec.Template.Spec.Containers).
		To(ContainElement(WithTransform(ContainerName, Equal(oauthProxyContainerName))), "Expected the OAuth proxy container to be present in the list of head group containers")

With helper functions in the bottom (to be relocated into codeflare-common):

func ContainerName(container corev1.Container) string {
	return container.Name
}

Copy link
Contributor

@sutaakar sutaakar May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this approach avoids BeTrue() checks for element with specific value, which often don't provide meaningful context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test can also check number of elements, making the assertion more structured:

	test.Expect(validRayCluster.Spec.HeadGroupSpec.Template.Spec.Containers).
		To(
			And(
				HaveLen(1),
				ContainElement(WithTransform(ContainerName, Equal(oauthProxyContainerName))),
			),
			"Expected the OAuth proxy container to be present in the list of head group containers",
		)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was very enlightening, thanks Karel for sharing such great expertise!

@ChristianZaccaria
Copy link
Contributor Author

@sutaakar latest changes include one positive test for each "main" function, and several negative tests to cover the full functionality of the raycluster_webhook.go file.

invalidRayCluster := validRayCluster.DeepCopy()

t.Run("Negative: Expected errors on call to ValidateUpdate function due to EnableIngress set to True", func(t *testing.T) {
invalidRayCluster.Spec.HeadGroupSpec.EnableIngress = &trueBool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, there is a dedicate funtion to create pointer for values:

Suggested change
invalidRayCluster.Spec.HeadGroupSpec.EnableIngress = &trueBool
invalidRayCluster.Spec.HeadGroupSpec.EnableIngress = support.Ptr(true)

@sutaakar
Copy link
Contributor

@ChristianZaccaria Can you also adjust test-unit Makefile target to run the new unit test (currently it doesn't run anything). That way the unit test will be executed for PRs.

Otherwise this PR looks great, good job.

@ChristianZaccaria
Copy link
Contributor Author

@sutaakar I made the last few changes, tests are now running as part of this PR check in the Unit tests workflow. Thank you!

@ChristianZaccaria
Copy link
Contributor Author

I made a PR to move the core functions used here to the codeflare-common repository. Should we merge that and then make the changes here or what is the appropriate approach? Thanks @sutaakar

@sutaakar
Copy link
Contributor

/lgtm

I made a project-codeflare/codeflare-common#48 used here to the codeflare-common repository. Should we merge that and then make the changes here or what is the appropriate approach?

That is up to you, can be merged now and refactored later or it can wait.

@sutaakar
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm label May 28, 2024
@sutaakar
Copy link
Contributor

/approve

Copy link

openshift-ci bot commented May 29, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sutaakar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 4ba15ae into project-codeflare:main May 29, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants