-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP: 2170: Adding cel validations on TrainingRuntime/ClusterTrainingRuntime CRDs #2313
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
51408e5
to
0e9654d
Compare
Pull Request Test Coverage Report for Build 11560677304Details
💛 - Coveralls |
Fixes: #2219 |
0e9654d
to
1c323fb
Compare
Signed-off-by: Akshay Chitneni <achitneni@apple.com>
1c323fb
to
d023258
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @akshaychitneni !
I left a few comments.
@@ -173,6 +176,8 @@ type TorchMLPolicySource struct { | |||
// Supported values: `auto`, `cpu`, `gpu`, or int value. | |||
// TODO (andreyvelich): Add kubebuilder validation. | |||
// Defaults to `auto`. | |||
// +kubebuilder:default="auto" | |||
// +kubebuilder:validation:XValidation:rule="self in ['auto', 'cpu', 'gpu'] || type(self) == int", message="NumProcPerNode must be auto,cpu,gpu strings or int value" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about this message, similar to torch distributed message: https://github.com/pytorch/pytorch/blob/main/torch/distributed/run.py#L438 ?
// +kubebuilder:validation:XValidation:rule="self in ['auto', 'cpu', 'gpu'] || type(self) == int", message="NumProcPerNode must be auto,cpu,gpu strings or int value" | |
// +kubebuilder:validation:XValidation:rule="self in ['auto', 'cpu', 'gpu'] || type(self) == int", message="NumProcPerNode must be equal to auto, cpu, gpu, or int value" |
@@ -173,6 +176,8 @@ type TorchMLPolicySource struct { | |||
// Supported values: `auto`, `cpu`, `gpu`, or int value. | |||
// TODO (andreyvelich): Add kubebuilder validation. | |||
// Defaults to `auto`. | |||
// +kubebuilder:default="auto" | |||
// +kubebuilder:validation:XValidation:rule="self in ['auto', 'cpu', 'gpu'] || type(self) == int", message="NumProcPerNode must be auto,cpu,gpu strings or int value" | |||
NumProcPerNode *string `json:"numProcPerNode,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tenzen-y @akshaychitneni Should we use the intstr
for NumProcPerNode here ?
@@ -209,13 +214,15 @@ type MPIMLPolicySource struct { | |||
|
|||
// Implementation name for the MPI to create the appropriate hostfile. | |||
// Defaults to OpenMPI. | |||
// +kubebuilder:default="OpenMPI" | |||
MPIImplementation *MPIImplementation `json:"mpiImplementation,omitempty"` | |||
|
|||
// Directory where SSH keys are mounted. | |||
SSHAuthMountPath *string `json:"SSHAuthMountPath,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tenzen-y @alculquicondor Do we have the default directory for SSH.
I can see it here: https://github.com/kubeflow/mpi-operator/blob/master/pkg/apis/kubeflow/v2beta1/types.go#L190-L191
gomega.Expect(k8sClient.Create(ctx, created)).Should(gomega.Succeed()) | ||
gomega.Expect(created).Should(gomega.BeComparableTo(wantTrainingRuntime(), util.IgnoreObjectMetadata)) | ||
}, | ||
ginkgo.Entry("Should succeed to default TorchMLPolicySource.NumProcPerNode=auto", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can you add the test case for MPI default values as well ?
What this PR does / why we need it:
Add cel validations on runtime crds for v2 apis
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...
format, will close the issue(s) when PR gets merged):Fixes #
Checklist: