-
Notifications
You must be signed in to change notification settings - Fork 700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V1: Introduce the CustomResourceValidationExpressions feature (CEL validation) #1708
Comments
/assign |
Through my investigation, I found that we can not replace all validations with CEL validation due to exceeding cost budget. For example, we can replace the below validation training-operator/pkg/apis/kubeflow.org/v1/pytorch_validation.go Lines 69 to 72 in 4dd0d09
with the below CEL validation: // ReplicaSpec is a description of the replica
type ReplicaSpec struct {
...
// Template is the object that describes the pod that
// will be created for this replica. RestartPolicy in PodTemplateSpec
// will be overide by RestartPolicy in ReplicaSpec
// +kubebuilder:validation:XValidation:rule="has(self.spec.containers) && self.spec.containers.all(c, has(c.image) && size(c.image) > 0)",message=""
Template v1.PodTemplateSpec `json:"template,omitempty"`
...
So, I think that we need to introduce the webhook validation instead of replacing the current validation with CEL validation. @kubeflow/wg-training-leads WDYT? |
I remember, we had a discussion regarding webhooks. The arguments were deployment easiness(without web hooks) vs clean validation(with webhooks) |
@johnugeorge Are they concerned about certs for the webhook? |
Yes. That was the major raod block from what I remember |
@johnugeorge I think we can remove webhook installation barriers once we introduce cert generation logic similar to katib. WDYT? Actually, I have experience implementing internal cert generation logic in 2 components (kubeflow/katib, kubernetes-sigs/kueue). |
Maybe, we generalize the katib cert generator, and then just import the cert-generator to the training-operator. cc: @andreyvelich |
@tenzen-y Shall we move this to the next release? |
Which does that mean training-operator v1.7 or v1.8? |
We are cutting first 1.7 RC in few days. I am afraid, we cannot complete testing within time if we want to aim this feature in 1.7. What are you thoughts on moving to 1.8? |
It makes sense. We can work on this feature for v1.8. Actually, I don't have enough bandwidth for this feature. |
Also, we can create another issue for the webhook since this issue aims CEL validation. |
Thank you for creating this @tenzen-y. |
Anyway, I will create an issue to discuss webhook since CEL validation isn't enough due to exceeding the cost budge. |
/remove-area 1.7.0 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/remove-lifecycle stale |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/remove-lifecycle stale |
Moved to 1.9 release |
I agreed with this decision in the offline meeting with @johnugeorge. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/remove-lifecycle stale |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/lifecycle frozen |
/kind feature
Since k8s v1.25, the CustomResourceValidationExpressions feature to validate CRDs using Common Expression Language (CEL) without webhook servers is enabled by default.
Since this feature helps to find CRD validation errors for users, we need to introduce that feature once we stop supporting K8s v1.24.
For example, if the container name of replicaSpec is invalid, for now, we only output that error to controller logs; once we introduce CEL validation, we can return that error to end-users.
training-operator/pkg/apis/kubeflow.org/v1/tensorflow_validation.go
Lines 69 to 74 in 69813fb
The text was updated successfully, but these errors were encountered: