Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2170: Implement skeleton webhook servers #2251

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,9 @@ manifests: controller-gen ## Generate WebhookConfiguration, ClusterRole and Cust
output:crd:artifacts:config=manifests/base/crds \
output:rbac:artifacts:config=manifests/base/rbac \
output:webhook:artifacts:config=manifests/base/webhook
$(CONTROLLER_GEN) "crd:generateEmbeddedObjectMeta=true" paths="./pkg/apis/kubeflow.org/v2alpha1/..." \
output:crd:artifacts:config=manifests/v2/base/crds
$(CONTROLLER_GEN) "crd:generateEmbeddedObjectMeta=true" "webhook" paths="./pkg/apis/kubeflow.org/v2alpha1/...;./pkg/webhook.v2/..." \
output:crd:artifacts:config=manifests/v2/base/crds \
output:webhook:artifacts:config=manifests/v2/base/webhook

generate: controller-gen ## Generate apidoc, sdk and code containing DeepCopy, DeepCopyInto, and DeepCopyObject method implementations.
$(CONTROLLER_GEN) object:headerFile="hack/boilerplate/boilerplate.go.txt" paths="./pkg/apis/..."
Expand Down
2 changes: 2 additions & 0 deletions manifests/v2/base/webhook/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
resources:
- manifests.yaml
66 changes: 66 additions & 0 deletions manifests/v2/base/webhook/manifests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: validating-webhook-configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call it similar to Katib webhooks ?

Suggested change
name: validating-webhook-configuration
name: training.kubeflow.org

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is auto-generated name by controller-gen.
So, we can not change this name.

Copy link
Member

@andreyvelich andreyvelich Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that we discussed before that we can rename it via Kustomize patch: https://github.com/kubeflow/training-operator/blob/master/manifests/base/webhook/patch.yaml.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. But we do not add any manifests in this PR because I added only manifests generated by controller-gen so that we can start up the webhook servers during envtest.

In other words, these manifests are at least manifests to launch webhook servers during integration testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think #2208 has the responsibility to add manifests not generated by controller-tools. Any thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. @tenzen-y Please can you add the comment in #2208 that we should add the Kustomize patch to update those names ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will describe the detailed task in #2208.

webhooks:
- admissionReviewVersions:
- v1
clientConfig:
service:
name: webhook-service
namespace: system
path: /validate-kubeflow-org-v2alpha1-clustertrainingruntime
failurePolicy: Fail
name: validator.clustertrainingruntime.kubeflow.org
rules:
- apiGroups:
- kubeflow.org
apiVersions:
- v2alpha1
operations:
- CREATE
- UPDATE
resources:
- clustertrainingruntimes
sideEffects: None
- admissionReviewVersions:
- v1
clientConfig:
service:
name: webhook-service
namespace: system
path: /validate-kubeflow-org-v2alpha1-trainingruntime
failurePolicy: Fail
name: validator.trainingruntime.kubeflow.org
rules:
- apiGroups:
- kubeflow.org
apiVersions:
- v2alpha1
operations:
- CREATE
- UPDATE
resources:
- trainingruntimes
sideEffects: None
- admissionReviewVersions:
- v1
clientConfig:
service:
name: webhook-service
namespace: system
path: /validate-kubeflow-org-v2alpha1-trainjob
failurePolicy: Fail
name: validator.trainjob.kubeflow.org
rules:
- apiGroups:
- kubeflow.org
apiVersions:
- v2alpha1
operations:
- CREATE
- UPDATE
resources:
- trainjobs
sideEffects: None
53 changes: 53 additions & 0 deletions pkg/webhook.v2/clustertrainingruntime_webhook.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
/*
Copyright 2024 The Kubeflow Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package webhookv2

import (
"context"

"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/webhook"
"sigs.k8s.io/controller-runtime/pkg/webhook/admission"

kubeflowv2 "github.com/kubeflow/training-operator/pkg/apis/kubeflow.org/v2alpha1"
)

type ClusterTrainingRuntimeWebhook struct{}

func setupWebhookForClusterTrainingRuntime(mgr ctrl.Manager) error {
return ctrl.NewWebhookManagedBy(mgr).
For(&kubeflowv2.ClusterTrainingRuntime{}).
WithValidator(&ClusterTrainingRuntimeWebhook{}).
Complete()
}

// +kubebuilder:webhook:path=/validate-kubeflow-org-v2alpha1-clustertrainingruntime,mutating=false,failurePolicy=fail,sideEffects=None,groups=kubeflow.org,resources=clustertrainingruntimes,verbs=create;update,versions=v2alpha1,name=validator.clustertrainingruntime.kubeflow.org,admissionReviewVersions=v1

var _ webhook.CustomValidator = (*ClusterTrainingRuntimeWebhook)(nil)

func (w *ClusterTrainingRuntimeWebhook) ValidateCreate(context.Context, runtime.Object) (admission.Warnings, error) {
return nil, nil
}

func (w *ClusterTrainingRuntimeWebhook) ValidateUpdate(context.Context, runtime.Object, runtime.Object) (admission.Warnings, error) {
return nil, nil
}

func (w *ClusterTrainingRuntimeWebhook) ValidateDelete(context.Context, runtime.Object) (admission.Warnings, error) {
return nil, nil
}
11 changes: 10 additions & 1 deletion pkg/webhook.v2/webhook.go → pkg/webhook.v2/setup.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,15 @@ package webhookv2

import ctrl "sigs.k8s.io/controller-runtime"

func Setup(ctrl.Manager) (string, error) {
func Setup(mgr ctrl.Manager) (string, error) {
if err := setupWebhookForClusterTrainingRuntime(mgr); err != nil {
return "ClusterTrainingRuntime", err
}
if err := setupWebhookForTrainingRuntime(mgr); err != nil {
return "TrainingRuntime", err
}
if err := setupWebhookForTrainJob(mgr); err != nil {
return "TrainJob", err
}
Comment on lines +23 to +30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenzen-y Do you return these names only to handle this error: https://github.com/kubeflow/training-operator/blob/master/cmd/training-operator.v2alpha1/main.go#L150C5-L152 ?
Do you think it is necessary ?
If some of the webhook will fail, we will get this info since we traceback error from the underlying file: ClusterTrainingRuntime, TrainingRuntime, or TrainJob.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe that this name is helpful because this is not for validation error, this is for set up error.
This error can be seen when the startup webhook server failed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, but without this name, can we see the file traceback of this error during the webhook setup ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that the traceback will show the path where the error occurs. However, I still believe this helps understand the components in which failure occurs since this component name (ex. TrainJob) can be shown as a top-level log, and this top-level log allows us to avoid digging into nested traceback logs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, sounds good!

return "", nil
}
53 changes: 53 additions & 0 deletions pkg/webhook.v2/trainingruntime_webhook.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
/*
Copyright 2024 The Kubeflow Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package webhookv2

import (
"context"

"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/webhook"
"sigs.k8s.io/controller-runtime/pkg/webhook/admission"

kubeflowv2 "github.com/kubeflow/training-operator/pkg/apis/kubeflow.org/v2alpha1"
)

type TrainingRuntimeWebhook struct{}

func setupWebhookForTrainingRuntime(mgr ctrl.Manager) error {
return ctrl.NewWebhookManagedBy(mgr).
For(&kubeflowv2.TrainingRuntime{}).
WithValidator(&TrainingRuntimeWebhook{}).
Complete()
}

// +kubebuilder:webhook:path=/validate-kubeflow-org-v2alpha1-trainingruntime,mutating=false,failurePolicy=fail,sideEffects=None,groups=kubeflow.org,resources=trainingruntimes,verbs=create;update,versions=v2alpha1,name=validator.trainingruntime.kubeflow.org,admissionReviewVersions=v1

var _ webhook.CustomValidator = (*TrainingRuntimeWebhook)(nil)

func (w *TrainingRuntimeWebhook) ValidateCreate(context.Context, runtime.Object) (admission.Warnings, error) {
return nil, nil
}

func (w *TrainingRuntimeWebhook) ValidateUpdate(context.Context, runtime.Object, runtime.Object) (admission.Warnings, error) {
return nil, nil
}

func (w *TrainingRuntimeWebhook) ValidateDelete(context.Context, runtime.Object) (admission.Warnings, error) {
return nil, nil
}
andreyvelich marked this conversation as resolved.
Show resolved Hide resolved
36 changes: 36 additions & 0 deletions pkg/webhook.v2/trainjob_webhook.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,39 @@ limitations under the License.
*/

package webhookv2

import (
"context"

"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/webhook"
"sigs.k8s.io/controller-runtime/pkg/webhook/admission"

kubeflowv2 "github.com/kubeflow/training-operator/pkg/apis/kubeflow.org/v2alpha1"
)

type TrainJobWebhook struct{}

func setupWebhookForTrainJob(mgr ctrl.Manager) error {
return ctrl.NewWebhookManagedBy(mgr).
For(&kubeflowv2.TrainJob{}).
WithValidator(&TrainJobWebhook{}).
Complete()
}

// +kubebuilder:webhook:path=/validate-kubeflow-org-v2alpha1-trainjob,mutating=false,failurePolicy=fail,sideEffects=None,groups=kubeflow.org,resources=trainjobs,verbs=create;update,versions=v2alpha1,name=validator.trainjob.kubeflow.org,admissionReviewVersions=v1

var _ webhook.CustomValidator = (*TrainJobWebhook)(nil)

func (w *TrainJobWebhook) ValidateCreate(context.Context, runtime.Object) (admission.Warnings, error) {
return nil, nil
}

func (w *TrainJobWebhook) ValidateUpdate(context.Context, runtime.Object, runtime.Object) (admission.Warnings, error) {
return nil, nil
}

func (w *TrainJobWebhook) ValidateDelete(context.Context, runtime.Object) (admission.Warnings, error) {
return nil, nil
}
28 changes: 27 additions & 1 deletion test/integration/framework/framework.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,11 @@ package framework

import (
"context"
"crypto/tls"
"fmt"
"net"
"path/filepath"
"time"

"github.com/onsi/ginkgo/v2"
"github.com/onsi/gomega"
Expand All @@ -31,9 +35,11 @@ import (
"sigs.k8s.io/controller-runtime/pkg/log/zap"
"sigs.k8s.io/controller-runtime/pkg/manager"
metricsserver "sigs.k8s.io/controller-runtime/pkg/metrics/server"
"sigs.k8s.io/controller-runtime/pkg/webhook"

kubeflowv2 "github.com/kubeflow/training-operator/pkg/apis/kubeflow.org/v2alpha1"
controllerv2 "github.com/kubeflow/training-operator/pkg/controller.v2"
webhookv2 "github.com/kubeflow/training-operator/pkg/webhook.v2"
)

type Framework struct {
Expand All @@ -45,7 +51,10 @@ func (f *Framework) Init() *rest.Config {
log.SetLogger(zap.New(zap.WriteTo(ginkgo.GinkgoWriter), zap.UseDevMode(true)))
ginkgo.By("bootstrapping test environment")
f.testEnv = &envtest.Environment{
CRDDirectoryPaths: []string{filepath.Join("..", "..", "..", "manifests", "v2", "base", "crds")},
CRDDirectoryPaths: []string{filepath.Join("..", "..", "..", "manifests", "v2", "base", "crds")},
WebhookInstallOptions: envtest.WebhookInstallOptions{
Paths: []string{filepath.Join("..", "..", "..", "manifests", "v2", "base", "webhook")},
},
ErrorIfCRDPathMissing: true,
}
cfg, err := f.testEnv.Start()
Expand All @@ -55,6 +64,7 @@ func (f *Framework) Init() *rest.Config {
}

func (f *Framework) RunManager(cfg *rest.Config) (context.Context, client.Client) {
webhookInstallOpts := &f.testEnv.WebhookInstallOptions
gomega.ExpectWithOffset(1, kubeflowv2.AddToScheme(scheme.Scheme)).NotTo(gomega.HaveOccurred())

// +kubebuilder:scaffold:scheme
Expand All @@ -70,18 +80,34 @@ func (f *Framework) RunManager(cfg *rest.Config) (context.Context, client.Client
Metrics: metricsserver.Options{
BindAddress: "0", // disable metrics to avoid conflicts between packages.
},
WebhookServer: webhook.NewServer(
webhook.Options{
Host: webhookInstallOpts.LocalServingHost,
Port: webhookInstallOpts.LocalServingPort,
CertDir: webhookInstallOpts.LocalServingCertDir,
}),
})
gomega.ExpectWithOffset(1, err).NotTo(gomega.HaveOccurred(), "failed to create manager")

failedCtrlName, err := controllerv2.SetupControllers(mgr)
gomega.ExpectWithOffset(1, err).NotTo(gomega.HaveOccurred(), "controller", failedCtrlName)
failedWebhookName, err := webhookv2.Setup(mgr)
gomega.ExpectWithOffset(1, err).NotTo(gomega.HaveOccurred(), "webhook", failedWebhookName)

go func() {
defer ginkgo.GinkgoRecover()
err = mgr.Start(ctx)
gomega.ExpectWithOffset(1, err).NotTo(gomega.HaveOccurred(), "failed to run manager")
}()

dialer := &net.Dialer{Timeout: time.Second}
addrPort := fmt.Sprintf("%s:%d", webhookInstallOpts.LocalServingHost, webhookInstallOpts.LocalServingPort)
gomega.Eventually(func(g gomega.Gomega) {
var conn *tls.Conn
conn, err = tls.DialWithDialer(dialer, "tcp", addrPort, &tls.Config{InsecureSkipVerify: true})
g.Expect(err).Should(gomega.Succeed())
g.Expect(conn.Close()).Should(gomega.Succeed())
}).Should(gomega.Succeed())
return ctx, k8sClient
}

Expand Down
52 changes: 52 additions & 0 deletions test/integration/webhook.v2/clustertrainingruntime_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
/*
Copyright 2024 The Kubeflow Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package webhookv2

import (
"github.com/onsi/ginkgo/v2"
"github.com/onsi/gomega"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

"github.com/kubeflow/training-operator/test/integration/framework"
)

var _ = ginkgo.Describe("ClusterTrainingRuntime Webhook", ginkgo.Ordered, func() {
var ns *corev1.Namespace

ginkgo.BeforeAll(func() {
fwk = &framework.Framework{}
cfg = fwk.Init()
ctx, k8sClient = fwk.RunManager(cfg)
})
ginkgo.AfterAll(func() {
fwk.Teardown()
})

ginkgo.BeforeEach(func() {
ns = &corev1.Namespace{
TypeMeta: metav1.TypeMeta{
APIVersion: corev1.SchemeGroupVersion.String(),
Kind: "Namespace",
},
ObjectMeta: metav1.ObjectMeta{
GenerateName: "clustertrainingruntime-webhook-",
},
}
gomega.Expect(k8sClient.Create(ctx, ns)).To(gomega.Succeed())
})
})
Loading
Loading