Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenTelemetryCollector.Spec.UpgradeStrategy #620

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,18 @@ The `config` node holds the `YAML` that should be passed down as-is to the under

At this point, the Operator does *not* validate the contents of the configuration file: if the configuration is invalid, the instance will still be created but the underlying OpenTelemetry Collector might crash.


### Upgrades

As noted above, the OpenTelemetry Collector format is continuing to evolve. However, a best-effort attempt is made to upgrade all managed `OpenTelemetryCollector` resources.

In certain scenarios, it may be desirable to prevent the operator from upgrading certain `OpenTelemetryCollector` resources. For example, when a resource is configured with a custom `.Spec.Image`, end users may wish to manage configuration themselves as opposed to having the operator upgrade it. This can be configured on a resource by resource basis with the exposed property `.Spec.UpgradeStrategy`.
adriankostrubiak-tomtom marked this conversation as resolved.
Show resolved Hide resolved

By configuring a resource's `.Spec.UpgradeStrategy` to `none`, the operator will skip the given instance during the upgrade routine.

The default and only other acceptable value for `.Spec.UpgradeStrategy` is `automatic`.


### Deployment modes

The `CustomResource` for the `OpenTelemetryCollector` exposes a property named `.Spec.Mode`, which can be used to specify whether the collector should run as a `DaemonSet`, `Sidecar`, or `Deployment` (default). Look at [this sample](https://github.com/open-telemetry/opentelemetry-operator/blob/main/tests/e2e/daemonset-features/00-install.yaml) for reference.
Expand Down Expand Up @@ -215,6 +227,11 @@ The possible values for the annotation can be

The OpenTelemetry Operator follows the same versioning as the operand (OpenTelemetry Collector) up to the minor part of the version. For example, the OpenTelemetry Operator v0.18.1 tracks OpenTelemetry Collector 0.18.0. The patch part of the version indicates the patch level of the operator itself, not that of OpenTelemetry Collector. Whenever a new patch version is released for OpenTelemetry Collector, we'll release a new patch version of the operator.

By default, the OpenTelemetry Operator ensures consistent versioning between itself and the managed `OpenTelemetryCollector` resources. That is, if the OpenTelemetry Operator is based on version `0.40.0`, it will create resources with an underlying OpenTelemetry Collector at version `0.40.0`.

When a custom `Spec.Image` is used with an `OpenTelemetryCollector` resource, the OpenTelemetry Operator will not manage this versioning and upgrading. In this scenario, it is best practice that the OpenTelemetry Operator version should match the underlying core version. Given a `OpenTelemetryCollector` resource with a `Spec.Image` configured to a custom image based on underlying OpenTelemetry Collector at version `0.40.0`, it is recommended that the OpenTelemetry Operator is kept at version `0.40.0`.


### OpenTelemetry Operator vs. Kubernetes vs. Cert Manager

We strive to be compatible with the widest range of Kubernetes versions as possible, but some changes to Kubernetes itself require us to break compatibility with older Kubernetes versions, be it because of code incompatibilities, or in the name of maintainability.
Expand Down
4 changes: 4 additions & 0 deletions apis/v1alpha1/opentelemetrycollector_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@ type OpenTelemetryCollectorSpec struct {
// +required
Config string `json:"config,omitempty"`

// UpgradeStrategy represents how the operator will handle upgrades to the CR when a newer version of the operator is deployed
// +optional
UpgradeStrategy UpgradeStrategy `json:"upgradeStrategy"`

// Args is the set of arguments to pass to the OpenTelemetry Collector binary
// +optional
Args map[string]string `json:"args,omitempty"`
Expand Down
3 changes: 3 additions & 0 deletions apis/v1alpha1/opentelemetrycollector_webhook.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,9 @@ func (r *OpenTelemetryCollector) Default() {
if len(r.Spec.Mode) == 0 {
r.Spec.Mode = ModeDeployment
}
if len(r.Spec.UpgradeStrategy) == 0 {
r.Spec.UpgradeStrategy = UpgradeStrategyAutomatic
}

if r.Labels == nil {
r.Labels = map[string]string{}
Expand Down
29 changes: 29 additions & 0 deletions apis/v1alpha1/upgrade_strategy.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
// Copyright The OpenTelemetry Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package v1alpha1

type (
// UpgradeStrategy represents how the operator will handle upgrades to the CR when a newer version of the operator is deployed
// +kubebuilder:validation:Enum=automatic;none
UpgradeStrategy string
)

const (
// UpgradeStrategyAutomatic specifies that the operator will automatically apply upgrades to the CR.
UpgradeStrategyAutomatic UpgradeStrategy = "automatic"

// UpgradeStrategyNone specifies that the operator will not apply any upgrades to the CR.
UpgradeStrategyNone UpgradeStrategy = "none"
)
Original file line number Diff line number Diff line change
Expand Up @@ -689,6 +689,13 @@ spec:
type: string
type: object
type: array
upgradeStrategy:
description: UpgradeStrategy represents how the operator will handle
upgrades to the CR when a newer version of the operator is deployed
enum:
- automatic
- none
type: string
volumeClaimTemplates:
description: VolumeClaimTemplates will provide stable storage using
PersistentVolumes. Only available when the mode=statefulset.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -690,6 +690,13 @@ spec:
type: string
type: object
type: array
upgradeStrategy:
description: UpgradeStrategy represents how the operator will handle
upgrades to the CR when a newer version of the operator is deployed
enum:
- automatic
- none
type: string
volumeClaimTemplates:
description: VolumeClaimTemplates will provide stable storage using
PersistentVolumes. Only available when the mode=statefulset.
Expand Down
84 changes: 80 additions & 4 deletions controllers/suite_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,29 +15,46 @@
package controllers_test

import (
"context"
"crypto/tls"
"fmt"
"net"
"os"
"path/filepath"
"sync"
"testing"
"time"

"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/util/wait"
"k8s.io/client-go/kubernetes/scheme"
"k8s.io/client-go/util/retry"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/envtest"

"github.com/open-telemetry/opentelemetry-operator/apis/v1alpha1"
// +kubebuilder:scaffold:imports
)

var k8sClient client.Client
var testEnv *envtest.Environment
var testScheme *runtime.Scheme = scheme.Scheme
var (
k8sClient client.Client
testEnv *envtest.Environment
testScheme *runtime.Scheme = scheme.Scheme
ctx context.Context
cancel context.CancelFunc
)

func TestMain(m *testing.M) {
ctx, cancel = context.WithCancel(context.TODO())
defer cancel()

testEnv = &envtest.Environment{
CRDDirectoryPaths: []string{filepath.Join("..", "config", "crd", "bases")},
WebhookInstallOptions: envtest.WebhookInstallOptions{
Paths: []string{filepath.Join("..", "config", "webhook")},
},
}

cfg, err := testEnv.Start()
if err != nil {
fmt.Printf("failed to start testEnv: %v", err)
Expand All @@ -56,6 +73,65 @@ func TestMain(m *testing.M) {
os.Exit(1)
}

// start webhook server using Manager
webhookInstallOptions := &testEnv.WebhookInstallOptions
mgr, err := ctrl.NewManager(cfg, ctrl.Options{
Scheme: testScheme,
Host: webhookInstallOptions.LocalServingHost,
Port: webhookInstallOptions.LocalServingPort,
CertDir: webhookInstallOptions.LocalServingCertDir,
LeaderElection: false,
MetricsBindAddress: "0",
})
if err != nil {
fmt.Printf("failed to start webhook server: %v", err)
os.Exit(1)
}

if err := (&v1alpha1.OpenTelemetryCollector{}).SetupWebhookWithManager(mgr); err != nil {
fmt.Printf("failed to SetupWebhookWithManager: %v", err)
os.Exit(1)
}

ctx, cancel = context.WithCancel(context.TODO())
defer cancel()
go func() {
if err = mgr.Start(ctx); err != nil {
fmt.Printf("failed to start manager: %v", err)
os.Exit(1)
}
}()

// wait for the webhook server to get ready
wg := &sync.WaitGroup{}
wg.Add(1)
dialer := &net.Dialer{Timeout: time.Second}
addrPort := fmt.Sprintf("%s:%d", webhookInstallOptions.LocalServingHost, webhookInstallOptions.LocalServingPort)
go func(wg *sync.WaitGroup) {
defer wg.Done()
if err = retry.OnError(wait.Backoff{
Steps: 20,
Duration: 10 * time.Millisecond,
Factor: 1.5,
Jitter: 0.1,
Cap: time.Second * 30,
}, func(error) bool {
return true
}, func() error {
// #nosec G402
conn, err := tls.DialWithDialer(dialer, "tcp", addrPort, &tls.Config{InsecureSkipVerify: true})
if err != nil {
return err
}
_ = conn.Close()
return nil
}); err != nil {
fmt.Printf("failed to wait for webhook server to be ready: %v", err)
os.Exit(1)
}
}(wg)
wg.Wait()

code := m.Run()

err = testEnv.Stop()
Expand Down
9 changes: 9 additions & 0 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -508,6 +508,15 @@ OpenTelemetryCollectorSpec defines the desired state of OpenTelemetryCollector.
Toleration to schedule OpenTelemetry Collector pods. This is only relevant to daemonsets, statefulsets and deployments<br/>
</td>
<td>false</td>
</tr><tr>
<td><b>upgradeStrategy</b></td>
<td>enum</td>
<td>
UpgradeStrategy represents how the operator will handle upgrades to the CR when a newer version of the operator is deployed<br/>
<br/>
<i>Enum</i>: automatic, none<br/>
</td>
<td>false</td>
</tr><tr>
<td><b><a href="#opentelemetrycollectorspecvolumeclaimtemplatesindex">volumeClaimTemplates</a></b></td>
<td>[]object</td>
Expand Down
84 changes: 80 additions & 4 deletions internal/webhookhandler/webhookhandler_suite_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,29 +15,46 @@
package webhookhandler_test

import (
"context"
"crypto/tls"
"fmt"
"net"
"os"
"path/filepath"
"sync"
"testing"
"time"

"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/util/wait"
"k8s.io/client-go/kubernetes/scheme"
"k8s.io/client-go/util/retry"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/envtest"

"github.com/open-telemetry/opentelemetry-operator/apis/v1alpha1"
// +kubebuilder:scaffold:imports
)

var k8sClient client.Client
var testEnv *envtest.Environment
var testScheme *runtime.Scheme = scheme.Scheme
var (
k8sClient client.Client
testEnv *envtest.Environment
testScheme *runtime.Scheme = scheme.Scheme
ctx context.Context
cancel context.CancelFunc
)

func TestMain(m *testing.M) {
ctx, cancel = context.WithCancel(context.TODO())
defer cancel()

testEnv = &envtest.Environment{
CRDDirectoryPaths: []string{filepath.Join("..", "..", "config", "crd", "bases")},
WebhookInstallOptions: envtest.WebhookInstallOptions{
Paths: []string{filepath.Join("..", "..", "config", "webhook")},
},
}

cfg, err := testEnv.Start()
if err != nil {
fmt.Printf("failed to start testEnv: %v", err)
Expand All @@ -56,6 +73,65 @@ func TestMain(m *testing.M) {
os.Exit(1)
}

// start webhook server using Manager
webhookInstallOptions := &testEnv.WebhookInstallOptions
mgr, err := ctrl.NewManager(cfg, ctrl.Options{
Scheme: testScheme,
Host: webhookInstallOptions.LocalServingHost,
Port: webhookInstallOptions.LocalServingPort,
CertDir: webhookInstallOptions.LocalServingCertDir,
LeaderElection: false,
MetricsBindAddress: "0",
})
if err != nil {
fmt.Printf("failed to start webhook server: %v", err)
os.Exit(1)
}

if err := (&v1alpha1.OpenTelemetryCollector{}).SetupWebhookWithManager(mgr); err != nil {
fmt.Printf("failed to SetupWebhookWithManager: %v", err)
os.Exit(1)
}

ctx, cancel = context.WithCancel(context.TODO())
defer cancel()
go func() {
if err = mgr.Start(ctx); err != nil {
fmt.Printf("failed to start manager: %v", err)
os.Exit(1)
}
}()

// wait for the webhook server to get ready
wg := &sync.WaitGroup{}
wg.Add(1)
dialer := &net.Dialer{Timeout: time.Second}
addrPort := fmt.Sprintf("%s:%d", webhookInstallOptions.LocalServingHost, webhookInstallOptions.LocalServingPort)
go func(wg *sync.WaitGroup) {
defer wg.Done()
if err = retry.OnError(wait.Backoff{
Steps: 20,
Duration: 10 * time.Millisecond,
Factor: 1.5,
Jitter: 0.1,
Cap: time.Second * 30,
}, func(error) bool {
return true
}, func() error {
// #nosec G402
conn, err := tls.DialWithDialer(dialer, "tcp", addrPort, &tls.Config{InsecureSkipVerify: true})
if err != nil {
return err
}
_ = conn.Close()
return nil
}); err != nil {
fmt.Printf("failed to wait for webhook server to be ready: %v", err)
os.Exit(1)
}
}(wg)
wg.Wait()

code := m.Run()

err = testEnv.Stop()
Expand Down
Loading