Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: monitoring multiple clusters #266

Merged
merged 4 commits into from
Jan 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,71 @@ you will be able to see the Results objects of the analysis after some minutes (
"details": "The error message means that the service in Kubernetes doesn't have any associated endpoints, which should have been labeled with \"control-plane=controller-manager\". \n\nTo solve this issue, you need to add the \"control-plane=controller-manager\" label to the endpoint that matches the service. Once the endpoint is labeled correctly, Kubernetes can associate it with the service, and the error should be resolved.",
```

## Monitor multiple clusters

The `k8sgpt.ai` Operator allows monitoring multiple clusters by providing a `kubeconfig` value.

This feature could be fascinating if you want to embrace Platform Engineering such as running a fleet of Kubernetes clusters for multiple stakeholders.
Especially designed for the Cluster API-based infrastructures, `k8sgpt.ai` Operator is going to be installed in the same Cluster API management cluster:
this one is responsible for creating the required clusters according to the infrastructure provider for the seed clusters.

Once a Cluster API-based cluster has been provisioned a `kubeconfig` according to the naming convention `${CLUSTERNAME}-kubeconfig` will be available in the same namespace:
the conventional Secret data key is `value`, this can be used to instruct the `k8sgpt.ai` Operator to monitor a remote cluster without installing any resource deployed to the seed cluster.

```
$: kubectl get clusters
NAME PHASE AGE VERSION
capi-quickstart Provisioned 8s v1.28.0

$: kubectl get secrets
NAME TYPE DATA AGE
capi-quickstart-kubeconfig Opaque 1 8s
```

> **A security concern**
>
> If your setup requires the least privilege approach,
> a different `kubeconfig` must be provided since the Cluster API generated one is bounded to the `admin` user which has `clustr-admin` permissions.


Once you have a valid `kubeconfig`, a `k8sgpt` instance can be created as it follows.

```yaml
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
name: capi-quickstart
namespace: default
spec:
ai:
anonymized: true
backend: openai
language: english
model: gpt-3.5-turbo
secret:
key: api_key
name: my_openai_secret
kubeconfig:
key: value
name: capi-quickstart-kubeconfig
```

Once applied the `k8sgpt.ai` Operator will create the `k8sgpt.ai` Deployment by using the seed cluster `kubeconfig` defined in the field `/spec/kubeconfig`.

The resulting `Result` objects will be available in the same Namespace where the `k8sgpt.ai` instance has been deployed,
accordingly labelled with the following keys:

- `k8sgpts.k8sgpt.ai/name`: the `k8sgpt.ai` instance Name
- `k8sgpts.k8sgpt.ai/namespace`: the `k8sgpt.ai` instance Namespace
- `k8sgpts.k8sgpt.ai/backend`: the AI backend (if specified)

Thanks to these labels, the results can be filtered according to the specified monitored cluster,
without polluting the underlying cluster with the `k8sgpt.ai` CRDs and consuming seed compute workloads,
as well as keeping confidentiality about the AI backend driver credentials.

> In case of missing `/spec/kubeconfig` field, `k8sgpt.ai` Operator will track the cluster on which has been deployed:
> this is possible by mounting the provided `ServiceAccount`.

## Remote Cache

<details>
Expand Down
3 changes: 3 additions & 0 deletions api/v1alpha1/k8sgpt_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,9 @@ type K8sGPTSpec struct {
RemoteCache *RemoteCacheRef `json:"remoteCache,omitempty"`
Integrations *Integrations `json:"integrations,omitempty"`
NodeSelector map[string]string `json:"nodeSelector,omitempty"`
// Define the kubeconfig the Deployment must use.
// If empty, the Deployment will use the ServiceAccount provided by Kubernetes itself.
Kubeconfig *SecretRef `json:"kubeconfig,omitempty"`
}

const (
Expand Down
5 changes: 5 additions & 0 deletions api/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 10 additions & 0 deletions config/crd/bases/core.k8sgpt.ai_k8sgpts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,16 @@ spec:
type: boolean
type: object
type: object
kubeconfig:
description: Define the kubeconfig the Deployment must use. If empty,
the Deployment will use the ServiceAccount provided by Kubernetes
itself.
properties:
key:
type: string
name:
type: string
type: object
noCache:
type: boolean
nodeSelector:
Expand Down
18 changes: 11 additions & 7 deletions controllers/k8sgpt_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,6 @@ import (

corev1alpha1 "github.com/k8sgpt-ai/k8sgpt-operator/api/v1alpha1"

kclient "github.com/k8sgpt-ai/k8sgpt-operator/pkg/client"
"github.com/k8sgpt-ai/k8sgpt-operator/pkg/integrations"
"github.com/k8sgpt-ai/k8sgpt-operator/pkg/resources"
"github.com/k8sgpt-ai/k8sgpt-operator/pkg/sinks"
"github.com/k8sgpt-ai/k8sgpt-operator/pkg/utils"
"github.com/prometheus/client_golang/prometheus"
v1 "k8s.io/api/apps/v1"
kcorev1 "k8s.io/api/core/v1"
Expand All @@ -37,6 +32,12 @@ import (
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
"sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/controller-runtime/pkg/metrics"

kclient "github.com/k8sgpt-ai/k8sgpt-operator/pkg/client"
"github.com/k8sgpt-ai/k8sgpt-operator/pkg/integrations"
"github.com/k8sgpt-ai/k8sgpt-operator/pkg/resources"
"github.com/k8sgpt-ai/k8sgpt-operator/pkg/sinks"
"github.com/k8sgpt-ai/k8sgpt-operator/pkg/utils"
)

const (
Expand Down Expand Up @@ -151,7 +152,7 @@ func (r *K8sGPTReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctr
// Check and see if the instance is new or has a K8sGPT deployment in flight
deployment := v1.Deployment{}
err = r.Get(ctx, client.ObjectKey{Namespace: k8sgptConfig.Namespace,
Name: "k8sgpt-deployment"}, &deployment)
AlexsJones marked this conversation as resolved.
Show resolved Hide resolved
Name: k8sgptConfig.Name}, &deployment)
if client.IgnoreNotFound(err) != nil {
k8sgptReconcileErrorCount.Inc()
return r.finishReconcile(err, false)
Expand Down Expand Up @@ -260,7 +261,10 @@ func (r *K8sGPTReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctr
// no longer are relevent, we can do this by using the resultSpec composed name against
// the custom resource name
resultList := &corev1alpha1.ResultList{}
err = r.List(ctx, resultList)
err = r.List(ctx, resultList, client.MatchingLabels(map[string]string{
"k8sgpts.k8sgpt.ai/name": k8sgptConfig.Name,
"k8sgpts.k8sgpt.ai/namespace": k8sgptConfig.Namespace,
}))
if err != nil {
k8sgptReconcileErrorCount.Inc()
return r.finishReconcile(err, false)
Expand Down
2 changes: 1 addition & 1 deletion pkg/client/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ func GenerateAddress(ctx context.Context, cli client.Client, k8sgptConfig *v1alp
// Get service IP and port for k8sgpt-deployment
svc := &corev1.Service{}
err := cli.Get(ctx, client.ObjectKey{Namespace: k8sgptConfig.Namespace,
Name: "k8sgpt"}, svc)
Name: k8sgptConfig.Name}, svc)
if err != nil {
return "", nil
}
Expand Down
84 changes: 59 additions & 25 deletions pkg/resources/k8sgpt.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ package resources
import (
"context"
err "errors"
"fmt"

"github.com/k8sgpt-ai/k8sgpt-operator/api/v1alpha1"
"github.com/k8sgpt-ai/k8sgpt-operator/pkg/utils"
Expand All @@ -29,6 +30,7 @@ import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/util/retry"
"k8s.io/utils/ptr"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
)
Expand All @@ -39,15 +41,14 @@ type SyncOrDestroy int
const (
SyncOp SyncOrDestroy = iota
DestroyOp
DeploymentName = "k8sgpt-deployment"
)

// GetService Create service for K8sGPT
func GetService(config v1alpha1.K8sGPT) (*corev1.Service, error) {
// Create service
service := corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Name: "k8sgpt",
Name: config.Name,
Namespace: config.Namespace,
OwnerReferences: []metav1.OwnerReference{
{
Expand All @@ -62,7 +63,7 @@ func GetService(config v1alpha1.K8sGPT) (*corev1.Service, error) {
},
Spec: corev1.ServiceSpec{
Selector: map[string]string{
"app": DeploymentName,
"app": config.Name,
},
Ports: []corev1.ServicePort{
{
Expand Down Expand Up @@ -178,14 +179,14 @@ func GetClusterRole(config v1alpha1.K8sGPT) (*r1.ClusterRole, error) {
}

// GetDeployment Create deployment with the latest K8sGPT image
func GetDeployment(config v1alpha1.K8sGPT) (*appsv1.Deployment, error) {
func GetDeployment(config v1alpha1.K8sGPT, outOfClusterMode bool) (*appsv1.Deployment, error) {

// Create deployment
image := config.Spec.Repository + ":" + config.Spec.Version
replicas := int32(1)
deployment := appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: DeploymentName,
Name: config.Name,
Namespace: config.Namespace,
OwnerReferences: []metav1.OwnerReference{
{
Expand All @@ -202,13 +203,13 @@ func GetDeployment(config v1alpha1.K8sGPT) (*appsv1.Deployment, error) {
Replicas: &replicas,
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{
"app": DeploymentName,
"app": config.Name,
},
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"app": DeploymentName,
"app": config.Name,
},
},
Spec: corev1.PodSpec{
Expand Down Expand Up @@ -273,6 +274,35 @@ func GetDeployment(config v1alpha1.K8sGPT) (*appsv1.Deployment, error) {
},
},
}
if outOfClusterMode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we standardize the code based on the Spec ?

if config.Spec.Kubeconfig {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we handle all these changes by updating the version of the k8sgpt resource

It depends on the API governance of the project, although sticking to the best practices of Kubernetes operators development, such as agreed on several Cluster API projects or Kubernetes itself, adding fields to a specification doesn't require a bump of the version since it's a feature addition that would require an update of the deployed Operator.

This would allow us to maintain two versions of the controller,

In operator development, it's highly discouraged to have two controllers for the same resource, even tho this has multiple API versions. Rather, a version is elected as Hub which is going to be the stored one in etcd, which will be used as "hub" to convert back and forth from other versions.

preventing any breaking changes and avoiding complicating the migration process

As I stated before, adding fields is not considered as a breaking change. The migration process can be achieved by leveraging the Operator SDK/controller-runtime machinery, such as the conversion webhook which will translate API objects between versions in a transparent mode for the end-user. Adding a conversion requires a lot of more code base, as well as webhooks management (which require CA handling): it should be put in place if and only if definitely required, this doesn't seem the case to me.

// No need of ServiceAccount since the Deployment will use
// a kubeconfig pointing to an external cluster.
deployment.Spec.Template.Spec.ServiceAccountName = ""
deployment.Spec.Template.Spec.AutomountServiceAccountToken = ptr.To(false)
AlexsJones marked this conversation as resolved.
Show resolved Hide resolved

kubeconfigPath := fmt.Sprintf("/tmp/%s", config.Name)

deployment.Spec.Template.Spec.Containers[0].Args = append(deployment.Spec.Template.Spec.Containers[0].Args, fmt.Sprintf("--kubeconfig=%s/kubeconfig", kubeconfigPath))
deployment.Spec.Template.Spec.Containers[0].VolumeMounts = append(deployment.Spec.Template.Spec.Containers[0].VolumeMounts, corev1.VolumeMount{
Name: "kubeconfig",
ReadOnly: true,
MountPath: kubeconfigPath,
})
deployment.Spec.Template.Spec.Volumes = append(deployment.Spec.Template.Spec.Volumes, corev1.Volume{
Name: "kubeconfig",
VolumeSource: v1.VolumeSource{
Secret: &corev1.SecretVolumeSource{
SecretName: config.Spec.Kubeconfig.Name,
Items: []corev1.KeyToPath{
{
Key: config.Spec.Kubeconfig.Key,
Path: "kubeconfig",
},
},
},
},
})
}
if config.Spec.AI.Secret != nil {
password := corev1.EnvVar{
Name: "K8SGPT_PASSWORD",
Expand Down Expand Up @@ -347,35 +377,39 @@ func Sync(ctx context.Context, c client.Client,

var objs []client.Object

svc, er := GetService(config)
if er != nil {
return er
}
outOfClusterMode := config.Spec.Kubeconfig != nil

objs = append(objs, svc)
if !outOfClusterMode {
svcAcc, er := GetServiceAccount(config)
if er != nil {
return er
}

svcAcc, er := GetServiceAccount(config)
if er != nil {
return er
}
objs = append(objs, svcAcc)

objs = append(objs, svcAcc)
clusterRole, er := GetClusterRole(config)
if er != nil {
return er
}

clusterRole, er := GetClusterRole(config)
if er != nil {
return er
}
objs = append(objs, clusterRole)

clusterRoleBinding, er := GetClusterRoleBinding(config)
if er != nil {
return er
}

objs = append(objs, clusterRole)
objs = append(objs, clusterRoleBinding)
}

clusterRoleBinding, er := GetClusterRoleBinding(config)
svc, er := GetService(config)
if er != nil {
return er
}

objs = append(objs, clusterRoleBinding)
objs = append(objs, svc)

deployment, er := GetDeployment(config)
deployment, er := GetDeployment(config, outOfClusterMode)
if er != nil {
return er
}
Expand Down