Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ support work driver config for cluster manager. #381

Conversation

morvencao
Copy link
Member

@morvencao morvencao commented Mar 19, 2024

Summary

This PR adds the capability to configure the work driver for the work controller via the cluster manager.

Related issue(s)

https://issues.redhat.com/browse/ACM-10422

@morvencao morvencao force-pushed the br_work_driver_config branch 2 times, most recently from 91137ad to d43748a Compare March 20, 2024 02:26
Copy link

codecov bot commented Mar 20, 2024

Codecov Report

Attention: Patch coverage is 69.04762% with 13 lines in your changes are missing coverage. Please review.

Project coverage is 62.44%. Comparing base (bcbe4d2) to head (6f5c4ee).
Report is 29 commits behind head on main.

Files Patch % Lines
...nagercontroller/clustermanager_secret_reconcile.go 64.70% 8 Missing and 4 partials ⚠️
...stermanagercontroller/clustermanager_controller.go 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #381      +/-   ##
==========================================
+ Coverage   61.54%   62.44%   +0.89%     
==========================================
  Files         133      134       +1     
  Lines       14078    11432    -2646     
==========================================
- Hits         8665     7139    -1526     
+ Misses       4664     3527    -1137     
- Partials      749      766      +17     
Flag Coverage Δ
unit 62.44% <69.04%> (+0.89%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@morvencao morvencao force-pushed the br_work_driver_config branch 3 times, most recently from 0433e14 to 691dc14 Compare March 20, 2024 09:38
@morvencao morvencao changed the title ✨ [WIP] support work driver config for cluster manager. ✨ support work driver config for cluster manager. Mar 20, 2024
@morvencao
Copy link
Member Author

/unhold

@morvencao morvencao force-pushed the br_work_driver_config branch 5 times, most recently from 048830f to c5f756d Compare March 20, 2024 12:01
@@ -209,6 +216,17 @@ func (n *clusterManagerController) sync(ctx context.Context, controllerContext f
}
managementClient := n.operatorKubeClient // We assume that operator is always running on the management cluster.

// If the work driver is not kube, we need to sync the work driver config secret
if workDriver != operatorapiv1.WorkDriverTypeKube {
if err := helpers.SyncWorkConfigSecret(ctx, n.operatorKubeClient, clusterManagerNamespace); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is already a sync secret func in library-go

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced this with SyncSecret from library-go.

@@ -209,6 +216,17 @@ func (n *clusterManagerController) sync(ctx context.Context, controllerContext f
}
managementClient := n.operatorKubeClient // We assume that operator is always running on the management cluster.

// If the work driver is not kube, we need to sync the work driver config secret
if workDriver != operatorapiv1.WorkDriverTypeKube {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if clustermanager is create at first and then secret is created?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operator will initially error, but subsequent reconcile will succeed after creating the secret.
Currently we don't watch the secret, do we need that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to reflect that the secret is missing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added condition to clustermanager CR's status to reflect that.

@morvencao morvencao force-pushed the br_work_driver_config branch from dea5883 to 63c8853 Compare March 21, 2024 01:53
@@ -46,10 +46,15 @@ spec:
args:
- "/work"
- "manager"
- "--work-driver=kube"
- "--work-driver={{ .WorkDriver }}"
{{ if eq .WorkDriver "kube" }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be like this ?

{{ if .HostedMode }}
{{ if eq .WorkDriver "kube" }}

hostedMode only in the case workDirver == kube?

Copy link
Member Author

@morvencao morvencao Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, per my understanding.
It's not clear how it works in hosted mode for other driver.

@@ -92,16 +97,28 @@ spec:
volumeMounts:
- name: tmpdir
mountPath: /tmp
{{ if eq .WorkDriver "kube" }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question:
hostedMode only in the case workDirver == kube?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

@morvencao morvencao force-pushed the br_work_driver_config branch 3 times, most recently from 8e56daf to 48a5f00 Compare March 25, 2024 04:41
@morvencao morvencao force-pushed the br_work_driver_config branch from 48a5f00 to a559f7e Compare March 29, 2024 01:29
{{ if .HostedMode }}
- "--kubeconfig=/var/run/secrets/hub/kubeconfig"
{{ end }}
{{ else }}
- "--cloudevents-client-id=$(POD_NAME)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if using podname is good choice. The pod name is changed after upgrade. Should the client id be already in the config file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the client id is not in config file, it's in cloudevents source options, see: https://github.com/open-cluster-management-io/sdk-go/blob/7dbdd1b5c63da7b832705501185138a5e5c530ba/pkg/cloudevents/generic/options/mqtt/sourceoptions.go#L21

per my understanding, it's acceptable if the client is changed, since it's used by MQTT broker to differentiate clients, the key is to make sure each publisher/subscriber has different client id.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am more concerned about if client id is part of authentication/authz, which will be impacted by the change of client id. cc @skeeey

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembered for some MQTT broker, like the AWS IOT, we need this client ID do authn/authz, but we can use a wildcard in the authn/authz, I think you may need a fixed format ID here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is link openshift-online/maestro#28 (comment) we have discussed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I updated this to "--cloudevents-client-id=work-controller-$(POD_NAME)" with fixed prefix work-controller-, so in the future the authn/zuthz can be donw with wildcard work-controller-*.

}

// GetOperatorNamespace is used to get the namespace where the operator is running.
func GetOperatorNamespace() (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we already get the namespace from the controllerContext

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the work-driver-config secret is supposed to be created in the namespace (open-cluster-management by default) as the operator, not in the operand namespace (open-cluster-management-hub by default), because the operand namespace may not exist before clustermanager is created.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but you can get the namespace from controllerContext which get the namespace the same as what this func does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need to remove it? The else section here seems unnecessary to me. we can just return a condition with true when kube-driver is set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used for case: non-kube driver is changed to kube driver, the condition should be removed, make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can always add the condition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, removed this part.

}
}
return err
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to many nested if. should refactor the code here.
Wouldn't SyncSecret return a notFound error? I think we do not need to get then sync.

directly sync and return error should be ok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SyncSecret doesn't return an error when the source secret doesn't exist, that's why I added the check before SyncSecret. Also added comments at

// check the secret containing work driver config explicitly because
// resourceapply.SyncSecret below won't return err if the source secret is not found
_, err = n.operatorKubeClient.CoreV1().Secrets(operatorNamespace).Get(ctx, helpers.WorkDriverConfig, metav1.GetOptions{})

@@ -75,6 +75,7 @@ func newClusterManager(name string) *operatorapiv1.ClusterManager {
},
WorkConfiguration: &operatorapiv1.WorkConfiguration{
FeatureGates: []operatorapiv1.FeatureGate{featureGate},
WorkDriver: operatorapiv1.WorkDriverTypeKube,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ut on other type would be needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added UT for sync work driver config secret and condition check.

@@ -209,6 +217,71 @@ func (n *clusterManagerController) sync(ctx context.Context, controllerContext f
}
managementClient := n.operatorKubeClient // We assume that operator is always running on the management cluster.

if config.MWReplicaSetEnabled {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this if is not necessary, we should alway sync.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this.

clusterManagerFinalizer = "operator.open-cluster-management.io/cluster-manager-cleanup"
clusterManagerApplied = "Applied"
clusterManagerProgressing = "Progressing"
workDriverConfigSecretSynced = "WorkDriverConfigSecretSynced"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just secret synced. We will need to also sync other secret for image pull credential later. These could be in the same condition, different reasons and messages.

FYI @zhiweiyin318

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated this to a general value: "SecretSynced"

@morvencao morvencao force-pushed the br_work_driver_config branch 3 times, most recently from ec24242 to 6fdf148 Compare April 2, 2024 02:14
@morvencao morvencao force-pushed the br_work_driver_config branch from f6f5ed0 to e9df745 Compare April 19, 2024 08:13
@morvencao morvencao force-pushed the br_work_driver_config branch 5 times, most recently from 3f80cd5 to 80940b5 Compare April 23, 2024 04:32
@morvencao
Copy link
Member Author

Another look? @qiujian16 @zhiweiyin318

return err
}
// TODO: set condition to indicate if the secret is synced successfully
// check the source secret exist before syncing and set condition if not exist.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @qiujian16's comment is to add a separated reconciler like

&webhookReconcile{cache: n.cache, recorder: n.recorder, hubKubeClient: hubClient, kubeClient: managementClient},
to only handle the secrets sync.

#397 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added a new secret reconciler.

@morvencao morvencao force-pushed the br_work_driver_config branch 2 times, most recently from a989e1c to 11233c5 Compare April 23, 2024 07:54
if _, err := c.operatorKubeClient.CoreV1().Secrets(c.operatorNamespace).Get(ctx,
workDriverConfig, metav1.GetOptions{}); errors.IsNotFound(err) {
// TODO: set condition if the source secret doesn't exist,
return cm, reconcileStop,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in helpers.SyncSecret, the synced secret will be deleted if the source secret is not found.
so if return here, the synced workDriverConfig secret will not be deleted if workDriverConfig is deleted.
not sure if this is expected ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the source secret should not be deleted per my understanding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we hav a empty secret slice, and append the "work-driver-config" secret if feature gate is enabled?

Copy link
Member Author

@morvencao morvencao Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated this with a secret set instead of a secret slice, to avoid duplicated resync.

@morvencao morvencao force-pushed the br_work_driver_config branch 2 times, most recently from b2ea778 to 0725a66 Compare April 23, 2024 11:15
Signed-off-by: morvencao <lcao@redhat.com>
@morvencao morvencao force-pushed the br_work_driver_config branch from 0725a66 to 95c0509 Compare April 23, 2024 11:42
for _, secretName := range secretNames.UnsortedList() {
// check the source secret explicitly as the 'helpers.SyncSecret' doesn't return an error
// when the source secret is not found.
if _, err := c.operatorKubeClient.CoreV1().Secrets(c.operatorNamespace).Get(ctx,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this additional get for now. It could happen this should return success if the secret does not exist

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this.

config.ClusterManagerNamespace,
secretName,
[]metav1.OwnerReference{},
); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, we should append the err to a err slice and return aggregated err finally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added aggregated error.

Signed-off-by: morvencao <lcao@redhat.com>
@morvencao morvencao force-pushed the br_work_driver_config branch from 7289013 to 6f5c4ee Compare April 24, 2024 03:24
@morvencao
Copy link
Member Author

kindly ping @qiujian16

@zhiweiyin318
Copy link
Member

/lgtm

Copy link
Member

@qiujian16 qiujian16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Copy link
Contributor

openshift-ci bot commented Apr 24, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: morvencao, qiujian16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 7154863 into open-cluster-management-io:main Apr 24, 2024
14 checks passed
@morvencao morvencao deleted the br_work_driver_config branch April 24, 2024 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants