Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change/local only medusaconfig #1267

Merged
merged 54 commits into from
Apr 16, 2024
Merged

Conversation

Miles-Garnsey
Copy link
Member

@Miles-Garnsey Miles-Garnsey commented Apr 3, 2024

What this PR does:

Deprecates functionality allowing the usage of MedusaConfigurations from remote namespaces within the K8ssandraCluster.

Ensures that the webhook throws an error when one is created.

Replaces the existing secret copy mechanism with the standard ReplicatedSecret replication mechanism, ensuring that the downstream target secret will be updated in future.

Which issue(s) this PR fixes:
Fixes #1266, #1217

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated
  • CHANGELOG.md updated (not required for documentation PRs)
  • CLA Signed: DataStax CLA

CHANGELOG/RELEASE-NOTES.md Show resolved Hide resolved
CHANGELOG/CHANGELOG-1.14.md Outdated Show resolved Hide resolved
controllers/k8ssandra/medusa_reconciler.go Outdated Show resolved Hide resolved
controllers/k8ssandra/medusa_reconciler.go Outdated Show resolved Hide resolved
Copy link

codecov bot commented Apr 5, 2024

Codecov Report

Attention: Patch coverage is 76.29630% with 32 lines in your changes are missing coverage. Please review.

Project coverage is 57.56%. Comparing base (1864ddb) to head (62157fd).
Report is 3 commits behind head on main.

❗ Current head 62157fd differs from pull request most recent head 4e73cb4. Consider uploading reports for the commit 4e73cb4 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1267      +/-   ##
==========================================
+ Coverage   57.43%   57.56%   +0.12%     
==========================================
  Files         103      103              
  Lines       10808    10895      +87     
==========================================
+ Hits         6208     6272      +64     
- Misses       4063     4082      +19     
- Partials      537      541       +4     
Files Coverage Δ
apis/medusa/v1alpha1/medusaconfiguration_types.go 5.55% <ø> (ø)
apis/k8ssandra/v1alpha1/k8ssandracluster_types.go 43.20% <0.00%> (-1.10%) ⬇️
...pis/k8ssandra/v1alpha1/k8ssandracluster_webhook.go 74.13% <85.71%> (+1.41%) ⬆️
pkg/utils/hash.go 0.00% <0.00%> (ø)
...ntrollers/medusa/medusaconfiguration_controller.go 79.36% <76.47%> (-3.66%) ⬇️
controllers/replication/secret_controller.go 65.21% <71.42%> (+0.20%) ⬆️
controllers/k8ssandra/medusa_reconciler.go 71.71% <80.28%> (+5.66%) ⬆️

... and 4 files with indirect coverage changes

@Miles-Garnsey
Copy link
Member Author

I almost have this sorted but I do have one reservation. I've gone through and made all of the MedusaConfigs in the existing tests share a namespace with the K8ssandraCluster. This allows the tests to pass because the webhook doesn't reject the request.

However, there is a potential issue here where we're not actually testing the new ReplicatedSecret functionality anywhere, because the tests now all assume the new codepath. I think perhaps we're going to need to have an isolated test which disables the webhook and that will allow MedusaConfigs to be created in a separate namespace to the K8ssandraCluster.

@Miles-Garnsey
Copy link
Member Author

Miles-Garnsey commented Apr 10, 2024

Implemented in this PR so far:

  1. Fix kustomize so that it uses a project-local version not system version (critical for local runs now that we need to pin versions).
  2. Fix controller-gen etc. so that it works on ARM macs.
  3. Fix port used by registry container so that it works on Macs.
  4. Prevent future creation of K8ssandraClusters with non-ns-local MedusaConfigs.
  5. Update existing logic to reconcile Medusa bucket secrets using ReplicatedSecrets.
  6. Fix a latent bug in the implementation of reconcileMedusa where the controller would look for the MedusaConfig in the DC's namespace, not under the MedusaConfigurationRef's namespace reference (ec6e2ba). This would have caused this functionality to fail under most circumstances.
  7. Fix another long standing bug where it appears that ReplicatedSecrets were not namespace scoped - i.e. they would actually pick up secrets whose labels match their selector from the whole cluster, not just their local ns.

This is far beyond the original scope of the PR, I can create tickets for these for posterity if we want them, but given how much time I’ve spent already I thought I’d just leave it for now unless someone wants them.

Not implemented in this PR so far:

  1. Cleanup of the ReplicatedSecrets that we're now creating when they're not local. We do add an OwnerReference if the ReplicatedSecret is local. This should mean it gets cleaned up on cluster deletion.
  2. A test that a non-local namespace MedusaConfig actually works. This requires turning off the validation webhook for a group of tests. Given the above note that this functionality is currently broken anyway, I'm not sure that testing that this works is even a priority.

If we do want to implement item (2) here, is there anywhere else where we've disabled the webhook for a group of tests? I'll try to avoid creating more new stuff if possible.

@@ -254,12 +253,12 @@ func (r *K8ssandraClusterReconciler) mergeStorageProperties(
}
storageProperties := &medusaapi.MedusaConfiguration{}
// Deprecated: This code path can be removed at version 1.17, as MedusaConfigs should now always be namespace-local to the K8ssandraCluster referencing them.
configNamespace := utils.FirstNonEmptyString(medusaSpec.MedusaConfigurationRef.Namespace, namespace)
configNamespace := utils.FirstNonEmptyString(medusaSpec.MedusaConfigurationRef.Namespace, desiredKc.Namespace)
Copy link
Contributor

@rzvoncek rzvoncek Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 42 we use FirstNonEmptyString to pick from two places that can possibly carry the information. That pick would be effective all the way here because of the namespace variable. Now that you removed the variable, there might be a case where the desiredKc won't have the namespace set.

I'm also trying to remember why I put the FirstNonEmptyString in the first place. The strongest memory is that tests needed it (it's really sad that tests require you to alter the actual code). The other option I can think of is a multicluster setup where the cross-namespace reference actually happens.

Copy link
Contributor

@adejanovski adejanovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, it now behaves as I expect it to 👏
The naming of the new replicated secret needs fixing.

pkg/utils/hash.go Outdated Show resolved Hide resolved
@@ -42,7 +42,7 @@ EOF
}

registry_name='kind-registry'
registry_port='5000'
registry_port='5001'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: this will break other folks setup and we should rather fix our laptops to avoid the conflict.
Could we keep 5000 as default and override it based on an env variable instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it break the setups of others? My understanding is that the airplay service on Mac runs on port 5000 and this is consistent across all MacOS machines. I do not know if you can configure this port but I suspect it might take some finagling around relatively deep in the system.

Is there another commonly used service on port 5000 which others in the team are using and which will block this fix from working? I know we can't guarantee that nothing is on 5000 globally, but if there is a service on that port on the most common OS used in the team, trying to use it for the registry seems like a bad idea.

test/kustomize/kustomize.go Outdated Show resolved Hide resolved
@Miles-Garnsey
Copy link
Member Author

I've just extended the timeout on the TestOperator/CreateSingleMedusaJob test, since it is passing locally, failing often on CI. I'm pretty sure it is just slow.

Copy link
Contributor

@adejanovski adejanovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: the original secret (the one referenced by the MedusaConfiguration) gets deleted when a K8ssandraCluster which references the MedusaConfiguration gets deleted.
IIRC, you need to place the "orphan" label on the secret so that it doesn't get deleted.
What's funny is that indeed the replicas of this secret didn't get cleaned up due to I think known bugs in the ReplicatedSecrets API :D

@Miles-Garnsey
Copy link
Member Author

Miles-Garnsey commented Apr 11, 2024

Issue: the original secret (the one referenced by the MedusaConfiguration) gets deleted when a K8ssandraCluster which references the MedusaConfiguration gets deleted.

That should not be happening. It is not owned by the K8ssandraCluster. Under what circumstances are you seeing this behaviour, what namespaces are the various resources in?

There should now be cleanup of the ReplicatedSecret because it IS owned, any failure to clean up the replicas are going to be out of scope here as this is indeed a rRplicatedSecret issue.

EDIT: no that's not it either. This code goes and selects secrets according to their labels. And if you look at the labels on the secret that is getting deleted, it doesn't have the replication labels that should mark it for deletion...

@Miles-Garnsey
Copy link
Member Author

Miles-Garnsey commented Apr 12, 2024

Ignore this... I figured it out. The ReplicatedSecret's finalizer code must not be taking account of the prefixed secret name. So it is cleaning up the original (which is in the same namespace, which wasn't previously allowed because names would have clashed) and not the replica.

EDIT: no that's not it either. The secret deletion happens according to the labels, and this secret doesn't have labels that should mark it for deletion:

apiVersion: v1
data:
  credentials: W2RlZmF1bHRdCmF3c19hY2Nlc3Nfa2V5X2lkID0gazhzc2FuZHJhCmF3c19zZWNyZXRfYWNjZXNzX2tleSA9IGs4c3NhbmRyYQ==
kind: Secret
metadata:
  annotations:
    k8ssandra.io/resource-hash: g7j7Kyr5Y5rToIlvnfAhBiVMPzbCys6TVWjW01jqG2A=
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Secret","metadata":{"annotations":{},"name":"global-bucket-key","namespace":"default"},"stringData":{"credentials":"[default]\naws_access_key_id = k8ssandra\naws_secret_access_key = k8ssandra"},"type":"Opaque"}
  creationTimestamp: "2024-04-12T01:25:24Z"
  labels:
    k8ssandra.io/medusa-storage-secret: f4148e8b
  name: global-bucket-key
  namespace: default
  resourceVersion: "1741"
  uid: eda84d8a-3b28-403b-b072-5a6abf776c11
type: Opaque

…Namespace (with a prefix) and also the DC1 namespace.
…d only be replicating secrets from their local namespace.
CHANGELOG/RELEASE-NOTES.md Outdated Show resolved Hide resolved
* [BUGFIX] [#1266](https://github.com/k8ssandra/k8ssandra-operator/issues/1266) MedusaConfigurations must now be namespace local to the K8ssandraCluster they are attached to. Additionally, ReplicatedSecrets should only pick up secrets from their local namespace to replicate.
* [BUGFIX] [#1217](https://github.com/k8ssandra/k8ssandra-operator/issues/1217) Medusa storage secrets now use a ReplicatedSecret for synchronization, fixing an issue where changes to the secrets were not propagating. Additionally, fix a number of issues with local testing on ARM Macs.
* [BUGFIX] [#1253](https://github.com/k8ssandra/k8ssandra-operator/issues/1253) Medusa storage secrets are now labelled with a unique label.
* [FEATURE] [#1260](https://github.com/k8ssandra/k8ssandra-operator/issues/1260) Update controller-gen to version 0.14.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why do we have new entries in both 1.14 and 1.15?

Miles-Garnsey and others added 2 commits April 16, 2024 16:12
Co-authored-by: Alexander Dejanovski <alex.dejanovski@datastax.com>
Copy link

sonarcloud bot commented Apr 16, 2024

Quality Gate Passed Quality Gate passed

Issues
11 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@Miles-Garnsey Miles-Garnsey merged commit 41ea670 into main Apr 16, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MedusaConfigs should be namespace local Updating a medusaConfiguration referenced secret should propagate
3 participants