Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault and External Secrets Operator integration #98

Merged
merged 54 commits into from
Jan 26, 2023
Merged

Conversation

schnatterer
Copy link
Member

@schnatterer schnatterer commented Nov 10, 2022

  • Implements optional Vault and External Secrets Operator integration via a --vault=dev|prod parameter.
  • Initializes Vault with static secrets in dev mode via post-startup hook
  • Adds example stage-specific SecretStores to ArgoCD and ExternalSecrets to an example app (ArgoCD-only)
  • Mounts secret into example app, exposes via HTTP for demo purposes
  • Extends docs .Also creates a simpler diagram with all features at the top.
    Note that the diagrams in README won't show until merged to master (because we use the plantuml proxy for rendering)

Can be tested with preview image: ghcr.io/cloudogu/gitops-playground:159954b

In order to implment this feature properly, some major refactorings needed to be done:

  • 2eaee63
    • Simplify package structure of Grovvy code. There were more packages than classes. Well, almost. 2eaee63
    • modules were renamed to features to better reflect what they are to the GitOps playgrounds, e.g. monitoring and secrets managements are optional features
    • Disentangled MetricsModule. Mailhog is always on. ArgoCD Notifications is always on. Only Prometheus Stack is enabled/disabled.
  • ef07e1c Mailhog and Prometheus Stack can now be deployed for all GitOps operators
  • 49fe9bd Create a declarative default configuration in ApplicationConfigurator -> this holds a lot of constants that could be overwritten easily using a config file, no need to add dozens of CLI params.
  • b86fc65 CommandExecutor now fails on error, instead of silently continuing. A bit like set -o errexit. This lead to a couple of improvments for idempotence. e.g. where something failed in the past but was never recognized. Example: git push of control app.
  • b4559cd Make GitClient more generic: Remove control-app repo specifics, make explicitly stateful. Rename to ScmmRepo to better match what it is. A representation of a repostitory in SCMM, not a generic client
  • bfa2f87 Remove code for setting the SCMM default branch

Minor unrelated fixes

- Rename project "monitoring" to more generic "system"
- Move apps in "system" project to common folder in applications

Avoid a bigger mess before adding ESO and vault operators to project.
In control-app and in repo.
Prepares for more subfolders for ESO and vault, where there will be more sub folders/applications.

webconfig-cm.yaml seems not to be needed.
- Simplify package structure
- Rename module to feature
- Rename metrics to argocd (because without the metrics module, the control-app repo would be empty!)
- Make features tell if they are enabled or disabled.
- Provide option for disabling features
- Remove unnecessary classes:
  - Exception (we only need a simple error message here!)
  - ModuleRepository: What was it's purpose? Now part of Application
  - ApplicationTest: What did it test anyway? Log messages?!
Install via Helm directly (not ArgoCD), because

- Difficult to implement apply of secret-store.yaml -> separate repo necessary?
- This way ESO can be used with all operators, not only argoCD
E.g. when playground is applied a 2nd time, #idempotence
As part of the control app
Advantage:
* Can be used from all GitOps operators in playground
* Uniform way, sam method of deployment as for Jenkins, SCMM, Registry, External Secrets, Vault
Mainly to store e.g. helm chart version in single place.
But also this allows for reading configuration from a file in the future.

Introduce more generic terms monitoring and secrets in internal config.
Can't ue application.yml, because it will not be bundled into static image.
E.g. when playground is applied a 2nd time, #idempotence

Also extends command executor to also return stderr and exit code. Allow for suppressing failure.

This was done because a different implementation for areChangesStagedForCommit() would be:
"git update-index --refresh && git diff-index --exit-code HEAD --"
which return exit code 1.

A requirement such as ignoring failing commands might come back in the future so let's keep it.
- Install chart
- Create Token secrets for secretStores
- Create external secrets
- Integrate secret into example
- Move example app repo creation from apply.sh to groovy
- Make repo code init testable
- Add missing ArgoCD.groovy tests

-> Fail "GitClient" is stateful. Needs refactoring
4.2.9 is no longer working (even with default values) because of plugin dependency

"Plugin workflow-aggregator ... depends on configuration-as-code:1559.v38a_b_2e3b_6b_b_7, but there is an older version defined on the top level"
The constructor now tells us, that it's bound to a repo and we need to use multiple instances for multiple repos.
This reflects more clearly what it is: It is not a client, it is a single repo (including the client) and has SCMM-specifics.
Newer versions of SCMM use "main" by default.
We might want to bring this back when we migrate the SCMM API calls from bash to groovy. Or we use the SCMM client, or native Groovy HTTP methods.
Never finished syncing in argocd, because service remained on type LB, where no external IP can be received on local cluster.
Timing problem: Hook is executed before Vault HTTP is up. This leads to the hook failing, which leads to the pod restarting. Eventually we have a crash loop backoff.
Solution: Wait for HTTP to become ready.
Staging app was missing, was pushed to wrong path in gitops repo.
External Secrets Operator seems to fail on properties containing minus, such as:
property: nginx-secret
Group by repo. Easier to maintain, because there are a number of other repos waiting to be migrated from apply.sh
Avoid errors such as:
Error: Service "vault-ui" is invalid: spec.ports[0].nodePort: Invalid value: 8200: provided port is not in the valid range. The range of valid ports is 30000-32767
So we hopefully can keep better track of what needs to be kept uptodate
Latest versions finally fix Jenkins builds on k8s nodes no longer work
🤷‍♂️
Exception in thread "main" com.oracle.svm.core.jdk.UnsupportedFeatureError: Runtime reflection is not supported for public abstract java.util.Iterator java.util.List.iterator()
        at com.oracle.svm.core.util.VMError.unsupportedFeature(VMError.java:89)
        at java.lang.reflect.Method.acquireMethodAccessor(Method.java:77)
        at java.lang.reflect.Method.invoke(Method.java:566)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1268)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
        at org.codehaus.groovy.runtime.InvokerHelper.invokePojoMethod(InvokerHelper.java:1017)
        at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:1008)
        at org.codehaus.groovy.runtime.InvokerHelper.asIterator(InvokerHelper.java:609)
        at org.codehaus.groovy.runtime.DefaultGroovyMethods.removeAll(DefaultGroovyMethods.java:5171)
        at com.cloudogu.gitops.features.argocd.ArgoCD.removeObjectFromList(ArgoCD.groovy:117)
Because SCMM and Jenkins won't work without the fixes from this branch.
Bitnami removes all versions older than 6 months from their helm chart index.
bitnami/charts#10833

To make sure we're not forced to update again, switch to full index.
# Conflicts:
#	scripts/jenkins/init-jenkins.sh
@schnatterer
Copy link
Member Author

  • Merged Branch feature/argocd-2.5 into vault, Otherwise Jenkins and SCMM would no longer have worked. Cherry picking of 5 commits would have been to much for my taste. Also the argocd-2.5 branch has been battle-tested in a training.
  • Had to update NGINX Helm Charts, because Bitnami removes all versions older than 6 months from their helm chart index. To make sure we're not forced to update again, switch to full index.
  • Switched from using the root token to using k8s service accounts and a user account.
    • For one this allows arbitrary --passwords. As the root token is also used as an ID, having special character in --password resulted in Error initializing Dev mode: failed to create root token with ID "<PW with special chars>".
    • Not using the root token is much more realistic!
    • If necessary, the root token can be found on the log

@schnatterer schnatterer force-pushed the feature/vault branch 2 times, most recently from 1a3dbf8 to 46fabee Compare January 16, 2023 15:33
*  For one this allows arbitrary `--password`s.  As the `root` token is also used as an ID, having special character in `--password` resulted in `Error initializing Dev mode: failed to create root token with ID "<PW with special chars>"`.
* Not using the root token is much closer to production!
* If necessary, the root token can be found on the vault-0 pod log
And increased timeout.
Hopefully builds will be less flaky in the future
@schnatterer schnatterer force-pushed the feature/vault branch 2 times, most recently from a4afdbc to 06ef921 Compare January 18, 2023 08:14
transfer vault values.yaml into groovy class

Signed-off-by: pmarkiewka <philipp.markiewka@cloudogu.com>
Copy link
Member Author

@schnatterer schnatterer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did it just work without namespaceOverride?

Signed-off-by: pmarkiewka <philipp.markiewka@cloudogu.com>
Signed-off-by: pmarkiewka <philipp.markiewka@cloudogu.com>
@pmarkiewka pmarkiewka merged commit 2cbd994 into main Jan 26, 2023
@pmarkiewka pmarkiewka deleted the feature/vault branch January 26, 2023 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants