Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DCA][Flare] Adds Helm user values, cluster Agent manifest, node Agent manifest v2 #16313

Merged
merged 18 commits into from
Mar 29, 2023

Conversation

tbavelier
Copy link
Member

Important note

What does this PR do?

  • Adds redacted (using the flare redaction system) Helm user values, cluster Agent deployment manifest, node Agent daemonset manifest to the cluster Agent flare by retrieving them from the API server

Motivation

  • https://datadoghq.atlassian.net/browse/CONT-3734 :
    • As containers support team, I would like to get all the info needed directly from the flare. We usually ask customer to provide the helm value files and the daemonset|deployment manifests
    • As customer, I would like that my support ticket require the least possible roundtrip iterations to speedup the resolution.

Additional Notes

  • This requires adding additional environment variables to the cluster Agent deployment :
    • The chart release name
    • The cluster Agent deployment name
    • The node Agent daemonset name
      • These env vars will be added in a Helm chart update
  • Since Helm v3, the default backend to store the release information is secrets, so the Cluster Agent needs the associated RBAC to access secrets in its namespaces. Even if the storage driver is configmap, by default, the DCA does not have necessarily the RBAC unless the Helm check is enabled, so it requires a Helm chart change to add RBAC for it and disable it if not desired (on top of the required env vars)

Possible Drawbacks / Trade-offs

  • Requires the cluster Agent to access secrets/configmaps in its own namespace, which will require a Helm chart change to enable it by default (and add an option to disable it)
  • Requires the flare redaction utility to adequately handles these objects

Describe how to test/QA your changes

3 scenarios to test :

  • Default Helm v3 configuration (using HELM_DRIVER=secret)
  • "Legacy" Helm configuration (using HELM_DRIVER=configmap)
  • Not using Helm to deploy
  1. Use Datadog Helm chart 3.23+ with clusterAgent.rbac.flareAdditionalPermissions set to true (default)
  2. Deploy the cluster Agent and run the agent flare command.
  3. Check the built flare to establish the presence of :
    • agent-daemonset.yaml
    • cluster-agent-deployment.yaml
    • helm-values.yaml

Image 2023-03-22 at 9 45 48 AM

4. Ensure potential sensitive values are redacted, such as API keys

Reviewer's Checklist

  • If known, an appropriate milestone has been selected; otherwise the Triage milestone is set.
  • Use the major_change label if your change either has a major impact on the code base, is impacting multiple teams or is changing important well-established internals of the Agent. This label will be use during QA to make sure each team pay extra attention to the changed behavior. For any customer facing change use a releasenote.
  • A release note has been added or the changelog/no-changelog label has been applied.
  • Changed code has automated tests for its functionality.
  • Adequate QA/testing plan information is provided if the qa/skip-qa label is not applied.
  • At least one team/.. label has been applied, indicating the team(s) that should QA this change.
  • If applicable, docs team has been notified or an issue has been opened on the documentation repo.
  • If applicable, the need-change/operator and need-change/helm labels have been applied.
  • If applicable, the k8s/<min-version> label, indicating the lowest Kubernetes version compatible with this feature.
  • If applicable, the config template has been updated.

@tbavelier tbavelier added this to the 7.45.0 milestone Mar 28, 2023
@tbavelier tbavelier marked this pull request as ready for review March 29, 2023 08:40
@tbavelier tbavelier requested review from a team as code owners March 29, 2023 08:40
@tbavelier tbavelier merged commit afc8d3f into main Mar 29, 2023
@tbavelier tbavelier deleted the tbavelier/dca_additional_info branch March 29, 2023 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants