Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create Pod Disruption Budget for DCA and CCR deployments #1454

Merged
merged 14 commits into from
Nov 8, 2024

Conversation

swang392
Copy link
Contributor

@swang392 swang392 commented Oct 9, 2024

What does this PR do?

Create Pod Disruption budgets for cluster agent and cluster checks runners.

Motivation

#864 We had PDB in version 0.8.x but it was not included in later versions of the operator.

Additional Notes

Anything else we should know when reviewing?

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

  • Agent: vX.Y.Z
  • Cluster Agent: vX.Y.Z

Describe your test plan

  • create a kind cluster
  • in datadog-agent.yaml:
    • enable PDB for DCA and CLC -- override.ClusterAgent.createPodDisruptionBudget = true
    • add replicas: override.ClusterAgent.replicas = 3
  • deploy operator and agent to your kind cluster
  • run kubectl get poddisruptionbudgets and you should see the DCA and CLC PDBs created. CLC has a maxUnavailable of 1 pod, and DCA has a minAvailable of 1 pod.
  • run kubectl get pods -o wide to see which node the DCA and CLC pods are on.
  • run k drain <node> --ignore-daemonsets --delete-emptydir-data. You should get a message like this: Cannot evict pod as it would violate the pod's disruption budget.
  • run kubectl get pods -o wide to see the status of the nodes now. There should only be one running pod for DCA, and only one pending pod for CLC.
  • run kubectl uncordone <node>
  • run kubectl get pods -o wide-- everything should be back to running.
  • remove the flags for PDB and reinstall the agent -- there are no more PDBs created as it is not enabled by default. If you drain a node without PDB, you shouldn't see any error statements and all the pods on that node will have a status of Pending until the node is uncordoned.

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label

@swang392 swang392 added this to the v1.10.0 milestone Oct 9, 2024
@swang392 swang392 added the enhancement New feature or request label Oct 9, 2024
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request does not contain a valid label. Please add one of the following labels: bug, enhancement, refactoring, documentation, tooling, dependencies

@swang392 swang392 modified the milestones: v1.10.0, v1.11.0 Oct 9, 2024
@codecov-commenter
Copy link

codecov-commenter commented Oct 9, 2024

Codecov Report

Attention: Patch coverage is 89.65517% with 6 lines in your changes missing coverage. Please review.

Project coverage is 48.79%. Comparing base (58f766a) to head (b10635b).
Report is 36 commits behind head on main.

Files with missing lines Patch % Lines
...l/controller/datadogagent/override/dependencies.go 64.70% 4 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1454      +/-   ##
==========================================
+ Coverage   48.70%   48.79%   +0.09%     
==========================================
  Files         224      225       +1     
  Lines       19845    20025     +180     
==========================================
+ Hits         9666     9772     +106     
- Misses       9670     9742      +72     
- Partials      509      511       +2     
Flag Coverage Δ
unittests 48.79% <89.65%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
api/datadoghq/v2alpha1/datadogagent_types.go 100.00% <ø> (ø)
...ler/datadogagent/component/clusteragent/default.go 96.89% <100.00%> (+0.03%) ⬆️
...oller/datadogagent/component/clusteragent/utils.go 28.35% <100.00%> (+28.35%) ⬆️
...adogagent/component/clusterchecksrunner/default.go 10.67% <100.00%> (+9.60%) ⬆️
...controller/datadogagent/controller_reconcile_v2.go 51.89% <ø> (-0.41%) ⬇️
...ller/datadogagent/feature/enabledefault/feature.go 41.21% <ø> (+6.64%) ⬆️
...l/controller/datadogagent/override/dependencies.go 56.04% <64.70%> (+1.98%) ⬆️

... and 5 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58f766a...b10635b. Read the comment docs.

@swang392 swang392 marked this pull request as ready for review October 10, 2024 19:24
@swang392 swang392 requested review from a team as code owners October 10, 2024 19:24
Copy link
Contributor

@levan-m levan-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments/questions. Would be nice if you could add a test here.
Otherwise looks good to me!

// Set CreatePodDisruptionBudget to true to create a PodDisruptionBudget for this component.
// Not applicable for the Node Agent. A Cluster Agent PDB is set with 1 min available pod, and a Cluster Checks Runner PDB is set with 1 max unavailable pod.
// +optional
CreatePodDisruptionBudget *bool `json:"createPodDisruptionBudget,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we ever get a request for making PDB configurable?! I guess it's unlikely since we haven't made it configurable in helm.

api/datadoghq/v2alpha1/datadogagent_types.go Outdated Show resolved Hide resolved
@swang392 swang392 merged commit 54f43d1 into main Nov 8, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants