-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datadog Integration (#3407) #3619
Datadog Integration (#3407) #3619
Conversation
* datadog-integration: updated consul-server agent telemetry-config.json with dd specific items as well as additional missing VM based options, unit tests, dd unix socket integration, dd agent acl token generation, deployment override failsafes * datadog-integration: updated consul-server agent telemetry-config.json with dd specific items as well as additional missing VM based options, unit tests, dd unix socket integration, dd agent acl token generation | final initial-push * changelog entry update * datadog-integration: updated consul-server agent server.config (enable_debug) and telemetry.config update | enable_debug to server.config * curt pr review changes (minus extraConfig templating verification changes) * global.metrics.AgentMetrics -> global.metrics.enableAgentMetrics * dogstatsd and otlp mutually exclusive verification checks * breaking changes now incorporated into consul.validateExtraConfig helper template function as precheck * extraConfig hash updates post merge conflict update * fix helpers.tpl consul.extraConfig from merge --> /consul/tmp/extra-config/extra-from-values.json | add labels to rolebinding for datadog secrets * update changelog .txt to match new PR number * updated server-statefulset.yaml to correct ad.datadoghq.com/consul.logs annotation to valid single quote string * fix helpers.tpl consul.extraConfig from merge --> /consul/tmp/extra-config/extra-from-values.json | add labels to rolebinding for datadog secrets * fix helpers.tpl consul.extraConfig from merge --> /consul/tmp/extra-config/extra-from-values.json | add labels to rolebinding for datadog secrets * update UDP dogstatsdPort behavior to exclude including a port value if using a kube service address (as determined by user overrides) * update _helpers.tpl consul.ValidateDatadogConfiguration func to account for using 'https' as protocol => should fail * update server-statefulset.yaml to exclude prometheus.io annotations if enabling datadog openmetrics method for consul server metrics scrape. conflict present with http vs https that breaks openemtrics scrape on consul * update server-statefulset.yaml to exclude prometheus.io annotations if enabling datadog openmetrics method for consul server metrics scrape. conflict present with http vs https that breaks openemtrics scrape on consul * correct otlp protocol helpers.tpl check to lower-case the protocol to match the open-telemetry-deployment.yaml behavior * fix server-acl-init command_test.go for datadog token policy - datacenter should have been dc1 * add in server-statefulset bats test for extraConfig validation testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Nate! It looks like changes from #3000 have leaked into your PR. Those should be removed as we thought they would be too disruptive for upgrades.
@@ -56,7 +57,12 @@ data: | |||
"enabled": true | |||
}, | |||
{{- end }} | |||
"server": true | |||
"server": true, | |||
"leave_on_terminate": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: These extra 'leave_on_terminate' and 'autopilot' settings should be removed as they were deemed destructive.
We need to check the other backports as anything from #3000 should not be in release/1.3.x, release/1.2.x and release/1.1.x (1.4.x is fine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected as recommended by reverting back to release/1.3.x
branch version of affected files.
$ git checkout 'release/1.3.x' -- charts/consul/templates/server-config-configmap.yaml
Re-applied datadog-integration
changes into the following files:
charts/consul/templates/server-config-configmap.yaml
- Reincorporated
enable_debug
intoserver.json
(updatesserver-statefulset.yaml
config-checksum) - Reapplied all datadog and agent metric-related entries into the
telemetry-config.json
- Reincorporated
charts/consul/test/unit/server-statefulset.bats
- Updated
config-configmap
tests to reflectenable_debug
update toserver.json
config"server/StatefulSet: adds config-checksum annotation when extraConfig is blank"
"server/StatefulSet: adds config-checksum annotation when extraConfig is provided"
"server/StatefulSet: adds config-checksum annotation when extraConfig is updated"
- Updated
@@ -17,7 +17,7 @@ metadata: | |||
release: {{ .Release.Name }} | |||
component: server | |||
spec: | |||
maxUnavailable: {{ template "consul.pdb.maxUnavailable" . }} | |||
maxUnavailable: {{ template "consul.server.pdb.maxUnavailable" . }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: This is also from #3000 and should be dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected as recommended by reverting back to release/1.3.x
branch version of affected files.
$ git checkout 'release/1.3.x' -- charts/consul/templates/server-disruptionbudget.yaml charts/consul/test/unit/server-disruptionbudget.bats charts/consul/template/_helpers.tpl
Applied datadog-integration changes back into _helpers.tpl
Re-ran entirety of bats tests using Makefile
- make bats-tests
(all passed)
charts/consul/templates/_helpers.tpl
Outdated
*/}} | ||
{{- define "consul.pdb.maxUnavailable" -}} | ||
{{- define "consul.server.pdb.maxUnavailable" -}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: A bunch of changes from #3000 in here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected as recommended by reverting back to release/1.3.x
branch version of affected files.
$ git checkout 'release/1.3.x' -- charts/consul/templates/server-disruptionbudget.yaml charts/consul/test/unit/server-disruptionbudget.bats charts/consul/template/_helpers.tpl
Applied datadog-integration changes back into _helpers.tpl
Re-ran entirety of bats tests using Makefile
- make bats-tests
(all passed)
charts/consul/test/unit/helpers.bats
Outdated
@@ -348,7 +348,7 @@ load _helpers | |||
[[ "$output" =~ "When the value global.experiments.resourceAPIs is set, global.peering.enabled is currently unsupported." ]] | |||
} | |||
|
|||
@test "connectInject/Deployment: fails if resource-apis is set and admin partitions are enabled" { | |||
@test "connectInject/Deployment: fails if resource-apis is set, v2tenancy is unset, and admin partitions are enabled" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Looks like extra stuff picked up. git checkout 'release/1.3.x' helpers.bats
will allow you to reset the file to the branch it is from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected as recommended by reverting back to release/1.3.x
branch version of affected files.
$ git checkout 'release/1.3.x' -- charts/consul/templates/server-disruptionbudget.yaml charts/consul/test/unit/server-disruptionbudget.bats charts/consul/template/_helpers.tpl
Applied datadog-integration changes back into _helpers.tpl
Re-ran entirety of bats tests using Makefile
- make bats-tests
(all passed)
@@ -97,7 +97,7 @@ load _helpers | |||
--set 'server.replicas=6' \ | |||
. | tee /dev/stderr | | |||
yq '.spec.maxUnavailable' | tee /dev/stderr) | |||
[ "${actual}" = "2" ] | |||
[ "${actual}" = "1" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: This file too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected as recommended by reverting back to release/1.3.x
branch version of affected files.
$ git checkout 'release/1.3.x' -- charts/consul/templates/server-disruptionbudget.yaml charts/consul/test/unit/server-disruptionbudget.bats charts/consul/template/_helpers.tpl
Applied datadog-integration changes back into _helpers.tpl
Re-ran entirety of bats tests using Makefile
- make bats-tests
(all passed)
… re-apply datadog-integration branch changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Backport
This cherry-picked PR has been manually generated from #3407 to be assessed for backporting as automatic cherry-picking using the label failed.
The below text is copied from the body of the original PR.
Changes proposed in this PR
enable_debug
telemetry.disable_hostname
telemetry.enable_host_metrics
telemetry.prefix_filter
telemetry.dogstatsd_addr
telemetry.dogstatsd_tags
/v1/agent/metrics?format=prometheus
endpoint/v1/agent/metrics?format=prometheus
/v1/agent/self
/v1/status/leader
/v1/status/peers
/v1/catalog/services
/v1/health/service
/v1/health/state/any
/v1/coordinate/datacenters
/v1/coordinate/nodes
server-acl-init
token creation for OpenMetrics and Datadog Consul Integration check methods allowing default minimal acl token permission generation for Datadog agent usage as necessary.How I've tested this PR
CONTRIBUTING.md
steps.consul-dev
(main) andconsul-k8s-control-plane-dev
(datadog-integration branch) images on k3d test cluster for each scenario. Test repository here.CONTRIBUTING.md
steps.bats ./charts/consul/test/unit --jobs 8
- ran successfully for all tests.How I expect reviewers to test this PR
Checklist
Overview of commits