Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to fetch metrics from external metrics API: Internal error occurred: DatadogMetric is invalid #4415

Closed
lejlapri1 opened this issue Mar 30, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@lejlapri1
Copy link

lejlapri1 commented Mar 30, 2023

Report

When deploying KEDA using a Datadog Scaler, the HPA created by KEDA returns
ScalingActive False FailedGetExternalMetric the HPA was unable to compute the replica count: unable to get external metric infra-integ/s0-datadog-avg-nginx-net-request_per_s/&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: dummy-app-flagger,},MatchExpressions:[]LabelSelectorRequirement{},}: unable to fetch metrics from external metrics API: Internal error occurred: DatadogMetric is invalid, err: Global error (all queries) from backend and the target value is always <unknown>

kubectl describe hpa keda-hpa-dummy-app-flagger
Screen Shot 2023-03-30 at 2 30 06 PM

kubectl describe datadogmetric dcaautogen-56cd4bb9dd5640062b98461f22b0da37904cf7
Screen Shot 2023-03-30 at 2 30 22 PM

The scaled object:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-trigger-auth-datadog-secret
  namespace: infra-integ
spec:
  secretTargetRef:
    # Required: API key for your Datadog account
  - parameter: apiKey
    name: keda-datadog-vault-secret
    key: apiKey
    # Required: APP key for your Datadog account
  - parameter: appKey
    name: keda-datadog-vault-secret
    key: appKey
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: keda-datadog-vault-secret
  namespace: infra-integ
spec:
  refreshInterval: 30s
  secretStoreRef:
    name: vault-infra
    kind: ClusterSecretStore
  target:
    name: keda-datadog-vault-secret
  dataFrom:
  - extract:
      key: keda/datadog
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: dummy-app-flagger
  namespace: infra-integ
spec:
  scaleTargetRef:
    name: dummy-app-flagger
  minReplicaCount: 1
  maxReplicaCount: 3
  pollingInterval: 5
  triggers:
  - type: datadog
    metricType: "AverageValue"
    metadata:
      age: "120"
      query: "avg:nginx.net.request_per_s{*}"
      queryValue: "1"
      metricUnavailableValue: "0"
    authenticationRef:
      name: keda-trigger-auth-datadog-secret

The datadog cluster agent is configured to use the datadog metrics provider:

clusterAgent:
  admissionController:
    enabled: false
  tokenExistingSecret: datadog-cluster-token
  createPodDisruptionBudget: true
  metricsProvider:
    enabled: true
    useDatadogMetrics: true

We have tried using KEDA v2.7.0, 2.6.1, and 2.6.0 and see the same error with each version.

When following the Datadog docs to create an HPA, we are able to get metrics and see scaling

Expected Behavior

Be able to get metrics returned from Datadog and scale appropriately

Actual Behavior

Error parsing the metric name and therefore unable to return a metric value to the HPA

Steps to Reproduce the Problem

  1. Install KEDA v2.10 with the default values.yaml
  2. Create the scaled object with the datadog, trigger authentication, and deploy
  3. kubectl describe hpa keda-hpa-dummy-app-flagger and kubectl describe datadogmetric dcaautogen-56cd4bb9dd5640062b98461f22b0da37904cf7

Logs from KEDA operator

Log from cluster agent

KEDA 2.10.0:

clusterAgent:
  metricsProvider:
    useDatadogMetrics: true
2023-03-28 22:14:41 UTC | CLUSTER | ERROR | (pkg/clusteragent/externalmetrics/provider.go:116 in GetExternalMetric) | ExternalMetric query failed with error: DatadogMetric is invalid, err: Global error (all queries) from backend
2023-03-28 22:14:56 UTC | CLUSTER | ERROR | (pkg/clusteragent/externalmetrics/provider.go:116 in GetExternalMetric) | ExternalMetric query failed with error: DatadogMetric is invalid, err: Global error (all queries) from backend
2023-03-28 22:14:59 UTC | CLUSTER | ERROR | (pkg/util/kubernetes/autoscalers/datadogexternal.go:74 in queryDatadogExternal) | Error while executing metric query avg:s0-datadog-avg-nginx-net-request_per_s{scaledobject.keda.sh/name:dummy-app-flagger}.rollup(30): API returned error: Error parsing query:
unable to parse avg:s0-datadog-avg-nginx-net-request_per_s{scaledobject.keda.sh/name:dummy-app-flagger}.rollup(30): Rule 'scope_expr' didn't match at '-datadog-avg-nginx-n' (line 1, column 7).
2023-03-28 22:14:59 UTC | CLUSTER | ERROR | (pkg/clusteragent/externalmetrics/metrics_retriever.go:81 in retrieveMetricsValues) | Unable to fetch external metrics: Error while executing metric query avg:s0-datadog-avg-nginx-net-request_per_s{scaledobject.keda.sh/name:dummy-app-flagger}.rollup(30): API returned error: Error parsing query:
unable to parse avg:s0-datadog-avg-nginx-net-request_per_s{scaledobject.keda.sh/name:dummy-app-flagger}.rollup(30): Rule 'scope_expr' didn't match at '-datadog-avg-nginx-n' (line 1, column 7).
2023-03-28 22:15:11 UTC | CLUSTER | ERROR | (pkg/clusteragent/externalmetrics/provider.go:116 in GetExternalMetric) | ExternalMetric query failed with error: DatadogMetric is invalid, err: Global error (all queries) from backend

When trying KEDA 2.7 and without a DatadogMetric object by setting:

clusterAgent:
  metricsProvider:
    useDatadogMetrics: false
2023-03-30 17:48:25 UTC | CLUSTER | ERROR | (pkg/util/kubernetes/autoscalers/autoscalers.go:108 in inspectHPAv2) | cannot build external metric value for HPA infra-integ/keda-hpa-dummy-app-flagger, skipping: metric name "s0-datadog-avg-nginx-net-request_per_s" is invalid

There are no errors in the operator:

1.6801999269360688e+09	INFO	controller.scaledobject	Creating a new HPA	{"reconciler group": "keda.sh", "reconciler kind": "ScaledObject", "name": "dummy-app-flagger", "namespace": "infra-integ", "HPA.Namespace": "infra-integ", "HPA.Name": "keda-hpa-dummy-app-flagger"}
1.680199927214943e+09	INFO	controller.scaledobject	Initializing Scaling logic according to ScaledObject Specification	{"reconciler group": "keda.sh", "reconciler kind": "ScaledObject", "name": "dummy-app-flagger", "namespace": "infra-integ"}
1.6801999272298207e+09	INFO	controller.scaledobject	Reconciling ScaledObject	{"reconciler group": "keda.sh", "reconciler kind": "ScaledObject", "name": "dummy-app-flagger", "namespace": "infra-integ"}
1.6801999422434523e+09	INFO	controller.scaledobject	Reconciling ScaledObject	{"reconciler group": "keda.sh", "reconciler kind": "ScaledObject", "name": "dummy-app-flagger", "namespace": "infra-integ"}

KEDA Version

2.10.0

Kubernetes Version

1.24

Platform

Amazon Web Services

Scaler Details

Datadog

Anything else?

We've tried adjusting the age and the metricUnavailableValue as well and have had no luck.
We have tried using KEDA v2.7.0, 2.6.1, and 2.6.0 and see the same error with each version.

@lejlapri1 lejlapri1 added the bug Something isn't working label Mar 30, 2023
@JorTurFer
Copy link
Member

Hi,
Could you share the result of this command? kubectl get apiservice | grep v1beta1.external.metrics.k8s.io
Datadog's metrics server exposes its metrics through external.metrics.k8s.io (the same endpoint that KEDA uses).

This means that both systems can't work together and this could be the reason behind the behavior you are describing

@zroubalik
Copy link
Member

Yeah, most likely it is this problem.

@lejlapri1
Copy link
Author

The output of that command is
v1beta1.external.metrics.k8s.io infra-prod/datadog-cluster-agent-metrics-api True 410d

When we were setting up KEDA, we did get an error about the metrics server so we added:

$patch: delete
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.external.metrics.k8s.io

to remove the duplicate metrics server

@JorTurFer
Copy link
Member

The problem is that even though KEDA operator is setting the HPA correctly, the HPA controller can't get the value for scaling because the endpoint is mapped to infra-prod/datadog-cluster-agent-metrics-api.
You need to replace it with KEDA apiservice if you want to use KEDA, but in that case the datadog-cluster-agent-metrics-api will stop working.

You cannot use KEDA and datadog metrics server at the same time

@lejlapri1
Copy link
Author

Oh interesting. Is there a recommended path forward? Does it make more sense to use just autoscaling with Datadog queries instead? What is the use case for the Datadog scaler with KEDA if you can't use the metrics server?

We were following this blogpost to implement the Datadog Scaler for KEDA which didn't mention any issues with the metrics server and we couldn't find other documentation mentioning not being able to use the datadog metrics server while using KEDA.

@JorTurFer
Copy link
Member

JorTurFer commented Apr 3, 2023

Is there a recommended path forward?

Do you mean a migration path? We don't have any migration path sorry

What is the use case for the Datadog scaler with KEDA if you can't use the metrics server?

Datadog scaler uses the Datadog SDK to get the metrics directly from the Datadog API, that's why the Datadog metrics server isn't necessary as KEDA does that job of requesting the metrics from Datadog Api

You can find more information about the limitation in the upstream issue: kubernetes-sigs/custom-metrics-apiserver#70

@tomkerkhove
Copy link
Member

Converting to discussion

@kedacore kedacore locked and limited conversation to collaborators Apr 26, 2023
@tomkerkhove tomkerkhove converted this issue into discussion #4477 Apr 26, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

4 participants