Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prometheus-to-sd] Allow adding a prefix/suffix to the metrics names #198

Open
yanana opened this issue Aug 23, 2018 · 12 comments
Open

[prometheus-to-sd] Allow adding a prefix/suffix to the metrics names #198

yanana opened this issue Aug 23, 2018 · 12 comments

Comments

@yanana
Copy link

yanana commented Aug 23, 2018

This is a feature request. I'd like to add prefix or suffix to the scraped metrics' names. Since in a case when collecting metrics from some components whose names of metrics are difficult to change, I'd like to distinguish such metrics on Stackdriver; especially by the environment the metrics are collected (e.g., staging or production).

@kawych
Copy link
Member

kawych commented Aug 23, 2018

Have you tried to:

  • Extend specified stackdriver prefix, for example: --stackdriver-prefix=custom.googleapis.com/prod
  • Extend component name, for example: --source=prod/my-app:http://localhost:8080/metrics
    Both ways you should be able to achieve your goal. Alternatively, you can add a label to your metric (e.g. "env"). I think it will be pushed to Stackdriver, @loburm can confirm.

@loburm
Copy link
Member

loburm commented Aug 23, 2018

Hi Shun,
As was written above by @kawych there are two ways to how you can influence metrics prefix. But that happens per component. Custom suffixes are not supported.

@yanana
Copy link
Author

yanana commented Aug 23, 2018

Thanks for your response @kawych, and @loburm.

I tried both. As additional information, I'm working with Horizontal Pod Autoscaler using custom metrics on GKE.

In both ways, I couldn't accomplish my goal as Kubernetes doesn't accept slashes as a metric's name.

  • In the former example, I have to set prod/my-metric-name as a metric's name. It's not accepted.
  • In the latter example, I have to set prod/my-app/my-metric-name as a metric's name. The same result.

Am I wrong at some point?

@kawych
Copy link
Member

kawych commented Aug 23, 2018

TLDR: the correct metric name to use with HPA in your case is custom.googleapis.com|prod|my-metric-name (in the first example).

This is explained here for autoscaling with External Metrics: https://cloud.google.com/kubernetes-engine/docs/tutorials/external-metrics-autoscaling#writing_external_metric_specification
For Custom Metrics, this is also supported but the documentation may be lacking.

@yanana
Copy link
Author

yanana commented Aug 23, 2018

That's surprising. Thanks! I'm trying that.

@yanana
Copy link
Author

yanana commented Aug 23, 2018

I've tried with --stackdriver-prefix=custom.googleapis.com/staging and a pods metrics with metricName: 'custom.googleapis.com|staging|my_metric_name'. But the autoscaler claims that the metric does not exist: unable to fetch metrics from custom metrics API: googleapi: Error 400: The metric referenced by the provided filter is unknown. Check the metric name and labels., badRequest. But the metric just exists (confirmed from Stackdriver's metric explorer and projects.timeSeries.list of the Stackdriver Monitoring API v3).

Some more restrictions exist?

@kawych
Copy link
Member

kawych commented Aug 23, 2018

Please make sure that you're using the newest image of Custom Metrics - Stackdriver Adapter. There used to be a limitation where you are not allowed to use '/' in the metric name. Also, please share

  • your cluster version
  • full command for prometheus-to-stackdriver
  • your request to Custom Metrics API

@yanana
Copy link
Author

yanana commented Aug 24, 2018

As you pointed, I was using an older image of the Adapter: v0.4.0. Once I upgraded it to the current latest version: v0.8.0, metrics began working! Thanks @kawych!

As a context, detail of my environment is:

  • cluster version: 1.10.6-gke.2 (both master and node pools)
  • full command for prometheus-to-stackdriver (as in YAML):
    - /monitor
    - --source=:http://localhost:24231
    - --stackdriver-prefix=custom.googleapis.com/development
    - --pod-id=$(POD_ID)
    - --namespace-id=$(POD_NAMESPACE)
  • HPA definition:
    apiVersion: autoscaling/v2beta1
    kind: HorizontalPodAutoscaler
    metadata:
      labels:
        version: "1535093756"
      name: my-autoscaler
      namespace: default
    spec:
      maxReplicas: 5
      metrics:
      - pods:
          metricName: custom.googleapis.com|development|my_metric
          targetAverageValue: "1"
        type: Pods
      - resource:
          name: cpu
          targetAverageUtilization: 70
        type: Resource
      minReplicas: 1
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-deployment
    

@yanana
Copy link
Author

yanana commented Aug 24, 2018

Custom metrics-based autoscaling just worked fine. But I'm not sure why an error occurred:

clusterroles.rbac.authorization.k8s.io "external-metrics-reader" is forbidden: attempt to grant extra privileges: [PolicyRule{APIGroups:["external.metrics.k8s.io"], Resources:["*"], Verbs:["list"]} PolicyRule{APIGroups:["external.metrics.k8s.io"], Resources:["*"], Verbs:["get"]} PolicyRule{APIGroups:["external.metrics.k8s.io"], Resources:["*"], Verbs:["watch"]}] user=&{client  [system:authenticated] map[]} ownerrules=[PolicyRule{APIGroups:["authorization.k8s.io"], Resources:["selfsubjectaccessreviews"], Verbs:["create"]} PolicyRule{APIGroups:["authorization.k8s.io"], Resources:["selfsubjectrulesreviews"], Verbs:["create"]} PolicyRule{NonResourceURLs:["/api" "/api/*" "/apis" "/apis/*" "/healthz" "/swaggerapi" "/swaggerapi/*" "/version"], Verbs:["get"]} PolicyRule{NonResourceURLs:["/swagger-2.0.0.pb-v1"], Verbs:["get"]} PolicyRule{NonResourceURLs:["/swagger.json"], Verbs:["get"]} PolicyRule{NonResourceURLs:["/openapi"], Verbs:["get"]} PolicyRule{NonResourceURLs:["/openapi/*"], Verbs:["get"]} PolicyRule{NonResourceURLs:["/version/"], Verbs:["get"]}] ruleResolutionErrors=[]

While applying the following YAML snippet (a part of the YAML: https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter.yaml):

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: external-metrics-reader
rules:
- apiGroups:
  - "external.metrics.k8s.io"
  resources:
  - "*"
  verbs:
  - list
  - get
  - watch         

@kawych
Copy link
Member

kawych commented Aug 24, 2018

You may be missing a prerequisite to Custom Metrics - Stackdriver Adapter installation as described here: https://cloud.google.com/kubernetes-engine/docs/tutorials/custom-metrics-autoscaling#step1, i.e.

kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user "$(gcloud config get-value account)"

@yanana
Copy link
Author

yanana commented Aug 24, 2018

I had already completed the step. So the output of kubectl get clusterrolebinding cluster-admin-binding -o yaml returns the following (some metadata omitted):

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: 2018-08-23T06:46:49Z
  name: cluster-admin-binding
  resourceVersion: "67281612"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/cluster-admin-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: my-email-address

@andrewteall
Copy link

andrewteall commented Nov 21, 2018

I'm seeing the same issue as well.
I've run the kubectl get clusterrolebinding cluster-admin-binding -o yaml command already with success.

The one difference is that if I use my own account everything works if I use a service account with the exact same permissions it does not.

kubectl version -o yaml

clientVersion:
  buildDate: 2018-08-20T10:09:03Z
  compiler: gc
  gitCommit: 0c38c362511b20a098d7cd855f1314dad92c2780
  gitTreeState: clean
  gitVersion: v1.10.7
  goVersion: go1.9.3
  major: "1"
  minor: "10"
  platform: darwin/amd64
serverVersion:
  buildDate: 2018-11-08T20:33:00Z
  compiler: gc

  gitCommit: d776b4deeb3655fa4b8f4e8e7e4651d00c5f4a98
  gitTreeState: clean
  gitVersion: v1.10.9-gke.5
  goVersion: go1.9.3b4
  major: "1"
  minor: 10+
  platform: linux/amd64

Edit:
So I figured out my issue. When Running from a GCE Instance that does not have the default compute engine service account I would get this error, even if I had another service account with owner permissions. Adding the default service account back to the instance resolves this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants