Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore metrics by type #238

Merged
merged 1 commit into from
Nov 29, 2021
Merged

Conversation

smcavallo
Copy link
Contributor

@smcavallo smcavallo commented Nov 4, 2021

allow metric filtering by metric type
Ex.

transformations:
  - description: "General processing rules"
    ignore_metrics:
      - metric_types:
        - "histogram"

The use case is the ability to ignore histograms which generate a large number of (unused) metrics.

# HELP argocd_redis_request_duration Redis requests duration.
# TYPE argocd_redis_request_duration histogram
argocd_redis_request_duration_bucket{hostname="argocd-application-controller-1",initiator="argocd-application-controller",le="0.01"} 2.791635e+06
argocd_redis_request_duration_bucket{hostname="argocd-application-controller-1",initiator="argocd-application-controller",le="0.05"} 3.347797e+06
argocd_redis_request_duration_bucket{hostname="argocd-application-controller-1",initiator="argocd-application-controller",le="0.1"} 3.582787e+06
argocd_redis_request_duration_bucket{hostname="argocd-application-controller-1",initiator="argocd-application-controller",le="0.25"} 3.609236e+06
argocd_redis_request_duration_bucket{hostname="argocd-application-controller-1",initiator="argocd-application-controller",le="0.5"} 3.609581e+06
argocd_redis_request_duration_bucket{hostname="argocd-application-controller-1",initiator="argocd-application-controller",le="1"} 3.609594e+06
argocd_redis_request_duration_bucket{hostname="argocd-application-controller-1",initiator="argocd-application-controller",le="+Inf"} 3.609602e+06
argocd_redis_request_duration_sum{hostname="argocd-application-controller-1",initiator="argocd-application-controller"} 41456.31533212083
argocd_redis_request_duration_count{hostname="argocd-application-controller-1",initiator="argocd-application-controller"} 3.609602e+06

In the above case since # TYPE is histogram the above metrics will be filtered out.
Except lists can be used for histograms which should be kept.

Signed-off-by: smcavallo smcavallo@hotmail.com

@davidgit davidgit added the triage/pending Issue or PR is pending for triage and prioritization. label Nov 5, 2021
@carlossscastro carlossscastro added feature request Categorizes issue or PR as related to a new feature or enhancement. priority/short-term Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed triage/pending Issue or PR is pending for triage and prioritization. labels Nov 5, 2021
@paologallinaharbur
Copy link
Member

Superseeds #235 (just for tracking pourposes)

@kang-makes kang-makes self-assigned this Nov 29, 2021
Copy link
Contributor

@kang-makes kang-makes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @smcavallo,

Let me explain my train of thoughts to explain it properly. I have created various config files that are targeting a Prometheus endpoint (an empty CoreDNS):

A file called filter-none:

standalone: false
emitters: infra-sdk
cluster_name: "non-empty"
targets:
  - urls: ["http://localhost:9253"]
verbose: true

A file called filter-prefix:

standalone: false
emitters: infra-sdk
cluster_name: "non-empty"
targets:
  - urls: ["http://localhost:9253"]
verbose: true

transformations:
- ignore_metrics:
  - prefixes:
    - go_th

A file called filter-suffix-info:

standalone: false
emitters: infra-sdk
cluster_name: "non-empty"
targets:
  - urls: ["http://localhost:9253"]
verbose: true

transformations:
- ignore_metrics:
  - suffixes:
    - _info

A file called filter-suffix-bucket:

standalone: false
emitters: infra-sdk
cluster_name: "non-empty"
targets:
  - urls: ["http://localhost:9253"]
verbose: true

transformations:
- ignore_metrics:
  - suffixes:
    - _bucket

A file called filter-type-count:

standalone: false
emitters: infra-sdk
cluster_name: "non-empty"
targets:
  - urls: ["http://localhost:9253"]
verbose: true

transformations:
- ignore_metrics:
  - metric_types:
    - count

A file called filter-type-summary:

standalone: false
emitters: infra-sdk
cluster_name: "non-empty"
targets:
  - urls: ["http://localhost:9253"]
verbose: true

transformations:
- ignore_metrics:
  - metric_types:
    - summary

Then I run this shell oneliner (from your origin) to see if it filters correctly:
for i in filter-*; do go run ./cmd/nri-prometheus -configfile $i | jq '.data[].metrics|=sort_by(.name) | [ .data[].metrics[] | { "name": .name, "type": .type, "value": .value } ]' > out-$i.json; done

So I diff out-filter-none.json out-filter-prefix.json and diff out-filter-none.json out-filter-suffix-info.json. Here is the first diff:

# More values changing...
190c190
<     "value": 53286
---
>     "value": 53780
236,240d235
<   },
<   {
<     "name": "go_threads",
<     "type": "gauge",
<     "value": 18

Here is the second diff:

# More values changing...
110,115c105
<     "value": 16
<   },
<   {
<     "name": "go_info",
<     "type": "gauge",
<     "value": 1
---
>     "value": 17
# More values changing...

They look cool! Congrats.

But, if I diff out-filter-none.json out-filter-suffix-bucket.json the diff is empty (aside values of the metrics changing).

% diff out-filter-none.json out-filter-suffix-bucket.json
% diff out-filter-none.json out-filter-suffix-bucket.json | wc -l
0

The _bucket part is added by Prometheus is not part of the metric name so cannot be filtered there. That is the reason we are not supporting suffixes as they lead to errors like this. Is this your intended way?

On the other hand the second commit of this PR to filter by metric type is working perfectly:

diff out-filter-none.json out-filter-type-count.json

 7,14d6
 <     "name": "coredns_forward_healthcheck_broken_total",
 <     "type": "cumulative-count"
 <   },
 <   {
 <     "name": "coredns_forward_max_concurrent_rejects_total",
 <     "type": "cumulative-count"
 <   },
 <   {
# More metrics being correctly filtered...

diff out-filter-none.json out-filter-type-summary.json

 43,46d42
 <     "name": "go_gc_duration_seconds",
 <     "type": "prometheus-summary"
 <   },
 <   {

We can accept this PR if it only filters by metric_type and it would be really useful.

Have a nice day!

Signed-off-by: smcavallo <smcavallo@hotmail.com>
@smcavallo smcavallo reopened this Nov 29, 2021
@smcavallo smcavallo changed the title Ignore metrics by type and suffix Ignore metrics by type Nov 29, 2021
@smcavallo
Copy link
Contributor Author

@kang-makes - I have made the requested updates for this PR.
This PR only filters by metric_type now. Thanks!!!!

Copy link
Contributor

@kang-makes kang-makes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Thank you very much for your patience on this contribution.

internal/integration/fetcher.go Show resolved Hide resolved
internal/integration/rules.go Show resolved Hide resolved
internal/integration/rules_test.go Show resolved Hide resolved
@kang-makes kang-makes merged commit bf5011d into newrelic:main Nov 29, 2021
@smcavallo
Copy link
Contributor Author

@kang-makes - thank you so much for reviewing and merging!
Here is what is causing the confusion around metric filtering
image

It is most apparent when attempting to use suffix filtering because prometheus appends suffixes to these metric types.

But it is also an issue for prefixes - just a case which rarely gets hit.
For argocd_redis_request_duration it is impossible to run ANY filter rules on argocd_redis_request_duration_bucket |argocd_redis_request_duration_sum|argocd_redis_request_duration_count
what is sent to the filtering code is ONLY argocd_redis_request_duration

this prefix rule will NOT work:

transformations:
- ignore_metrics:
  - prefixes:
    - argocd_redis_request_duration_bucket

this exclude rule will NOT work:

transformations:
- ignore_metrics:
  - prefixes:
    - argocd_redis_request_duration
  - except:
    - argocd_redis_request_duration_count

Both of the above feel like bugs but they are WAD. The documentation could be made more clear there.

It is confusing but it is the same issue with suffixes.

This ability to use this suffix filter code would be useful for us

transformations:
- ignore_metrics:
  - suffixes:
    - _request_duration

it is just extremely confusing to use suffix filters and makes it feel like metric filtering is broken.

@smcavallo smcavallo deleted the ignore_metrics_suffix branch November 29, 2021 18:34
@kang-makes
Copy link
Contributor

kang-makes commented Nov 30, 2021

Hi @smcavallo,

When we rejected your first PR @gsanchezgavier added a commit to the docs where it is explained that we filter by the base_name and not by the while metric: newrelic/docs-website@231018c which you can read here: https://docs.newrelic.com/docs/infrastructure/prometheus-integrations/install-configure-openmetrics/ignore-or-include-prometheus-metrics/#ignore-metrics

We are struggling on how to create a better transformation configuration and how to write it in the docs.

You are not the only one to hit this issue and put except clause in other index of the list. Here is an example of a proper except clause:

transformations:
- ignore_metrics:
  - prefixes:
      - coredns_
    except:
      - coredns_build_info

  - prefixes:
      - go_
    except:
      - go_threads

Same command as above go run ./cmd/nri-prometheus -configfile filter-fixed | jq '.data[].metrics|=sort_by(.name) | [ .data[].metrics[] | { "name": .name, "type": .type }'

[
  {
    "name": "coredns_build_info",
    "type": "gauge"
  },
  {
    "name": "go_threads",
    "type": "gauge"
  }
]

But, even putting the except clause inside the proper struct you are right. You cannot filter by _bucket, _sum or _count. As you saw it is not trivial to filter by the aggregated data from a metric.

Nevertheless, I will have this issue in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Categorizes issue or PR as related to a new feature or enhancement. priority/short-term Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants