Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to query Metrics (Counters, Histogram, Gauge) with ClickHouse exporter? #26494

Closed
StarpTech opened this issue Sep 6, 2023 · 12 comments
Closed
Labels

Comments

@StarpTech
Copy link
Contributor

StarpTech commented Sep 6, 2023

Component(s)

exporter/clickhouse

Describe the issue you're reporting

Hello, we would like to utilize the ClickHouse exporter for metrics. Regrettably, we have been unable to locate any examples demonstrating how metrics are queried within the intended schema. If you conduct some research on how ClickHouse handles time-series data, you will discover only a limited number of resources available. Furthermore, the documentation on functions is quite minimal. There is no article or function that covers counter resets. It would be immensely helpful if the author of the exporter could offer some insights into the querying of data. For instance, a single example for counters (rate, latest), gauges (avg_over_time), and histograms (p95) would suffice. Thank you in advance.

Maybe @hanjm you can provide help here.

@StarpTech StarpTech added the needs triage New item requiring triage label Sep 6, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@emreyalvac
Copy link
Member

🚶🏻

@hanjm
Copy link
Member

hanjm commented Sep 7, 2023

@Frapschen is the author of metrics exporter. Could you help such as add more example of query metrics sql?

@StarpTech
Copy link
Contributor Author

@Frapschen friendly ping. Short feedback would be enough to understand its limitations.

@Frapschen
Copy link
Contributor

Sorry, I also lack relevant experience, but https://grafana.com/grafana/plugins/vertamedia-clickhouse-datasource/ provides some functions to help query time series data, hope that can help you.

@StarpTech
Copy link
Contributor Author

Hi @Frapschen unfortunately not. The examples are very minimal and don't explain much. Gauge and Histogramm are missing. Thank you anyway. We will continue to use the Clickhouse exporter for logs and traces only.

@Frapschen
Copy link
Contributor

Frapschen commented Sep 27, 2023

@StarpTech I try to write a Histogram metrics query demo
refer: Prometheus's code: https://github.com/prometheus/prometheus/blob/main/promql/quantile.go#L120:

bucketStart + (bucketEnd-bucketStart)*float64(rank/count)

table schema && mocked data

CREATE TABLE IF NOT EXISTS histogram (
TimeUnix DateTime64(9) CODEC(Delta, ZSTD(1)),
ServiceName String,
SpanName String,
Count UInt64 CODEC(Delta, ZSTD(1)),
Sum Float64 CODEC(ZSTD(1)),
BucketCounts Array(UInt64) CODEC(ZSTD(1)),
ExplicitBounds Array(Float64) CODEC(ZSTD(1)),
) ENGINE MergeTree()
PARTITION BY toDate(TimeUnix)
ORDER BY (toUnixTimestamp64Nano(TimeUnix))
SETTINGS index_granularity=8192, ttl_only_drop_parts = 1;

INSERT INTO histogram (*) VALUES ('2023-09-19 10:52:55.605392606','adservice','oteldemo.AdService/GetAds',3054,5829.722671000004,[2721,151,182,0,0,0,0,0,0,0,0],[4,10,50,100,500,800,1000,5000,10000,60000]);

It's looks lile:

SELECT
    TimeUnix,
    ServiceName,
    Count,
    Sum,
    BucketCounts,
    ExplicitBounds
FROM histogram

┌──────────────────────TimeUnix─┬─ServiceName─┬─Count─┬───────────────Sum─┬─BucketCounts───────────────────┬─ExplicitBounds──────────────────────────────┐
│ 2023-09-19 10:52:55.605392606 │ adservice   │  3054 │ 5829.722671000004 │ [2721,151,182,0,0,0,0,0,0,0,0] │ [4,10,50,100,500,800,1000,5000,10000,60000] │
└───────────────────────────────┴─────────────┴───────┴───────────────────┴────────────────────────────────┴─────────────────────────────────────────────┘

create help functions:

CREATE FUNCTION func_rank as (q,buckets) -> toUInt64(q*arraySum(buckets));
CREATE FUNCTION func_rank_bucket_index as (rank,buckets) -> arrayFirstIndex(x -> if(x > rank, 1, 0),arrayCumSum(buckets));
CREATE FUNCTION func_rank_bound as (rank,buckets,bounds) -> arraySlice(bounds,arrayFirstIndex(x -> if(x > rank, 1, 0),arrayCumSum(buckets)),2);
CREATE FUNCTION func_rank_in_bucket_postion as (rank,buckets) -> minus(rank,arrayElement(arrayCumSum(buckets),minus(func_rank_bucket_index(rank,buckets),1)))
CREATE FUNCTION func_bouket_total as (x,buckets) -> arrayElement(buckets,x)

#  histogram v2
CREATE FUNCTION func_histogram_v2 as (rank,rank_bound,buckets,bounds) ->
rank_bound[1] + (rank_bound[2]-rank_bound[1])*
(func_rank_in_bucket_postion(rank,buckets)/func_bouket_total(func_rank_bucket_index(rank,buckets),buckets))

P95

SELECT
    TimeUnix,
    ServiceName,
    Count,
    Sum,
    BucketCounts,
    ExplicitBounds,
    func_rank(0.95,BucketCounts) as rank,
    func_histogram_v2(rank,func_rank_bound(rank,BucketCounts,ExplicitBounds),BucketCounts,ExplicitBounds) as P95
FROM histogram;


┌──────────────────────TimeUnix─┬─ServiceName─┬─Count─┬───────────────Sum─┬─BucketCounts───────────────────┬─ExplicitBounds──────────────────────────────┬─rank─┬────────────────P95─┐
│ 2023-09-19 10:52:55.605392606 │ adservice   │  3054 │ 5829.722671000004 │ [2721,151,182,0,0,0,0,0,0,0,0] │ [4,10,50,100,500,800,1000,5000,10000,60000] │ 2901 │ 57.967032967032964 │
└───────────────────────────────┴─────────────┴───────┴───────────────────┴────────────────────────────────┴─────────────────────────────────────────────┴──────┴────────────────────┘

@StarpTech
Copy link
Contributor Author

StarpTech commented Sep 27, 2023

Hi @Frapschen thanks for the additional effort. I took another look. We will start to collect metrics with delta temporality. In that way, we can deal with metrics much simpler. We don't have to handle counter resets. We can sum it up or use quantileInterpolatedWeighted on the histograms.

@StarpTech
Copy link
Contributor Author

StarpTech commented Oct 1, 2023

@Frapschen how would you use your approach in a materialized view where you need to aggregate to get the correct result? It would be great if we could somehow use clickhouse -State function for that.

I found quantilesExactWeighted is a reasonable approach although data is not interpolated. It also works with -State and -Merge functions. The query uses the table schema from the exporter and your sample dataset. Metrics are collected with delta temporality.

SELECT quantilesExactWeighted(0.5, 0.95, 0.99)(Bin, DCount) as DurationQuantiles FROM otel_metrics_histogram
    ARRAY JOIN BucketCounts AS DCount, arrayMap(x -> if(x = length(ExplicitBounds)-1, ExplicitBounds[length(ExplicitBounds)-1], ExplicitBounds[x]), arrayEnumerate(BucketCounts)) AS Bin
WHERE DCount > 0 AND MetricName = 'router.http.request.duration_milliseconds'

Result:

[4, 50, 50]

@StarpTech
Copy link
Contributor Author

StarpTech commented Oct 2, 2023

@Frapschen your percentile calculation pre-work is great. I modified it a bit to make it work like https://github.com/prometheus/prometheus/blob/main/promql/quantile.go#L111

CREATE FUNCTION func_rank as (q,buckets) -> q*arraySum(buckets);
CREATE FUNCTION func_rank_bucket_lower_index as (rank,buckets) -> arrayFirstIndex(x -> if(x >= rank, 1, 0),arrayCumSum(buckets));

CREATE FUNCTION func_histogram_v2 as (rank,b,buckets,bounds) ->
    -- When +inf is matched, we return the last bucket's upper bound
    if(b = length(buckets), bounds[length(bounds)],
       if(b > 1,
          -- if the bucketLowerIndex is greater than 1, we interpolate between the lower and upper bounds of the bucket
          bounds[b] + (bounds[b+1] - bounds[b]) *
                          (minus(rank,arrayElement(arrayCumSum(buckets),minus(b,1))) /
                           minus(arrayElement(arrayCumSum(buckets),b), arrayElement(arrayCumSum(buckets), minus(b,1)))),
           -- else
          bounds[b+1] * (rank / arrayElement(arrayCumSum(buckets),b))
       )
    );

Usage

select
    toStartOfFiveMinute(TimeUnix) as Timestamp,
    -- Begin P95
    func_rank(0.95, BucketCounts) as rank95,
    func_rank_bucket_lower_index(rank95, BucketCounts) as b,
    func_histogram_v2(
            rank95,
            b,
            BucketCounts,
            ExplicitBounds
        ) as P95,
    -- P95 end

    -- Histogram aggregations
    sumForEach(BucketCounts) as BucketCounts,
    sum(Sum) AS Sum,
    sum(Count) AS Count,
    min(Min) AS Min,
    max(Max) AS Max
from otel_metrics_histogram
GROUP BY Timestamp, ExplicitBounds
ORDER BY Timestamp

Copy link
Contributor

github-actions bot commented Dec 4, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 4, 2023
Copy link
Contributor

github-actions bot commented Feb 2, 2024

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants