ClickHouse as a storage backend #1438

sboisson · 2019-03-21T11:00:02Z

ClickHouse, an open-source column-oriented DBMS designed initially for real-time analytics and mostly write-once/read-many big-data use cases, can be used as a very efficient log and trace storage.

Meta issue: #638 Additional storage backends

bzon · 2019-10-23T11:50:15Z

Is anyone working on this? I or my team can maybe give a shot at this.

Slach · 2019-10-23T11:51:39Z

as i now noone working on this

Slach · 2019-10-23T11:52:17Z

@bzon i can join you as tester

bzon · 2019-10-23T11:53:04Z

@bzon i can join you as tester

Sure!

sboisson · 2019-10-23T11:58:30Z

Happy to see someone working on this :)
Might be able to join as tester

yurishkuro · 2019-10-23T13:10:13Z

@bzon would be good if you post an architecture here, specifically how you would lay out the data, ingestion, etc. To my knowledge, clickhouse requires batched writes, and it may even be up to you to decide which node to send the writes to, so there are many questions. It may require some benchmarking to find the optimal design.

bzon · 2019-10-23T18:23:13Z

@yurishkuro at the moment, we have zero knowledge with the internals of jaeger. I think that benchmarking should be the first step to see if this integration is feasible. And with that said, the first requirement should be creating the right table schema.

sboisson · 2019-10-24T09:11:14Z

I think architecture of project https://github.com/flant/loghouse could be a source of inspiration…

sboisson · 2019-12-19T16:49:28Z

This webinar could be interesting: A Practical Introduction to Handling Log Data in ClickHouse

bobrik · 2020-05-10T21:14:32Z

I took a stab at it (very early WIP):

master...bobrik:ivan/clickhouse

This is the schema I used:

Index table for fast searches (haven't measured if indices are useful yet)

CREATE TABLE jaeger_index (
  timestamp DateTime64(6),
  traceID FixedString(16),
  service LowCardinality(String),
  operation LowCardinality(String),
  durationUs UInt64,
  tags Nested(
    key LowCardinality(String),
    valueString LowCardinality(String),
    valueBool UInt8,
    valueInt Int64,
    valueFloat Float64
  ),
  INDEX tags_strings (tags.key, tags.valueString) TYPE set(0) GRANULARITY 64,
  INDEX tags_ints (tags.key, tags.valueInt) TYPE set(0) GRANULARITY 64
) ENGINE MergeTree() PARTITION BY toDate(timestamp) ORDER BY (timestamp, service, operation);

Data table for span storage and quick retrieval by traceID

CREATE TABLE jaeger_spans (
  timestamp DateTime64(6),
  traceID FixedString(16),
  model String
) ENGINE MergeTree() PARTITION BY toDate(timestamp) ORDER BY traceID;

You probably need Clickhouse 20.x for DateTime64, I used 20.1.11.73.

Index table looks like this:

SELECT *
FROM jaeger_index
ARRAY JOIN tags
ORDER BY timestamp DESC
LIMIT 20
FORMAT PrettyCompactMonoBlock

┌──────────────────timestamp─┬─traceID──────────┬─service──────┬─operation────┬─durationUs─┬─tags.key─────────────┬─tags.valueString─┬─tags.valueBool─┬─tags.valueInt─┬─tags.valueFloat─┐
│ 2020-05-10 20:43:23.248314 │ 212e5c616f4b9c2f │ jaeger-query │ getTraces    │     131605 │ num_trace_ids        │                  │              0 │            13 │               0 │
│ 2020-05-10 20:43:23.248314 │ 212e5c616f4b9c2f │ jaeger-query │ getTraces    │     131605 │ weird                │                  │              1 │             0 │               0 │
│ 2020-05-10 20:43:23.248314 │ 212e5c616f4b9c2f │ jaeger-query │ getTraces    │     131605 │ π                    │                  │              0 │             0 │            3.14 │
│ 2020-05-10 20:43:23.248314 │ 212e5c616f4b9c2f │ jaeger-query │ getTraces    │     131605 │ internal.span.format │ proto            │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.248314 │ 212e5c616f4b9c2f │ jaeger-query │ getTraces    │     131605 │ jaeger.version       │ Go-2.22.1        │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.248314 │ 212e5c616f4b9c2f │ jaeger-query │ getTraces    │     131605 │ hostname             │ C02TV431HV2Q     │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.248314 │ 212e5c616f4b9c2f │ jaeger-query │ getTraces    │     131605 │ ip                   │ 192.168.1.43     │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.248314 │ 212e5c616f4b9c2f │ jaeger-query │ getTraces    │     131605 │ client-uuid          │ 7fc8f98ddbcd358c │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065577 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraceIDs │     182728 │ internal.span.format │ proto            │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065577 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraceIDs │     182728 │ jaeger.version       │ Go-2.22.1        │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065577 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraceIDs │     182728 │ hostname             │ C02TV431HV2Q     │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065577 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraceIDs │     182728 │ ip                   │ 192.168.1.43     │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065577 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraceIDs │     182728 │ client-uuid          │ 7fc8f98ddbcd358c │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065574 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraces   │     314349 │ internal.span.format │ proto            │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065574 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraces   │     314349 │ jaeger.version       │ Go-2.22.1        │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065574 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraces   │     314349 │ hostname             │ C02TV431HV2Q     │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065574 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraces   │     314349 │ ip                   │ 192.168.1.43     │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065574 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraces   │     314349 │ client-uuid          │ 7fc8f98ddbcd358c │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065535 │ 212e5c616f4b9c2f │ jaeger-query │ /api/traces  │     315554 │ sampler.type         │ const            │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065535 │ 212e5c616f4b9c2f │ jaeger-query │ /api/traces  │     315554 │ sampler.param        │                  │              1 │             0 │               0 │
└────────────────────────────┴──────────────────┴──────────────┴──────────────┴────────────┴──────────────────────┴──────────────────┴────────────────┴───────────────┴─────────────────┘

Tags are stored in their original types, so with enough SQL-fu you can find all spans with response size between X and Y bytes, for example.

The layout of the query is different from Elasticsearch, since now you have all tags for the trace laid out on a single view. This means that if you search for all operations of some service, you will get a cross-span result where one tag can match one span, and another tag can match another span. Consider the following trace:

+ upstream (tags: {"host": "foo.bar"})
++ upstream_ttfb (tags: {"status": 200})
++ upstream_download (tags: {"error": true})

You can search for host=foo.bar status=200 across all operations and this trace will be found, even though no since span has both tags. This seems like a really nice upside.

There's support for both JSON and Protobuf storage. The former allows out-of-band queries, since Clickhouse supports JSON functions. The latter is much more compact.

I pushed 100K spans from tracegen through this with a local Clickhouse in a Docker container with stock settings, and here's how storage looks like:

SELECT
    table,
    sum(marks) AS marks,
    sum(rows) AS rows,
    sum(bytes_on_disk) AS bytes_on_disk,
    sum(data_compressed_bytes) AS data_compressed_bytes,
    sum(data_uncompressed_bytes) AS data_uncompressed_bytes,
    toDecimal64(data_uncompressed_bytes / data_compressed_bytes, 2) AS compression_ratio,
    toDecimal64(data_compressed_bytes / rows, 2) AS compressed_bytes_per_row
FROM system.parts
WHERE table LIKE 'jaeger_%'
GROUP BY table
ORDER BY table ASC

SELECT
    table,
    sum(marks) AS marks,
    sum(rows) AS rows,
    sum(bytes_on_disk) AS bytes_on_disk,
    sum(data_compressed_bytes) AS data_compressed_bytes,
    sum(data_uncompressed_bytes) AS data_uncompressed_bytes,
    toDecimal64(data_uncompressed_bytes / data_compressed_bytes, 2) AS compression_ratio,
    toDecimal64(data_compressed_bytes / rows, 2) AS compressed_bytes_per_row
FROM system.parts
WHERE table LIKE 'jaeger_%'
GROUP BY table
ORDER BY table ASC

┌─table────────┬─marks─┬───rows─┬─bytes_on_disk─┬─data_compressed_bytes─┬─data_uncompressed_bytes─┬─compression_ratio─┬─compressed_bytes_per_row─┐
│ jaeger_index │    16 │ 106667 │       2121539 │               2110986 │                22678493 │             10.74 │                    19.79 │
│ jaeger_spans │    20 │ 106667 │       5634663 │               5632817 │                37112272 │              6.58 │                    52.80 │
└──────────────┴───────┴────────┴───────────────┴───────────────────────┴─────────────────────────┴───────────────────┴──────────────────────────┘

SELECT
    table,
    column,
    type,
    sum(column_data_compressed_bytes) AS compressed,
    sum(column_data_uncompressed_bytes) AS uncompressed,
    toDecimal64(uncompressed / compressed, 2) AS compression_ratio,
    sum(rows) AS rows,
    toDecimal64(compressed / rows, 2) AS bytes_per_row
FROM system.parts_columns
WHERE (table LIKE 'jaeger_%') AND active
GROUP BY
    table,
    column,
    type
ORDER BY
    table ASC,
    column ASC

┌─table────────┬─column───────────┬─type──────────────────────────┬─compressed─┬─uncompressed─┬─compression_ratio─┬───rows─┬─bytes_per_row─┐
│ jaeger_index │ durationUs       │ UInt64                        │     248303 │       853336 │              3.43 │ 106667 │          2.32 │
│ jaeger_index │ operation        │ LowCardinality(String)        │       5893 │       107267 │             18.20 │ 106667 │          0.05 │
│ jaeger_index │ service          │ LowCardinality(String)        │        977 │       107086 │            109.60 │ 106667 │          0.00 │
│ jaeger_index │ tags.key         │ Array(LowCardinality(String)) │      29727 │      1811980 │             60.95 │ 106667 │          0.27 │
│ jaeger_index │ tags.valueBool   │ Array(UInt8)                  │      29063 │      1810904 │             62.30 │ 106667 │          0.27 │
│ jaeger_index │ tags.valueFloat  │ Array(Float64)                │      44762 │      8513880 │            190.20 │ 106667 │          0.41 │
│ jaeger_index │ tags.valueInt    │ Array(Int64)                  │     284393 │      8513880 │             29.93 │ 106667 │          2.66 │
│ jaeger_index │ tags.valueString │ Array(LowCardinality(String)) │      31695 │      1814416 │             57.24 │ 106667 │          0.29 │
│ jaeger_index │ timestamp        │ DateTime64(6)                 │     431835 │       853336 │              1.97 │ 106667 │          4.04 │
│ jaeger_index │ traceID          │ FixedString(16)               │    1063375 │      1706672 │              1.60 │ 106667 │          9.96 │
│ jaeger_spans │ model            │ String                        │    4264180 │     34552264 │              8.10 │ 106667 │         39.97 │
│ jaeger_spans │ timestamp        │ DateTime64(6)                 │     463444 │       853336 │              1.84 │ 106667 │          4.34 │
│ jaeger_spans │ traceID          │ FixedString(16)               │     905193 │      1706672 │              1.88 │ 106667 │          8.48 │
└──────────────┴──────────────────┴───────────────────────────────┴────────────┴──────────────┴───────────────────┴────────┴───────────────┘

We have around 74B daily docs in our production Elasticsearch storage. My plan is to switch that to fields-as-tags, remove indexing of non-queried fields (logs, nested tags, references), then switch to a sorted index and then see how Clickhouse compares to that for the same spans.

yurishkuro · 2020-05-11T04:03:36Z

@bobrik very interesting, thanks for sharing. I am curious what the performance for retrieving by trace ID would be like.

Q: why do you use LowCardinality(String) for tags? Some tags can be very high cardinality, e.g. URLs.

You can search for host=foo.bar status=200 across all operations and this trace will be found, even though no since span has both tags. This seems like a really nice upside.

I'm confused why this would be the case. Doesn't CH evaluate the query in full against each row (i.e. each span)? Or is this because how your plugin interacts with CH?

Slach · 2020-05-11T04:51:53Z

@yurishkuro even 100k per block cardinality, LowCardinality(String) with dictionary based encoding will better than just String

bobrik · 2020-05-11T05:08:55Z

@yurishkuro retrieving by trace ID is pretty very fast, since you're doing a primary key lookup.

ClickHouse/ClickHouse#4074 (comment) says this about LowCardinality:

Rule of thumb: it should make benefits if the number of distinct values is less that few millions.

That said, the schema is in no way final.

I'm confused why this would be the case. Doesn't CH evaluate the query in full against each row (i.e. each span)? Or is this because how your plugin interacts with CH?

The row is not span, it's span-key-value combination. That's the key.

Take a look at the output of this query, which is an equivalent of what I do:

SELECT *
FROM jaeger_index
ARRAY JOIN tags
ORDER BY timestamp DESC
LIMIT 20
FORMAT PrettyCompactMonoBlock

┌──────────────────timestamp─┬─traceID──────────┬─service──────┬─operation─────┬─durationUs─┬─tags.key─────────────┬─tags.valueString─┬─tags.valueBool─┬─tags.valueInt─┬─tags.valueFloat─┐
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces     │    3414327 │ num_trace_ids        │                  │              0 │            20 │               0 │
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces     │    3414327 │ weird                │                  │              1 │             0 │               0 │
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces     │    3414327 │ π                    │                  │              0 │             0 │            3.14 │
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces     │    3414327 │ internal.span.format │ proto            │              0 │             0 │               0 │
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces     │    3414327 │ jaeger.version       │ Go-2.22.1        │              0 │             0 │               0 │
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces     │    3414327 │ hostname             │ C02TV431HV2Q     │              0 │             0 │               0 │
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces     │    3414327 │ ip                   │ 192.168.1.43     │              0 │             0 │               0 │
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces     │    3414327 │ client-uuid          │ 3f9574079594605c │              0 │             0 │               0 │
│ 2020-05-11 04:53:45.723921 │ 700a1bff0bdf3141 │ jaeger-query │ GetOperations │    2268055 │ internal.span.format │ proto            │              0 │             0 │               0 │
│ 2020-05-11 04:53:45.723921 │ 700a1bff0bdf3141 │ jaeger-query │ GetOperations │    2268055 │ jaeger.version       │ Go-2.22.1        │              0 │             0 │               0 │
└────────────────────────────┴──────────────────┴──────────────┴───────────────┴────────────┴──────────────────────┴──────────────────┴────────────────┴───────────────┴─────────────────┘

Here we joint traceID with nested tags, repeating every tag key value with corresponding trace.

Operation getTraces does not happen multiple times in trace 3adb641936b21d98 , but our query makes a separate row for it, so that cross-span tags naturally work out of the box

Compare this to how it looks like at span level (this is what Elasticsearch works with):

SELECT *
FROM jaeger_index
WHERE (traceID = '3adb641936b21d98') AND (service = 'jaeger-query') AND (operation = 'getTraces')

┌──────────────────timestamp─┬─traceID──────────┬─service──────┬─operation─┬─durationUs─┬─tags.key────────────────────────────────────────────────────────────────────────────────────────────┬─tags.valueString────────────────────────────────────────────────────────────────┬─tags.valueBool────┬─tags.valueInt──────┬─tags.valueFloat──────┐
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces │    3414327 │ ['num_trace_ids','weird','π','internal.span.format','jaeger.version','hostname','ip','client-uuid'] │ ['','','','proto','Go-2.22.1','C02TV431HV2Q','192.168.1.43','3f9574079594605c'] │ [0,1,0,0,0,0,0,0] │ [20,0,0,0,0,0,0,0] │ [0,0,3.14,0,0,0,0,0] │
└────────────────────────────┴──────────────────┴──────────────┴───────────┴────────────┴───────────────────────────────────────────────

Clickhouse doesn't allow direct lookups against nested fields like tags.ip=192.168.1.43, instead you have to do a JOIN ARRAY which results in this new property.

bobrik · 2020-05-11T07:36:56Z

I think I might be more confused about tag queries that I initially realized, please take my explanation with a big grain of salt.

bobrik · 2020-05-13T22:03:21Z

Turns out that having a sparse tags with nested fields is not great for searches when you have a lot of spans, so went back to array of key=value strings with a bloom filter index on top.

I've tried combining span storage and indexing into one table, but that proved to be very detrimental to performance of lookup by trace id. It compressed somewhat better, though.

If anyone wants to give it a spin, please be my guest:

https://github.com/bobrik/jaeger/tree/ivan/clickhouse/plugin/storage/clickhouse

It feels a lot snappier than Elasticsearch, and it's at least 2.5x more compact (4.4x if you compare to Elasticsearch behaviour out of the box).

levonet · 2020-06-26T15:27:58Z

@bobrik What stops Pull Request?
Are you planning to do more optimization?

bobrik · 2020-06-28T00:57:26Z

There were some issue with ClickHouse that prevented me from getting reasonable latency for the amount of data we have.

This is the main one:

DB::KeyCondition::mayBeTrueInRange() is a bottleneck in a PK lookup ClickHouse/ClickHouse#11564

See also:

There are also quite a few local changes that I haven't pushed yet.

I'll see if I can carve out some more time for this to push the latest changes, but no promises on ETA.

levonet · 2020-07-01T08:57:20Z

Another question.

Maybe it makes sense to divide the settings of tables jaeger_index_v2 and jaeger_spans_v2 into read and write?
This will create tables on the cluster as follows:

insert -> jaeger_*_v2_buffer -> jaeger_*_v2_distributed -> jaeger_*_v2
select -> jaeger_*_v2_distributed -> jaeger_*_v2

This will take the load off Jaeger when it collects the batches (no need to make large batches on high load).

levonet · 2020-08-26T11:01:04Z

It seems that all the necessary changes have been made in the latest versions of ClickHouse.
@bobrik Are you still working on the plugin?

bobrik · 2020-08-27T18:29:09Z

Yes and no. The code is pretty solid and it's been running on a single node Clickhouse for the last month, but I'm waiting on an internal Cluster provisioning to finish to test it more broadly with real humans making queries. My plan is to add more tests and make a PR upstream after that.

The latest code here:

https://github.com/bobrik/jaeger/tree/ivan/clickhouse/plugin/storage/clickhouse

bzon · 2020-08-27T21:48:55Z

@bobrik Impressive. I can test this on my clickhouse cluster. A few question:

How is the performance compared to previous backend solutions you tried?
How to configure the jaeger collector to use clickhouse?
Will jaeger collector create the tables?

bobrik · 2020-08-29T19:00:31Z

How is the performance compared to previous backend solutions you tried?

First, here's how stock Elasticsearch backend compares to our modified one (2x replication in both cases):

$ curl -s 'https://foo.baz/_cat/indices/jaeger-span-2020-05-20?s=index&v&h=index,pri,docs.count,store.size,segments.memory,segments.fixed_bitset_memory,fielddata.memory_size,memory.total'
index                  pri  docs.count store.size segments.memory segments.fixed_bitset_memory fielddata.memory_size memory.total
jaeger-span-2020-05-20  38 81604373583      8.9tb          17.1gb                         38gb                    0b       17.3gb

$ curl 'https://foo.bar/_cat/indices/jaeger-span-2020-05-20?s=index&v&h=index,pri,docs.count,store.size,segments.memory,segments.fixed_bitset_memory,fielddata.memory_size,memory.total'
index                  pri docs.count store.size segments.memory segments.fixed_bitset_memory fielddata.memory_size memory.total
jaeger-span-2020-05-20  38  8406585575      4.8tb         192.1mb                           0b                    0b      192.1mb

We don't use nested docs and sort the index, which:

Removes an issue when shards run out of docs (2147483519 is the limit)
Disk space usage is 2x more efficient, 312 bytes per span
Index memory usage is down from almost 17.3GiB to just 0.2GiB
Bitset memory usage is down from 38GiB to just 0B

As you can see, this was back in May. Now we can compare improved Elasticsearch to Clickhouse:

$ curl 'https://foo.bar/_cat/indices/jaeger-span-2020-08-28?s=index&v&h=index,pri,docs.count,store.size,segments.memory,segments.fixed_bitset_memory,fielddata.memory_size,memory.total'
index                  pri  docs.count store.size segments.memory segments.fixed_bitset_memory fielddata.memory_size memory.total
jaeger-span-2020-08-28  38 13723317165      8.1tb         331.2mb                           0b                    0b      916.1mb

SELECT
       table,
       partition,
       sum(marks) AS marks,
       sum(rows) AS rows,
       formatReadableSize(sum(data_compressed_bytes)) AS compressed_bytes,
       formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_bytes,
       toDecimal64(sum(data_uncompressed_bytes) / sum(data_compressed_bytes), 2) AS compression_ratio,
       formatReadableSize(sum(data_compressed_bytes) / rows) AS bytes_per_row,
       formatReadableSize(sum(primary_key_bytes_in_memory)) AS pk_in_memory
  FROM system.parts
 WHERE (table IN ('jaeger_index_v2', 'jaeger_spans_v2', 'jaeger_archive_spans_v2', '.inner.jaeger_operations_v2'))
   AND active
   AND partition = '2020-08-28'
 GROUP BY table, partition
 ORDER BY table ASC, partition ASC

┌─table───────────┬─partition──┬────marks─┬────────rows─┬─compressed_bytes─┬─uncompressed_bytes─┬─compression_ratio─┬─bytes_per_row─┬─pk_in_memory─┐
│ jaeger_index_v2 │ 2020-08-28 │ 13401691 │ 13723301235 │ 294.31 GiB       │ 2.75 TiB           │              9.56 │ 23.03 B       │ 115.04 MiB   │
│ jaeger_spans_v2 │ 2020-08-28 │ 13401693 │ 13723301235 │ 757.75 GiB       │ 4.55 TiB           │              6.14 │ 59.29 B       │ 358.88 MiB   │
└─────────────────┴────────────┴──────────┴─────────────┴──────────────────┴────────────────────┴───────────────────┴───────────────┴──────────────┘

Disk usage is down from 4.0TiB to 1.0TiB, 4x (bytes per span is down from 324 to 82)
Memory usage is roughly the same
Search performance is roughly the same, but I'm comparing single Clickhouse node to 48 Elasticsearch nodes
Trace lookup is instantaneous, since it's a primary key lookup

How to configure the jaeger collector to use clickhouse?

Set SPAN_STORAGE_TYPE=clickhouse env variable and then --clickhouse.datasource to point to your Clickhouse database URL, for example: tcp://localhost:9000?database=jaeger.

Will jaeger collector create the tables?

No, there are multitudes of ways to organize tables and you may want to tweak table settings yourself, that's why I only provide an example starting schema. This is also part of the reason I want to test on our production cluster rather than on a single node before I make a PR.

Please consult the docs: https://github.com/bobrik/jaeger/tree/ivan/clickhouse/plugin/storage/clickhouse#schema

Keep in mind that you need at least Clickhouse v20.7.1.4189 to have reasonable performance.

mcarbonneaux · 2020-10-06T16:18:06Z

it's very promising !
you've try to use chproxy in for caching request ?
or tricksterproxy who are optimized for request by time range...

levonet · 2020-10-06T17:28:57Z

The first and second proxy is not suitable because both use the HTTP protocol.
I'm now thinking about to start the controller before each Clickhouse. And defining on each agent a list of all controllers at the time of deployment.
Otherwise, there are will have to do TCP balancing.

mcarbonneaux · 2020-10-08T12:15:24Z

The first and second proxy is not suitable because both use the HTTP protocol.

hooo your backend use the native tcp protocol of clickhouse !

yes you use: https://github.com/ClickHouse/clickhouse-go that use native protocol !

Slach · 2021-02-03T12:47:08Z

@jkowall why you think ClickHouse doesn't have scale-out? It near-linear scalability available in ClickHouse itself by design

just add shard with hosts to <remote_servers> in /etc/clickhouse-server/config.d/remote_servers.xml
and run CREATE TABLE ... Engine=Distributed(cluster_name, db.table, sharding_key_expression) ON CLUSTER ... ;

after it you can insert into any host and data will spreaded via sharding key, or you can use chproxy \ haproxy to data ingestion directly into MergeTree table

also you can read from all servers with smarter aggregation from Distributed table

jkowall · 2021-02-03T16:12:31Z

@Slach yes I saw that, but if you have hundreds of nodes it can be an issue I would assume. We run thousands of nodes of ES. Bigger issue would be supporting ClickHouse in Kibana, but that's for another project :)

bobrik · 2021-02-03T20:54:12Z

@mcspring I'm not actively working on it at the moment, since it doesn't require any maintenance from me. Feel free to submit a PR and I'll get to it eventually.

Slach · 2021-02-04T15:30:09Z

@jkowall try to use https://github.com/Altinity/clickhouse-operator/

otisg · 2021-03-26T17:54:01Z

It is unlikely that we'll add any other storage mechanism to the core repository in the near future. What will probably happen is that we'll change the architecture of how this works for Jaeger v2, making it easier to plug your own storage plugin.

@jpkrohling is this still the case?
#638 is still open. Are you saying this should really be closed because a completely different approach is being pursued?

making it easier to plug your own storage plugin.

I tried finding an issue for this, but could not find it. Is there an issue for this? Thanks!

yurishkuro · 2021-03-26T18:03:21Z

@otisg don't know about the issue, but the approach is that we're rebuilding Jaeger backend on top of OpenTelemetry collector, which has a mechanism of combining various packages (even from different repos) at build time without explicit code dependencies in the core. That means the storage implementations can directly implement storage interface and don't have to go through much less efficient grpc-plugin interface.

jpkrohling · 2021-03-30T20:00:47Z

I don't think we have an issue tracking this specific feature, but we know it will be there for v2, which is being tracked here: https://github.com/jaegertracing/jaeger/milestone/15

chhetripradeep · 2021-06-03T08:43:29Z

Based on personal experience, i would like to recommend adding clickhouse as one of core storage options for jaeger (like elasticsearch and cassandra), if not, we should probably port Ivan's work as a gRPC plugin.

There are few benefits I can think of:

Clickhouse is lot better in disk i/o as compared to ES. Clickhouse scales very well with magnetic disks but scaling ES requires ssds.
Clickhouse has very good data compression support as compared to ES.

Uber's migration of their logging pipeline from ES to clickhouse[0] is a very good example of clickhouse performance.

[0] - https://eng.uber.com/logging/

levonet · 2021-06-07T11:06:02Z

I also like the idea of adding a plugin as one of the core storage options for jaeger.
This will allow to organically add a clickhouse to existing jaeger-operator and helm-charts.

We have been using Ivan's plugin for half a year. The use of a clickhouse as storage is almost invisible in terms of infrastructure resources. Docker images with an updated version of jaeger and a startup example can be found here
https://github.com/levonet/docker-jaeger if anyone is interested.

pavolloffay · 2021-07-14T17:11:38Z

I have migrated https://github.com/bobrik/jaeger/tree/ivan/clickhouse/plugin/storage/clickhouse to storage plugin. The source code is available here https://github.com/pavolloffay/jaeger-clickhouse.

For now I am not planning on maintaining it, I have built it for our internal purposes. However if there is interest and somebody would like to help to maintain it I am happy to chat about it.

The repository contains instructions how to run it and the code was tested locally (see readme). When I run the code on more data I get Too many simultaneous queries. Maximum: 100, see https://github.com/pavolloffay/jaeger-clickhouse/issues/2. It might be just DB configuration issue.

Slach · 2021-07-15T06:15:47Z

@pavolloffay thanks a lot for your great efforts ;) I hope you will bring ClickHouse to first-class citizen storage in Jaeger

pavolloffay · 2021-07-15T10:25:09Z

Folks, did you have to increase maximum nuber of queries in ClickHouse when running Jaeger?

<max_concurrent_queries>100</max_concurrent_queries>

I am getting DB::Exception: Too much simultaneous queries. Maximum: 100 when running the default ClickHouse image.

@bobrik / @Slach how does your Jaeger deployment look like? Are you using Kafka as well?

bobrik · 2021-07-16T19:16:03Z

All ingress in our deployment comes from Kafka with large batches (10k to 100k). We have 512 as the concurrent query limit, but the metrics show under 20 concurrent queries per node in a 3 node cluster.

EinKrebs · 2021-07-22T09:26:42Z

I pushed 100K spans from tracegen through this with a local Clickhouse in a Docker container with stock settings, and here's how storage looks like:

SELECT
    table,
    sum(marks) AS marks,
    sum(rows) AS rows,
    sum(bytes_on_disk) AS bytes_on_disk,
    sum(data_compressed_bytes) AS data_compressed_bytes,
    sum(data_uncompressed_bytes) AS data_uncompressed_bytes,
    toDecimal64(data_uncompressed_bytes / data_compressed_bytes, 2) AS compression_ratio,
    toDecimal64(data_compressed_bytes / rows, 2) AS compressed_bytes_per_row
FROM system.parts
WHERE table LIKE 'jaeger_%'
GROUP BY table
ORDER BY table ASC

┌─table────────┬─marks─┬───rows─┬─bytes_on_disk─┬─data_compressed_bytes─┬─data_uncompressed_bytes─┬─compression_ratio─┬─compressed_bytes_per_row─┐
│ jaeger_index │    16 │ 106667 │       2121539 │               2110986 │                22678493 │             10.74 │                    19.79 │
│ jaeger_spans │    20 │ 106667 │       5634663 │               5632817 │                37112272 │              6.58 │                    52.80 │
└──────────────┴───────┴────────┴───────────────┴───────────────────────┴─────────────────────────┴───────────────────┴──────────────────────────┘

@bobrik Could you please tell me, how did you use jaeger-tracegen for this, 'cause I can't figure out how to push generated spans neither to Jaeger nor to Clickhouse.

jpkrohling · 2021-07-22T14:29:58Z

Are you having troubles running tracegen itself? It should send data via UDP to a local agent, which would be connected to a remote collector (unless using a local all-in-one).

https://www.jaegertracing.io/docs/1.24/tools/#tracegen

EinKrebs · 2021-07-23T11:57:49Z

I made it now, thank you!

levonet · 2021-07-23T13:07:07Z

@pavolloffay Here is an example of tables with buffers.
https://github.com/levonet/docker-jaeger/tree/master/examples/cluster/sql
There is an example of the deployment of a Clickhouse cluster and Jaeger deployment.
This scheme can withstand ~100K spans per sec. But we have about 20 agents with 2 collectors in one environment.

pavolloffay · 2021-08-11T13:08:13Z

Folks, I have moved the clickhouse-plugin to jaegertracing org - https://github.com/jaegertracing/jaeger-clickhouse

Note that support only depends on the community. It's not officially supported. However the project is in good state thanks to @EinKrebs excellent work.

Please move any further discussion to that repository.

pavolloffay · 2021-08-11T13:09:03Z

Also note that the last version of Jaeger operator supports storage grpc-plugin.

wangpu666 · 2021-08-30T12:03:00Z

The code I copied is also set SPAN_STORAGE_TYPE=clickhouse , but it still reports an error.

Use "jaeger-ingester [command] --help" for more information about a command.
unknown flag: --clickhouse.datasource

EinKrebs · 2021-08-30T12:04:53Z

The code I copied is also set SPAN_STORAGE_TYPE=clickhouse , but it still reports an error.

Use "jaeger-ingester [command] --help" for more information about a command.
unknown flag: --clickhouse.datasource

Can you tell exactly which code?

nkz-soft · 2024-01-28T12:13:40Z

@EinKrebs @pavolloffay
Is this project still being maintained? The last release was over a year ago and there are many important PRs.

EinKrebs · 2024-01-28T12:50:12Z

@EinKrebs @pavolloffay Is this project still being maintained? The last release was over a year ago and there are many important PRs.

I don't know about jaeger-clickhouse, I personally did not maintain it for 2 years now.

selfing12 · 2024-08-20T11:32:29Z

@pavolloffay Is this project still being maintained?

selfing12 · 2024-08-21T11:32:10Z

@yurishkuro Is there any replacement for grpc-plugin now for using clickhouse as a backend or is it possible to somehow configure query to work with clickhouse?

chenlujjj · 2024-08-21T13:10:41Z

@selfing12 maybe this discussion can help you
https://github.com/orgs/jaegertracing/discussions/5851
I used the plugin with jaeger-query and jaeger-collector v1.57 successfully

selfing12 · 2024-08-21T13:25:24Z

@selfing12 maybe this discussion can help you https://github.com/orgs/jaegertracing/discussions/5851 I used the plugin with jaeger-query and jaeger-collector v1.57 successfully

Yes, as I understand it, this is the last version in which this bundle works, what to do with 1.60+

jpkrohling added area/storage enhancement labels Mar 21, 2019

yurishkuro mentioned this issue Dec 20, 2019

Additional storage backends #638

Open

22 tasks

bobrik mentioned this issue Jul 8, 2020

Speed up SpanReader::findTraceIDs() for Elasticsearch #1475

Closed

Slach mentioned this issue Sep 8, 2020

[wip] add some opentelemetry support ClickHouse/ClickHouse#14195

Merged

Slach mentioned this issue Feb 12, 2021

Support ClickHouse as storage backend? SigNoz/signoz#22

Closed

pavolloffay closed this as completed Aug 11, 2021

yurishkuro mentioned this issue Jan 31, 2023

[Feature]: ClickHouse as a core storage backend #4196

Open

ClickHouse as a storage backend #1438

ClickHouse as a storage backend #1438

Comments

sboisson commented Mar 21, 2019

bzon commented Oct 23, 2019

Slach commented Oct 23, 2019

Slach commented Oct 23, 2019

bzon commented Oct 23, 2019

sboisson commented Oct 23, 2019

yurishkuro commented Oct 23, 2019

bzon commented Oct 23, 2019

sboisson commented Oct 24, 2019

sboisson commented Dec 19, 2019

bobrik commented May 10, 2020 • edited Loading

yurishkuro commented May 11, 2020

Slach commented May 11, 2020

bobrik commented May 11, 2020

bobrik commented May 11, 2020

bobrik commented May 13, 2020

levonet commented Jun 26, 2020

bobrik commented Jun 28, 2020

levonet commented Jul 1, 2020

levonet commented Aug 26, 2020

bobrik commented Aug 27, 2020

bzon commented Aug 27, 2020

bobrik commented Aug 29, 2020

mcarbonneaux commented Oct 6, 2020 • edited Loading

levonet commented Oct 6, 2020

mcarbonneaux commented Oct 8, 2020 • edited Loading

Slach commented Feb 3, 2021

jkowall commented Feb 3, 2021

bobrik commented Feb 3, 2021

Slach commented Feb 4, 2021

otisg commented Mar 26, 2021

yurishkuro commented Mar 26, 2021

jpkrohling commented Mar 30, 2021

chhetripradeep commented Jun 3, 2021 • edited Loading

levonet commented Jun 7, 2021

pavolloffay commented Jul 14, 2021 • edited Loading

Slach commented Jul 15, 2021

pavolloffay commented Jul 15, 2021

bobrik commented Jul 16, 2021

EinKrebs commented Jul 22, 2021 • edited Loading

jpkrohling commented Jul 22, 2021

EinKrebs commented Jul 23, 2021

levonet commented Jul 23, 2021

pavolloffay commented Aug 11, 2021

pavolloffay commented Aug 11, 2021

wangpu666 commented Aug 30, 2021

EinKrebs commented Aug 30, 2021

nkz-soft commented Jan 28, 2024

EinKrebs commented Jan 28, 2024

selfing12 commented Aug 20, 2024

selfing12 commented Aug 21, 2024

chenlujjj commented Aug 21, 2024

selfing12 commented Aug 21, 2024

bobrik commented May 10, 2020 •

edited

Loading

mcarbonneaux commented Oct 6, 2020 •

edited

Loading

mcarbonneaux commented Oct 8, 2020 •

edited

Loading

chhetripradeep commented Jun 3, 2021 •

edited

Loading

pavolloffay commented Jul 14, 2021 •

edited

Loading

EinKrebs commented Jul 22, 2021 •

edited

Loading