Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClickHouse as a storage backend #1438

Closed
sboisson opened this issue Mar 21, 2019 · 67 comments
Closed

ClickHouse as a storage backend #1438

sboisson opened this issue Mar 21, 2019 · 67 comments

Comments

@sboisson
Copy link

ClickHouse, an open-source column-oriented DBMS designed initially for real-time analytics and mostly write-once/read-many big-data use cases, can be used as a very efficient log and trace storage.

Meta issue: #638 Additional storage backends

@bzon
Copy link

bzon commented Oct 23, 2019

Is anyone working on this? I or my team can maybe give a shot at this.

@Slach
Copy link

Slach commented Oct 23, 2019

as i now noone working on this

@Slach
Copy link

Slach commented Oct 23, 2019

@bzon i can join you as tester

@bzon
Copy link

bzon commented Oct 23, 2019

@bzon i can join you as tester

Sure!

@sboisson
Copy link
Author

Happy to see someone working on this :)
Might be able to join as tester

@yurishkuro
Copy link
Member

@bzon would be good if you post an architecture here, specifically how you would lay out the data, ingestion, etc. To my knowledge, clickhouse requires batched writes, and it may even be up to you to decide which node to send the writes to, so there are many questions. It may require some benchmarking to find the optimal design.

@bzon
Copy link

bzon commented Oct 23, 2019

@yurishkuro at the moment, we have zero knowledge with the internals of jaeger. I think that benchmarking should be the first step to see if this integration is feasible. And with that said, the first requirement should be creating the right table schema.

@sboisson
Copy link
Author

I think architecture of project https://github.com/flant/loghouse could be a source of inspiration…

@sboisson
Copy link
Author

This webinar could be interesting: A Practical Introduction to Handling Log Data in ClickHouse

@bobrik
Copy link
Contributor

bobrik commented May 10, 2020

I took a stab at it (very early WIP):

This is the schema I used:

  • Index table for fast searches (haven't measured if indices are useful yet)
CREATE TABLE jaeger_index (
  timestamp DateTime64(6),
  traceID FixedString(16),
  service LowCardinality(String),
  operation LowCardinality(String),
  durationUs UInt64,
  tags Nested(
    key LowCardinality(String),
    valueString LowCardinality(String),
    valueBool UInt8,
    valueInt Int64,
    valueFloat Float64
  ),
  INDEX tags_strings (tags.key, tags.valueString) TYPE set(0) GRANULARITY 64,
  INDEX tags_ints (tags.key, tags.valueInt) TYPE set(0) GRANULARITY 64
) ENGINE MergeTree() PARTITION BY toDate(timestamp) ORDER BY (timestamp, service, operation);
  • Data table for span storage and quick retrieval by traceID
CREATE TABLE jaeger_spans (
  timestamp DateTime64(6),
  traceID FixedString(16),
  model String
) ENGINE MergeTree() PARTITION BY toDate(timestamp) ORDER BY traceID;

You probably need Clickhouse 20.x for DateTime64, I used 20.1.11.73.

Index table looks like this:

SELECT *
FROM jaeger_index
ARRAY JOIN tags
ORDER BY timestamp DESC
LIMIT 20
FORMAT PrettyCompactMonoBlock
┌──────────────────timestamp─┬─traceID──────────┬─service──────┬─operation────┬─durationUs─┬─tags.key─────────────┬─tags.valueString─┬─tags.valueBool─┬─tags.valueInt─┬─tags.valueFloat─┐
│ 2020-05-10 20:43:23.248314 │ 212e5c616f4b9c2f │ jaeger-query │ getTraces    │     131605 │ num_trace_ids        │                  │              0 │            13 │               0 │
│ 2020-05-10 20:43:23.248314 │ 212e5c616f4b9c2f │ jaeger-query │ getTraces    │     131605 │ weird                │                  │              1 │             0 │               0 │
│ 2020-05-10 20:43:23.248314 │ 212e5c616f4b9c2f │ jaeger-query │ getTraces    │     131605 │ π                    │                  │              0 │             0 │            3.14 │
│ 2020-05-10 20:43:23.248314 │ 212e5c616f4b9c2f │ jaeger-query │ getTraces    │     131605 │ internal.span.format │ proto            │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.248314 │ 212e5c616f4b9c2f │ jaeger-query │ getTraces    │     131605 │ jaeger.version       │ Go-2.22.1        │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.248314 │ 212e5c616f4b9c2f │ jaeger-query │ getTraces    │     131605 │ hostname             │ C02TV431HV2Q     │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.248314 │ 212e5c616f4b9c2f │ jaeger-query │ getTraces    │     131605 │ ip                   │ 192.168.1.43     │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.248314 │ 212e5c616f4b9c2f │ jaeger-query │ getTraces    │     131605 │ client-uuid          │ 7fc8f98ddbcd358c │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065577 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraceIDs │     182728 │ internal.span.format │ proto            │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065577 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraceIDs │     182728 │ jaeger.version       │ Go-2.22.1        │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065577 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraceIDs │     182728 │ hostname             │ C02TV431HV2Q     │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065577 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraceIDs │     182728 │ ip                   │ 192.168.1.43     │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065577 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraceIDs │     182728 │ client-uuid          │ 7fc8f98ddbcd358c │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065574 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraces   │     314349 │ internal.span.format │ proto            │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065574 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraces   │     314349 │ jaeger.version       │ Go-2.22.1        │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065574 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraces   │     314349 │ hostname             │ C02TV431HV2Q     │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065574 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraces   │     314349 │ ip                   │ 192.168.1.43     │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065574 │ 212e5c616f4b9c2f │ jaeger-query │ FindTraces   │     314349 │ client-uuid          │ 7fc8f98ddbcd358c │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065535 │ 212e5c616f4b9c2f │ jaeger-query │ /api/traces  │     315554 │ sampler.type         │ const            │              0 │             0 │               0 │
│ 2020-05-10 20:43:23.065535 │ 212e5c616f4b9c2f │ jaeger-query │ /api/traces  │     315554 │ sampler.param        │                  │              1 │             0 │               0 │
└────────────────────────────┴──────────────────┴──────────────┴──────────────┴────────────┴──────────────────────┴──────────────────┴────────────────┴───────────────┴─────────────────┘

Tags are stored in their original types, so with enough SQL-fu you can find all spans with response size between X and Y bytes, for example.

The layout of the query is different from Elasticsearch, since now you have all tags for the trace laid out on a single view. This means that if you search for all operations of some service, you will get a cross-span result where one tag can match one span, and another tag can match another span. Consider the following trace:

+ upstream (tags: {"host": "foo.bar"})
++ upstream_ttfb (tags: {"status": 200})
++ upstream_download (tags: {"error": true})

You can search for host=foo.bar status=200 across all operations and this trace will be found, even though no since span has both tags. This seems like a really nice upside.

There's support for both JSON and Protobuf storage. The former allows out-of-band queries, since Clickhouse supports JSON functions. The latter is much more compact.

I pushed 100K spans from tracegen through this with a local Clickhouse in a Docker container with stock settings, and here's how storage looks like:

SELECT
    table,
    sum(marks) AS marks,
    sum(rows) AS rows,
    sum(bytes_on_disk) AS bytes_on_disk,
    sum(data_compressed_bytes) AS data_compressed_bytes,
    sum(data_uncompressed_bytes) AS data_uncompressed_bytes,
    toDecimal64(data_uncompressed_bytes / data_compressed_bytes, 2) AS compression_ratio,
    toDecimal64(data_compressed_bytes / rows, 2) AS compressed_bytes_per_row
FROM system.parts
WHERE table LIKE 'jaeger_%'
GROUP BY table
ORDER BY table ASC
SELECT
    table,
    sum(marks) AS marks,
    sum(rows) AS rows,
    sum(bytes_on_disk) AS bytes_on_disk,
    sum(data_compressed_bytes) AS data_compressed_bytes,
    sum(data_uncompressed_bytes) AS data_uncompressed_bytes,
    toDecimal64(data_uncompressed_bytes / data_compressed_bytes, 2) AS compression_ratio,
    toDecimal64(data_compressed_bytes / rows, 2) AS compressed_bytes_per_row
FROM system.parts
WHERE table LIKE 'jaeger_%'
GROUP BY table
ORDER BY table ASC

┌─table────────┬─marks─┬───rows─┬─bytes_on_disk─┬─data_compressed_bytes─┬─data_uncompressed_bytes─┬─compression_ratio─┬─compressed_bytes_per_row─┐
│ jaeger_index │    16 │ 106667 │       2121539 │               2110986 │                22678493 │             10.74 │                    19.79 │
│ jaeger_spans │    20 │ 106667 │       5634663 │               5632817 │                37112272 │              6.58 │                    52.80 │
└──────────────┴───────┴────────┴───────────────┴───────────────────────┴─────────────────────────┴───────────────────┴──────────────────────────┘
SELECT
    table,
    column,
    type,
    sum(column_data_compressed_bytes) AS compressed,
    sum(column_data_uncompressed_bytes) AS uncompressed,
    toDecimal64(uncompressed / compressed, 2) AS compression_ratio,
    sum(rows) AS rows,
    toDecimal64(compressed / rows, 2) AS bytes_per_row
FROM system.parts_columns
WHERE (table LIKE 'jaeger_%') AND active
GROUP BY
    table,
    column,
    type
ORDER BY
    table ASC,
    column ASC
┌─table────────┬─column───────────┬─type──────────────────────────┬─compressed─┬─uncompressed─┬─compression_ratio─┬───rows─┬─bytes_per_row─┐
│ jaeger_index │ durationUs       │ UInt64                        │     248303 │       853336 │              3.43 │ 106667 │          2.32 │
│ jaeger_index │ operation        │ LowCardinality(String)        │       5893 │       107267 │             18.20 │ 106667 │          0.05 │
│ jaeger_index │ service          │ LowCardinality(String)        │        977 │       107086 │            109.60 │ 106667 │          0.00 │
│ jaeger_index │ tags.key         │ Array(LowCardinality(String)) │      29727 │      1811980 │             60.95 │ 106667 │          0.27 │
│ jaeger_index │ tags.valueBool   │ Array(UInt8)                  │      29063 │      1810904 │             62.30 │ 106667 │          0.27 │
│ jaeger_index │ tags.valueFloat  │ Array(Float64)                │      44762 │      8513880 │            190.20 │ 106667 │          0.41 │
│ jaeger_index │ tags.valueInt    │ Array(Int64)                  │     284393 │      8513880 │             29.93 │ 106667 │          2.66 │
│ jaeger_index │ tags.valueString │ Array(LowCardinality(String)) │      31695 │      1814416 │             57.24 │ 106667 │          0.29 │
│ jaeger_index │ timestamp        │ DateTime64(6)                 │     431835 │       853336 │              1.97 │ 106667 │          4.04 │
│ jaeger_index │ traceID          │ FixedString(16)               │    1063375 │      1706672 │              1.60 │ 106667 │          9.96 │
│ jaeger_spans │ model            │ String                        │    4264180 │     34552264 │              8.10 │ 106667 │         39.97 │
│ jaeger_spans │ timestamp        │ DateTime64(6)                 │     463444 │       853336 │              1.84 │ 106667 │          4.34 │
│ jaeger_spans │ traceID          │ FixedString(16)               │     905193 │      1706672 │              1.88 │ 106667 │          8.48 │
└──────────────┴──────────────────┴───────────────────────────────┴────────────┴──────────────┴───────────────────┴────────┴───────────────┘

We have around 74B daily docs in our production Elasticsearch storage. My plan is to switch that to fields-as-tags, remove indexing of non-queried fields (logs, nested tags, references), then switch to a sorted index and then see how Clickhouse compares to that for the same spans.

@yurishkuro
Copy link
Member

@bobrik very interesting, thanks for sharing. I am curious what the performance for retrieving by trace ID would be like.

Q: why do you use LowCardinality(String) for tags? Some tags can be very high cardinality, e.g. URLs.

You can search for host=foo.bar status=200 across all operations and this trace will be found, even though no since span has both tags. This seems like a really nice upside.

I'm confused why this would be the case. Doesn't CH evaluate the query in full against each row (i.e. each span)? Or is this because how your plugin interacts with CH?

@Slach
Copy link

Slach commented May 11, 2020

@yurishkuro even 100k per block cardinality, LowCardinality(String) with dictionary based encoding will better than just String

@bobrik
Copy link
Contributor

bobrik commented May 11, 2020

@yurishkuro retrieving by trace ID is pretty very fast, since you're doing a primary key lookup.

ClickHouse/ClickHouse#4074 (comment) says this about LowCardinality:

Rule of thumb: it should make benefits if the number of distinct values is less that few millions.

That said, the schema is in no way final.

I'm confused why this would be the case. Doesn't CH evaluate the query in full against each row (i.e. each span)? Or is this because how your plugin interacts with CH?

The row is not span, it's span-key-value combination. That's the key.

Take a look at the output of this query, which is an equivalent of what I do:

SELECT *
FROM jaeger_index
ARRAY JOIN tags
ORDER BY timestamp DESC
LIMIT 20
FORMAT PrettyCompactMonoBlock
┌──────────────────timestamp─┬─traceID──────────┬─service──────┬─operation─────┬─durationUs─┬─tags.key─────────────┬─tags.valueString─┬─tags.valueBool─┬─tags.valueInt─┬─tags.valueFloat─┐
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces     │    3414327 │ num_trace_ids        │                  │              0 │            20 │               0 │
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces     │    3414327 │ weird                │                  │              1 │             0 │               0 │
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces     │    3414327 │ π                    │                  │              0 │             0 │            3.14 │
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces     │    3414327 │ internal.span.format │ proto            │              0 │             0 │               0 │
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces     │    3414327 │ jaeger.version       │ Go-2.22.1        │              0 │             0 │               0 │
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces     │    3414327 │ hostname             │ C02TV431HV2Q     │              0 │             0 │               0 │
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces     │    3414327 │ ip                   │ 192.168.1.43     │              0 │             0 │               0 │
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces     │    3414327 │ client-uuid          │ 3f9574079594605c │              0 │             0 │               0 │
│ 2020-05-11 04:53:45.723921 │ 700a1bff0bdf3141 │ jaeger-query │ GetOperations │    2268055 │ internal.span.format │ proto            │              0 │             0 │               0 │
│ 2020-05-11 04:53:45.723921 │ 700a1bff0bdf3141 │ jaeger-query │ GetOperations │    2268055 │ jaeger.version       │ Go-2.22.1        │              0 │             0 │               0 │
└────────────────────────────┴──────────────────┴──────────────┴───────────────┴────────────┴──────────────────────┴──────────────────┴────────────────┴───────────────┴─────────────────┘

Here we joint traceID with nested tags, repeating every tag key value with corresponding trace.

Operation getTraces does not happen multiple times in trace 3adb641936b21d98 , but our query makes a separate row for it, so that cross-span tags naturally work out of the box

Compare this to how it looks like at span level (this is what Elasticsearch works with):

SELECT *
FROM jaeger_index
WHERE (traceID = '3adb641936b21d98') AND (service = 'jaeger-query') AND (operation = 'getTraces')
┌──────────────────timestamp─┬─traceID──────────┬─service──────┬─operation─┬─durationUs─┬─tags.key────────────────────────────────────────────────────────────────────────────────────────────┬─tags.valueString────────────────────────────────────────────────────────────────┬─tags.valueBool────┬─tags.valueInt──────┬─tags.valueFloat──────┐
│ 2020-05-11 04:53:54.299067 │ 3adb641936b21d98 │ jaeger-query │ getTraces │    3414327 │ ['num_trace_ids','weird','π','internal.span.format','jaeger.version','hostname','ip','client-uuid'] │ ['','','','proto','Go-2.22.1','C02TV431HV2Q','192.168.1.43','3f9574079594605c'] │ [0,1,0,0,0,0,0,0] │ [20,0,0,0,0,0,0,0] │ [0,0,3.14,0,0,0,0,0] │
└────────────────────────────┴──────────────────┴──────────────┴───────────┴────────────┴───────────────────────────────────────────────

Clickhouse doesn't allow direct lookups against nested fields like tags.ip=192.168.1.43, instead you have to do a JOIN ARRAY which results in this new property.

@bobrik
Copy link
Contributor

bobrik commented May 11, 2020

I think I might be more confused about tag queries that I initially realized, please take my explanation with a big grain of salt.

@bobrik
Copy link
Contributor

bobrik commented May 13, 2020

Turns out that having a sparse tags with nested fields is not great for searches when you have a lot of spans, so went back to array of key=value strings with a bloom filter index on top.

I've tried combining span storage and indexing into one table, but that proved to be very detrimental to performance of lookup by trace id. It compressed somewhat better, though.

If anyone wants to give it a spin, please be my guest:

It feels a lot snappier than Elasticsearch, and it's at least 2.5x more compact (4.4x if you compare to Elasticsearch behaviour out of the box).

@levonet
Copy link
Member

levonet commented Jun 26, 2020

@bobrik What stops Pull Request?
Are you planning to do more optimization?

@bobrik
Copy link
Contributor

bobrik commented Jun 28, 2020

There were some issue with ClickHouse that prevented me from getting reasonable latency for the amount of data we have.

This is the main one:

See also:

There are also quite a few local changes that I haven't pushed yet.

I'll see if I can carve out some more time for this to push the latest changes, but no promises on ETA.

@levonet
Copy link
Member

levonet commented Jul 1, 2020

Another question.

Maybe it makes sense to divide the settings of tables jaeger_index_v2 and jaeger_spans_v2 into read and write?
This will create tables on the cluster as follows:

  • insert -> jaeger_*_v2_buffer -> jaeger_*_v2_distributed -> jaeger_*_v2
  • select -> jaeger_*_v2_distributed -> jaeger_*_v2

This will take the load off Jaeger when it collects the batches (no need to make large batches on high load).

@levonet
Copy link
Member

levonet commented Aug 26, 2020

It seems that all the necessary changes have been made in the latest versions of ClickHouse.
@bobrik Are you still working on the plugin?

@bobrik
Copy link
Contributor

bobrik commented Aug 27, 2020

Yes and no. The code is pretty solid and it's been running on a single node Clickhouse for the last month, but I'm waiting on an internal Cluster provisioning to finish to test it more broadly with real humans making queries. My plan is to add more tests and make a PR upstream after that.

The latest code here:

@bzon
Copy link

bzon commented Aug 27, 2020

@bobrik Impressive. I can test this on my clickhouse cluster. A few question:

  • How is the performance compared to previous backend solutions you tried?
  • How to configure the jaeger collector to use clickhouse?
  • Will jaeger collector create the tables?

@bobrik
Copy link
Contributor

bobrik commented Aug 29, 2020

How is the performance compared to previous backend solutions you tried?

First, here's how stock Elasticsearch backend compares to our modified one (2x replication in both cases):

$ curl -s 'https://foo.baz/_cat/indices/jaeger-span-2020-05-20?s=index&v&h=index,pri,docs.count,store.size,segments.memory,segments.fixed_bitset_memory,fielddata.memory_size,memory.total'
index                  pri  docs.count store.size segments.memory segments.fixed_bitset_memory fielddata.memory_size memory.total
jaeger-span-2020-05-20  38 81604373583      8.9tb          17.1gb                         38gb                    0b       17.3gb

$ curl 'https://foo.bar/_cat/indices/jaeger-span-2020-05-20?s=index&v&h=index,pri,docs.count,store.size,segments.memory,segments.fixed_bitset_memory,fielddata.memory_size,memory.total'
index                  pri docs.count store.size segments.memory segments.fixed_bitset_memory fielddata.memory_size memory.total
jaeger-span-2020-05-20  38  8406585575      4.8tb         192.1mb                           0b                    0b      192.1mb

We don't use nested docs and sort the index, which:

  • Removes an issue when shards run out of docs (2147483519 is the limit)
  • Disk space usage is 2x more efficient, 312 bytes per span
  • Index memory usage is down from almost 17.3GiB to just 0.2GiB
  • Bitset memory usage is down from 38GiB to just 0B

As you can see, this was back in May. Now we can compare improved Elasticsearch to Clickhouse:

$ curl 'https://foo.bar/_cat/indices/jaeger-span-2020-08-28?s=index&v&h=index,pri,docs.count,store.size,segments.memory,segments.fixed_bitset_memory,fielddata.memory_size,memory.total'
index                  pri  docs.count store.size segments.memory segments.fixed_bitset_memory fielddata.memory_size memory.total
jaeger-span-2020-08-28  38 13723317165      8.1tb         331.2mb                           0b                    0b      916.1mb
SELECT
       table,
       partition,
       sum(marks) AS marks,
       sum(rows) AS rows,
       formatReadableSize(sum(data_compressed_bytes)) AS compressed_bytes,
       formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_bytes,
       toDecimal64(sum(data_uncompressed_bytes) / sum(data_compressed_bytes), 2) AS compression_ratio,
       formatReadableSize(sum(data_compressed_bytes) / rows) AS bytes_per_row,
       formatReadableSize(sum(primary_key_bytes_in_memory)) AS pk_in_memory
  FROM system.parts
 WHERE (table IN ('jaeger_index_v2', 'jaeger_spans_v2', 'jaeger_archive_spans_v2', '.inner.jaeger_operations_v2'))
   AND active
   AND partition = '2020-08-28'
 GROUP BY table, partition
 ORDER BY table ASC, partition ASC
┌─table───────────┬─partition──┬────marks─┬────────rows─┬─compressed_bytes─┬─uncompressed_bytes─┬─compression_ratio─┬─bytes_per_row─┬─pk_in_memory─┐
│ jaeger_index_v2 │ 2020-08-281340169113723301235294.31 GiB       │ 2.75 TiB           │              9.5623.03 B       │ 115.04 MiB   │
│ jaeger_spans_v2 │ 2020-08-281340169313723301235757.75 GiB       │ 4.55 TiB           │              6.1459.29 B       │ 358.88 MiB   │
└─────────────────┴────────────┴──────────┴─────────────┴──────────────────┴────────────────────┴───────────────────┴───────────────┴──────────────┘
  • Disk usage is down from 4.0TiB to 1.0TiB, 4x (bytes per span is down from 324 to 82)
  • Memory usage is roughly the same
  • Search performance is roughly the same, but I'm comparing single Clickhouse node to 48 Elasticsearch nodes
  • Trace lookup is instantaneous, since it's a primary key lookup

How to configure the jaeger collector to use clickhouse?

Set SPAN_STORAGE_TYPE=clickhouse env variable and then --clickhouse.datasource to point to your Clickhouse database URL, for example: tcp://localhost:9000?database=jaeger.

Will jaeger collector create the tables?

No, there are multitudes of ways to organize tables and you may want to tweak table settings yourself, that's why I only provide an example starting schema. This is also part of the reason I want to test on our production cluster rather than on a single node before I make a PR.

Please consult the docs: https://github.com/bobrik/jaeger/tree/ivan/clickhouse/plugin/storage/clickhouse#schema

Keep in mind that you need at least Clickhouse v20.7.1.4189 to have reasonable performance.

@mcarbonneaux
Copy link

mcarbonneaux commented Oct 6, 2020

it's very promising !
you've try to use chproxy in for caching request ?
or tricksterproxy who are optimized for request by time range...

@levonet
Copy link
Member

levonet commented Oct 6, 2020

The first and second proxy is not suitable because both use the HTTP protocol.
I'm now thinking about to start the controller before each Clickhouse. And defining on each agent a list of all controllers at the time of deployment.
Otherwise, there are will have to do TCP balancing.

@mcarbonneaux
Copy link

mcarbonneaux commented Oct 8, 2020

The first and second proxy is not suitable because both use the HTTP protocol.

hooo your backend use the native tcp protocol of clickhouse !

yes you use: https://github.com/ClickHouse/clickhouse-go that use native protocol !

@Slach
Copy link

Slach commented Feb 3, 2021

@jkowall why you think ClickHouse doesn't have scale-out? It near-linear scalability available in ClickHouse itself by design

just add shard with hosts to <remote_servers> in /etc/clickhouse-server/config.d/remote_servers.xml
and run CREATE TABLE ... Engine=Distributed(cluster_name, db.table, sharding_key_expression) ON CLUSTER ... ;

after it you can insert into any host and data will spreaded via sharding key, or you can use chproxy \ haproxy to data ingestion directly into MergeTree table

also you can read from all servers with smarter aggregation from Distributed table

@jkowall
Copy link
Contributor

jkowall commented Feb 3, 2021

@Slach yes I saw that, but if you have hundreds of nodes it can be an issue I would assume. We run thousands of nodes of ES. Bigger issue would be supporting ClickHouse in Kibana, but that's for another project :)

@bobrik
Copy link
Contributor

bobrik commented Feb 3, 2021

@mcspring I'm not actively working on it at the moment, since it doesn't require any maintenance from me. Feel free to submit a PR and I'll get to it eventually.

@Slach
Copy link

Slach commented Feb 4, 2021

@jkowall try to use https://github.com/Altinity/clickhouse-operator/

@otisg
Copy link

otisg commented Mar 26, 2021

It is unlikely that we'll add any other storage mechanism to the core repository in the near future. What will probably happen is that we'll change the architecture of how this works for Jaeger v2, making it easier to plug your own storage plugin.

@jpkrohling is this still the case?
#638 is still open. Are you saying this should really be closed because a completely different approach is being pursued?

making it easier to plug your own storage plugin.

I tried finding an issue for this, but could not find it. Is there an issue for this? Thanks!

@yurishkuro
Copy link
Member

@otisg don't know about the issue, but the approach is that we're rebuilding Jaeger backend on top of OpenTelemetry collector, which has a mechanism of combining various packages (even from different repos) at build time without explicit code dependencies in the core. That means the storage implementations can directly implement storage interface and don't have to go through much less efficient grpc-plugin interface.

@jpkrohling
Copy link
Contributor

I don't think we have an issue tracking this specific feature, but we know it will be there for v2, which is being tracked here: https://github.com/jaegertracing/jaeger/milestone/15

@chhetripradeep
Copy link

chhetripradeep commented Jun 3, 2021

Based on personal experience, i would like to recommend adding clickhouse as one of core storage options for jaeger (like elasticsearch and cassandra), if not, we should probably port Ivan's work as a gRPC plugin.

There are few benefits I can think of:

  • Clickhouse is lot better in disk i/o as compared to ES. Clickhouse scales very well with magnetic disks but scaling ES requires ssds.
  • Clickhouse has very good data compression support as compared to ES.

Uber's migration of their logging pipeline from ES to clickhouse[0] is a very good example of clickhouse performance.

[0] - https://eng.uber.com/logging/

@levonet
Copy link
Member

levonet commented Jun 7, 2021

I also like the idea of ​​adding a plugin as one of the core storage options for jaeger.
This will allow to organically add a clickhouse to existing jaeger-operator and helm-charts.

We have been using Ivan's plugin for half a year. The use of a clickhouse as storage is almost invisible in terms of infrastructure resources. Docker images with an updated version of jaeger and a startup example can be found here
https://github.com/levonet/docker-jaeger if anyone is interested.

@pavolloffay
Copy link
Member

pavolloffay commented Jul 14, 2021

I have migrated https://github.com/bobrik/jaeger/tree/ivan/clickhouse/plugin/storage/clickhouse to storage plugin. The source code is available here https://github.com/pavolloffay/jaeger-clickhouse.

For now I am not planning on maintaining it, I have built it for our internal purposes. However if there is interest and somebody would like to help to maintain it I am happy to chat about it.

The repository contains instructions how to run it and the code was tested locally (see readme). When I run the code on more data I get Too many simultaneous queries. Maximum: 100, see https://github.com/pavolloffay/jaeger-clickhouse/issues/2. It might be just DB configuration issue.

@Slach
Copy link

Slach commented Jul 15, 2021

@pavolloffay thanks a lot for your great efforts ;) I hope you will bring ClickHouse to first-class citizen storage in Jaeger

@pavolloffay
Copy link
Member

Folks, did you have to increase maximum nuber of queries in ClickHouse when running Jaeger?

<max_concurrent_queries>100</max_concurrent_queries>

I am getting DB::Exception: Too much simultaneous queries. Maximum: 100 when running the default ClickHouse image.

@bobrik / @Slach how does your Jaeger deployment look like? Are you using Kafka as well?

@bobrik
Copy link
Contributor

bobrik commented Jul 16, 2021

All ingress in our deployment comes from Kafka with large batches (10k to 100k). We have 512 as the concurrent query limit, but the metrics show under 20 concurrent queries per node in a 3 node cluster.

@EinKrebs
Copy link
Member

EinKrebs commented Jul 22, 2021

I pushed 100K spans from tracegen through this with a local Clickhouse in a Docker container with stock settings, and here's how storage looks like:

SELECT
    table,
    sum(marks) AS marks,
    sum(rows) AS rows,
    sum(bytes_on_disk) AS bytes_on_disk,
    sum(data_compressed_bytes) AS data_compressed_bytes,
    sum(data_uncompressed_bytes) AS data_uncompressed_bytes,
    toDecimal64(data_uncompressed_bytes / data_compressed_bytes, 2) AS compression_ratio,
    toDecimal64(data_compressed_bytes / rows, 2) AS compressed_bytes_per_row
FROM system.parts
WHERE table LIKE 'jaeger_%'
GROUP BY table
ORDER BY table ASC

┌─table────────┬─marks─┬───rows─┬─bytes_on_disk─┬─data_compressed_bytes─┬─data_uncompressed_bytes─┬─compression_ratio─┬─compressed_bytes_per_row─┐
│ jaeger_index │    16 │ 106667 │       2121539 │               2110986 │                22678493 │             10.74 │                    19.79 │
│ jaeger_spans │    20 │ 106667 │       5634663 │               5632817 │                37112272 │              6.58 │                    52.80 │
└──────────────┴───────┴────────┴───────────────┴───────────────────────┴─────────────────────────┴───────────────────┴──────────────────────────┘

@bobrik Could you please tell me, how did you use jaeger-tracegen for this, 'cause I can't figure out how to push generated spans neither to Jaeger nor to Clickhouse.

@jpkrohling
Copy link
Contributor

Are you having troubles running tracegen itself? It should send data via UDP to a local agent, which would be connected to a remote collector (unless using a local all-in-one).

https://www.jaegertracing.io/docs/1.24/tools/#tracegen

@EinKrebs
Copy link
Member

I made it now, thank you!

@levonet
Copy link
Member

levonet commented Jul 23, 2021

@pavolloffay Here is an example of tables with buffers.
https://github.com/levonet/docker-jaeger/tree/master/examples/cluster/sql
There is an example of the deployment of a Clickhouse cluster and Jaeger deployment.
This scheme can withstand ~100K spans per sec. But we have about 20 agents with 2 collectors in one environment.

@pavolloffay
Copy link
Member

Folks, I have moved the clickhouse-plugin to jaegertracing org - https://github.com/jaegertracing/jaeger-clickhouse

Note that support only depends on the community. It's not officially supported. However the project is in good state thanks to @EinKrebs excellent work.

Please move any further discussion to that repository.

@pavolloffay
Copy link
Member

Also note that the last version of Jaeger operator supports storage grpc-plugin.

@wangpu666
Copy link

The code I copied is also set SPAN_STORAGE_TYPE=clickhouse , but it still reports an error.

Use "jaeger-ingester [command] --help" for more information about a command.
unknown flag: --clickhouse.datasource

@EinKrebs
Copy link
Member

The code I copied is also set SPAN_STORAGE_TYPE=clickhouse , but it still reports an error.

Use "jaeger-ingester [command] --help" for more information about a command.
unknown flag: --clickhouse.datasource

Can you tell exactly which code?

@nkz-soft
Copy link

@EinKrebs @pavolloffay
Is this project still being maintained? The last release was over a year ago and there are many important PRs.

@EinKrebs
Copy link
Member

@EinKrebs @pavolloffay Is this project still being maintained? The last release was over a year ago and there are many important PRs.

I don't know about jaeger-clickhouse, I personally did not maintain it for 2 years now.

@selfing12
Copy link

@pavolloffay Is this project still being maintained?

@selfing12
Copy link

@yurishkuro Is there any replacement for grpc-plugin now for using clickhouse as a backend or is it possible to somehow configure query to work with clickhouse?

@chenlujjj
Copy link

@selfing12 maybe this discussion can help you
https://github.com/orgs/jaegertracing/discussions/5851
I used the plugin with jaeger-query and jaeger-collector v1.57 successfully

@selfing12
Copy link

@selfing12 maybe this discussion can help you https://github.com/orgs/jaegertracing/discussions/5851 I used the plugin with jaeger-query and jaeger-collector v1.57 successfully

Yes, as I understand it, this is the last version in which this bundle works, what to do with 1.60+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests