Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supports reading multiple spans per Kafka message #995

Merged
merged 1 commit into from
Mar 2, 2016

Conversation

codefromthecrypt
Copy link
Member

Kafka messages have binary payloads and no key. The binary contents are
serialized TBinaryProtocol thrift messages. This change peeks at thei
first bytes to see if it is a List of structs or not, reading
accordingly.

This approach would need a revision if we ever add a Struct field to
Span. However, that is unlikely. At the point we change the structure of
Span, we'd likely change other aspects which would make it a different
struct completely (see #939). In such case, we'd add a key to the kafka
message of the span version, and not hit the code affected in this
change.

Fixes #979

Kafka messages have binary payloads and no key. The binary contents are
serialized TBinaryProtocol thrift messages. This change peeks at thei
first bytes to see if it is a List of structs or not, reading
accordingly.

This approach would need a revision if we ever add a Struct field to
Span. However, that is unlikely. At the point we change the structure of
Span, we'd likely change other aspects which would make it a different
struct completely (see #939). In such case, we'd add a key to the kafka
message of the span version, and not hit the code affected in this
change.

Fixes #979
@codefromthecrypt
Copy link
Member Author

cc @clehene @kristofa @prat0318

so after this.. instrumentation can choose to either TBinaryProtocol encode a single span, or a list of spans.

@eirslett
Copy link
Contributor

I must admit this feels very hacky and brittle to me... but it could work as a temporary migration path. In the future, I suggest we should only support collecting a list of spans. For Zipkin v2, we could provide a deprecation warning - add warning log messages whenever the collector receives single spans, and not a list of spans?

@yurishkuro
Copy link
Contributor

Did anyone do any benchmarks to show that writing/reading Kafka messages with a single span is slowed than messages with batch of spans?

I do agree with @eirslett that APIs that accept just a single span instead of a collector are suboptimal. Although logging a warning would just spam the collector's log, it can't do much about what clients submit.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Feb 24, 2016 via email

@eirslett
Copy link
Contributor

Don't get me wrong, I'm +1 for this change. I don't like the solution, but I still think it's the best solution - there's no smooth migration path. One possible alternative would be to consume span collections on a separate kafka topic, but then we get lots of additional complexity from handling two topics.

@prat0318
Copy link

I will try running the collector with this patch to check the perf

On Wednesday, February 24, 2016, Eirik Sletteberg notifications@github.com
wrote:

Don't get me wrong, I'm +1 for this change. I don't like the solution, but
I still think it's the best solution - there's no smooth migration path.
One possible alternative would be to consume span collections on a separate
kafka topic, but then we get lots of additional complexity from handling
two topics.


Reply to this email directly or view it on GitHub
#995 (comment).

Prateek Agarwal

@yurishkuro
Copy link
Contributor

is there indeed a lot of setup overhead to use a different topic for different message format?

@prat0318
Copy link

From usability perspective, maintaining multiple topics would be hard imo. For testing perf and maintaining backwards compat, it will be helpful though

@prat0318
Copy link

prat0318 commented Mar 1, 2016

In my experiment, each Trace has 9 spans, which means without patch 9 kafka messages per trace.

- Traces/s/partition Spans/s/partition Messages/s/partition
Production 174.5 1570 1570
Consumption 40.5 364 364

With bundling, all 9 spans are batched up in 2 kafka messages.

- Traces/s/partition Spans/s/partition Messages/s/partition
Production 200 1800 400
Consumption 175 1580 351

As we can see, we straight away get a perf improvement of ~4.5 times. This should
be ~9 times if i batch up all spans in a single kafka message (it just needed some
more code changes at my end, so i just went ahead with 2 messages).

So, what i see is Message consumption rate is around constant to ~350 but we can increase
our Trace consumption rate by bundling together. So, i am super excited to have this
patch merged to master. 👍

cc @adriancole @yurishkuro

@codefromthecrypt
Copy link
Member Author

I've not heard any feedback against from a kafka transport user, and this clearly will help @prat0318 and move us in the right direction of moving towards lists as the defacto unit-of-transport.

merging

codefromthecrypt pushed a commit that referenced this pull request Mar 2, 2016
Supports reading multiple spans per Kafka message
@codefromthecrypt codefromthecrypt merged commit fe6cf88 into master Mar 2, 2016
@codefromthecrypt codefromthecrypt deleted the multi-kafka branch March 2, 2016 01:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants