Skip to content
This repository has been archived by the owner on Jul 1, 2022. It is now read-only.

Fix ThriftSender max span size check #670

Merged
merged 4 commits into from
Jan 17, 2020

Conversation

yhpark
Copy link
Contributor

@yhpark yhpark commented Nov 30, 2019

Signed-off-by: Yeonghoon Park me@yhpark.io

Which problem is this PR solving?

Fixes #660. Currently the span size check only calculates the byte size of span, where it should also take the process size into account.

Consider a span with size less than getMaxSpanBytes() and greater than getMaxSpanBytes() - processSize. We cannot create a batch that fits UDP limit because we always have to send the Process. Still it passes the size check and corrupts write buffer in ThriftUdpTransport when flushed (which is another problem not dealt in this PR). Then the write buffer is always full with 65000 bytes in the buffer, never reinitialized, preventing all further spans from being sent.

I think this is a very critical issue because it causes the client to stop sending any spans altogether, and we actually encountered this problem frequently in production. For now, a workaround is to set the ThriftSenderBase.maxPacketSize value to something like 64000 by injecting a custom ThriftSenderFactory via ServiceLoader (see SenderResolver.resolve()). Process without custom tags takes around 100 bytes, so using value less than 64800 would be okay.

Short description of the changes

  • Before: span size + Batch thrift overhead < 65000
    (spanSize > getMaxSpanBytes())
  • After: process size + span size + Batch thrift overhead < 65000
    ((processSize + spanSize) > getMaxSpanBytes())

@codecov
Copy link

codecov bot commented Nov 30, 2019

Codecov Report

Merging #670 into master will decrease coverage by 0.46%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #670      +/-   ##
============================================
- Coverage     89.93%   89.46%   -0.47%     
+ Complexity      569      567       -2     
============================================
  Files            69       69              
  Lines          2086     2089       +3     
  Branches        266      267       +1     
============================================
- Hits           1876     1869       -7     
- Misses          129      137       +8     
- Partials         81       83       +2
Impacted Files Coverage Δ Complexity Δ
...rtracing/internal/reporters/CompositeReporter.java 71.42% <0%> (-28.58%) 6% <0%> (-1%)
...gertracing/internal/reporters/LoggingReporter.java 81.81% <0%> (-9.1%) 4% <0%> (-1%)
...egertracing/internal/reporters/RemoteReporter.java 87.35% <0%> (-2.3%) 7% <0%> (ø)
...n/java/io/jaegertracing/internal/JaegerTracer.java 88.01% <0%> (-0.92%) 26% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 46798bd...6929ad9. Read the comment docs.

Signed-off-by: Yeonghoon Park <me@yhpark.io>
…ftSenderBase

Signed-off-by: Yeonghoon Park <me@yhpark.io>
Signed-off-by: Yeonghoon Park <me@yhpark.io>
Signed-off-by: Yeonghoon Park <me@yhpark.io>
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your PR and patience!

result = sender.append(jaegerSpan);

assertEquals(1, result);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@yurishkuro yurishkuro merged commit 74adce4 into jaegertracing:master Jan 17, 2020
foxliu20 pushed a commit to foxliu20/java-opentracing-jaeger-bundle that referenced this pull request Jun 18, 2021
Jaeger < 1.2.0 has a critical bug which causes the client stop working.

The problem has been fixed at 1.2.0

jaegertracing/jaeger-client-java#670
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

client stops working after failing to send messages that are too large
3 participants