-
Notifications
You must be signed in to change notification settings - Fork 231
Conversation
Codecov Report
@@ Coverage Diff @@
## master #142 +/- ##
===========================================
- Coverage 79.94% 79.84% -0.1%
+ Complexity 455 448 -7
===========================================
Files 76 76
Lines 1715 1722 +7
Branches 188 192 +4
===========================================
+ Hits 1371 1375 +4
- Misses 256 261 +5
+ Partials 88 86 -2
Continue to review full report at Codecov.
|
metrics.tracesJoinedSampled.inc(1); | ||
} else { | ||
metrics.tracesJoinedNotSampled.inc(1); | ||
} | ||
return parent; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be parentSpan reused when using jaeger.thrift?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this UI does not show ->
for client-server RPC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
->
is only shown when RPC is collapsed- it requires span.kind to be set on both sides
what kind of demo app did you try? maybe it's instrumentation issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
App is instrumented correctly, the problem was that I was trying this PR when SPAN_KIND
tags were not added due to zipkin sr,cs.. annotations. Now it is working fine.
Do you want to keep |
Yes, we need to keep ZipkinSender, some people are using jaeger-client with Zipkin backend. I don't think sender interface should be generic, the reporter only has a Span object, and currently reported invokes the transformation from Span to zipkin.thrift. Instead that transformation should be encapsulated inside the Sender, since reporter shouldn't care about the wire format used by the sender. It looks like the Span class already has public visibility on getters so that the senders can transform it without being in the same package. Another possible change here is to use the approach lightstep used in their Go client, where the Span contains another class RawSpan, which is unsynchronized, and the interface between reporter and senders, or even between tracer and reporters, is the RawSpan - unsynchronized & immutable. Tracer can create it just before calling Reporter.report(), to reduce the number of synchronized blocks to one (instead of one per every getter). |
Thanks, I will convert What about my question about shared span model in RPC? |
Jaeger model does not reuse span ID between RPC parent / child, each span should get a unique ID. |
40dc159
to
a7ac35a
Compare
It's ready for an initial review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm
Reference preferredParent = references.isEmpty() ? null : references.get(0); | ||
for (Reference reference: references) { | ||
if (References.CHILD_OF.equals(reference.getType()) && | ||
References.FOLLOWS_FROM.equals(preferredParent.getType())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to say !References.CHILD_OF.equals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine alternative, anyways is is currently allowing only child and follows.
context = createNewContext(null); | ||
} else if (parent.isDebugIDContainerOnly()) { | ||
context = createNewContext(parent.getDebugID()); | ||
} else if (preferredParent.referencedContext().isDebugIDContainerOnly()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, I guess we never implemented the ability to set baggage externally, as in Go client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm... could you explain what is wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's fine, just a note to myself. We want to be able to parse baggage from the request even when there's no trace context. We do it in Go client, but not in Java. I will do a separate pr for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offtopic: What is the use case for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #146
if (parent == null || isRPCServer()) { | ||
// add tracer tags only to first span in the process | ||
// add tracer tags to zipkin first span in process | ||
if (zipkinSharedRPCSpan && (preferredParent == null || isRPCServer())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like an overloading of this boolean flag. I think it's better to add tracer tags in the Sender, since it's specific to Zipkin. But if it creates too much complexity, I'm ok to leave it here, maybe add a TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally I had it there, but I had to tweak zipkin thritf convertor and it did not look right, this was cleaner.
@@ -21,6 +21,11 @@ | |||
*/ | |||
package com.uber.jaeger.reporters.protocols; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this move to jaeger-zipkin
module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I wanted to propose the same.
@@ -21,11 +21,13 @@ | |||
*/ | |||
package com.uber.jaeger.senders; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh, no idea why we need this guy. Can only imagine needing it to test RemoteReporter, but then it should be in tests-only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to tests
import lombok.ToString; | ||
|
||
@ToString(exclude = {"spanBuffer", "udpClient", "memoryTransport"}) | ||
public class UDPSender implements Sender { | ||
final static int emitZipkinBatchOverhead = 22; | ||
final static int emitBatchOverhead = 22; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be a different number, because it includes the name of the struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that is 56 but it should be verified by someone.
@@ -85,14 +92,20 @@ int getSizeOfSerializedSpan(Span span) throws SenderException { | |||
*/ | |||
@Override | |||
public int append(Span span) throws SenderException { | |||
int spanSize = getSizeOfSerializedSpan(span); | |||
if (process == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: as far as I can tell, it's not used until the call to emitBatch, so I would (a) move this to a function and (b) inline it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emitBatch()
is in Sender.flush()
. Problem is that flush operates on thrift spans and I need access to tracer via Jaeger span.
Tracer tags should not change over time, therefore null check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized, there's another issue here. The previous Zipkin based code was able to compute the cumulative size of the message incrementally by sizing each span and adding to the overhead constant. But in jaeger.thrift model we have another variable, the process tags. Good news is we do not allow process tags to be changed post-construction, so we can pre-calculate their thrift size. But the overall calculation is getting more complex because Compact encoding uses varint encoding for things like the size of collections. So we need to be careful with the max-overhead constant. Ideally there should be a unit test to ensure that the final serialized buffer size does not exceed 65000, or it may be dropped by the agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see 5c65b9b
@@ -85,7 +87,7 @@ public void tearDown() throws Exception { | |||
reporter.close(); | |||
} | |||
|
|||
@Test(expected = SenderException.class) | |||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why doesn't this throw exception anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to move jaegerSpan.log(msg, new Object());
into for cycle to thow it.
assertEquals(context.getParentID(), actualSpan.getParent_id()); | ||
assertEquals(expectedSpan.getOperationName(), actualSpan.getName()); | ||
com.uber.jaeger.thriftjava.Span actualSpan = spans.get(0); | ||
assertEquals(expectedSpan.context().getTraceID(), actualSpan.getTraceIdLow()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a check for TraceIdHigh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added assertEquals(0, actualSpan.getTraceIdHigh());
@@ -97,8 +101,8 @@ public void filter() throws Exception { | |||
Span span = spans.get(0); | |||
Map<String, Object> tags = span.getTags(); | |||
|
|||
assertTrue(span.isRPC() && !span.isRPCClient()); | |||
assertEquals(source, span.getPeer().getService_name()); | |||
assertTrue(ThriftSpanConverter.isRPC(span) && !ThriftSpanConverter.isRPCClient(span)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it's worth leaving these helper methods on the Span, they are kind of used all over the place.
(sigh, I miss Scala)
Can always write tests in any language.. even kotlin! :D
|
This should be ready for the final review. |
*/ | ||
public class Reference { | ||
|
||
private final SpanContext spanContext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that we use lombok annotations to cut out boiler plate on new classes.
See https://projectlombok.org/features/Value.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I have used @Getter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use @Value
and remove the constructor as well (we also gain a toString/Hashcode/Equals by doing that) .
@@ -21,6 +21,17 @@ | |||
*/ | |||
package com.uber.jaeger; | |||
|
|||
import java.io.InputStream; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that you don't reorder diffs in this commit. If it's difficult to undo, we can let it be and address later with #147
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vprithvi I have imported https://raw.githubusercontent.com/google/styleguide/gh-pages/intellij-java-google-style.xml and run optimize imports on the whole project. It produces a massive diff https://paste.fedoraproject.org/paste/ZSJs8G3jE~VExhSMKHwNuF5M1UNdIGYhyRLivL9gydE=.
It is inconvenient to go through 22 files and change that manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #148
assertTrue(span.isRPC() && !span.isRPCClient()); | ||
assertEquals(source, span.getPeer().getService_name()); | ||
assertEquals(method, span.getOperationName()); | ||
assertEquals(Tags.SPAN_KIND_SERVER, tags.get(Tags.SPAN_KIND.getKey())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add an assertion for the service name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have to change InMemoryReporter
to hold tracer tags. Better is to test in in TracerTagsTest
componentName = span.getLocalComponent().getBytes(UTF_8); | ||
Object componentTag = tags.get(Tags.COMPONENT.getKey()); | ||
if (componentTag instanceof String) { | ||
componentName = componentTag.toString().getBytes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add the encoding for getBytes()
*/ | ||
public static Endpoint extractPeerEndpoint(Map<String, Object> tags) { | ||
Object tagValue = tags.get(Tags.PEER_HOST_IPV4.getKey()); | ||
Endpoint peerEndpoint = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use zipkin.Endpoint#builder
to make this neater.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have refactored, now it looks much cleaner.
ignore, deleted. |
@yurishkuro @vprithvi what is the status of this? Can we merge this? |
Apart from that one nit, this looks good to me. |
@vprithvi removed About checkstyle: It would be easier for me to get this in and then apply checkstyle globally then do a rebase etc. |
@black-adder could you please review? The CI run looks odd to me, there's no matrix, I don't see a crossdock run, so not clear if end-to-end test has been executed. |
I'm gonna do a quick PR to add some endtoend tests for the java client. Once I land that, I'll rebase this to make sure endtoend works and then we can land it. Gimme an hour. |
4b1dbd6
to
04a1a13
Compare
rebased CI passed |
@@ -305,7 +327,8 @@ boolean isRPCServer() { | |||
startTimeMicroseconds, | |||
startTimeNanoTicks, | |||
computeDurationViaNanoTicks, | |||
tags); | |||
tags, | |||
references); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one concern I have here is that we're duplicating the chid-of reference by keeping it explicitly as well we recording it implicitly in the parent_id. Strictly speaking in OpenTracing the parent_id is redundant, but we kept it (a) for efficiency since 99% of spans have a simple child-of relationship to the parent that we can represent with a single number instead of a Reference struct, and (b) to main our legacy zipkin-style wire format that includes parent ID (even though with zipkinSharedRPCSpan=false
we completely ignore the parent_id on the wire).
On the other hand, in the UI data model we abolished the parent_id and represent everything with References, to keep it normalized. So when we keep this reference here, the code in the backend will create two identical references for the UI model (https://github.com/uber/jaeger/blob/master/model/converter/json/from_domain.go#L87).
We can either fix the backend code to dedupe those, but we could also exclude child-of reference here since it's redundant and only increases the payload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting topic, so parentId in thrift model should be set only if reference is childOf
https://github.com/uber/jaeger/blob/master/model/converter/json/from_domain.go#L82
What if there are multiple childOf
references? Which one to choose? First one/second...? Inject operation is also affected.
What if there is only one followsFrom
/whatEverFrom
. Should it be this set in thrift as parent. Should it be injected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a bit of a grey area, given that Jaeger UI doesn't support arbitrary references today. My thoughts are:
- if you have child-of within the same trace, then remove that reference and populate parent_id instead
- if you have more than one child-of, use the first for parent_id, keep others are references
- if you don't have child-of, but have follows-from in the same trace, then still populate parent_id, but keep the reference (this is a bit weird)
- if you have reference to a different trace ID, do not populate parent_id, keep it as a reference
This sounds fairly convoluted, I am open to suggestions. We can implement just the first point for now and punt on the rest, which is probably what we've done in other clients so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the parent_id
offer that much of a benefit, to warrant this processing in the client? Seems like it would be easier just to use the references.
If the parent_id
is required for the UI currently, then couldn't this logic be implemented in the server, so that once the UI no longer requires it, it is simply a case of changing the logic in one place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish we didn't have to maintain parent_id on the clients in the first place. It's only needed for compatibility with Zipkin's single span per RPC model that we still optionally support.
I think I'm ok with keeping both parent_id & child_of reference and just adding a sanitizer to the backend to normalize the representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thrift Span model would be cleaner without a direct parentId field. If the parentId
has been designed to save a little bit of payload then it should be used.
I think it should be only used to represent childOf
relationship. Otherwise, it's ambiguous.
What if we establish a convention that parentId
(in thrift span) should be used only when there is one parent as childOf
. Otherwise, use only list of references. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What traceId should be used in a newly created span when there are references from different traces?
Create new traceId as a "join" or make an assumption that childOf
is "stronger" and newly created span should inherit traceId?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If parent_id
only used to maintain compatibility with Zipkin's model (as mentioned by @yurishkuro) - then the other alternative is Zipkin's parent id is simply converted to a ChildOf
reference - then the new model would be "pure" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @yurishkuro meant only parentId
from SpanContext class which also affects context...
@@ -163,19 +172,17 @@ public void close() { | |||
|
|||
private String operationName = null; | |||
private long startTimeMicroseconds; | |||
private SpanContext parent; | |||
private List<Reference> references = new ArrayList<Reference>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
private final Map<String, Object> tags = new HashMap<String, Object>(); | ||
private final Map<String, String> baggage = new HashMap<String, String>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this introduces a couple of inefficiencies, we're always allocating two collections even though the list of references could be empty (see my comment below about child-of vs. parent_id representation), and when there is only one parent its baggage can be used as is, without copying (baggage is supposed to be immutable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS - if you prefer, we can address these in the follow-up PRs, since this one is pretty large.
udpClient = new Agent.Client(new TCompactProtocol(udpTransport)); | ||
maxSpanBytes = maxPacketSize - emitZipkinBatchOverhead; | ||
spanBuffer = new ArrayList<Span>(); | ||
udpClient = new Agent.Client(new TBinaryProtocol(udpTransport)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should continue using compact protocol. The agent has a port that accepts jaeger-compact and jaeger-binary. The Binary is only used by the Node.js client, since thriftrw we're using there does not support Compact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine, I chose binary because it was used in Node, going to switch to compact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pavolloffay could you rebase off master?
memoryTransport.reset(); | ||
try { | ||
span.write(new TCompactProtocol(memoryTransport)); | ||
span.write(new TCompactProtocol((memoryTransport))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extraneous params
static List<SpanRef> buildReferences(List<Reference> references) { | ||
List<SpanRef> thriftReferences = new ArrayList<SpanRef>(references.size()); | ||
for (Reference reference: references) { | ||
SpanRefType thriftRefType = References.CHILD_OF.equals(reference.getType()) ? SpanRefType.CHILD_OF : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would pass a spanContext and add this check, to reduce the side of the span message:
if (References.CHILD_OF.equals(reference.getType()) &&
spanContext.getTraceID() == reference.getTraceID() &&
spanContext.getParentId() == reference.getSpanID()) {
continue; // skip child_of reference that's already represented by parentID
}
@@ -85,14 +92,20 @@ int getSizeOfSerializedSpan(Span span) throws SenderException { | |||
*/ | |||
@Override | |||
public int append(Span span) throws SenderException { | |||
int spanSize = getSizeOfSerializedSpan(span); | |||
if (process == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized, there's another issue here. The previous Zipkin based code was able to compute the cumulative size of the message incrementally by sizing each span and adding to the overhead constant. But in jaeger.thrift model we have another variable, the process tags. Good news is we do not allow process tags to be changed post-construction, so we can pre-calculate their thrift size. But the overall calculation is getting more complex because Compact encoding uses varint encoding for things like the size of collections. So we need to be careful with the max-overhead constant. Ideally there should be a unit test to ensure that the final serialized buffer size does not exceed 65000, or it may be dropped by the agent.
…g, zipkin endpoint creation refactoring
11b83a8
to
5c65b9b
Compare
@black-adder is doing a similar change in Go and we want to develop a test that guarantees that the sender never overflows 65,000bytes buffer size. We will port that test to Java afterwards. |
private static final String defaultUDPSpanServerHost = "localhost"; | ||
private static final int defaultUDPSpanServerPort = 5775; | ||
private static final int defaultUDPSpanServerPort = 6832; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting... https://github.com/uber/jaeger/blob/master/cmd/agent/app/config.go#L127 shows that port 6832 is for the jaeger-binary protocol but we're using compactProtocol here. Not sure how the endtoend tests are passing.
Resolves #140