Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fill Datadog trace meta from Trace baggage #8

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dnlserrano
Copy link

I noticed it while using it to send some metadata to Datadog! 🙊

Am I doing something wrong? Thanks in advance!


describe "ApiServer.format/3" do
test "it fills in Datadog metadata using the trace baggage", %{trace: trace} do
span = trace.spans |> Enum.at(0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a function call when a pipeline is only one function long

@sourcelevel-bot
Copy link

Ebert has finished reviewing this Pull Request and has found:

  • 1 possible new issue (including those that may have been commented here).

You can see more details about this review at https://ebertapp.io/github/spandex-project/spandex_datadog/pulls/8.

Copy link
Member

@GregMefford GregMefford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find, thanks! I actually included the baggage field recently so we could begin to think about how to support it as a part of the OpenTracing spec, but as you found, it doesn’t do anything yet. Have you tried using this “for real” in an application yet? I’m curious what the use-case looks like at the application level. Like how are you putting things into the baggage and propagating it from one service to another?

@dnlserrano
Copy link
Author

Hey @GregMefford! I tried using it only in the context of a single app, i.e., reporting to Datadog from an app, but not fully tracing it across services (still waiting on some internal work on other services to make sense of the x-datadog-* headers 😅) It looks something like the following:

screen shot 2018-09-28 at 14 42 15

Note: Taken from the Datadog Trace page.

So the data actually ends up being tied to a specific span, not all of the spans (which is understandable given the non-mandatory way of creating and sending multiple traces with the same trace_id separately). I guess we could eventually propagate it across services using something like x-datadog-baggage, but it would be very brittle since other libs would have to use this headers as well...

I tested it in a real system, locally but using a real Datadog account! The data showed up there. 😄

I guess you can test it out if you've got some Datadog account, or I can too if you share it with me over Slack and I send an arbitrary key/value pair if you want to be sure (understandable).

The way I'm doing it is I'm grabbing the trace, changing the baggage, and updating the trace to use this new (updated) one (as per usual in the spandex core lib). I then finish the trace, and bam! Got my metadata in Datadog! ✨ 🐕

I think we should definitely add this in even if for now it only works for Spandex-emitted traces (i.e., traces sent from Elixir apps using spandex and spandex_datadog).

WDYT? ❤️

@GregMefford
Copy link
Member

I definitely agree that we should support this feature, I just want to be sure that it's clear what it's for and the tradeoffs that come with it. My understanding is that it should be used pretty sparingly because you don't often need to pass a lot of metadata between services (just directly to the back-end trace collector) and it adds that much more overhead to the HTTP call.

Here are some docs about it: http://opentracing.io/documentation/pages/instrumentation/common-use-cases.html#using-baggage--distributed-context-propagation

@zachdaniel
Copy link
Member

If you are only trying to support sending arbitrary data, I think that is supported already via the 'meta' key

@GregMefford
Copy link
Member

Oh right, I meant to mention that but got side-tracked and forgot. I think you'd set something in the tags option on the span if you just want to include some metadata:

https://github.com/spandex-project/spandex/blob/master/lib/span.ex#L43

Tracer.span("some_span_name", tags: [key: "value"]) do
  # ...
end

@dnlserrano
Copy link
Author

dnlserrano commented Sep 29, 2018

Hey guys! In hindsight, I agree with you both. I should probably have used the concept of Span tags here, indeed. 🙈

We should probably look into other (official) Datadog client libraries to understand how they’re propagating the baggage kv pairs and only then think about adding this feature end-to-end.

Shall I close this PR? 😅

@GregMefford
Copy link
Member

No, I think it's worth leaving it open and getting it figured out properly, but perhaps not as urgent as you initially thought.

@dnlserrano
Copy link
Author

Is there a standard way of propagating baggage across services in OpenTracing? I can't find information about the "correct way" of doing it in the OT specification.

I've seen it implemented in:

  • the Node Jaeger client

    • comma-separated values sent in a single header (jaeger-baggage)
  • the Python Datadog client

    • multiple headers (one for each key/value pair in baggage) with same header prefix (ot-baggage-)

I honestly prefer the way Jaeger does it, but since we're talking Datadog here, maybe we should just go ahead and follow their practice of using multiple headers, as they do in the Python lib.

@zachdaniel
Copy link
Member

I don't know of any set down method, but I'd have to do some more research. Might be easier just to make it a configuration where they provide an mfa, and we can give them an mfa that will do either of those, and they could write their own if they wanted.

@dnlserrano
Copy link
Author

Sounds good. I’m a strong advocate of sane defaults though.

My rationale is twofold:

  1. Datadog has defined a way of doing it in one of their official clients (Python)

  2. Distributed tracing across different services means we’ll have different programming languages talking the same open tracing language; we could contribute to consistency (following the principle of least astonishment)

    • e.g., Python lib must be extracting baggage from HTTP headers in the inverse way they encode baggage in HTTP headers; having to prepare it to every possible client implementation would be very cumbersome (undoable, I’d even argue?)

But again, I might not be seeing the full picture!

@zachdaniel
Copy link
Member

I agree completely. I would personally be fine with either, and we should definitely provide a default, but knowing that implementations differ we should make sure its easy to plug and play, and make that clear in the documentation. I think that, if we have to choose, we may want to choose the one that is more cumbersome, which would be the csv version (specifically because commas have to be escaped). But honestly I don't mind either way, as long as its pluggable and that is visible in the docs.

@dnlserrano
Copy link
Author

Totally agree with allowing that sort of flexibility.

I don’t mean to sound pushy, but I’d only vote csv if we were having this discussion in spandex_jaeger.

Datadog clients for Python and Java (just confirmed) both use the multiple headers strategy. I think this goes to show the default in spandex_datadog should probably be the same as what other Datadog clients already do.

Thoughts?

Also hoping to hear from @GregMefford on this.

@GregMefford
Copy link
Member

I think that spandex_datadog should encode it the way the Datadog-provided libraries for other languages do it, so that if you're "a Datadog shop," your tooling will all work seamlessly. When someone gets around to supporting Jaeger, they'd write the CSV version in their adapter. That's the beauty of the way Spandex has an internal representation and a vendor-specific back-end. Spandex itself just maintains a key-value data structure and the back-end library is responsible for injecting/extracting it the way that vendor wants to do it.

As far as vendor-interoperability, that's just a sad fact right now in the industry that Spandex can't solve, other than supporting multiple back-ends and perhaps having a per-endpoint configuration for outgoing requests (i.e. so that you could receive a trace using one format and propagate it out using a different one). I think other tracing libraries would similarly "just break" if you try to plug them into each other today.

FYI, there is some ongoing work in the industry (linked below) to solve this problem because even the vendors can see that this is the wrong place to differentiate - it's better to interoperate seamlessly on the collection side and differentiate on the visualization side.

https://w3c.github.io/distributed-tracing/report-trace-context.html
https://w3c.github.io/distributed-tracing/report-correlation-context.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants