-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lack of clarity on how the trace fields are supposed to be used #998
Comments
Thanks @nikclayton-dfinity for the feedback. The detailed issue write-up is greatly appreciated. @felixbarny @axw would either of you be able to help answer the APM-related questions concerning the |
Yes the fields in each field set are sorted alphabetical, but agree we can improve some of the supporting documentation to make usage clearer. |
@nikclayton-dfinity apologies for the lack of clarity. As you have discovered, we have only added a very small subset of tracing fields to ECS. We added these primarily to document how correlation across traces and logs should work. Excuses over, I'll try to clarify :)
You understand correctly.
Although spans are logically scoped to transactions, span IDs must be unique within a trace. To uniquely identify any transaction or span, you must consider both
No, all events related to the same original client request will share the same trace ID. The span ID corresponding to the Load Balancer's outgoing request to the Authorization Server will be used as the parent ID for the transaction recorded by the Authorization Server. I'll illustrate what the events would look like with your example:
Following your description, it sounds like
So we end up with the following:
So I'll try to summarise:
|
Thanks @axw for jumping in and helping provide clarity!
@axw - |
Initially at least the goal of the tracing section in ECS was to explain correlation across types of data (namely traces and logs). For correlating traces and logs you would be using For a somewhat more comprehensive explanation of the fields, https://www.elastic.co/guide/en/apm/get-started/current/transaction-spans.html is the source of truth. I'm not against adding more of the tracing fields to ECS, but I'd want to know what problem we're solving with that. |
Thanks for the detailed answer, @axw!
A good question. Actually I'd like to understand how users can leverage these fields in custom data sources. Since @felixbarny contributed the first two fields here, perhaps he can chime in as well. Were these fields added purely for documentation purposes of important APM fields? Or is it possible e.g. for someone to carry on with tagging events with these identifiers in custom sources for them to bubble up in APM to help complete a trace? If it's possible for users to do so, is there APM docs we could link to from here, to help explain the process and the possibilities? |
The fields we have here were added to explain how to enable trace/log correlation. i.e. if you want to correlate logs with traces, then you should include
We document all fields in https://www.elastic.co/guide/en/apm/server/current/exported-fields.html, however the documentation isn't likely to be very helpful for implementing an agent. For non-agent custom data sources which produce Elasticsearch docs directly, we don't have a good reference guide; but these are also exceedingly rare. For developing an agent for Elastic APM the fields aren't particularly relevant. What matters most is the protocol between the agent and APM Server. For developing agents we have https://github.com/elastic/apm/tree/master/specs/agents, which references https://www.elastic.co/guide/en/apm/server/current/intake-api.html for the protocol. |
If there's collective agreement, I'd like to adjust the current description of the |
Thanks for the details @axw.
This is very helpful context. If it's not a first class workflow, we can say so. |
@ebeahan sounds good. Do you have something in mind already? Maybe some kind of disclaimer at the top that this documentation is intended for log correlation, with pointers to the other docs for other use cases? |
We actually have some flexibility to document this in ECS, if needed. Since #988, a given field set can have a whole free form documentation page that accompanies it. So we can do more than just a quick warning. We could go into some details in free form asciidoc, and yes of course defer and link to APM docs as well (we don't want to repeat everything). If you want to see an example, this draft PR #1066 is the first to add such a docs page, in this case to the "user" fields. You can see the "usage" subsection in the sidebar, and there's also a call out to it at the top of the normal "user page". |
@axw Just went through this issue again, and I'd like to bring this to something actionable. The main purpose of having these fields in ECS is to help folks correlate logs around an APM-instrumented app with the events generated by APM. The simple case here is simply tagging raw logs of the main app (e.g. customizing Rails logs) with these 3 fields.
If users are looking to build an APM agent, that's a whole different endeavour, and we will 100% defer to the documentation you linked to above. Should users feel free to use these fields when doing distributed tracing ad hoc, without Elastic APM? |
You could, but the more common thing to do here would be to use distributed tracing to continue a trace in the downstream microservice/subsystem. Traces to go between services, and then correlation between the traces and logs of the same service.
Yes.
Not currently. In the future we intend to have an embedded Logs viewer right in the APM UI: elastic/kibana#79995
How it works depends on the agent. Each APM agent has its own "Log correlation" page. The Java agent provides a config variable to inject the IDs, and then if you use an ECS logger they'll get added to your log records automatically: https://www.elastic.co/guide/en/apm/agent/java/current/log-correlation.html#log-correlation-enable The Go agent provides integrations with several popular logging libraries to grab trace IDs from the current trace context: https://www.elastic.co/guide/en/apm/agent/go/current/supported-tech.html#supported-tech-logging
There's no harm in that, but no great benefit either. It wouldn't be enough to have the traces show up in the APM app in Kibana. |
I'd like to see parent.id, or probably parentSpan.id could be a clearer name, added to ECS. The values for trace.id, span.id, and parent.id allow, in decreasing order of importance, correlation of all log messages within one logical operation (trace.id), the log messages within one subsection of the operation (span.id), and show the hierarchical parent-child relationship between the subsections (parent.id). Where you have another source of information recording the parent-child relationship, then the data contained in parent.id becomes redundant... in fact if you log multiple messages the values are redundant (a particular span.id always has the same parent.id). However, it then means you need to merge in that other data set in order to have the needed information. Having the parent.id in each record means you can determine parent-child relationships without any other data, without full APM, with only a subset of records, etc. It allows logging to be stand alone. The same is true of many other fields within ECS, for example the information about operation system, server, user, etc is all repeated, even though they will contain the same repeated information within a session. |
Let's revive this conversation. 😄 @sgryphon has kindly opened #1128 to work on expanding the @axw @felixbarny any feedback on @sgryphon's latest thoughts around adding a |
I have not changed my opinion since #998 (comment). The tracing fields in ECS were never intended to be complete for describing or reconstructing a distributed trace -- they were intended for trace/log correlation. We can change that of course, but I don't see a compelling reason to do so. I would prefer to cover them in the Elastic APM docs. |
@axw one scenario would be for use of Elasticsearch for logging without using Elastic APM. Including supporting just W3C Trace Context, which has slightly different semantics (no separate transaction). Working on one of the clients/implementations (dotnet), it seems strange to map the trace and span parts, yet not map the parent. This is not using APM, just using the Trace Context support in .NET. There is also clearly some support, with at least one other user @alankis commenting on the PR. It is not the end of the world; in most cases all you need is traceid, and you can infer spanid/parentid from the source systems, topology knowledge, and temporal links, i.e. system B is only called from A, so we know A is the parent of B within a trace. The additional information from parent is only relevant in highly complex systems. What a decision would mean was whether I add it to the client as "parent.id", for forward compatibility, or "Parent.id" as a custom field. |
@sgryphon Thanks for the clarification. Although I don't see
i.e. I'm not a big fan of the current name. I'd suggest adding this as a custom field which just happens to match what Elastic APM uses. |
Description of the issue:
The description of the tracing fields at https://www.elastic.co/guide/en/ecs/current/ecs-tracing.html is unclear on how they are supposed to be used.
Any additional context or examples:
The ordering of the fields on the page is alphabetical, but this presents them out of order with how they are supposed to be used.
I think the hierarchy is:
A trace contains one or more transactions, a transaction contains zero or more spans (cite for "zero or more" is https://www.elastic.co/guide/en/apm/get-started/current/transactions.html).
If this is correct then the documentation for a span, which starts "Unique identifier of the span within the scope of its trace" should probably be changed to "Unique identifier of the span within the scope of its transaction".
Linking to https://www.elastic.co/guide/en/apm/get-started/current/distributed-tracing.html from the descriptions would also be helpful.
As would an explicit description of the hierarchy in the introductory information in the page.
Uniqueness constraint is unclear
Does the "unique" in that sentence mean that span IDs should aim to be universally unique, or does it mean that a span ID only has to be unique within a single transaction, so:
and
would be OK (identical span IDs, but the transaction IDs are different) ?
How these work together in a tiered architecture is unclear
For example, suppose you have:
Client makes request which terminates at the Load Balancer.
The Load Balancer will perform a lookup request to the authorization server, then forward the request to the application server (this is a slightly contrived example), so the load balancer has to:
a. Create a new trace id (since this is "a user request handled by multiple inter-connected services")
b. Contact the authorization server
What does it need to receive in order to effectively log an ECS event?
In particular, how do we correlate the event log the Authorization server is going to generate with the span for this request that the Load Balancer generated?
Does the span id that the Load Balancer generated become the trace id that the authorization server uses?
https://www.elastic.co/guide/en/apm/get-started/current/transaction-spans.html suggests that there are supposed to
transaction.id
andparent.id
attributes, but they're not present in the schema.[I'm generating doing this in Rust, for which there isn't an APM agent yet, I want my app to emit logs that can be seamlessly ingested in to Elastic with tracing support]
The text was updated successfully, but these errors were encountered: