-
Notifications
You must be signed in to change notification settings - Fork 182
Conversation
semantic_conventions.md
Outdated
| `sampling.priority` | integer | If greater than 0, a hint to the Tracer to do its best to capture the trace. If 0, a hint to the trace to not-capture the trace. If absent, the Tracer should use its default sampling mechanism. | | ||
| `service` | string | The name of the current service. E.g., `"elasticsearch"`, `"a_custom_microservice"`, `"memcache"`. Should correspond with values set in `peer.service`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to have a description of the use case, such as a service mesh creating spans on behalf of different services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a single use-case that would work well in this description. For example, with Datadog, (in addition to the use case you described) someone could set the service name via this tag that would override the default service name for a specific tag with the goal of separating out or correcting the name. Do you have a better description in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your use case is less clear to me. Why would a service override its name? In our internal wrappers around oss jaeger libs we do the opposite, ie prevent services from even configuring their name in the tracer, instead we read it from env vars set by infra, to avoid names that do not correspond to discovery names.
I think it would be good to go through an RFC for this feature and describe the use cases it is meant for and why it's needed. Otherwise people will just unintentionally abuse it, I'm afraid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are two use cases for wanting the ability to override from the globally configured name:
First: you can think of like the service mesh use case (creating a span on behalf of another service). For example when making a database call, we rename that span to correspond to the database instead of the current service.
Second: same issue applies when dealing with a java application server with multiple war files deployed onto it. Many people think of those war files as logically separate things and want them to be named differently. If the ENV configured name takes priority, there isn't a nice way to make that distinction.
What about the name and use case is unclear? Why the need for an RFC? @tedsuo thoughts?
I understand your suggestion for configuring separate tracer instances to cover some of these purposes, but that doesn't work well in auto-instrumentation land where things need to be much more dynamic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for RFC is that it serves as a formal description of the use cases that can be considered canonical. The summary table modified in this PR can refer to the RFC for more details. Otherwise defining this new tag without explaining how to use it leads to confusion of which I think your database use case is a perfect example: it's misleading to attribute the client-side span inside the application to the database server, and unnecessary since the database can and should be referenced in the peer.service
tag instead, per already existing semantic guidelines.
The multi-war file example could be a use case, but none of the existing instrumentation in opentracing/contrib behaves like that today, i.e. if I use java-grpc or java-spring-cloud or probably any other java-xxx modules from contrib, they won't know how to set the service
tag because they expect the tracer to do it. So for that multi-war example to be practical not only do we need to introduce the service
tag, but we also need to change all instrumentations in contrib to be able to take service name as an argument so that they can set this tag. I think that's much larger implications than just adding an optional tag. Personally I don't think we should be recommending this as a valid use case, since it can be much easier solved with tracer per application, even if they are deployed in a single app server (any decent app server would offer a separate class loader for each app).
So that leaves us with service mesh or similar use cases where an application is constructing spans on behalf of other applications, which is indeed a marginal use case well served by an "optional" tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely want this in order for service meshes and collection systems to trace "on behalf" of another process. There is a need to differentiate between their own tracing and the tracing they are doing on behalf of another service. I would recommend that be listed as a primary use case.
I don't think you would ever have RPC instrumentation setting this. In the case of running multiple processes/JARs/etc, it would be an application-level span that would receive this tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest focusing just on the service mesh /external reporting. The guidance should be, IMHO:
- when describing a span which is associated with an adjacent service the current service is calling, use the
peer.service
tag. - when recording a span out-of-process, on behalf of another service, use the
service
tag.
Separately, there is a larger issue around tracing stateful "things", be they called services, processes, or objects. Right now tracing is only about transactions. We consistently see code that persists and interacts with multiple traces/transactions, and it is not possible to model them effectively right now. For that issue, I think we need an RFC. And WARs, to me, would fall under that scenario.
I would recommend we separate that larger case out from the smaller (yet also necessary) case of "external tracing" such as service meshes, syslog-to-trace, etc. The service
tag is sufficient for that narrower use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro @tedsuo if you want to give me the text you want in the description column that you'd like to see, I can put it in there. I'm not clear on how to word the description so it would be useful for the service mesh use case. The description ted gave above seems too long for that compared to the other descriptions already there.
We received this requirement from our users as well: they want to override on a per span level the calling/called service name. |
@tedsuo @yurishkuro @CodingFabian I've made another pass at the description to try to incorporate the feedback. If you don't like it, please propose a better description. |
semantic_conventions.md
Outdated
@@ -31,8 +31,9 @@ Span tags apply to **the entire Span**; as such, they apply to the entire timera | |||
| `peer.ipv4` | string | Remote IPv4 address as a `.`-separated tuple. E.g., `"127.0.0.1"` | | |||
| `peer.ipv6` | string | Remote IPv6 address as a string of colon-separated 4-char hex tuples. E.g., `"2001:0db8:85a3:0000:0000:8a2e:0370:7334"` | | |||
| `peer.port` | integer | Remote port. E.g., `80` | | |||
| `peer.service` | string | Remote service name (for some unspecified definition of `"service"`). E.g., `"elasticsearch"`, `"a_custom_microservice"`, `"memcache"` | | |||
| `peer.service` | string | Remote service name (for some unspecified definition of `"service"`) being called by a client. E.g., `"elasticsearch"`, `"a_custom_microservice"`, `"memcache"`. Meaning should correspond with values set in `service`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"being called by a client" is incorrect: peer.service
tag is symmetric, used by both client and server spans to identify the remote service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair in theory, but I don't understand how that would be useful in practice. Do people write instrumentation where the server creates a span on behalf of a client like is common for the reverse?
Should I remove the being called by a client
part then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the server creates a span with span.kind=server
and peer.service={name}
if it knows the upstream caller by {name}
(it depends on the RPC protocol, or conventions within the organization, e.g. a special http header - example).
yes, "being called by a client" should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro done.
new description lgtm. |
semantic_conventions.md
Outdated
| `sampling.priority` | integer | If greater than 0, a hint to the Tracer to do its best to capture the trace. If 0, a hint to the trace to not-capture the trace. If absent, the Tracer should use its default sampling mechanism. | | ||
| `service` | string | Service name for a span which should override any default defined in tracer config for current and child spans. E.g., `"account-service"`, `"a_custom_microservice"`. Meaning should correspond with values set in `peer.service`. Note: intended use is for things like a service mesh which might have many services it is creating traces for. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"for current and child spans" - this implies that if a root span of a trace sets the service
tag, then all spans in that trace will be associated with that service. Need to find some wording that restricts the scope of this to spans created in the same process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@objectiser I understand the concern. What do you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"current and child spans (reasonably considered to be within the scope of that service)" ?
Not sure - was wondering whether something based on the tracer, or within the current process - but couldn't find suitable phrase. So the above is vague enough so could mean anything but at the same time make it clear it does not mean all child spans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@objectiser any other suggestions?
semantic_conventions.md
Outdated
| `sampling.priority` | integer | If greater than 0, a hint to the Tracer to do its best to capture the trace. If 0, a hint to the trace to not-capture the trace. If absent, the Tracer should use its default sampling mechanism. | | ||
| `service` | string | Service name for a span which should override any default defined in tracer config for current and child spans (reasonably considered to be within the scope of that service). E.g., `"account-service"`, `"a_custom_microservice"`. Meaning should correspond with values set in `peer.service`. Note: intended use is for things like a service mesh which might have many services it is creating traces for. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reasonably considered to be within the scope of that service
Is it really necessary? If say a service mesh process is creating spans on behalf of a service, why can't it set the service
tag on all spans it creates?
Another example is a normal service that creates most of the spans for itself, but the very first span is created on behalf of the caller (for whatever reason, the caller couldn't do it). With the above wording that first span will cause all spans in that service to be attributed to the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is changed so the tracer does not take responsibility for setting service on child spans, that would also be fine - although I'm guessing that was added to overcome the issue where all instrumentations would need to know the service and added it explicitly.
Current services definitionService name for a span which should override any default defined in tracer config for current and child spans (reasonably considered to be within the scope of that service). E.g., "account-service", "a_custom_microservice". Meaning should correspond with values set in peer.service. Note: intended use is for things like a service mesh which might have many services it is creating traces for. CommentsVery happy to see agreement that we should have this, and I'd like to get it merged soon. I only have one real piece of feedback: I don't think we should specify that child spans automatically have this tag applied. In practice, I have found that external reporters will be applying this tag to every span they are reporting. Requiring tracers to track parentage internally would require implementation changes, which would be expensive and unnecessary for the stated use case. It also leads to odd edge cases: if you try to define Suggested rewordingThe service name for a span, which overrides any default "service name" property defined in a tracer's config. The meaning of |
@tedsuo +1, this is the same concern as mine in #119 (comment) |
0aba35e
to
1b3c89e
Compare
@tedsuo @yurishkuro in the interest of getting this in soon, I've adopted the changes suggested by Ted. I expect Datadog will continue supporting the cascading behavior, but I don't think that is incompatible with the spec or the way I expect users to utilize it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tylerbenson thanks! I think, in practice, there's nothing "incompatible" or wrong with having that cascading behavior, it just shouldn't be a guarantee made by the spec at this time.
I believe we've reached consensus on this initial set of usecases and requirements, so I'm going to merge. In the future, if we would like to extend the set of use-cases to include something that requires cascading behavior - such as "multi-service processes" - we can revisit this tag in a new issue. Thanks, y'all! |
@tedsuo is the next step to add this tag to the java repo? |
This reverts commit f7ca62c.
This reverts commit f7ca62c.
This reverts commit f7ca62c.
Per discussion in #77, service name is deemed a widely enough used concept to warrant adoption by the community.
(Closes #77)