Add tag for service #119

tylerbenson · 2018-05-30T03:58:13Z

Per discussion in #77, service name is deemed a widely enough used concept to warrant adoption by the community.

(Closes #77)

Per discussion in #77, service name is deemed a widely enough used concept to warrant adoption by the community. (Closes #77)

yurishkuro · 2018-05-31T03:12:11Z

semantic_conventions.md

 | `sampling.priority` | integer | If greater than 0, a hint to the Tracer to do its best to capture the trace. If 0, a hint to the trace to not-capture the trace. If absent, the Tracer should use its default sampling mechanism. |
+| `service` | string  | The name of the current service. E.g., `"elasticsearch"`, `"a_custom_microservice"`, `"memcache"`. Should correspond with values set in `peer.service`. |


I would prefer to have a description of the use case, such as a service mesh creating spans on behalf of different services.

I don't think there is a single use-case that would work well in this description. For example, with Datadog, (in addition to the use case you described) someone could set the service name via this tag that would override the default service name for a specific tag with the goal of separating out or correcting the name. Do you have a better description in mind?

Your use case is less clear to me. Why would a service override its name? In our internal wrappers around oss jaeger libs we do the opposite, ie prevent services from even configuring their name in the tracer, instead we read it from env vars set by infra, to avoid names that do not correspond to discovery names.

I think it would be good to go through an RFC for this feature and describe the use cases it is meant for and why it's needed. Otherwise people will just unintentionally abuse it, I'm afraid.

Here are two use cases for wanting the ability to override from the globally configured name:

First: you can think of like the service mesh use case (creating a span on behalf of another service). For example when making a database call, we rename that span to correspond to the database instead of the current service.

Second: same issue applies when dealing with a java application server with multiple war files deployed onto it. Many people think of those war files as logically separate things and want them to be named differently. If the ENV configured name takes priority, there isn't a nice way to make that distinction.

What about the name and use case is unclear? Why the need for an RFC? @tedsuo thoughts?

I understand your suggestion for configuring separate tracer instances to cover some of these purposes, but that doesn't work well in auto-instrumentation land where things need to be much more dynamic.

The reason for RFC is that it serves as a formal description of the use cases that can be considered canonical. The summary table modified in this PR can refer to the RFC for more details. Otherwise defining this new tag without explaining how to use it leads to confusion of which I think your database use case is a perfect example: it's misleading to attribute the client-side span inside the application to the database server, and unnecessary since the database can and should be referenced in the peer.service tag instead, per already existing semantic guidelines.

The multi-war file example could be a use case, but none of the existing instrumentation in opentracing/contrib behaves like that today, i.e. if I use java-grpc or java-spring-cloud or probably any other java-xxx modules from contrib, they won't know how to set the service tag because they expect the tracer to do it. So for that multi-war example to be practical not only do we need to introduce the service tag, but we also need to change all instrumentations in contrib to be able to take service name as an argument so that they can set this tag. I think that's much larger implications than just adding an optional tag. Personally I don't think we should be recommending this as a valid use case, since it can be much easier solved with tracer per application, even if they are deployed in a single app server (any decent app server would offer a separate class loader for each app).

So that leaves us with service mesh or similar use cases where an application is constructing spans on behalf of other applications, which is indeed a marginal use case well served by an "optional" tag.

We definitely want this in order for service meshes and collection systems to trace "on behalf" of another process. There is a need to differentiate between their own tracing and the tracing they are doing on behalf of another service. I would recommend that be listed as a primary use case.

I don't think you would ever have RPC instrumentation setting this. In the case of running multiple processes/JARs/etc, it would be an application-level span that would receive this tag.

I would suggest focusing just on the service mesh /external reporting. The guidance should be, IMHO:

when describing a span which is associated with an adjacent service the current service is calling, use the peer.service tag.

when recording a span out-of-process, on behalf of another service, use the service tag.

Separately, there is a larger issue around tracing stateful "things", be they called services, processes, or objects. Right now tracing is only about transactions. We consistently see code that persists and interacts with multiple traces/transactions, and it is not possible to model them effectively right now. For that issue, I think we need an RFC. And WARs, to me, would fall under that scenario.

I would recommend we separate that larger case out from the smaller (yet also necessary) case of "external tracing" such as service meshes, syslog-to-trace, etc. The service tag is sufficient for that narrower use case.

@yurishkuro @tedsuo if you want to give me the text you want in the description column that you'd like to see, I can put it in there. I'm not clear on how to word the description so it would be useful for the service mesh use case. The description ted gave above seems too long for that compared to the other descriptions already there.

CodingFabian · 2018-06-19T13:56:00Z

We received this requirement from our users as well: they want to override on a per span level the calling/called service name.
While I agree that there are other ways to do it, I can see that configuring all metadata on a span level makes sense.
Nitpicking on the description: It should be made clear that peer.service of the parent should correspont to service of the child. Currently it reads as if peer.service and service of the same span should match, which would make one redundant.

tylerbenson · 2018-06-25T05:13:01Z

@tedsuo @yurishkuro @CodingFabian I've made another pass at the description to try to incorporate the feedback. If you don't like it, please propose a better description.

yurishkuro · 2018-06-25T19:02:18Z

semantic_conventions.md

@@ -31,8 +31,9 @@ Span tags apply to **the entire Span**; as such, they apply to the entire timera
 | `peer.ipv4` | string | Remote IPv4 address as a `.`-separated tuple. E.g., `"127.0.0.1"` |
 | `peer.ipv6` | string | Remote IPv6 address as a string of colon-separated 4-char hex tuples. E.g., `"2001:0db8:85a3:0000:0000:8a2e:0370:7334"` |
 | `peer.port` | integer | Remote port. E.g., `80` |
-| `peer.service` | string | Remote service name (for some unspecified definition of `"service"`). E.g., `"elasticsearch"`, `"a_custom_microservice"`, `"memcache"` |
+| `peer.service` | string | Remote service name (for some unspecified definition of `"service"`) being called by a client. E.g., `"elasticsearch"`, `"a_custom_microservice"`, `"memcache"`. Meaning should correspond with values set in `service`. |


"being called by a client" is incorrect: peer.service tag is symmetric, used by both client and server spans to identify the remote service.

That's fair in theory, but I don't understand how that would be useful in practice. Do people write instrumentation where the server creates a span on behalf of a client like is common for the reverse?

Should I remove the being called by a client part then?

the server creates a span with span.kind=server and peer.service={name} if it knows the upstream caller by {name} (it depends on the RPC protocol, or conventions within the organization, e.g. a special http header - example).

yes, "being called by a client" should be removed.

@yurishkuro done.

yurishkuro · 2018-06-25T19:02:50Z

new description lgtm.

objectiser · 2018-07-06T08:30:21Z

semantic_conventions.md

 | `sampling.priority` | integer | If greater than 0, a hint to the Tracer to do its best to capture the trace. If 0, a hint to the trace to not-capture the trace. If absent, the Tracer should use its default sampling mechanism. |
+| `service` | string  | Service name for a span which should override any default defined in tracer config for current and child spans. E.g., `"account-service"`, `"a_custom_microservice"`. Meaning should correspond with values set in `peer.service`. Note: intended use is for things like a service mesh which might have many services it is creating traces for. |


"for current and child spans" - this implies that if a root span of a trace sets the service tag, then all spans in that trace will be associated with that service. Need to find some wording that restricts the scope of this to spans created in the same process?

@objectiser I understand the concern. What do you suggest?

"current and child spans (reasonably considered to be within the scope of that service)" ?

Not sure - was wondering whether something based on the tracer, or within the current process - but couldn't find suitable phrase. So the above is vague enough so could mean anything but at the same time make it clear it does not mean all child spans.

@objectiser any other suggestions?

yurishkuro · 2018-07-06T15:24:32Z

semantic_conventions.md

 | `sampling.priority` | integer | If greater than 0, a hint to the Tracer to do its best to capture the trace. If 0, a hint to the trace to not-capture the trace. If absent, the Tracer should use its default sampling mechanism. |
+| `service` | string  | Service name for a span which should override any default defined in tracer config for current and child spans (reasonably considered to be within the scope of that service). E.g., `"account-service"`, `"a_custom_microservice"`. Meaning should correspond with values set in `peer.service`. Note: intended use is for things like a service mesh which might have many services it is creating traces for. |


reasonably considered to be within the scope of that service

Is it really necessary? If say a service mesh process is creating spans on behalf of a service, why can't it set the service tag on all spans it creates?

Another example is a normal service that creates most of the spans for itself, but the very first span is created on behalf of the caller (for whatever reason, the caller couldn't do it). With the above wording that first span will cause all spans in that service to be attributed to the caller.

If it is changed so the tracer does not take responsibility for setting service on child spans, that would also be fine - although I'm guessing that was added to overcome the issue where all instrumentations would need to know the service and added it explicitly.

tedsuo · 2018-07-11T23:03:30Z

Current services definition

Service name for a span which should override any default defined in tracer config for current and child spans (reasonably considered to be within the scope of that service). E.g., "account-service", "a_custom_microservice". Meaning should correspond with values set in peer.service. Note: intended use is for things like a service mesh which might have many services it is creating traces for.

Comments

Very happy to see agreement that we should have this, and I'd like to get it merged soon. I only have one real piece of feedback: I don't think we should specify that child spans automatically have this tag applied. In practice, I have found that external reporters will be applying this tag to every span they are reporting. Requiring tracers to track parentage internally would require implementation changes, which would be expensive and unnecessary for the stated use case. It also leads to odd edge cases: if you try to define reasonably considered as stated above, you'll find that it may be tricky. It would be better to simply say that the tagged span will be treated as belonging to the service in the tag.

Suggested rewording

The service name for a span, which overrides any default "service name" property defined in a tracer's config. The meaning of service should correspond to the value set in peer.service, except it is applied to the current span. This tag is meant to only be used when a tracer is reporting spans on behalf of another service (for example, a service mesh reporting on behalf of the services it is proxying, or an out-of-band reporter which reads in log files). This tag does not need to be used when reporting spans for the service the tracer is running in.

yurishkuro · 2018-07-12T00:44:52Z

@tedsuo +1, this is the same concern as mine in #119 (comment)

tylerbenson · 2018-07-12T18:02:37Z

@tedsuo @yurishkuro in the interest of getting this in soon, I've adopted the changes suggested by Ted. I expect Datadog will continue supporting the cascading behavior, but I don't think that is incompatible with the spec or the way I expect users to utilize it.

tedsuo

@tylerbenson thanks! I think, in practice, there's nothing "incompatible" or wrong with having that cascading behavior, it just shouldn't be a guarantee made by the spec at this time.

tedsuo · 2018-07-12T20:18:46Z

I believe we've reached consensus on this initial set of usecases and requirements, so I'm going to merge.

In the future, if we would like to extend the set of use-cases to include something that requires cascading behavior - such as "multi-service processes" - we can revisit this tag in a new issue.

Thanks, y'all!

tylerbenson · 2018-07-12T21:08:13Z

@tedsuo is the next step to add this tag to the java repo?

This reverts commit f7ca62c.

Add tag for service

e7d3740

Per discussion in #77, service name is deemed a widely enough used concept to warrant adoption by the community. (Closes #77)

tylerbenson requested review from tedsuo and yurishkuro May 30, 2018 03:58

yurishkuro reviewed May 31, 2018

View reviewed changes

Try to incorporate feedback.

1b0aef8

yurishkuro reviewed Jun 25, 2018

View reviewed changes

Removed “being called by a client”

c18157d

yurishkuro approved these changes Jul 5, 2018

View reviewed changes

objectiser reviewed Jul 6, 2018

View reviewed changes

limit scope of service cascading

307542f

yurishkuro reviewed Jul 6, 2018

View reviewed changes

SergeyKanzhelev mentioned this pull request Jul 11, 2018

Enhance OpenCensus’ span model to include Zipkin style endpoints census-instrumentation/opencensus-specs#135

Open

Adopt description changes suggested by @tedsuo.

1b3c89e

tylerbenson force-pushed the tyler/service-tag branch from 0aba35e to 1b3c89e Compare July 12, 2018 18:01

tedsuo approved these changes Jul 12, 2018

View reviewed changes

tedsuo merged commit f7ca62c into master Jul 12, 2018

tylerbenson deleted the tyler/service-tag branch July 12, 2018 21:04

pavolloffay mentioned this pull request Jul 13, 2018

Add service tag opentracing/opentracing-java#287

Merged

codefromthecrypt mentioned this pull request Oct 10, 2018

[Feature Request] Improve detection of parent-child relationships ExpediaDotCom/haystack#537

Open

tylerbenson mentioned this pull request Oct 10, 2018

Remove service tag and replace with a similar concept on Scope. #131

Open

tylerbenson added a commit that referenced this pull request Dec 6, 2018

Revert "Add tag for service (#119)"

4d8db95

This reverts commit f7ca62c.

tylerbenson mentioned this pull request Dec 6, 2018

Revert "Add tag for service" #135

Merged

tylerbenson added a commit that referenced this pull request Dec 6, 2018

Revert "Add tag for service (#119)"

d7896c8

This reverts commit f7ca62c.

tylerbenson added a commit that referenced this pull request Dec 6, 2018

Revert "Add tag for service (#119)"

a1c3aa7

This reverts commit f7ca62c.

tylerbenson added a commit that referenced this pull request Jan 3, 2019

Revert "Add tag for service (#119)" (#135)

b9fdcdd

This reverts commit f7ca62c.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tag for service #119

Add tag for service #119

tylerbenson commented May 30, 2018

yurishkuro May 31, 2018

tylerbenson Jun 5, 2018

yurishkuro Jun 5, 2018

tylerbenson Jun 6, 2018

yurishkuro Jun 6, 2018

tedsuo Jun 14, 2018

tedsuo Jun 14, 2018

tylerbenson Jun 14, 2018 •

edited

Loading

CodingFabian commented Jun 19, 2018

tylerbenson commented Jun 25, 2018

yurishkuro Jun 25, 2018 •

edited

Loading

tylerbenson Jun 26, 2018

yurishkuro Jun 26, 2018

tylerbenson Jul 5, 2018

yurishkuro commented Jun 25, 2018

objectiser Jul 6, 2018

tylerbenson Jul 6, 2018

objectiser Jul 6, 2018

tylerbenson Jul 6, 2018

yurishkuro Jul 6, 2018

objectiser Jul 6, 2018

tedsuo commented Jul 11, 2018

yurishkuro commented Jul 12, 2018

tylerbenson commented Jul 12, 2018

tedsuo left a comment

tedsuo commented Jul 12, 2018

tylerbenson commented Jul 12, 2018

		\| `sampling.priority` \| integer \| If greater than 0, a hint to the Tracer to do its best to capture the trace. If 0, a hint to the trace to not-capture the trace. If absent, the Tracer should use its default sampling mechanism. \|
		\| `service` \| string \| The name of the current service. E.g., `"elasticsearch"`, `"a_custom_microservice"`, `"memcache"`. Should correspond with values set in `peer.service`. \|

		\| `sampling.priority` \| integer \| If greater than 0, a hint to the Tracer to do its best to capture the trace. If 0, a hint to the trace to not-capture the trace. If absent, the Tracer should use its default sampling mechanism. \|
		\| `service` \| string \| Service name for a span which should override any default defined in tracer config for current and child spans. E.g., `"account-service"`, `"a_custom_microservice"`. Meaning should correspond with values set in `peer.service`. Note: intended use is for things like a service mesh which might have many services it is creating traces for. \|

Add tag for service #119

Add tag for service #119

Conversation

tylerbenson commented May 30, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tylerbenson Jun 14, 2018 • edited Loading

Choose a reason for hiding this comment

CodingFabian commented Jun 19, 2018

tylerbenson commented Jun 25, 2018

yurishkuro Jun 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yurishkuro commented Jun 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tedsuo commented Jul 11, 2018

Current services definition

Comments

Suggested rewording

yurishkuro commented Jul 12, 2018

tylerbenson commented Jul 12, 2018

tedsuo left a comment

Choose a reason for hiding this comment

tedsuo commented Jul 12, 2018

tylerbenson commented Jul 12, 2018

tylerbenson Jun 14, 2018 •

edited

Loading

yurishkuro Jun 25, 2018 •

edited

Loading