-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add integration context spec #335
base: main
Are you sure you want to change the base?
Conversation
specification/tracestate.md
Outdated
|
||
## Outgoing State | ||
|
||
When `cisco.tracestate.enabled` is `true`, Splunk implementations MUST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AppD instrumented App --- passing tracestate
---> Splunk/OTel instrumented App ----- Should we also pass received tracestate
from the parent AppD app?-----> AppD instrumented App
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we've drifted away from tracestate
now for these, but your question stands....do we need some of the incoming state propagated to the outgoing state? I think the answer to this likely depends on how sophisticated the backend integrations will be and how deep the links may go. We should find out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the BT ID, which could/should be added to baggage in a clean way. Everything else (as far as I know) is meant to be one-hop and interpreted as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnbley, do you suggest to put BT.ID in baggage and all other stuff in the header way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would make the most sense to me unless there's something about bt.id I don't understand,.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting some parts over here and some parts over there makes it slightly more difficult to wrangle. So far, it feels fairly simple/straightforward to me. I don't feel super strongly about it.
Co-authored-by: Piotr Kiełkowicz <pkiekowicz@splunk.com>
Going to continue working on this now, without using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you considered W3C baggage?
I didn't have time to full analysis, but it is something we should consider.
specification/integration_context.md
Outdated
When `cisco.ctx.enabled` is `true`, Splunk implementations MUST | ||
extract fields from the `cisco-ctx-*` headers (above) and add extra | ||
attributes to any Spans created as part of the incoming request context. | ||
Null or missing values MUST be handled gracefully by simply | ||
omitting the span attributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before merging, we should create POC for this.
I do no like the idea of adding new attributes to every span. Why not just to the first span in the application? Other calculation could be done by the standard trace-id, span-id and the hierarchy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it needs to be on every span for that request. It is more concise that way.
You also have to allow for a scenario where a call enters the app, leaves the app, and comes back in from a different tier. Multiple tiers possibly for the same trace from different tiers.
You said that you don't like adding attributes, but you didn't say why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just duplicating them, I think that they can be inherited/detected by backends. No strong opinion here.
Yes. And am confident that it would work, especially since mutations are allowed. I'm not sure how easy/sophisticated the baggage handling capabilities are in the various languages though. I would be happy to convert these headers to baggage keys. What do other folks think? |
Co-authored-by: Piotr Kiełkowicz <pkiekowicz@splunk.com>
Unless I misunderstand something, I'm against using baggage for this because baggage is meant to be trace-wide and we'd have to write logic to counteract that. Yes, it is possible, but it's much easier (less code, less compexity, fewer chances for errors or misattribution) when it is a plain non-propagated header. |
Right, we'd also need to make sure that we remove some fields at each hop, because the default behavior is to propagate, which is not necessarily what we want/need here. Given that the intent of baggage is to propagate, I think it suggests that our use case is slightly different. Thanks @johnbley . |
Prototype here signalfx/splunk-otel-java#2198 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments. I have not managed to review everything, but looks good in general.
* `cisco-ctx-service` - Contains the [service.name](https://opentelemetry.io/docs/specs/semconv/resource/#service) | ||
resource value from an OpenTelemetry based component. | ||
|
||
HTTP headers are capable of being multivalued. As such, implementations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also refer to https://datatracker.ietf.org/doc/html/rfc7230#section-3.2.2:
A sender MUST NOT generate multiple header fields with the same field name in a message unless either the entire field value for that header field is defined as a comma-separated list [i.e., #(values)] or the header field is a well-known exception (as noted below).
generate them. These headers SHOULD be treated as opaque values of type | ||
string. | ||
|
||
* `cisco-ctx-acct-id` - Contains the ID of the AppDynamics account. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe AppD
prefix as it feels more unique than cisco
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I like about cisco
is that we are able to use the same prefix for data generated in AppD components and data generated in splunk-otel components. With this approach, the intention is to be able to say "Oh, I see this cisco-ctx-*
header, I know exactly why it exists -- to facilitate integrated product experience.
* `cisco-ctx-acct-id` - Contains the ID of the AppDynamics account. | ||
* `cisco-ctx-app-id` - Contains the ID of the AppDynamics application. | ||
* `cisco-ctx-tier-id` - Contains the ID of the AppDynamics tier. | ||
* `cisco-ctx-bt-id` - Contains the ID of the AppDynamics business transaction (BT). | ||
* `cisco-ctx-env` - Contains | ||
the [deployment.environment.name](https://opentelemetry.io/docs/specs/semconv/attributes-registry/deployment/) | ||
resource value from an OpenTelemetry based component. | ||
* `cisco-ctx-service` - Contains the [service.name](https://opentelemetry.io/docs/specs/semconv/resource/#service) | ||
resource value from an OpenTelemetry based component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use the "standardized/canonical" way of defining headers
* `cisco-ctx-acct-id` - Contains the ID of the AppDynamics account. | |
* `cisco-ctx-app-id` - Contains the ID of the AppDynamics application. | |
* `cisco-ctx-tier-id` - Contains the ID of the AppDynamics tier. | |
* `cisco-ctx-bt-id` - Contains the ID of the AppDynamics business transaction (BT). | |
* `cisco-ctx-env` - Contains | |
the [deployment.environment.name](https://opentelemetry.io/docs/specs/semconv/attributes-registry/deployment/) | |
resource value from an OpenTelemetry based component. | |
* `cisco-ctx-service` - Contains the [service.name](https://opentelemetry.io/docs/specs/semconv/resource/#service) | |
resource value from an OpenTelemetry based component. | |
* `Cisco-Ctx-Acc-ID` - Contains the ID of the AppDynamics account. | |
* `Cisco-Ctx-App-ID` - Contains the ID of the AppDynamics application. | |
* `Cisco-Ctx-Tier-ID` - Contains the ID of the AppDynamics tier. | |
* `Cisco-Ctx-BT-ID` - Contains the ID of the AppDynamics business transaction (BT). | |
* `Cisco-Ctx-Env` - Contains | |
the [deployment.environment.name](https://opentelemetry.io/docs/specs/semconv/attributes-registry/deployment/) | |
resource value from an OpenTelemetry based component. | |
* `Cisco-Ctx-Service` - Contains the [service.name](https://opentelemetry.io/docs/specs/semconv/resource/#service) | |
resource value from an OpenTelemetry based component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it matter much -- but my purely aesthetic preference is to use lowercase. Most libraries/frameworks standardize on lowercase anyway I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most libraries/frameworks standardize on lowercase anyway I believe.
I am not sure. At least, this is not true for Go
I don't think it matter much
It does not, but this is how most RFC and docs like https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers follow. But maybe it just me who is not used to lowercase header field names. I won't fight for it, but sometime the difference in "casing" makes it easier for me to "pattern-match" the "type". E.g. when I see SPLUNK_IS_COOL
I see an env var 😆
Feel free to do whatever you want. I do not want to end up with an academical discussion 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnbley , @breedx-splk, whats about #335 (comment)?
specification/integration_context.md
Outdated
When `cisco.ctx.enabled` is `true`, Splunk implementations MUST | ||
extract fields from the `cisco-ctx-*` headers (above) and add extra | ||
attributes to any Spans created as part of the incoming request context. | ||
Null or missing values MUST be handled gracefully by simply | ||
omitting the span attributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just duplicating them, I think that they can be inherited/detected by backends. No strong opinion here.
generate them. These headers SHOULD be treated as opaque values of type | ||
string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What means opaque here?
Should internal.public API implementing it treat it as an additional class with string value field, liek the Baggage + metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence should be indeed clarified. There is too much left for interpretation.
I think the intention is to have a dedicated opaque type for each header.
Thanks to it in future you can add funtionality to use cisco-ctx-app-id
as int
and leave other headers intact.
Metadata
[..] This should be an opaque wrapper for a string with no semantic meaning. Left opaque to allow for future functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to me opaque here means that agents should not attempt to parse or otherwise derive a meaning for these headers, they are just values as far as agent is concerned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to me opaque here means that agents should not attempt to parse or otherwise derive a meaning for these headers, they are just values as far as agent is concerned
@laurit is exactly right -- the intent is that it's of type string and that it shouldn't need to be parsed. I think the use of "opaque" to mean this is common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intention is to have a dedicated opaque type for each header.
Thanks to it in future you can add funtionality to use cisco-ctx-app-id as int and leave other headers intact.
No, I don't want any of this spec'd as anything other than a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the intent is that it's of type string and that it shouldn't need to be parsed. I think the use of "opaque" to mean this is common.
I strongly disagree. Opaque is more like encapsulation and information hiding.
I just found https://en.wikipedia.org/wiki/Opaque_data_type to back up my understanding.
Also from Oxford Dictionary:
opaque - (adj.) not able to be seen through; not transparent.
No, I don't want any of this spec'd as anything other than a string.
Then let's spec it as "of string type" or "underlying type of string".
"underlying type" is like a "super-set" of a type in some languages. E.g. an enum
in .NET has int
as underlying type by default. In Go you can have underlying types for any type e.g. type CiscoAppID string
.
HTTP headers are capable of being multivalued. As such, implementations | ||
SHOULD use the _last_ value when the above headers contain multiple values. | ||
|
||
## Splunk OpenTelemetry distributions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be helpful to describe how this can be implemented using OTel components (via propagator and processor) based on signalfx/splunk-otel-java#2198.
Do I understand correctly that the propagator should be applied only in communication between Splunk and AppD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that the propagator should be applied only in communication between Splunk and AppD?
How would one tell that context is being propagated to an app that is using appd? There is an open spec issue for controlling context propagation boundary. I hope that none of these headers are considered sensitive as limiting where they are propagated will be complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@laurit raises a good point that a customer might want the control of this behavior to be (optionally) more granular - e.g. cisco.ctx.enabled
would be the preferred way, but also cisco.ctx.enabled.inbound=true/false
and cisco.ctx.enabled.outbound
as more specific toggles. (If so, further spec issue to settle - what if the "combination" enabled
is set and a more specific one is also set - which takes precedence?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For outbound injection, instrumentation is obviously not going to be able to tell what it's talking to. For security/leakage concerns I've suggested more granular behavior controls, but the fallback could also be telling the user to configure a network proxy/firewall/etc. for their needs.
This feature (appd<->splunk context sharing) needs to be off by default in splunk-otel
per PM, who I'm sure would be happy to share reasoning in separate channels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that the propagator should be applied only in communication between Splunk and AppD?
No, this was not the intention. What phrasing leads you to believe this?
How would one tell that context is being propagated to an app that is using appd?
You can't and shouldn't need to. The headers shouldn't contain sensitive information. Users who choose to put sensitive information in any of these fields (service name, environment, appd IDs, whatever) shouldn't use this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What phrasing leads you to believe this?
This:
See integration_context.md for specifics about
exchanging additional context between AppD and splunk-otel based agents.
Threat 1: Information disclosure
If I understand correctly the current design sends splunk-otel context to everyone (via outgoing requests).
Thread 2: Spoofing, Tampering?
The other problem is that the incoming requests are also not verifying who send the data (if this is actually coming from AppD). A malicious actor can send cisco-ctx-acct-id
headers which would be stored in spans.
Are we planning to do a Thread Modeling and store somewhere its outcomes?
Given it is an opt-in feature I think that documenting the side-effects for the users might be good enough...
Co-authored-by: Robert Pająk <pellared@hotmail.com>
In order to facilitate better interoperability between AppDynamics and Splunk O11y Cloud, we need to pass some additional context. This is the first step in specifying which fields the splunk-otel side of things will need to consume and to generate.
The AppD side is intended to receive a follow-up PR (although one could make the case that the AppD side shouldn't be spec'd here....I'm open to ideas).