Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration context spec #335

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

breedx-splk
Copy link
Contributor

In order to facilitate better interoperability between AppDynamics and Splunk O11y Cloud, we need to pass some additional context. This is the first step in specifying which fields the splunk-otel side of things will need to consume and to generate.

The AppD side is intended to receive a follow-up PR (although one could make the case that the AppD side shouldn't be spec'd here....I'm open to ideas).

@breedx-splk breedx-splk requested review from a team as code owners February 11, 2025 00:56

## Outgoing State

When `cisco.tracestate.enabled` is `true`, Splunk implementations MUST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppD instrumented App --- passing tracestate ---> Splunk/OTel instrumented App ----- Should we also pass received tracestate from the parent AppD app?-----> AppD instrumented App

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we've drifted away from tracestate now for these, but your question stands....do we need some of the incoming state propagated to the outgoing state? I think the answer to this likely depends on how sophisticated the backend integrations will be and how deep the links may go. We should find out.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the BT ID, which could/should be added to baggage in a clean way. Everything else (as far as I know) is meant to be one-hop and interpreted as such.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnbley, do you suggest to put BT.ID in baggage and all other stuff in the header way?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make the most sense to me unless there's something about bt.id I don't understand,.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting some parts over here and some parts over there makes it slightly more difficult to wrangle. So far, it feels fairly simple/straightforward to me. I don't feel super strongly about it.

Co-authored-by: Piotr Kiełkowicz <pkiekowicz@splunk.com>
@github-actions github-actions bot locked and limited conversation to collaborators Feb 11, 2025
@breedx-splk
Copy link
Contributor Author

Going to continue working on this now, without using tracestate or w3c, but instead some custom headers...

@breedx-splk breedx-splk reopened this Feb 17, 2025
@breedx-splk breedx-splk marked this pull request as draft February 17, 2025 22:53
@breedx-splk breedx-splk marked this pull request as ready for review February 18, 2025 00:18
@breedx-splk breedx-splk changed the title Add tracestate spec Add integration context spec Feb 18, 2025
Copy link
Contributor

@Kielek Kielek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you considered W3C baggage?

I didn't have time to full analysis, but it is something we should consider.

Comment on lines 57 to 61
When `cisco.ctx.enabled` is `true`, Splunk implementations MUST
extract fields from the `cisco-ctx-*` headers (above) and add extra
attributes to any Spans created as part of the incoming request context.
Null or missing values MUST be handled gracefully by simply
omitting the span attributes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merging, we should create POC for this.
I do no like the idea of adding new attributes to every span. Why not just to the first span in the application? Other calculation could be done by the standard trace-id, span-id and the hierarchy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs to be on every span for that request. It is more concise that way.

You also have to allow for a scenario where a call enters the app, leaves the app, and comes back in from a different tier. Multiple tiers possibly for the same trace from different tiers.

You said that you don't like adding attributes, but you didn't say why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just duplicating them, I think that they can be inherited/detected by backends. No strong opinion here.

@breedx-splk
Copy link
Contributor Author

Did you considered W3C baggage?

Yes. And am confident that it would work, especially since mutations are allowed. I'm not sure how easy/sophisticated the baggage handling capabilities are in the various languages though.

I would be happy to convert these headers to baggage keys. What do other folks think?

@johnbley
Copy link

Did you considered W3C baggage?

Yes. And am confident that it would work, especially since mutations are allowed. I'm not sure how easy/sophisticated the baggage handling capabilities are in the various languages though.

I would be happy to convert these headers to baggage keys. What do other folks think?

Unless I misunderstand something, I'm against using baggage for this because baggage is meant to be trace-wide and we'd have to write logic to counteract that. Yes, it is possible, but it's much easier (less code, less compexity, fewer chances for errors or misattribution) when it is a plain non-propagated header.

@breedx-splk
Copy link
Contributor Author

Did you considered W3C baggage?
baggage is meant to be trace-wide and we'd have to write logic to counteract that. Yes, it is possible, but it's much easier (less code, less compexity, fewer chances for errors or misattribution) when it is a plain non-propagated header.

Right, we'd also need to make sure that we remove some fields at each hop, because the default behavior is to propagate, which is not necessarily what we want/need here. Given that the intent of baggage is to propagate, I think it suggests that our use case is slightly different.

Thanks @johnbley .

@breedx-splk
Copy link
Contributor Author

Prototype here signalfx/splunk-otel-java#2198

Copy link
Contributor

@pellared pellared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. I have not managed to review everything, but looks good in general.

* `cisco-ctx-service` - Contains the [service.name](https://opentelemetry.io/docs/specs/semconv/resource/#service)
resource value from an OpenTelemetry based component.

HTTP headers are capable of being multivalued. As such, implementations
Copy link
Contributor

@pellared pellared Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also refer to https://datatracker.ietf.org/doc/html/rfc7230#section-3.2.2:

A sender MUST NOT generate multiple header fields with the same field name in a message unless either the entire field value for that header field is defined as a comma-separated list [i.e., #(values)] or the header field is a well-known exception (as noted below).

generate them. These headers SHOULD be treated as opaque values of type
string.

* `cisco-ctx-acct-id` - Contains the ID of the AppDynamics account.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe AppD prefix as it feels more unique than cisco?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I like about cisco is that we are able to use the same prefix for data generated in AppD components and data generated in splunk-otel components. With this approach, the intention is to be able to say "Oh, I see this cisco-ctx-* header, I know exactly why it exists -- to facilitate integrated product experience.

Comment on lines +26 to +34
* `cisco-ctx-acct-id` - Contains the ID of the AppDynamics account.
* `cisco-ctx-app-id` - Contains the ID of the AppDynamics application.
* `cisco-ctx-tier-id` - Contains the ID of the AppDynamics tier.
* `cisco-ctx-bt-id` - Contains the ID of the AppDynamics business transaction (BT).
* `cisco-ctx-env` - Contains
the [deployment.environment.name](https://opentelemetry.io/docs/specs/semconv/attributes-registry/deployment/)
resource value from an OpenTelemetry based component.
* `cisco-ctx-service` - Contains the [service.name](https://opentelemetry.io/docs/specs/semconv/resource/#service)
resource value from an OpenTelemetry based component.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we use the "standardized/canonical" way of defining headers

Suggested change
* `cisco-ctx-acct-id` - Contains the ID of the AppDynamics account.
* `cisco-ctx-app-id` - Contains the ID of the AppDynamics application.
* `cisco-ctx-tier-id` - Contains the ID of the AppDynamics tier.
* `cisco-ctx-bt-id` - Contains the ID of the AppDynamics business transaction (BT).
* `cisco-ctx-env` - Contains
the [deployment.environment.name](https://opentelemetry.io/docs/specs/semconv/attributes-registry/deployment/)
resource value from an OpenTelemetry based component.
* `cisco-ctx-service` - Contains the [service.name](https://opentelemetry.io/docs/specs/semconv/resource/#service)
resource value from an OpenTelemetry based component.
* `Cisco-Ctx-Acc-ID` - Contains the ID of the AppDynamics account.
* `Cisco-Ctx-App-ID` - Contains the ID of the AppDynamics application.
* `Cisco-Ctx-Tier-ID` - Contains the ID of the AppDynamics tier.
* `Cisco-Ctx-BT-ID` - Contains the ID of the AppDynamics business transaction (BT).
* `Cisco-Ctx-Env` - Contains
the [deployment.environment.name](https://opentelemetry.io/docs/specs/semconv/attributes-registry/deployment/)
resource value from an OpenTelemetry based component.
* `Cisco-Ctx-Service` - Contains the [service.name](https://opentelemetry.io/docs/specs/semconv/resource/#service)
resource value from an OpenTelemetry based component.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matter much -- but my purely aesthetic preference is to use lowercase. Most libraries/frameworks standardize on lowercase anyway I believe.

Copy link
Contributor

@pellared pellared Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most libraries/frameworks standardize on lowercase anyway I believe.

I am not sure. At least, this is not true for Go

I don't think it matter much

It does not, but this is how most RFC and docs like https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers follow. But maybe it just me who is not used to lowercase header field names. I won't fight for it, but sometime the difference in "casing" makes it easier for me to "pattern-match" the "type". E.g. when I see SPLUNK_IS_COOL I see an env var 😆

Feel free to do whatever you want. I do not want to end up with an academical discussion 😉

Copy link
Contributor

@Kielek Kielek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 57 to 61
When `cisco.ctx.enabled` is `true`, Splunk implementations MUST
extract fields from the `cisco-ctx-*` headers (above) and add extra
attributes to any Spans created as part of the incoming request context.
Null or missing values MUST be handled gracefully by simply
omitting the span attributes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just duplicating them, I think that they can be inherited/detected by backends. No strong opinion here.

Comment on lines +23 to +24
generate them. These headers SHOULD be treated as opaque values of type
string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What means opaque here?
Should internal.public API implementing it treat it as an additional class with string value field, liek the Baggage + metadata?

Copy link
Contributor

@pellared pellared Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence should be indeed clarified. There is too much left for interpretation.

I think the intention is to have a dedicated opaque type for each header.
Thanks to it in future you can add funtionality to use cisco-ctx-app-id as int and leave other headers intact.

From https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/baggage/api.md#set-value

Metadata [..] This should be an opaque wrapper for a string with no semantic meaning. Left opaque to allow for future functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me opaque here means that agents should not attempt to parse or otherwise derive a meaning for these headers, they are just values as far as agent is concerned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me opaque here means that agents should not attempt to parse or otherwise derive a meaning for these headers, they are just values as far as agent is concerned

@laurit is exactly right -- the intent is that it's of type string and that it shouldn't need to be parsed. I think the use of "opaque" to mean this is common.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intention is to have a dedicated opaque type for each header.
Thanks to it in future you can add funtionality to use cisco-ctx-app-id as int and leave other headers intact.

No, I don't want any of this spec'd as anything other than a string.

Copy link
Contributor

@pellared pellared Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the intent is that it's of type string and that it shouldn't need to be parsed. I think the use of "opaque" to mean this is common.

I strongly disagree. Opaque is more like encapsulation and information hiding.

I just found https://en.wikipedia.org/wiki/Opaque_data_type to back up my understanding.

Also from Oxford Dictionary:

opaque - (adj.) not able to be seen through; not transparent.

No, I don't want any of this spec'd as anything other than a string.

Then let's spec it as "of string type" or "underlying type of string".

"underlying type" is like a "super-set" of a type in some languages. E.g. an enum in .NET has int as underlying type by default. In Go you can have underlying types for any type e.g. type CiscoAppID string.

HTTP headers are capable of being multivalued. As such, implementations
SHOULD use the _last_ value when the above headers contain multiple values.

## Splunk OpenTelemetry distributions
Copy link
Contributor

@pellared pellared Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to describe how this can be implemented using OTel components (via propagator and processor) based on signalfx/splunk-otel-java#2198.

Do I understand correctly that the propagator should be applied only in communication between Splunk and AppD?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that the propagator should be applied only in communication between Splunk and AppD?

How would one tell that context is being propagated to an app that is using appd? There is an open spec issue for controlling context propagation boundary. I hope that none of these headers are considered sensitive as limiting where they are propagated will be complicated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laurit raises a good point that a customer might want the control of this behavior to be (optionally) more granular - e.g. cisco.ctx.enabledwould be the preferred way, but also cisco.ctx.enabled.inbound=true/false and cisco.ctx.enabled.outbound as more specific toggles. (If so, further spec issue to settle - what if the "combination" enabled is set and a more specific one is also set - which takes precedence?)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For outbound injection, instrumentation is obviously not going to be able to tell what it's talking to. For security/leakage concerns I've suggested more granular behavior controls, but the fallback could also be telling the user to configure a network proxy/firewall/etc. for their needs.

This feature (appd<->splunk context sharing) needs to be off by default in splunk-otel per PM, who I'm sure would be happy to share reasoning in separate channels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that the propagator should be applied only in communication between Splunk and AppD?

No, this was not the intention. What phrasing leads you to believe this?

How would one tell that context is being propagated to an app that is using appd?

You can't and shouldn't need to. The headers shouldn't contain sensitive information. Users who choose to put sensitive information in any of these fields (service name, environment, appd IDs, whatever) shouldn't use this feature.

Copy link
Contributor

@pellared pellared Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What phrasing leads you to believe this?

This:

See integration_context.md for specifics about
exchanging additional context between AppD and splunk-otel based agents.

Threat 1: Information disclosure
If I understand correctly the current design sends splunk-otel context to everyone (via outgoing requests).

Thread 2: Spoofing, Tampering?
The other problem is that the incoming requests are also not verifying who send the data (if this is actually coming from AppD). A malicious actor can send cisco-ctx-acct-id headers which would be stored in spans.

Are we planning to do a Thread Modeling and store somewhere its outcomes?

Given it is an opt-in feature I think that documenting the side-effects for the users might be good enough...

@signalfx signalfx unlocked this conversation Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants