Add integration context spec #335

breedx-splk · 2025-02-11T00:56:17Z

In order to facilitate better interoperability between AppDynamics and Splunk O11y Cloud, we need to pass some additional context. This is the first step in specifying which fields the splunk-otel side of things will need to consume and to generate.

The AppD side is intended to receive a follow-up PR (although one could make the case that the AppD side shouldn't be spec'd here....I'm open to ideas).

specification/tracestate.md

Kielek · 2025-02-11T06:01:40Z

specification/tracestate.md

+
+## Outgoing State
+
+When `cisco.tracestate.enabled` is `true`, Splunk implementations MUST


AppD instrumented App --- passing tracestate ---> Splunk/OTel instrumented App ----- Should we also pass received tracestate from the parent AppD app?-----> AppD instrumented App

Well we've drifted away from tracestate now for these, but your question stands....do we need some of the incoming state propagated to the outgoing state? I think the answer to this likely depends on how sophisticated the backend integrations will be and how deep the links may go. We should find out.

Just the BT ID, which could/should be added to baggage in a clean way. Everything else (as far as I know) is meant to be one-hop and interpreted as such.

@johnbley, do you suggest to put BT.ID in baggage and all other stuff in the header way?

That would make the most sense to me unless there's something about bt.id I don't understand,.

Putting some parts over here and some parts over there makes it slightly more difficult to wrangle. So far, it feels fairly simple/straightforward to me. I don't feel super strongly about it.

specification/behaviors.md

Co-authored-by: Piotr Kiełkowicz <pkiekowicz@splunk.com>

breedx-splk · 2025-02-17T22:47:23Z

Going to continue working on this now, without using tracestate or w3c, but instead some custom headers...

Kielek

Did you considered W3C baggage?

I didn't have time to full analysis, but it is something we should consider.

specification/integration_context.md

Kielek · 2025-02-18T06:19:54Z

specification/integration_context.md

+When `cisco.ctx.enabled` is `true`, Splunk implementations MUST
+extract fields from the `cisco-ctx-*` headers (above) and add extra
+attributes to any Spans created as part of the incoming request context.
+Null or missing values MUST be handled gracefully by simply
+omitting the span attributes.


Before merging, we should create POC for this.
I do no like the idea of adding new attributes to every span. Why not just to the first span in the application? Other calculation could be done by the standard trace-id, span-id and the hierarchy.

I think it needs to be on every span for that request. It is more concise that way.

You also have to allow for a scenario where a call enters the app, leaves the app, and comes back in from a different tier. Multiple tiers possibly for the same trace from different tiers.

You said that you don't like adding attributes, but you didn't say why.

Just duplicating them, I think that they can be inherited/detected by backends. No strong opinion here.

specification/integration_context.md

breedx-splk · 2025-02-18T21:15:57Z

Did you considered W3C baggage?

Yes. And am confident that it would work, especially since mutations are allowed. I'm not sure how easy/sophisticated the baggage handling capabilities are in the various languages though.

I would be happy to convert these headers to baggage keys. What do other folks think?

Co-authored-by: Piotr Kiełkowicz <pkiekowicz@splunk.com>

johnbley · 2025-02-19T00:19:49Z

Did you considered W3C baggage?

Yes. And am confident that it would work, especially since mutations are allowed. I'm not sure how easy/sophisticated the baggage handling capabilities are in the various languages though.

I would be happy to convert these headers to baggage keys. What do other folks think?

Unless I misunderstand something, I'm against using baggage for this because baggage is meant to be trace-wide and we'd have to write logic to counteract that. Yes, it is possible, but it's much easier (less code, less compexity, fewer chances for errors or misattribution) when it is a plain non-propagated header.

breedx-splk · 2025-02-19T00:27:03Z

Did you considered W3C baggage?
baggage is meant to be trace-wide and we'd have to write logic to counteract that. Yes, it is possible, but it's much easier (less code, less compexity, fewer chances for errors or misattribution) when it is a plain non-propagated header.

Right, we'd also need to make sure that we remove some fields at each hop, because the default behavior is to propagate, which is not necessarily what we want/need here. Given that the intent of baggage is to propagate, I think it suggests that our use case is slightly different.

Thanks @johnbley .

breedx-splk · 2025-02-20T22:24:42Z

Prototype here signalfx/splunk-otel-java#2198

pellared

Some comments. I have not managed to review everything, but looks good in general.

specification/integration_context.md

pellared · 2025-02-21T08:30:54Z

specification/integration_context.md

+* `cisco-ctx-service` - Contains the [service.name](https://opentelemetry.io/docs/specs/semconv/resource/#service)
+  resource value from an OpenTelemetry based component.
+
+HTTP headers are capable of being multivalued. As such, implementations


I think we should also refer to https://datatracker.ietf.org/doc/html/rfc7230#section-3.2.2:

A sender MUST NOT generate multiple header fields with the same field name in a message unless either the entire field value for that header field is defined as a comma-separated list [i.e., #(values)] or the header field is a well-known exception (as noted below).

pellared · 2025-02-21T08:37:42Z

specification/integration_context.md

+generate them. These headers SHOULD be treated as opaque values of type
+string.
+
+* `cisco-ctx-acct-id` - Contains the ID of the AppDynamics account.


Maybe AppD prefix as it feels more unique than cisco?

What I like about cisco is that we are able to use the same prefix for data generated in AppD components and data generated in splunk-otel components. With this approach, the intention is to be able to say "Oh, I see this cisco-ctx-* header, I know exactly why it exists -- to facilitate integrated product experience.

pellared · 2025-02-21T08:42:51Z

specification/integration_context.md

+* `cisco-ctx-acct-id` - Contains the ID of the AppDynamics account.
+* `cisco-ctx-app-id` - Contains the ID of the AppDynamics application.
+* `cisco-ctx-tier-id` - Contains the ID of the AppDynamics tier.
+* `cisco-ctx-bt-id` - Contains the ID of the AppDynamics business transaction (BT).
+* `cisco-ctx-env` - Contains
+  the [deployment.environment.name](https://opentelemetry.io/docs/specs/semconv/attributes-registry/deployment/)
+  resource value from an OpenTelemetry based component.
+* `cisco-ctx-service` - Contains the [service.name](https://opentelemetry.io/docs/specs/semconv/resource/#service)
+  resource value from an OpenTelemetry based component.


nit: can we use the "standardized/canonical" way of defining headers

Suggested change

* `cisco-ctx-acct-id` - Contains the ID of the AppDynamics account.

* `cisco-ctx-app-id` - Contains the ID of the AppDynamics application.

* `cisco-ctx-tier-id` - Contains the ID of the AppDynamics tier.

* `cisco-ctx-bt-id` - Contains the ID of the AppDynamics business transaction (BT).

* `cisco-ctx-env` - Contains

the [deployment.environment.name](https://opentelemetry.io/docs/specs/semconv/attributes-registry/deployment/)

resource value from an OpenTelemetry based component.

* `cisco-ctx-service` - Contains the [service.name](https://opentelemetry.io/docs/specs/semconv/resource/#service)

resource value from an OpenTelemetry based component.

* `Cisco-Ctx-Acc-ID` - Contains the ID of the AppDynamics account.

* `Cisco-Ctx-App-ID` - Contains the ID of the AppDynamics application.

* `Cisco-Ctx-Tier-ID` - Contains the ID of the AppDynamics tier.

* `Cisco-Ctx-BT-ID` - Contains the ID of the AppDynamics business transaction (BT).

* `Cisco-Ctx-Env` - Contains

the [deployment.environment.name](https://opentelemetry.io/docs/specs/semconv/attributes-registry/deployment/)

resource value from an OpenTelemetry based component.

* `Cisco-Ctx-Service` - Contains the [service.name](https://opentelemetry.io/docs/specs/semconv/resource/#service)

resource value from an OpenTelemetry based component.

I don't think it matter much -- but my purely aesthetic preference is to use lowercase. Most libraries/frameworks standardize on lowercase anyway I believe.

Most libraries/frameworks standardize on lowercase anyway I believe.

I am not sure. At least, this is not true for Go

https://pkg.go.dev/net/http#Header

https://pkg.go.dev/net/http#CanonicalHeaderKey

I don't think it matter much

It does not, but this is how most RFC and docs like https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers follow. But maybe it just me who is not used to lowercase header field names. I won't fight for it, but sometime the difference in "casing" makes it easier for me to "pattern-match" the "type". E.g. when I see SPLUNK_IS_COOL I see an env var 😆

Feel free to do whatever you want. I do not want to end up with an academical discussion 😉

Kielek

@johnbley , @breedx-splk, whats about #335 (comment)?

Kielek · 2025-02-21T06:40:31Z

specification/integration_context.md

+When `cisco.ctx.enabled` is `true`, Splunk implementations MUST
+extract fields from the `cisco-ctx-*` headers (above) and add extra
+attributes to any Spans created as part of the incoming request context.
+Null or missing values MUST be handled gracefully by simply
+omitting the span attributes.


Just duplicating them, I think that they can be inherited/detected by backends. No strong opinion here.

Kielek · 2025-02-21T09:07:30Z

specification/integration_context.md

+generate them. These headers SHOULD be treated as opaque values of type
+string.


What means opaque here?
Should internal.public API implementing it treat it as an additional class with string value field, liek the Baggage + metadata?

This sentence should be indeed clarified. There is too much left for interpretation.

I think the intention is to have a dedicated opaque type for each header.
Thanks to it in future you can add funtionality to use cisco-ctx-app-id as int and leave other headers intact.

From https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/baggage/api.md#set-value

Metadata [..] This should be an opaque wrapper for a string with no semantic meaning. Left opaque to allow for future functionality.

to me opaque here means that agents should not attempt to parse or otherwise derive a meaning for these headers, they are just values as far as agent is concerned

to me opaque here means that agents should not attempt to parse or otherwise derive a meaning for these headers, they are just values as far as agent is concerned

@laurit is exactly right -- the intent is that it's of type string and that it shouldn't need to be parsed. I think the use of "opaque" to mean this is common.

I think the intention is to have a dedicated opaque type for each header.
Thanks to it in future you can add funtionality to use cisco-ctx-app-id as int and leave other headers intact.

No, I don't want any of this spec'd as anything other than a string.

the intent is that it's of type string and that it shouldn't need to be parsed. I think the use of "opaque" to mean this is common.

I strongly disagree. Opaque is more like encapsulation and information hiding.

I just found https://en.wikipedia.org/wiki/Opaque_data_type to back up my understanding.

Also from Oxford Dictionary:

opaque - (adj.) not able to be seen through; not transparent.

No, I don't want any of this spec'd as anything other than a string.

Then let's spec it as "of string type" or "underlying type of string".

"underlying type" is like a "super-set" of a type in some languages. E.g. an enum in .NET has int as underlying type by default. In Go you can have underlying types for any type e.g. type CiscoAppID string.

pellared · 2025-02-21T09:29:27Z

specification/integration_context.md

+HTTP headers are capable of being multivalued. As such, implementations
+SHOULD use the _last_ value when the above headers contain multiple values.
+
+## Splunk OpenTelemetry distributions


I think it would be helpful to describe how this can be implemented using OTel components (via propagator and processor) based on signalfx/splunk-otel-java#2198.

Do I understand correctly that the propagator should be applied only in communication between Splunk and AppD?

Do I understand correctly that the propagator should be applied only in communication between Splunk and AppD?

How would one tell that context is being propagated to an app that is using appd? There is an open spec issue for controlling context propagation boundary. I hope that none of these headers are considered sensitive as limiting where they are propagated will be complicated.

@laurit raises a good point that a customer might want the control of this behavior to be (optionally) more granular - e.g. cisco.ctx.enabledwould be the preferred way, but also cisco.ctx.enabled.inbound=true/false and cisco.ctx.enabled.outbound as more specific toggles. (If so, further spec issue to settle - what if the "combination" enabled is set and a more specific one is also set - which takes precedence?)

For outbound injection, instrumentation is obviously not going to be able to tell what it's talking to. For security/leakage concerns I've suggested more granular behavior controls, but the fallback could also be telling the user to configure a network proxy/firewall/etc. for their needs.

This feature (appd<->splunk context sharing) needs to be off by default in splunk-otel per PM, who I'm sure would be happy to share reasoning in separate channels.

Do I understand correctly that the propagator should be applied only in communication between Splunk and AppD?

No, this was not the intention. What phrasing leads you to believe this?

How would one tell that context is being propagated to an app that is using appd?

You can't and shouldn't need to. The headers shouldn't contain sensitive information. Users who choose to put sensitive information in any of these fields (service name, environment, appd IDs, whatever) shouldn't use this feature.

What phrasing leads you to believe this?

This:

See integration_context.md for specifics about
exchanging additional context between AppD and splunk-otel based agents.

Threat 1: Information disclosure
If I understand correctly the current design sends splunk-otel context to everyone (via outgoing requests).

Thread 2: Spoofing, Tampering?
The other problem is that the incoming requests are also not verifying who send the data (if this is actually coming from AppD). A malicious actor can send cisco-ctx-acct-id headers which would be stored in spans.

Are we planning to do a Thread Modeling and store somewhere its outcomes?

Given it is an opt-in feature I think that documenting the side-effects for the users might be good enough...

specification/integration_context.md

…nal.

Co-authored-by: Robert Pająk <pellared@hotmail.com>

add tracestate

66f3710

breedx-splk requested review from a team as code owners February 11, 2025 00:56

breedx-splk added 7 commits February 10, 2025 16:58

format

6663300

let tables be wider

9956cc5

format

5b5abe6

format

359fa75

format

6bd1cb9

format

2387f84

format

858cae2

Kielek reviewed Feb 11, 2025

View reviewed changes

johnbley requested changes Feb 11, 2025

View reviewed changes

specification/behaviors.md Show resolved Hide resolved

Update specification/tracestate.md

233787c

Co-authored-by: Piotr Kiełkowicz <pkiekowicz@splunk.com>

breedx-splk closed this Feb 11, 2025

github-actions bot locked and limited conversation to collaborators Feb 11, 2025

breedx-splk reopened this Feb 17, 2025

breedx-splk marked this pull request as draft February 17, 2025 22:53

breedx-splk added 6 commits February 17, 2025 15:40

update to use separate headers

d2b5f59

format

2bcc8aa

fragment and format

14be856

fix link

6e3d419

format

2d844cc

more format yay

4d19a01

breedx-splk marked this pull request as ready for review February 18, 2025 00:18

breedx-splk changed the title ~~Add tracestate spec~~ Add integration context spec Feb 18, 2025

change ipe -> ctx

986bf48

Kielek reviewed Feb 18, 2025

View reviewed changes

johnbley reviewed Feb 18, 2025

View reviewed changes

specification/integration_context.md Outdated Show resolved Hide resolved

johnbley reviewed Feb 18, 2025

View reviewed changes

specification/integration_context.md Show resolved Hide resolved

breedx-splk and others added 4 commits February 18, 2025 13:19

Update specification/integration_context.md

429853c

Co-authored-by: Piotr Kiełkowicz <pkiekowicz@splunk.com>

add clarifying statement about upstream

2fb7504

add statement about string type

037fdd2

format

d7f23db

breedx-splk mentioned this pull request Feb 20, 2025

Add ability to add additional context for appd integrations. signalfx/splunk-otel-java#2198

Open

johnbley approved these changes Feb 21, 2025

View reviewed changes

pellared reviewed Feb 21, 2025

View reviewed changes

Kielek reviewed Feb 21, 2025

View reviewed changes

pellared reviewed Feb 21, 2025

View reviewed changes

signalfx unlocked this conversation Feb 21, 2025

pellared reviewed Feb 21, 2025

View reviewed changes

specification/integration_context.md Outdated Show resolved Hide resolved

specification/integration_context.md Show resolved Hide resolved

breedx-splk and others added 3 commits February 21, 2025 08:12

last -> first

cd65b5b

clarify introduction wording, removing suggestion of being bidirectio…

125c3dc

…nal.

Update specification/integration_context.md

440c8ac

Co-authored-by: Robert Pająk <pellared@hotmail.com>


		## Outgoing State

		When `cisco.tracestate.enabled` is `true`, Splunk implementations MUST

		generate them. These headers SHOULD be treated as opaque values of type
		string.

Add integration context spec #335

Are you sure you want to change the base?

Add integration context spec #335

Conversation

breedx-splk commented Feb 11, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

breedx-splk commented Feb 17, 2025

Kielek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

breedx-splk commented Feb 18, 2025

johnbley commented Feb 19, 2025

breedx-splk commented Feb 19, 2025

breedx-splk commented Feb 20, 2025

pellared left a comment

Choose a reason for hiding this comment

pellared Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pellared Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

Kielek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pellared Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pellared Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

pellared Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pellared Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

pellared Feb 21, 2025 •

edited

Loading

pellared Feb 21, 2025 •

edited

Loading

pellared Feb 21, 2025 •

edited

Loading

pellared Feb 21, 2025 •

edited

Loading

pellared Feb 21, 2025 •

edited

Loading

pellared Feb 21, 2025 •

edited

Loading