Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inject OTel tracestate when power-of-two probability sampling #7962

Closed
jmacd opened this issue Feb 17, 2022 · 8 comments
Closed

Inject OTel tracestate when power-of-two probability sampling #7962

jmacd opened this issue Feb 17, 2022 · 8 comments
Labels
processor/probabilisticsampler Probabilistic Sampler processor

Comments

@jmacd
Copy link
Contributor

jmacd commented Feb 17, 2022

Proposal

I propose to modify the probabilistic sampling processor to emit an OTel tracestate corresponding with the sampling probability in use, which is well defined in the current specification when the sampling probability is a power of two.

Specifically, if the sampling probability is a power of two such that Log2(probability) == -X where X is an integer, then the corresponding Span's tracestate should be extended with an entry ot=p:X. Here, "p" is the base-2 logarithm of "adjusted count".

When the Span already has a p-value, probabilities multiply. If the span does not have a p-value, which formally means "unknown" adjusted count, we will assume the count is 1 span corresponding with probability=1 (i.e., ot=p:0).

If the span already has an ot=p:Y property, the correct output is ot=p:Z for Z=X+Y (multiplying probabilities == addition inside Log2).

For example, if performing 50% sampling then we are multiplying adjusted counts by 2. Spans with no adjusted count on arrival will depart with p-value 1, and Spans with an adjusted count on arrival will depart with p-value greater by 1.

Other ot= properties than p, such as r, SHOULD pass through unmodified.

Additional context

The tracestate fields used here are defined in
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md

This specification defines support only for power-of-two adjusted counts. Support for non-power-of-two adjusted counts would require more specification work, but is certainly possible. The proposal here is to leave the processor logic here unmodified, and only to extend p-value when the probability is a power of two. Some documentation will be required to warn users that arbitrary composition of these processors having mixed power-of-two and non-power-of-two probabilities will not behave correctly. Users who want this should user powers-of-two everywhere.

The type of sampling being performed here is considered "after-the-fact", since we are interpreting and mutating the Span data of finished spans. Trace context uses an r-value to describe a randomness value for contexts in flight to use making consistent decisions, whereas this processor needs only to interpret and set p-value.

@oertl
Copy link

oertl commented Feb 18, 2022

I think it would be better not to mix the information of consistent and inconsistent sampling decisions. If the sampling decision is not based on the r-value, it should not touch the p-value. Maybe, it is better to add some extra field (e.g. t for trace) for the sampling probability (or corresponding adjusted count) that was applied to the entire trace independently of r.

@PeterF778
Copy link

Continuing with the idea of an extra field, why don't we reconsider already discussed explicit adjusted count c, but now taking precedence over p. The effective adjusted count would be

  • the value of c, if c is present
  • 2**p, if c is not present

Thus, a probabilistic sampling processor could generate the c-values from p-values and its own sampling probability. Furthermore, if we allow non-integer values for c, the processor does not have to be constrained to probabilities being powers of 2.

@oertl
Copy link

oertl commented Feb 22, 2022

When there are multiple sampling stages, such as when consistent probability samplers are combined with this probabilistic sampling processor, each making sampling decisions independently (based on different random/hash values), it is generally important to know which sampling probability (or corresponding adjusted count) was used at which stage in order to have the chance to get the statistics right. If this sampling processor uses the same randomness as consistent probability samplers (the r-value), it would be correct to use/adapt the p field to propagate the adjusted count.

@jpkrohling
Copy link
Member

Could you please expand on the use-cases for this? Is this so that users can analyze this data later and extrapolate based on the probability? And just to confirm: we are talking only about root spans here, right?

Would you also be open to providing concrete examples of input and output?

@PeterF778
Copy link

Probabilistic sampling processor would typically live in a collector. Its role would be to further reduce the volume of data (spans), if for whatever reason it decides that it is too high.
On input, it receives spans which have been already sampled by a tracer, more specifically, here we assume that consistent probability samplers were used (see https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md) and that the spans have their r-values and p-values. If sampling processor is not present (i.e. all spans are passed through), the p-values are sufficient to evaluate adjusted counts. However, if there's additional sampling processor(s) involved, the adjusted counts would be incorrect, if they were based on the original p-values.
This applies to all spans, not just root spans.

@codeboten codeboten added the needs triage New item requiring triage label Sep 16, 2022
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Nov 16, 2022
@jmacd
Copy link
Contributor Author

jmacd commented Nov 17, 2022

Discussed in the 11/17/22 Sampling SIG.

This would be addressed by the proposal dicsussed in open-telemetry/opentelemetry-specification#1413, which would take advantage of 7 definite bytes of randomness per TraceID.

@fatsheep9146 fatsheep9146 added processor/probabilisticsampler Probabilistic Sampler processor and removed Stale needs triage New item requiring triage labels Nov 18, 2022
@github-actions
Copy link
Contributor

Pinging code owners: @jpkrohling. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@jmacd jmacd closed this as completed Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/probabilisticsampler Probabilistic Sampler processor
Projects
None yet
Development

No branches or pull requests

6 participants