Inject OTel tracestate when power-of-two probability sampling #7962

jmacd · 2022-02-17T20:16:43Z

Proposal

I propose to modify the probabilistic sampling processor to emit an OTel tracestate corresponding with the sampling probability in use, which is well defined in the current specification when the sampling probability is a power of two.

Specifically, if the sampling probability is a power of two such that Log2(probability) == -X where X is an integer, then the corresponding Span's tracestate should be extended with an entry ot=p:X. Here, "p" is the base-2 logarithm of "adjusted count".

When the Span already has a p-value, probabilities multiply. If the span does not have a p-value, which formally means "unknown" adjusted count, we will assume the count is 1 span corresponding with probability=1 (i.e., ot=p:0).

If the span already has an ot=p:Y property, the correct output is ot=p:Z for Z=X+Y (multiplying probabilities == addition inside Log2).

For example, if performing 50% sampling then we are multiplying adjusted counts by 2. Spans with no adjusted count on arrival will depart with p-value 1, and Spans with an adjusted count on arrival will depart with p-value greater by 1.

Other ot= properties than p, such as r, SHOULD pass through unmodified.

Additional context

The tracestate fields used here are defined in
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md

This specification defines support only for power-of-two adjusted counts. Support for non-power-of-two adjusted counts would require more specification work, but is certainly possible. The proposal here is to leave the processor logic here unmodified, and only to extend p-value when the probability is a power of two. Some documentation will be required to warn users that arbitrary composition of these processors having mixed power-of-two and non-power-of-two probabilities will not behave correctly. Users who want this should user powers-of-two everywhere.

The type of sampling being performed here is considered "after-the-fact", since we are interpreting and mutating the Span data of finished spans. Trace context uses an r-value to describe a randomness value for contexts in flight to use making consistent decisions, whereas this processor needs only to interpret and set p-value.

The text was updated successfully, but these errors were encountered:

oertl · 2022-02-18T06:52:43Z

I think it would be better not to mix the information of consistent and inconsistent sampling decisions. If the sampling decision is not based on the r-value, it should not touch the p-value. Maybe, it is better to add some extra field (e.g. t for trace) for the sampling probability (or corresponding adjusted count) that was applied to the entire trace independently of r.

PeterF778 · 2022-02-22T22:21:29Z

Continuing with the idea of an extra field, why don't we reconsider already discussed explicit adjusted count c, but now taking precedence over p. The effective adjusted count would be

the value of c, if c is present
2**p, if c is not present

Thus, a probabilistic sampling processor could generate the c-values from p-values and its own sampling probability. Furthermore, if we allow non-integer values for c, the processor does not have to be constrained to probabilities being powers of 2.

oertl · 2022-02-22T23:21:52Z

When there are multiple sampling stages, such as when consistent probability samplers are combined with this probabilistic sampling processor, each making sampling decisions independently (based on different random/hash values), it is generally important to know which sampling probability (or corresponding adjusted count) was used at which stage in order to have the chance to get the statistics right. If this sampling processor uses the same randomness as consistent probability samplers (the r-value), it would be correct to use/adapt the p field to propagate the adjusted count.

jpkrohling · 2022-03-02T11:43:18Z

Could you please expand on the use-cases for this? Is this so that users can analyze this data later and extrapolate based on the probability? And just to confirm: we are talking only about root spans here, right?

Would you also be open to providing concrete examples of input and output?

PeterF778 · 2022-03-02T23:48:09Z

Probabilistic sampling processor would typically live in a collector. Its role would be to further reduce the volume of data (spans), if for whatever reason it decides that it is too high.
On input, it receives spans which have been already sampled by a tracer, more specifically, here we assume that consistent probability samplers were used (see https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md) and that the spans have their r-values and p-values. If sampling processor is not present (i.e. all spans are passed through), the p-values are sufficient to evaluate adjusted counts. However, if there's additional sampling processor(s) involved, the adjusted counts would be incorrect, if they were based on the original p-values.
This applies to all spans, not just root spans.

github-actions · 2022-11-16T03:43:31Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

jmacd · 2022-11-17T20:18:04Z

Discussed in the 11/17/22 Sampling SIG.

This would be addressed by the proposal dicsussed in open-telemetry/opentelemetry-specification#1413, which would take advantage of 7 definite bytes of randomness per TraceID.

github-actions · 2022-11-18T03:00:26Z

Pinging code owners: @jpkrohling. See Adding Labels via Comments if you do not have permissions to add labels yourself.

codeboten added the needs triage New item requiring triage label Sep 16, 2022

github-actions bot added the Stale label Nov 16, 2022

fatsheep9146 added processor/probabilisticsampler Probabilistic Sampler processor and removed Stale needs triage New item requiring triage labels Nov 18, 2022

jmacd closed this as completed Nov 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inject OTel tracestate when power-of-two probability sampling #7962

Inject OTel tracestate when power-of-two probability sampling #7962

jmacd commented Feb 17, 2022

oertl commented Feb 18, 2022

PeterF778 commented Feb 22, 2022

oertl commented Feb 22, 2022

jpkrohling commented Mar 2, 2022

PeterF778 commented Mar 2, 2022

github-actions bot commented Nov 16, 2022

jmacd commented Nov 17, 2022

github-actions bot commented Nov 18, 2022