-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inject OTel tracestate when power-of-two probability sampling #7962
Comments
I think it would be better not to mix the information of consistent and inconsistent sampling decisions. If the sampling decision is not based on the |
Continuing with the idea of an extra field, why don't we reconsider already discussed explicit adjusted count c, but now taking precedence over p. The effective adjusted count would be
Thus, a probabilistic sampling processor could generate the c-values from p-values and its own sampling probability. Furthermore, if we allow non-integer values for c, the processor does not have to be constrained to probabilities being powers of 2. |
When there are multiple sampling stages, such as when consistent probability samplers are combined with this probabilistic sampling processor, each making sampling decisions independently (based on different random/hash values), it is generally important to know which sampling probability (or corresponding adjusted count) was used at which stage in order to have the chance to get the statistics right. If this sampling processor uses the same randomness as consistent probability samplers (the |
Could you please expand on the use-cases for this? Is this so that users can analyze this data later and extrapolate based on the probability? And just to confirm: we are talking only about root spans here, right? Would you also be open to providing concrete examples of input and output? |
Probabilistic sampling processor would typically live in a collector. Its role would be to further reduce the volume of data (spans), if for whatever reason it decides that it is too high. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping |
Discussed in the 11/17/22 Sampling SIG. This would be addressed by the proposal dicsussed in open-telemetry/opentelemetry-specification#1413, which would take advantage of 7 definite bytes of randomness per TraceID. |
Pinging code owners: @jpkrohling. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Proposal
I propose to modify the probabilistic sampling processor to emit an OTel tracestate corresponding with the sampling probability in use, which is well defined in the current specification when the sampling probability is a power of two.
Specifically, if the sampling probability is a power of two such that
Log2(probability) == -X
where X is an integer, then the corresponding Span's tracestate should be extended with an entryot=p:X
. Here, "p" is the base-2 logarithm of "adjusted count".When the Span already has a p-value, probabilities multiply. If the span does not have a p-value, which formally means "unknown" adjusted count, we will assume the count is 1 span corresponding with probability=1 (i.e.,
ot=p:0
).If the span already has an
ot=p:Y
property, the correct output isot=p:Z
forZ=X+Y
(multiplying probabilities == addition inside Log2).For example, if performing 50% sampling then we are multiplying adjusted counts by 2. Spans with no adjusted count on arrival will depart with p-value 1, and Spans with an adjusted count on arrival will depart with p-value greater by 1.
Other
ot=
properties thanp
, such asr
, SHOULD pass through unmodified.Additional context
The tracestate fields used here are defined in
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md
This specification defines support only for power-of-two adjusted counts. Support for non-power-of-two adjusted counts would require more specification work, but is certainly possible. The proposal here is to leave the processor logic here unmodified, and only to extend
p-value
when the probability is a power of two. Some documentation will be required to warn users that arbitrary composition of these processors having mixed power-of-two and non-power-of-two probabilities will not behave correctly. Users who want this should user powers-of-two everywhere.The type of sampling being performed here is considered "after-the-fact", since we are interpreting and mutating the Span data of finished spans. Trace context uses an r-value to describe a randomness value for contexts in flight to use making consistent decisions, whereas this processor needs only to interpret and set p-value.
The text was updated successfully, but these errors were encountered: