Refactor the probabilistic sampler processor; add FailClosed configuration, prepare for OTEP 235 support #31946

jmacd · 2024-03-25T23:12:52Z

Description:

Refactors the probabilistic sampling processor to prepare it for more OTEP 235 support.

This clarifies existing inconsistencies between tracing and logging samplers, see the updated README. The tracing priority mechanism applies a 0% or 100% sampling override (e.g., "1" implies 100% sampling), whereas the logging sampling priority mechanism supports variable-probability override (e.g., "1" implies 1% sampling).

This pins down cases where no randomness is available, and organizes the code to improve readability. A new type called randomnessNamer carries the randomness information (from the sampling pacakge) and a name of the policy that derived it. When sampling priority causes the effective sampling probability to change, the value "sampling.priority" replaces the source of randomness, which is currently limited to "trace_id_hash" or the name of the randomess-source attribute, for logs.

While working on #31894, I discovered that some inputs fall through to the hash function with zero bytes of input randomness. The hash function, computed on an empty input (for logs) or on 16 bytes of zeros (which OTel calls an invalid trace ID), would produce a fixed random value. So, for example, when logs are sampled and there is no TraceID and there is no randomness attribute value, the result will be sampled at approximately 82.9% and above.

In the refactored code, an error is returned when there is no input randomness. A new boolean configuration field determines the outcome when there is an error extracting randomness from an item of telemetry. By default, items of telemetry with errors will not pass through the sampler. When FailClosed is set to false, items of telemetry with errors will pass through the sampler.

The original hash function, which uses 14 bits of information, is structured as an "acceptance threshold", ultimately the test for sampling translated into a positive decision when Randomness < AcceptThreshold. In the OTEP 235 scheme, thresholds are rejection thresholds--this PR modifies the original 14-bit accept threshold into a 56-bit reject threshold, using Threshold and Randomness types from the sampling package. Reframed in this way, in the subsequent PR (i.e., #31894) the effective sampling probability will be seamlessly conveyed using OTEP 235 semantic conventions.

Note, both traces and logs processors are now reduced to a function like this:

				return commonSamplingLogic(
					ctx,
					l,
					lsp.sampler,
					lsp.failClosed,
					lsp.sampler.randomnessFromLogRecord,
					lsp.priorityFunc,
					"logs sampler",
					lsp.logger,
				)

which is a generic function that handles the common logic on a per-item basis and ends in a single metric event. This structure makes it clear how traces and logs are processed differently and have different prioritization schemes, currently. This structure also makes it easy to introduce new sampler modes, as shown in #31894. After this and #31940 merge, the changes in #31894 will be relatively simple to review as the third part in a series.

Link to tracking Issue:

Depends on #31940.
Part of #31918.

Testing: Added. Existing tests already cover the exact random behavior of the current hashing mechanism. Even more testing will be introduced with the last step of this series. Note that #32360 is added ahead of this test to ensure refactoring does not change results.

Documentation: Added.

to iterate.

…tor-contrib into jmacd/tvaluesampler

…n#3602

…and resampler

…tor-contrib into jmacd/tvaluesampler

…ype similar to configcomprsesion.CompressionType

jpkrohling

I like this a lot, and I'm eager to see this merged. I would like to request one change before that, though: could you add a benchmark that exercised the top-level consumer functions? I suspect this here will bring performance improvements, but there are a few checks in the hot path that might need some attention.

processor/probabilisticsamplerprocessor/sampler_mode.go

processor/probabilisticsamplerprocessor/tracesprocessor_test.go

…tor-contrib into jmacd/bigrefactor

kentquirk

This is so much better now; easier to read and understand. I had a couple of notes but no blockers.

One question that I'd like to be sure of because I've recently been bitten by this problem -- with FailClosed being a boolean that defaults to true, can the config system tell the difference between a config file that doesn't specify FailClosed, and one where FailClosed is set to false?

kentquirk · 2024-05-13T02:21:50Z

processor/probabilisticsamplerprocessor/README.md

+In logs pipelines, when the priority attribute has value 0, the
+configured probability will by modified to 0%, and the item will not
+pass the sampler.  Otherwise, the logs sampling priority attribute is
+interpreted as a percentage, with values >= 100 equal to 100%


We could unify the solutions by varying behavior according to the type of the attribute. If numeric, it's the priority, and if a string, it's the name of the numeric attribute containing the priority.

Personally, I'd prefer to choose one for the long term, something like:

State that the current traces behavior for sampling.priority is deprecated and that the logs behavior is desired.

For some period of time (probably a long time) state that when traces are configured with a numeric value for sampling.priority, it is interpreted as the configured probability, but that a string value will be treated as the name of the attribute to be used for priority.

It would be just as valid to do it the other way, if we preferred less configuration.

processor/probabilisticsamplerprocessor/logsprocessor.go

Co-authored-by: Kent Quirk <kentquirk@gmail.com>

jpkrohling · 2024-05-15T14:15:19Z

processor/probabilisticsamplerprocessor/README.md

+In logs pipelines, when the priority attribute has value 0, the
+configured probability will by modified to 0%, and the item will not
+pass the sampler.  Otherwise, the logs sampling priority attribute is
+interpreted as a percentage, with values >= 100 equal to 100%


I think this might be the perfect opportunity for this change, but not necessarily part of this PR.

jpkrohling · 2024-05-15T14:16:30Z

processor/probabilisticsamplerprocessor/README.md

-## Hashing
+- `sampling_percentage` (32-bit floating point, required): Percentage at which items are sampled; >= 100 samples all items, 0 rejects all items.
+- `hash_seed` (32-bit unsigned integer, optional, default = 0): An integer used to compute the hash algorithm. Note that all collectors for a given tier (e.g. behind the same load balancer) should have the same hash_seed.
+- `fail_closed` (boolean, optional, default = true): Whether to reject items with sampling-related errors.


Sounds good!

…rt OTEP 235) (#31894) **Description:** Creates new sampler modes named "equalizing" and "proportional". Preserves existing functionality under the mode named "hash_seed". Fixes #31918 This is the final step in a sequence, the whole of this work was factored into 3+ PRs, including the new `pkg/sampling` and the previous step, #31946. The two new Sampler modes enable mixing OTel sampling SDKs with Collectors in a consistent way. The existing hash_seed mode is also a consistent sampling mode, which makes it possible to have a 1:1 mapping between its decisions and the OTEP 235 randomness and threshold values. Specifically, the 14-bit hash value and sampling probability are mapped into 56-bit R-value and T-value encodings, so that all sampling decisions in all modes include threshold information. This implements the semantic conventions of open-telemetry/semantic-conventions#793, namely the `sampling.randomness` and `sampling.threshold` attributes used for logs where there is no tracestate. The default sampling mode remains HashSeed. We consider a future change of default to Proportional to be desirable, because: 1. Sampling probability is the same, only the hashing algorithm changes 2. Proportional respects and preserves information about earlier sampling decisions, which HashSeed can't do, so it has greater interoperability with OTel SDKs which may also adopt OTEP 235 samplers. **Link to tracking Issue:** Draft for open-telemetry/opentelemetry-specification#3602. Previously #24811, see also open-telemetry/oteps#235 Part of #29738 **Testing:** New testing has been added. **Documentation:** ✅ --------- Co-authored-by: Juraci Paixão Kröhling <juraci.github@kroehling.de>

jmacd added 30 commits May 12, 2023 15:20

Add t-value sampler draft

e822a9b

copy/import tracestate parser package

1bc6017

test ot tracestate

d1fd891

tidy

85e4472

renames

bb75f8a

testing two parsers w/ generic code

6a57b77

integrated

7fa8130

Comments

36230e7

revert two files

7bae35c

Update with r, s, and t-value. Now using regexps and strings.IndexByte()

9010a67

to iterate.

fix sampler build

0e27e40

add support for s-value for non-consistent mode

efcdc3d

WIP

939c758

Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…

b9a1e56

…tor-contrib into jmacd/tvaluesampler

use new proposed syntax see open-telemetry/opentelemetry-specificatio…

a31266c

…n#3602

update tracestate libs for new encoding

690cd64

wip working on probabilistic sampler with two new modes: downsampler …

c8baf29

…and resampler

unsigned implement split

7f47e4a

two implementations

422e0b2

wip

787b9fd

Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…

ed36f03

…tor-contrib into jmacd/tvaluesampler

Updates for OTEP 235

d795210

wip TODO

09000f7

versions.yaml

a4d467b

Add proportional sampler mode; comment on TODOs; create SamplerMode t…

e373b9b

…ype similar to configcomprsesion.CompressionType

back from internal

fe6a085

wip

396efb1

fix existing tests

36de5dd

:wip:

f1aa0ad

Update for rejection threshold

700734e

jpkrohling reviewed May 7, 2024

View reviewed changes

jmacd added 3 commits May 7, 2024 11:25

Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…

b0fd487

…tor-contrib into jmacd/bigrefactor

Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…

b3f84ac

…tor-contrib into jmacd/bigrefactor

more text on consistency/completeness

594852f

jmacd added 11 commits May 8, 2024 14:04

comments/feedback

7abdf2d

Return FailClosed: in tests

2012193

make it debug-level

d3abdd7

fix test

e331268

fix test

1ff0053

more tidy

b937dbe

make gotidy missed this :(

00ab52f

remove cfg != nil test

d2cde83

tidy the world

487aad8

Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…

1049de3

…tor-contrib into jmacd/bigrefactor

ci/cd is so awful

21ef16d

kentquirk approved these changes May 13, 2024

View reviewed changes

Update processor/probabilisticsamplerprocessor/logsprocessor.go

fecd75e

Co-authored-by: Kent Quirk <kentquirk@gmail.com>

jmacd mentioned this pull request May 13, 2024

Document clarity re: sampling_priority configuration #30410

Open

jpkrohling approved these changes May 15, 2024

View reviewed changes

jpkrohling merged commit 4fa4603 into open-telemetry:main May 15, 2024
162 checks passed

github-actions bot added this to the next release milestone May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the probabilistic sampler processor; add FailClosed configuration, prepare for OTEP 235 support #31946

Refactor the probabilistic sampler processor; add FailClosed configuration, prepare for OTEP 235 support #31946

jmacd commented Mar 25, 2024 •

edited

Loading

jpkrohling left a comment

kentquirk left a comment

kentquirk May 13, 2024

jpkrohling May 15, 2024

jpkrohling May 15, 2024

Refactor the probabilistic sampler processor; add FailClosed configuration, prepare for OTEP 235 support #31946

Refactor the probabilistic sampler processor; add FailClosed configuration, prepare for OTEP 235 support #31946

Conversation

jmacd commented Mar 25, 2024 • edited Loading

jpkrohling left a comment

Choose a reason for hiding this comment

kentquirk left a comment

Choose a reason for hiding this comment

kentquirk May 13, 2024

Choose a reason for hiding this comment

jpkrohling May 15, 2024

Choose a reason for hiding this comment

jpkrohling May 15, 2024

Choose a reason for hiding this comment

jmacd commented Mar 25, 2024 •

edited

Loading