-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample ratio not align with configured SampleRate during Stress Relief #1391
Comments
I've been analyzing this data and expermenting with different hash functions, and the problem turns out to be the often-surprising nature of sampling statistics. The short version is: The larger the sample rate you're trying to achieve, the more samples it takes to be reliably close to it. The other thing that people forget to account for is: The number of spans per trace usually varies. I ran a test with several different hash algorithms (wyhash, murmur3, and sha1). There was a slight but consistent difference between them, with wyhash (the algorithm we use) yielding the best results, but the difference was minor. The test generated random traceIDs, hashed them, and then decided to "keep" or "drop" them based on the value of the hash, the same way Refinery does, using a target sample rate of 100. It then calculated the actual achieved sample rate. It did this for different numbers of samples, and repeated each test 100 times, keeping track of the minimum, maximum, average and standard deviation of the actual sampleRate achieved in each test. The table below shows the results.
If you think of the sampleCount column as the number of samples in one granularity bucket of a Honeycomb query, you need something like 10000 samples per bucket before the graph looks even close to "smooth" when your sampleRate is 100. Now we need to take the second factor into account. This is subtle, but: in any collection of traces containing a distribution of span counts, different trace IDs will have different weights when looking at the number of spans kept vs dropped. They will also reduce the number of actual trace IDs you see compared to span count. So if you randomly select a subset of trace IDs, you're going to get skewed results, particularly at the small numbers of traces. The results below were from the same test as above, but now each trace represented from 1-20 spans in a bell curve around 11. Note how much less stable these results are (lower min, higher max, farther from the target):
One last thing -- the current implementation of stress relief is not a binary -- it sends a fraction of traces through the deterministic sampler, so the effective sample rate will be a blend of the normal and stress rates. In short, I think we're seeing "expected behavior" -- it's just that we didn't actually expect it until we did the math. Because sampling is hard, yo. |
Description:
When Stress Relief mode is activated, a fraction of traffic is sampled through a deterministic sampler based on the hash of trace id calculated using
wyhash
. When observing thekept_from_stress
anddropped_from_stress
metrics, the ratio between the two does not always align with the SampleRate configured for stress reliefPotential Cause
Below test showed that with smaller iteration
n
,wyhash
result sometimes can have less distribution. This can be a reason why more traces are kept than configured SampleRateThe text was updated successfully, but these errors were encountered: