Skip to content
This repository has been archived by the owner on Oct 3, 2023. It is now read-only.

proposal: Better default sampler #156

Open
semistrict opened this issue Aug 9, 2018 · 8 comments
Open

proposal: Better default sampler #156

semistrict opened this issue Aug 9, 2018 · 8 comments

Comments

@semistrict
Copy link
Contributor

I think the default sampler of 1/10k is causing a number of problems.

First, users who want to initially start doing tracing want to be able to see traces without rolling it out to production. In this case, they usually set the sampler to AlwaysSample.

Then, sometimes they commit this and forget to change it leading to unintentionally starting traces for all requests. In a high-traffic system, this can be very costly.

My initial proposal for discussion would be that we change the default sampler so that it is suitable for both development and production. For example, we could rate limit sampling up to 1 per second and thereafter, sample 1 in 1000 (say) up to 5 per second.

This should allow people to see traces during development and testing (all requests will be traces if the service sees less than 1qps) and should still be fine in production since we limit ourselves to 5 per second.

@codefromthecrypt
Copy link

codefromthecrypt commented Aug 10, 2018 via email

@semistrict
Copy link
Contributor Author

/cc @bogdandrutu

@bogdandrutu
Copy link
Contributor

Changing the default is a breaking change which we may not be able to do that easily. I understand the problem and I think the current best solution is to add a qps based sampler and suggest that in our examples instead of always sample.

What do you think? @adriancole @Ramonza

@semistrict
Copy link
Contributor Author

Well we have a well-defined mechanism for making breaking API changes in Go. It seems like we should also have a way to make breaking non-API changes like this.

I definitely agree that it needs to be done slowly over time and can't just be changed outright and that the examples are the first thing we should update.

@shahprit
Copy link

I like the idea but also think it should be done carefully (no surprises to existing users).

Is there an explicit flag/mode that can enable this behavior for people who're trying to demo or debug? that way we don't surprise the existing users

@SergeyKanzhelev
Copy link
Member

We are using adaptive sampling as a default in Application Insights. It adjusts sampling percentage so the rate of produced telemetry will be limited to 5 (default) per second. Since pricing is per volume of telemetry - this approach allows to estimate the cost of monitoring and keep it (virtually) constant on any high load including spikes. It also works great in dev environment.

More dev notes here.

Will this one be a better alternative?

@semistrict
Copy link
Contributor Author

Is this the same as RateLimit sampling described here: https://github.com/census-instrumentation/opencensus-specs/blob/master/trace/Sampling.md

I agree that something like this is a better default.

@SergeyKanzhelev
Copy link
Member

@Ramonza idea of adaptive sampling is easier. Adaptive sampling is basically a wrapper on probability sampled that adjusts probability sampler config based on previous time interval. All decisions of adaptive sampler are delayed, but it make the spans easier to analyze. Each time frame sampling probability was constant and you can gain up numbers to estimate the real number of spans were seen in that interval.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants