proposal: Better default sampler #156

semistrict · 2018-08-09T20:36:36Z

I think the default sampler of 1/10k is causing a number of problems.

First, users who want to initially start doing tracing want to be able to see traces without rolling it out to production. In this case, they usually set the sampler to AlwaysSample.

Then, sometimes they commit this and forget to change it leading to unintentionally starting traces for all requests. In a high-traffic system, this can be very costly.

My initial proposal for discussion would be that we change the default sampler so that it is suitable for both development and production. For example, we could rate limit sampling up to 1 per second and thereafter, sample 1 in 1000 (say) up to 5 per second.

This should allow people to see traces during development and testing (all requests will be traces if the service sees less than 1qps) and should still be fine in production since we limit ourselves to 5 per second.

codefromthecrypt · 2018-08-10T08:04:31Z

agree

semistrict · 2018-08-10T17:45:04Z

/cc @bogdandrutu

bogdandrutu · 2018-08-10T18:54:51Z

Changing the default is a breaking change which we may not be able to do that easily. I understand the problem and I think the current best solution is to add a qps based sampler and suggest that in our examples instead of always sample.

What do you think? @adriancole @Ramonza

semistrict · 2018-08-10T20:14:01Z

Well we have a well-defined mechanism for making breaking API changes in Go. It seems like we should also have a way to make breaking non-API changes like this.

I definitely agree that it needs to be done slowly over time and can't just be changed outright and that the examples are the first thing we should update.

shahprit · 2018-08-10T21:30:43Z

I like the idea but also think it should be done carefully (no surprises to existing users).

Is there an explicit flag/mode that can enable this behavior for people who're trying to demo or debug? that way we don't surprise the existing users

SergeyKanzhelev · 2018-09-25T05:52:56Z

We are using adaptive sampling as a default in Application Insights. It adjusts sampling percentage so the rate of produced telemetry will be limited to 5 (default) per second. Since pricing is per volume of telemetry - this approach allows to estimate the cost of monitoring and keep it (virtually) constant on any high load including spikes. It also works great in dev environment.

More dev notes here.

Will this one be a better alternative?

semistrict · 2018-09-25T21:17:47Z

Is this the same as RateLimit sampling described here: https://github.com/census-instrumentation/opencensus-specs/blob/master/trace/Sampling.md

I agree that something like this is a better default.

SergeyKanzhelev · 2018-09-25T22:20:54Z

@Ramonza idea of adaptive sampling is easier. Adaptive sampling is basically a wrapper on probability sampled that adjusts probability sampler config based on previous time interval. All decisions of adaptive sampler are delayed, but it make the spans easier to analyze. Each time frame sampling probability was constant and you can gain up numbers to estimate the real number of spans were seen in that interval.

semistrict added trace proposal labels Aug 9, 2018

semistrict mentioned this issue Aug 14, 2018

Don't trace health endpoints #151

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: Better default sampler #156

proposal: Better default sampler #156

semistrict commented Aug 9, 2018

codefromthecrypt commented Aug 10, 2018 via email

semistrict commented Aug 10, 2018

bogdandrutu commented Aug 10, 2018

semistrict commented Aug 10, 2018

shahprit commented Aug 10, 2018

SergeyKanzhelev commented Sep 25, 2018

semistrict commented Sep 25, 2018

SergeyKanzhelev commented Sep 25, 2018

proposal: Better default sampler #156

proposal: Better default sampler #156

Comments

semistrict commented Aug 9, 2018

codefromthecrypt commented Aug 10, 2018 via email

semistrict commented Aug 10, 2018

bogdandrutu commented Aug 10, 2018

semistrict commented Aug 10, 2018

shahprit commented Aug 10, 2018

SergeyKanzhelev commented Sep 25, 2018

semistrict commented Sep 25, 2018

SergeyKanzhelev commented Sep 25, 2018