-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question: what is rate_by_service? #3031
Comments
Hi Sam! I'd be curious to hear what you are working on and why you aren't using one of our own tracers. We've not really spent time documenting our API endpoints given that they are still in the The map is used to determine the behavior of the priority sampler, which uses these rates to set the sampling priority. Clients are expected to keep these rates and use them after the agent sends them. Using the mapSome simple rules to using the rates map:
Using the rateThe formula used to sample by rate in all Datadog systems is consistent and is expected to be the same everywhere. The sampling formula looks like this: const knuthFactor = uint64(1111111111111111111)
// sampledByRate verifies if the trace with the given traceID should be sampled.
func sampledByRate(traceID uint64, rate float64) bool {
if rate < 1 {
return traceID*knuthFactor < uint64(rate*math.MaxUint64)
}
return true
} Applying the sampling decisionBased on the value returned by the rate sampler, sampling priority tags are applied:
These decisions also need to be propagated in HTTP requests as You may check the implementation in our Go tracer as an example. |
Thanks @gbbr "Clients are expected to keep these rates and use them after the agent sends them" is the bit I am most interested in. Could you please provide more actionable information for a client author? e.g. what happens if the client ignores it? We are commercial customers of datadog and we had a discussion with one of your "Customer Success Manager" where it was decided that we needed to build our own client for Haskell, because datadog does not provide one. |
Nothing will happen if you ignore it. But setting the sampling priority metric on traces (and propagating it cross-wire) is important. If you don't, you'll end up seeing only pieces of distributed traces because the agent (and backend) will sample them differently. The rates map provides a good guide as to how much interest should be put into a specific service and is backed by algorithms that aim to guarantee a good variety of traces will end up in the application. I don't see why you'd want to ignore it. If you chose to not use the rate sampling to apply priority sampling, you can chose your own logic and that's no problem (to my knowledge), but definitely don't set priority sampling of Does this help? |
thanks @gbbr ... I think there is still an assumption that the words "sample" and "rate" are known quantities and if this is reporting actuals or desired values. Let's get to some specifics:
As I understand it, from reading the source code of this repository, the agent is enforcing that the value is |
Sorry about that. That is many times the case when talking to someone who's been in a domain for too long, even though I try to avoid it.
It's the parameter of
I've created a confusion here. The only possible sampling rate can be The sampling rate: determines the rate at which a trace is sampled. Feel free to reach out to me on our public Slack, it might be easier. |
thanks again @gbbr . To be clear, I'm not talking about sampling priorities at all, I'm only interested in the With that in mind, let's return to the specifics:
|
I think I'm just going to take this away
and ignore the field. Thanks anyways. |
thanks @gbbr I really don't think I need to read the other examples, the To implement the v0.4 API, all that I need to know is a technical definition of what "rate" means and how to act upon it. Given that "no action" is fine, understanding what it actually means seems to be irrelevant. I should point out that I still don't know if "rate" means:
There is an implicit assumption that this is obvious. If the meaning is in this list, I'd very much appreciate it if you could point it out. From a client implementer point of view, I can assure you that it is not obvious. |
It is none of that. Rate is just a number that clients are meant to use with the formula I've described above to mark a trace with a sampling priority in order to consistently get full distributed traces. Clients are meant to decide whether a trace will be kept or not (in other words "sampled or not") by using that formula and the rate recommended by the agent. If you look at how the formula works, you'll soon understand that a rate of 1 means 100% and a rate of 0.5 means 50% chance of being sampled. I'm sorry but I'm not able to tell where the misunderstanding is and which part it is that you don't understand. |
My very first comment is a detailed step-by-step guide on how to use the endpoint. I'm lost as to which part is confusing. |
I'll jump on your slack as you suggested @gbbr ... perhaps this is better in realtime, although I was hoping to keep an audit of the conversation for our records. |
Conclusion from slack: "rate" is a number between Clients should act upon |
The
/v0.4/traces
endpoint returns a JSON object containing arate_by_services
map.Is there further documentation about what this is? It is unclear what "rate" means, if the client should act upon it, or what the unit is.
A line comment says "the recommended sampling rates for all services". Could you please provide more actionable information for a client author? e.g. what happens if the client ignores it?
(Some ambiguous interpretations I have are: if this drops below 1.0, the client should begin to sample/drop their traces before sending, or if the rate drops the client should adjust the frequency of submissions, or perhaps this is simply informational to let clients know how many traces are being rejected due to invalid content)
The text was updated successfully, but these errors were encountered: