-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mean_90 and upper_90 unclear #157
Comments
It is the mean and upper for the bottom 90th percentile. You can change the actual percentile value in the config and the code to handle it is at https://github.com/etsy/statsd/blob/master/backends/graphite.js#L101 |
Closing this, if something is still unclear about this, please reopen. |
Sorry to reopen this old thread, but I'm a bit confused here (even with your explanation). Perhaps I can ask this concretely... In this dataset of 20 numbers: 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 What would the mean_90 and upper_90 be? Based on your description above, I could see either: (1) mean_90 = 92.5 (mean of 90 and 95), upper_90 = 95 (upper of those) Given that (1) would imply that upper_x = upper for all x (and given another reading of your description after going through this example), (2) seems like it's right. But I wanted to confirm. |
+1, if anyone ever figures it out, please post it here or somewhere. |
Nevermind, I think I got it; @dyross your (2) is right. I believe it's meant to filter out spiky outliers. |
Yes I get it now too. Thanks for confirming 👍 |
Yeah it's doing the second thing. The lower n-th percentile is taken to filter outliers at the end. |
It seems like mean_90 should be avg(5-90) in the above example, no? Analogously, a hypothetical (but not very useful in general) lower_90 would be 10-95. |
The 90th percentile function removes the 10% highest measurements. It is meant to ignore short spikes. So for example, if the sorted dataset is 1,2,3,5,...,10,101,102,...,110, there are 20 values, so it will remove the 2 highest values (2 is 10% of 20), in this case 109 and 110. So lower_90 is 1, upper_90 is 108, sum_90 is sum(1,2,...10, 101, 102...108) = 891, and mean_90 is 891/(20-2)=49.5. |
This makes sense. I had assumed outliers would be discarded at both ends, but percentile is explicitly about values below a threshold so this makes sense. |
Consider the sorted data below: upper_90 : it is nothing but max value of 90th Pecentile. 90th Percentile is 889 and max value of 90th percentile according to data is 844. So upper_90 is 844. http://blog.pkhamre.com/2012/07/24/understanding-statsd-and-graphite/ |
@Guanpeng520 |
It's unclear from the documentation what the meaning of the two metrics mean_90 and upper_90 are on timers.
Is this the mean of the upper 90th percentile? What is upper_90 and how does it differ from plain old upper? Where is the mean metric then?
Looking through the code also does not elucidate.
The text was updated successfully, but these errors were encountered: