mean_90 and upper_90 unclear #157

joshdevins · 2012-09-27T12:18:42Z

It's unclear from the documentation what the meaning of the two metrics mean_90 and upper_90 are on timers.

The percentile threshold can be a single value, or a list of values, and will generate the following list of stats for each threshold:

stats.timers.$KEY.mean_$PCT
stats.timers.$KEY.upper_$PCT

Is this the mean of the upper 90th percentile? What is upper_90 and how does it differ from plain old upper? Where is the mean metric then?

Looking through the code also does not elucidate.

The text was updated successfully, but these errors were encountered:

mrtazz · 2012-09-28T19:44:47Z

It is the mean and upper for the bottom 90th percentile. You can change the actual percentile value in the config and the code to handle it is at https://github.com/etsy/statsd/blob/master/backends/graphite.js#L101

mrtazz · 2012-10-20T03:35:11Z

Closing this, if something is still unclear about this, please reopen.

dyross · 2012-12-12T04:41:49Z

Sorry to reopen this old thread, but I'm a bit confused here (even with your explanation). Perhaps I can ask this concretely...

In this dataset of 20 numbers:

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95

What would the mean_90 and upper_90 be? Based on your description above, I could see either:

(1) mean_90 = 92.5 (mean of 90 and 95), upper_90 = 95 (upper of those)
(2) mean_90 = 42.5 (mean of 0 - 85), upper_90 = 85

Given that (1) would imply that upper_x = upper for all x (and given another reading of your description after going through this example), (2) seems like it's right. But I wanted to confirm.

seantalts · 2012-12-18T18:16:15Z

+1, if anyone ever figures it out, please post it here or somewhere.

seantalts · 2012-12-18T18:38:39Z

Nevermind, I think I got it; @dyross your (2) is right. I believe it's meant to filter out spiky outliers.

dyross · 2012-12-18T22:01:08Z

Yes I get it now too. Thanks for confirming 👍

mrtazz · 2012-12-18T22:03:55Z

Yeah it's doing the second thing. The lower n-th percentile is taken to filter outliers at the end.

parkan · 2013-02-11T23:33:19Z

It seems like mean_90 should be avg(5-90) in the above example, no? Analogously, a hypothetical (but not very useful in general) lower_90 would be 10-95.

ageron · 2013-03-21T11:42:31Z

The 90th percentile function removes the 10% highest measurements. It is meant to ignore short spikes. So for example, if the sorted dataset is 1,2,3,5,...,10,101,102,...,110, there are 20 values, so it will remove the 2 highest values (2 is 10% of 20), in this case 109 and 110. So lower_90 is 1, upper_90 is 108, sum_90 is sum(1,2,...10, 101, 102...108) = 891, and mean_90 is 891/(20-2)=49.5.
Note that "lower_90" will always be equal to "lower", it's redundant.
This is what @dyross said (his 2nd option), but I just thought it would be useful to spell it out in more details.
Here's a python program to demonstrate this: https://gist.github.com/ageron/5212412

parkan · 2013-03-22T20:54:14Z

This makes sense. I had assumed outliers would be discarded at both ends, but percentile is explicitly about values below a threshold so this makes sense.

rohitjalan · 2013-09-22T10:34:09Z

Consider the sorted data below:
120
334
450
496
553
675
844
994

upper_90 : it is nothing but max value of 90th Pecentile. 90th Percentile is 889 and max value of 90th percentile according to data is 844. So upper_90 is 844.

http://blog.pkhamre.com/2012/07/24/understanding-statsd-and-graphite/

Fixes statsd#157

Guanpeng520 · 2022-02-07T07:15:05Z

Excuse me, what does std stand for?

joshdevins · 2022-02-07T08:51:59Z

@Guanpeng520 std refers to the standard deviation

mrtazz closed this as completed Oct 20, 2012

matschaffer added a commit to matschaffer/statsd that referenced this issue Nov 24, 2013

Added additional detail for timing percentiles

8b5538a

Fixes statsd#157

matschaffer mentioned this issue Nov 24, 2013

Added additional detail for timing percentiles #367

Merged

bitglue mentioned this issue Mar 26, 2015

Use "percentile" as it is defined in statistics #499

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mean_90 and upper_90 unclear #157

mean_90 and upper_90 unclear #157

joshdevins commented Sep 27, 2012

mrtazz commented Sep 28, 2012

mrtazz commented Oct 20, 2012

dyross commented Dec 12, 2012

seantalts commented Dec 18, 2012

seantalts commented Dec 18, 2012

dyross commented Dec 18, 2012

mrtazz commented Dec 18, 2012

parkan commented Feb 11, 2013

ageron commented Mar 21, 2013

parkan commented Mar 22, 2013

rohitjalan commented Sep 22, 2013

Guanpeng520 commented Feb 7, 2022

joshdevins commented Feb 7, 2022

mean_90 and upper_90 unclear #157

mean_90 and upper_90 unclear #157

Comments

joshdevins commented Sep 27, 2012

mrtazz commented Sep 28, 2012

mrtazz commented Oct 20, 2012

dyross commented Dec 12, 2012

seantalts commented Dec 18, 2012

seantalts commented Dec 18, 2012

dyross commented Dec 18, 2012

mrtazz commented Dec 18, 2012

parkan commented Feb 11, 2013

ageron commented Mar 21, 2013

parkan commented Mar 22, 2013

rohitjalan commented Sep 22, 2013

Guanpeng520 commented Feb 7, 2022

joshdevins commented Feb 7, 2022