Weighted var #55

nilgoyette · 2019-09-30T20:26:47Z

Here's the var/std version.

I moved all "summary statistic" tests. Sadly github shows them as a deleted and a new file. Sorry, I should have used git mv. Ctrl-f weighted_var and weighted_std to know what I actually changed.
I'm not too sure about ddof. I know that 0.0 and 1.0 work, but I don't actually know what others ddof do. ndarray's definition of ddof is different, so I'm probably wrong :)
How can A::from_usize(0) fails? An unwrap would be safe, no?
I picked the first 1-pass+weights algorithm that I found. I don't know if there are better or faster algorithm around, or if it accepts other ddof than 0.0 and 1.0. All I know is that it does give the same results as the 2-pass algorithm.

LukeMathWalker · 2019-10-03T08:35:35Z

Thanks for working on this!
I am bit busy at the moment, I'll try get back to this during the weekend (hopefully with some answers to the points you have raised).

src/summary_statistics/means.rs

LukeMathWalker · 2019-10-06T10:17:30Z

I moved all "summary statistic" tests. Sadly github shows them as a deleted and a new file. Sorry, I should have used git mv. Ctrl-f weighted_var and weighted_std to know what I actually changed.

No worries, not a big deal.

I'm not too sure about ddof. I know that 0.0 and 1.0 work, but I don't actually know what others ddof do. ndarray's definition of ddof is different, so I'm probably wrong :)

I have yet to find a use for ddof outside of 0 and 1. In ndarray we have been slightly more permissive (letting it range from >0 to <n), but unless we have a concrete example I am actually ok to constrain it to [0,1].

How can A::from_usize(0) fails? An unwrap would be safe, no?

Well, that's exactly what we are doing 😁
We are just using expect instead of unwrap to provide a better error message in the unlikely (but still possible) event of a panic. We should use expect also when doing A::from_usize(1) and we should add this panic condition to documentation of the various new methods.

I picked the first 1-pass+weights algorithm that I found. I don't know if there are better or faster algorithm around, or if it accepts other ddof than 0.0 and 1.0. All I know is that it does give the same results as the 2-pass algorithm.

To notice a difference you need to evaluate it on arrays with a lot of elements, even better if they lead to somewhat badly conditioned operations. The first issue there is the summing algorithm we are using, the naive one. Using pairwise summation would greatly improve numerical stability, but there are pros and cons at the moment, considering that we do not have specialization.

Overall it looks good to me 👍
I have left a couple of comments. Could you add some benchmarks for these new methods?
I am curious to see if we can make them faster using azip instead of zipping together using .iter. There are also some opportunities to use mul_add that we should probably leverage to make it faster and more precise.

nilgoyette · 2019-10-07T17:22:08Z

I added a benchmark (azip! was slower) but I'm not sure about using mul_add because criterion tells me that it's slower. It's more precise though (from 1e-10 to 1e-11). I'll add it if you want.

LukeMathWalker · 2019-10-07T18:46:36Z

That's interesting! Could you post criterion's plot? I'll try to run them later on my machine as well.

nilgoyette · 2019-10-21T13:09:44Z

I replace

mean += (w / weight_sum) * x_minus_mean;
s += w * x_minus_mean * (x - mean);

with

mean = (w / weight_sum).mul_add(x_minus_mean, mean);
s = (w * x_minus_mean).mul_add(x - mean, s);

cargo bench gives something like

   /10 Change within noise threshold.
  /100 +2.6839%
 /1000 +3.0680%
/10000 +1.9234%

LukeMathWalker · 2019-10-26T09:39:38Z

The performance impact looks even more severe on my machine:

I'd say we can go without mul_add for now and then revise if we find out we need to squeeze more precision.

LukeMathWalker · 2019-10-26T09:46:30Z

Same story for Zip (quite interesting actually):

I'd say we are good to go 💪
Thanks for working on this!

nilgoyette added 5 commits September 26, 2019 11:26

Move summary statistic tests outside

6797d7b

Add weighted variance and standard deviation

8660232

Add tests

8e1e7b9

Add axis versions

a0d4212

Add tests for axis versions

4ca7558

LukeMathWalker reviewed Oct 6, 2019

View reviewed changes

src/summary_statistics/means.rs Outdated Show resolved Hide resolved

nilgoyette added 3 commits October 7, 2019 11:27

ddof expect and doc

5629d11

Fmt

6254b28

Add benches

092cfa6

LukeMathWalker merged commit d9483ea into rust-ndarray:master Oct 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weighted var #55

Weighted var #55

nilgoyette commented Sep 30, 2019

LukeMathWalker commented Oct 3, 2019

LukeMathWalker commented Oct 6, 2019

nilgoyette commented Oct 7, 2019

LukeMathWalker commented Oct 7, 2019 •

edited

Loading

nilgoyette commented Oct 21, 2019

LukeMathWalker commented Oct 26, 2019

LukeMathWalker commented Oct 26, 2019

Weighted var #55

Weighted var #55

Conversation

nilgoyette commented Sep 30, 2019

LukeMathWalker commented Oct 3, 2019

LukeMathWalker commented Oct 6, 2019

nilgoyette commented Oct 7, 2019

LukeMathWalker commented Oct 7, 2019 • edited Loading

nilgoyette commented Oct 21, 2019

LukeMathWalker commented Oct 26, 2019

LukeMathWalker commented Oct 26, 2019

LukeMathWalker commented Oct 7, 2019 •

edited

Loading