-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Features/514 var std numpylike #515
Conversation
Codecov Report
@@ Coverage Diff @@
## master #515 +/- ##
==========================================
+ Coverage 96.60% 96.61% +0.01%
==========================================
Files 65 65
Lines 13991 14008 +17
==========================================
+ Hits 13516 13534 +18
+ Misses 475 474 -1
Continue to review full report at Codecov.
|
i disagree with making a biased estimator the default of any statistical measure, regardless of how numpy does it. |
I have a lot of understanding and in my mind I've been calling this the "doof" argument all along. Not sure it's worth jeopardizing our |
to further stir the pot, pytorch's default is an unbiased estimator |
and to extend that, this would need to be changed in the var function as well to calculate the correct estimator. |
I think I've addressed all of this, but let me know once you had a look. |
i do not think that we are jeopardizing this as the code will run clean with the bessel correction as default. I think that we should be using unbiased estimators statistics whenever possible. there are already differences which make it so that |
The code doesn't run clean with the Bessel correction in edge cases, where for example ht.var() of a 1-element tensor returns nan instead of 0. I'm not here to make a statement about best practices in statistics. I think deviating from numpy and setting
Let's agree to disagree as far as I'm concerned, and maybe the others should chip in as well at this point. |
this does a better job of making the case as to why it should be 0 and not 1. they didnt chose it arbitrarily. Can you add something like this do the docs? this is pulled directly from numpy The mean is normally calculated as |
Well, I never thought that they did it arbitrarily! By the way thanks for giving me the chance to refresh statistics 101... |
CHANGELOG.md
Outdated
@@ -6,6 +6,7 @@ | |||
- [#483](https://github.com/helmholtz-analytics/heat/pull/483) Bugfix: DNDarray.cpu() changes heat device to cpu | |||
- Update documentation theme to "Read the Docs" | |||
- [#499](https://github.com/helmholtz-analytics/heat/pull/499) Bugfix: MPI datatype mapping: `torch.int16` now maps to `MPI.SHORT` instead of `MPI.SHORT_INT` | |||
- [#515] (https://github.com/helmholtz-analytics/heat/pull/515) ht.var(), ht.std() numpy-compliant (ddof=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you be more specific? default is now max likelyhood ... instead of unbiased estimator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK done, both here and in the docs
I was not sure why they chose it before. i still dont agree with it but if it has a logical reason behind it which i agree with then i cannot continue to oppose it simply because i have an opinion which leans the other way. (also this is not a major change, it was mostly a principle thing for me) |
Description
Working on #490 led me to opening several cans of worms and resulted in a rather messy PR (#508). I'm going to withdraw #508 and submit the main changes separately.
With this PR I'm making ht.var() and ht.std() numpy-compliant.
Issue/s resolved: #514 and partly #490 (because the numpy default is biased variance (no Bessel correction), ht.var() of a tensor of 1 element now returns 0 instead of nan).
Changes proposed:
bessel
has been replaced byddof
, can be 0 or 1;ddof=0
like for numpy (equivalent tounbiased=False
for torch andbessel=False
for the current heat implementation).bessel
(bool) can still be passed so scripts shouldn't break, but remember the default is now no Bessel correctionType of change
Remove irrelevant options:
Due Diligence
Does this change modify the behaviour of other functions? If so, which?
functions that call var() or std() relying on unbiased variance will yield the wrong result (default is now biased variance, ddof=0).