-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Added a min_count keyword to stat funcs #18876
Conversation
pandas/core/nanops.py
Outdated
if len(self.kwargs) > 0: | ||
for k, v in compat.iteritems(self.kwargs): | ||
if k not in kwds: | ||
kwds[k] = v | ||
try: | ||
if values.size == 0: | ||
|
||
if values.size < min_count: | ||
# we either return np.nan or pd.NaT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I moved this logic to _na_for_min_count
but forgot to call it here.
5630d9d passed all the CI tests, so that's good. I just pushed another commit that fixes a few things, but is still backwards compatible with 0.21.x. I'll push my breaking changes once I get things passing locally. |
I think we should for now only add |
Not much. I mainly did it for all of them since it was easier, and would possibly be useful for other reductions (haven't thought it through). I'll push a commit that does it just for sum and prod. |
For other reductions like min, max, std, .. we already return NaN for empty or all-NA series, so not sure what |
Pushed another backwards compatible commit, including docstrings, written from the point of view of a hypothetical 0.21.2 release. I haven't through through whether it'd be worth backporting the non-breaking changes (adding Signature: pd.Series.sum(self, axis=None, skipna=None, level=None, numeric_only=None, min_count=1, **kwargs)
Docstring:
Return the sum of the values for the requested axis
Parameters
----------
axis : {index (0)}
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA or empty, the result
will be NA
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a scalar
numeric_only : boolean, default None
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data. Not implemented for Series.
min_count : int, default 1
The required number of valid values to perform the operation. If fewer than
``min_count`` non-NA values are present the result will be NA.
.. versionadded :: 0.21.2
Added with the default being 1. This means the sum or product
of an all-NA or empty series is ``NaN``.
Returns
-------
sum : scalar or Series (if level specified)
Examples
--------
By default, the sum of an empty series is ``NaN``.
>>> pd.Series([]).sum() # min_count=1 is the default
nan
This can be controlled with the ``min_count`` parameter. For example, if
you'd like the sum of an empty series to be 0, pass ``min_count=0``.
>>> pd.Series([]).sum(min_count=0)
0.0
Thanks to the ``skipna`` parameter, ``min_count`` handles all-NA and
empty series identically.
>>> pd.Series([np.nan]).sum()
nan
>>> pd.Series([np.nan]).sum(min_count=0)
0.0 |
For reviewing, do people have a preference for a.) 2 PRs: the first adding a backwards compatible |
pandas/core/nanops.py
Outdated
result = np.empty(result_shape, dtype=values.dtype) | ||
result.fill(fill_value) | ||
return result | ||
if values.size < min_count: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
values could be 2d here, so in theory should handle this on a per-column basis (but not 100% sure if it is ever practically 2d).
pandas/core/nanops.py
Outdated
@@ -304,15 +308,18 @@ def nanall(values, axis=None, skipna=True): | |||
|
|||
@disallow('M8') | |||
@bottleneck_switch() | |||
def nansum(values, axis=None, skipna=True): | |||
def nansum(values, axis=None, skipna=True, min_count=1): | |||
if len(values) < min_count: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can have a ndim > 1 here
@@ -1759,6 +1759,48 @@ def test_value_counts_categorical_not_ordered(self): | |||
tm.assert_series_equal(idx.value_counts(normalize=True), exp) | |||
|
|||
|
|||
class TestMinCount(): | |||
@pytest.mark.parametrize("use_bottleneck", [True, False]) | |||
@pytest.mark.parametrize("method", [("sum", 0), ("prod", 1)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so for sum we never use bottleneck, I think for prod we DO use bottleneck. So maybe should just fix that here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In _bn_ok_dtype
# GH 9422
# further we also want to preserve NaN when all elements
# are NaN, unlinke bottleneck/numpy which consider this
# to be 0
if name in ['nansum', 'nanprod']:
return False
so we don't use bottleneck. And apparently bottleneck doesn't have a nanprod, at least in my version installed locally.
@pytest.mark.parametrize("method", [("sum", 0), ("prod", 1)]) | ||
def test_min_count_empty(self, method, use_bottleneck): | ||
method, unit = method | ||
s = pd.Series() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of these tests should be at the very top, right after (or maybe) replacing
class TestSeriesAnalytics(TestData):
@pytest.mark.parametrize("use_bottleneck", [True, False])
@pytest.mark.parametrize("method", ["sum", "prod"])
def test_empty(self, method, use_bottleneck):
with pd.option_context("use_bottleneck", use_bottleneck):
# GH 9422
# treat all missing as NaN
those tests need to be changed, no?
a) maybe that is better, then we can see both passing |
I think this is ready for review (cc @shoyer). I haven't written a whatsnew since I'll be doing that in the followup PR. Just to verify, the current plan is to release 0.22.0 as 0.21.1 + this PR and my follow PR with the breaking changes? |
@@ -32,36 +32,64 @@ class TestSeriesAnalytics(TestData): | |||
@pytest.mark.parametrize("use_bottleneck", [True, False]) | |||
@pytest.mark.parametrize("method", ["sum", "prod"]) | |||
def test_empty(self, method, use_bottleneck): | |||
if method == "sum": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should just pass in unit with method
@pytest.mark.parametrize("method, unit", [("sum", 0.0), ("prod", 1.0)]))
|
||
with pd.option_context("use_bottleneck", use_bottleneck): | ||
# GH 9422 | ||
# treat all missing as NaN | ||
s = Series([]) | ||
result = getattr(s, method)() | ||
assert isna(result) | ||
result = getattr(s, method)(min_count=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make explcity passing min_count=1
in tests as well, and then add another entry w/o passing min_count (e.g. the default)
FYI, I have a In [1]: import pandas as pd
In [2]: df = pd.DataFrame({"A": pd.Categorical(['a', 'a'], categories=['a', 'b']), "B": [1, 1]})
In [3]: df.groupby("A").B.sum()
Out[3]:
A
a 2
b 0
Name: B, dtype: int64
In [4]: df.groupby("A").B.sum(min_count=1)
Out[4]:
A
a 2.0
b NaN
Name: B, dtype: float64 I think I'll include that in this PR as well (with the default being 1). |
@@ -88,7 +89,7 @@ def group_add_{{name}}(ndarray[{{dest_type2}}, ndim=2] out, | |||
|
|||
for i in range(ncounts): | |||
for j in range(K): | |||
if nobs[i, j] == 0: | |||
if nobs[i, j] < min_count: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work for min_count==0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sumx
starts out as zeros
, so we just have to avoid setting it to NaN. Same for prod
, but with ones.
pandas/core/groupby.py
Outdated
@@ -2322,9 +2326,16 @@ def _aggregate(self, result, counts, values, comp_ids, agg_func, | |||
for i, chunk in enumerate(values.transpose(2, 0, 1)): | |||
|
|||
chunk = chunk.squeeze() | |||
agg_func(result[:, :, i], counts, chunk, comp_ids) | |||
if min_count is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would always pass it to cython as an int. it can be -1 if its None
somehow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly did this to avoid having to add an unused min_count
keyword to all the group_*
methods. If I added min_count
to each of those, then I'd also need to raise if the non-default value is passed. I'll add a comment here that it's only expected for count
and prod
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just add it and assert == -1. these are all internal routines anyhow. an if is very ugly and error prone here. better to make them have the same uniform signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok that sounds easy enough.
pandas/core/groupby.py
Outdated
agg_func(result, counts, values, comp_ids) | ||
if min_count is None: | ||
agg_func(result, counts, values, comp_ids) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
pandas/_libs/groupby_helper.pxi.in
Outdated
@@ -332,7 +343,8 @@ def get_dispatch(dtypes): | |||
def group_last_{{name}}(ndarray[{{dest_type2}}, ndim=2] out, | |||
ndarray[int64_t] counts, | |||
ndarray[{{c_type}}, ndim=2] values, | |||
ndarray[int64_t] labels): | |||
ndarray[int64_t] labels, | |||
Py_ssize_t min_count): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can set == -1 here (and actually all routines) i think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will do (had to for OHLC). I couldn't remember if cython took a perf hit for checking default values.
@@ -822,7 +822,14 @@ def test_cython_agg_empty_buckets(self): | |||
grps = range(0, 55, 5) | |||
|
|||
for op, targop in ops: | |||
result = df.groupby(pd.cut(df[0], grps))._cython_agg_general(op) | |||
# calling _cython_agg_general directly, instead of via the user API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit was unfortunate, but I think necessary since we're calling _cython_agg_general directly.
I'll need change min_count
to 0 in the followup PR.
Added min_count to In [4]: s.resample('30T').sum() # default min_count=1 for now, will change to 0
Out[4]:
2017-01-01 00:00:00 1.0
2017-01-01 00:30:00 NaN
2017-01-01 01:00:00 1.0
Freq: 30T, dtype: float64
In [5]: s.resample('30T').sum(min_count=0)
Out[5]:
2017-01-01 00:00:00 1
2017-01-01 00:30:00 0
2017-01-01 01:00:00 1
Freq: 30T, dtype: int64 I'm going to see how difficult I believe that Given a widow at least |
pandas/core/groupby.py
Outdated
@Appender(_local_template) | ||
def f(self, **kwargs): | ||
return f_(self, **kwargs) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? not sure u need this
shouldn’t the template just work for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not required, but with it I get the min_count
in the signature.
In [5]: gr.sum?
Signature: gr.sum(min_count=1, **kwargs)
Docstring:
Compute sum of group values
Without the if / else, things work , but then the sig is just **kwargs
. I have a slight preference for including it the signature, but will revert it if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would revert it for now. we don't include this for anything else (IOW any parameters now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC there is an issue about this. its needs a comprehensive soln.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I forgot about numeric_only, etc. OK.
@@ -822,7 +822,14 @@ def test_cython_agg_empty_buckets(self): | |||
grps = range(0, 55, 5) | |||
|
|||
for op, targop in ops: | |||
result = df.groupby(pd.cut(df[0], grps))._cython_agg_general(op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should paramterize this
|
Ah, is In [2]: s = pd.Series(1, index=pd.date_range("2017", periods=6, freq="D"))
...: t = s.copy()
...: t.iloc[[3, 4]] = np.nan
...:
In [3]: s
Out[3]:
2017-01-01 1
2017-01-02 1
2017-01-03 1
2017-01-04 1
2017-01-05 1
2017-01-06 1
Freq: D, dtype: int64
In [4]: t
Out[4]:
2017-01-01 1.0
2017-01-02 1.0
2017-01-03 1.0
2017-01-04 NaN
2017-01-05 NaN
2017-01-06 1.0
Freq: D, dtype: float64
In [5]: s.rolling(2).sum()
Out[5]:
2017-01-01 NaN
2017-01-02 2.0
2017-01-03 2.0
2017-01-04 2.0
2017-01-05 2.0
2017-01-06 2.0
Freq: D, dtype: float64
In [6]: t.rolling(2).sum()
Out[6]:
2017-01-01 NaN # (1,)
2017-01-02 2.0 # (1, 1)
2017-01-03 2.0 # (1, 1)
2017-01-04 NaN # (1, nan)
2017-01-05 NaN # (nan, nan)
2017-01-06 NaN # (nan, 1)
Freq: D, dtype: float64 So will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
Do we need to add some tests with higher values for min_count
. This 'works', so either should test it or disallow it.
As also commented inline somewhere, the sum(level=)
does not seem to be implemented yet:
In [15]: s = pd.Series([1, np.nan, np.nan, np.nan], pd.MultiIndex.from_product([[1, 2], ['a', 'b']]))
In [16]: s
Out[16]:
1 a 1.0
b NaN
2 a NaN
b NaN
dtype: float64
In [17]: s.sum(level=0)
Out[17]:
1 1.0
2 0.0
dtype: float64
In [18]: s.sum(level=0, min_count=0)
Out[18]:
1 1.0
2 0.0
dtype: float64
In [19]: s.sum(level=0, min_count=1)
Out[19]:
1 1.0
2 0.0
dtype: float64
(and it also changed compared to 0.21.0, there it gave 1.0, NaN
in this case).
Groupby seems to have a wrong default (for now to be backwards compatible), but can't directly spot the error in the code:
In [15]: s = pd.Series([1, np.nan, np.nan, np.nan], pd.MultiIndex.from_product([[1, 2], ['a', 'b']]))
In [21]: s.groupby(level=0).sum()
Out[21]:
1 1.0
2 0.0
dtype: float64
In [22]: s.groupby(level=0).sum(min_count=0)
Out[22]:
1 1.0
2 0.0
dtype: float64
In [23]: s.groupby(level=0).sum(min_count=1)
Out[23]:
1 1.0
2 NaN
dtype: float64
(I get the same when grouping by array of values or grouping by column in dataframe)
The required number of valid values to perform the operation. If fewer than | ||
``min_count`` non-NA values are present the result will be NA. | ||
|
||
.. versionadded :: 0.21.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably it will become 0.22 ? (but can change later)
pandas/core/generic.py
Outdated
axis = self._stat_axis_number | ||
if level is not None: | ||
return self._agg_by_level(name, axis=axis, level=level, | ||
skipna=skipna) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this one also need to handle min_count
?
if axis is not None and getattr(result, 'ndim', False): | ||
null_mask = (mask.shape[axis] - mask.sum(axis)) == 0 | ||
null_mask = (mask.shape[axis] - mask.sum(axis) - min_count) < 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just substract here something if null_mask
is later on used for indexing into result
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean here. null_mask
will be a boolean array of which values in the output should be null because they have too few valid values.
I just pushed a commit fixing that. In [4]: s.groupby(level=0).sum()
Out[4]:
1 1.0
2 NaN
dtype: float64
This seems to be fixed (presumably In [10]: s.sum(level=0)
Out[10]:
1 1.0
2 NaN
dtype: float64 |
Groupby is indeed fixed!
The default is indeed fixed, but |
Codecov Report
@@ Coverage Diff @@
## master #18876 +/- ##
==========================================
+ Coverage 91.57% 91.65% +0.07%
==========================================
Files 150 154 +4
Lines 48913 51394 +2481
==========================================
+ Hits 44794 47107 +2313
- Misses 4119 4287 +168
Continue to review full report at Codecov.
|
Ok, we should be good on series/frame, groupby, and resample. Does anyone have thoughts on expanding / window? Does
If it's 1, then the current behavior (on master) is buggy: In [6]: s = pd.Series(1, index=pd.date_range("2017", periods=6, freq="D"))
...: t = s.copy()
...: t.iloc[[3, 4]] = np.nan
...:
In [7]: t.rolling(2, min_periods=2).apply(lambda x: np.nansum(x))
Out[7]:
2017-01-01 NaN # (1,)
2017-01-02 2.0 # (1, 1)
2017-01-03 2.0 # (1, 1)
2017-01-04 NaN # (1, nan)
2017-01-05 NaN # (nan, nan)
2017-01-06 NaN # (nan, 1)
Freq: D, dtype: float64 |
why would you think 1)? it always has meant non-NA |
I agree it currently means non-NA rows (although that is not explicit in the docstring). I am thinking if there are possible reasons we would want to change this, and to have both I personally wouldn't rename |
Agreed with @jorisvandenbossche. There may be a bit of a slight issue with the current implementation of On my other branch: In [9]: x = pd.Series([np.nan])
In [10]: x.rolling(1, min_periods=0).sum()
Out[10]:
0 NaN
dtype: float64
In [11]: x.rolling(1, min_periods=0).apply(lambda v: pd.Series(v).sum())
Out[11]:
0 0.0
dtype: float64 I assume we want those to be consistent? |
Here's master: In [1]: import pandas as pd; import numpy as np;
In [2]: x = pd.Series([np.nan])
In [3]: x.rolling(1, min_periods=0).sum()
Out[3]:
0 NaN
dtype: float64
In [4]: x.rolling(1, min_periods=1).sum()
Out[4]:
0 NaN
dtype: float64 Here's my branch In [1]: import pandas as pd; import numpy as np;
In [2]: x = pd.Series([np.nan])
In [3]: x.rolling(1, min_periods=0).sum()
Out[3]:
0 0.0
dtype: float64
In [4]: x.rolling(1, min_periods=1).sum()
Out[4]:
0 NaN
dtype: float64
In [5]: x.rolling(1).sum()
Out[5]:
0 NaN
dtype: float64 The fix in |
To be clear: so for now no change, which is keep min_periods as it was, and for now no min_count added to rolling sum? |
If we interpret min_periods as the minimum number of non-na periods, then yes. There won’t be a need for rolling.sum to take min_count.
…________________________________
From: Joris Van den Bossche <notifications@github.com>
Sent: Saturday, December 23, 2017 8:05:55 AM
To: pandas-dev/pandas
Cc: Tom Augspurger; Author
Subject: Re: [pandas-dev/pandas] ENH: Added a min_count keyword to stat funcs (#18876)
The fix in window.pyx is small. I don't think it's worth including here, since the behavior on master is correct for the 0.21 semantics of "sum of all-missing is missing".
To be clear: so for now no change, which is keep min_periods as it was, and for now no min_count added to rolling sum?
(which is fine for me)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#18876 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABQHIr1ucILhc6rd72DpuGQSd7Yn-ObQks5tDQjDgaJpZM4RIuoM>.
|
0ed1a13
to
b9decb6
Compare
@TomAugspurger why don't you rebase on master. can merge this on green (as I think you are taking care of any remaining comments in other PR). (and then backport, maybe create the new branch as well)? |
The current default is 1, reproducing the behavior of pandas 0.21. The current test suite should pass. Currently, only nansum and nanprod actually do anything with `min_count`. It will not be hard to adjust other nan* methods use it if we want. This was just simplest for now. Additional tests for the new behavior have been added.
b9decb6
to
0e954f8
Compare
Yeah, sounds good, this one should be OK. I'll create a 0.22.x branch that starts at 0.21.x and backport this + the other PR once that's finished. |
@TomAugspurger I also think that need to fix the version number? (or I suppose once you tag 0.22, then can fix that)? |
Yeah, how should we do that? I suppose we could tag 0.23.dev0 on master any time right? So then it stays ahead of 0.22.x. |
yeah I think can change the tag on master. merge away when ready. |
thanks @TomAugspurger |
The current default is 1, reproducing the behavior of pandas 0.21. The current test suite should pass. Currently, only nansum and nanprod actually do anything with `min_count`. It will not be hard to adjust other nan* methods use it if we want. This was just simplest for now. Additional tests for the new behavior have been added. (cherry picked from commit dbec3c9)
The current default is 1, reproducing the behavior of pandas 0.21. The current test suite should pass. Currently, only nansum and nanprod actually do anything with `min_count`. It will not be hard to adjust other nan* methods use it if we want. This was just simplest for now. Additional tests for the new behavior have been added. (cherry picked from commit dbec3c9)
Version 0.22.0 * tag 'v0.22.0': (777 commits) RLS: v0.22.0 DOC: Fix min_count docstring (pandas-dev#19005) DOC: More 0.22.0 updates (pandas-dev#19002) TST: Remove pow test in expressions COMPAT: Avoid td.skip decorator DOC: 0.22.0 release docs (pandas-dev#18983) DOC: Include 0.22.0 whatsnew Breaking changes for sum / prod of empty / all-NA (pandas-dev#18921) ENH: Added a min_count keyword to stat funcs (pandas-dev#18876) RLS: v0.21.1 DOC: Add date to whatsnew (pandas-dev#18740) DOC: Include 0.21.1 whatsnew DOC: Update relase notes (pandas-dev#18739) CFG: Ignore W503 DOC: fix options table (pandas-dev#18730) ENH: support non default indexes in writing to Parquet (pandas-dev#18629) BUG: Fix to_latex with longtable (pandas-dev#17959) (pandas-dev#17960) Parquet: Add error message for no engine found (pandas-dev#18717) BUG: Categorical data fails to load from hdf when all columns are NaN (pandas-dev#18652) DOC: clean-up whatsnew file for 0.21.1 (pandas-dev#18690) ...
* releases: (777 commits) RLS: v0.22.0 DOC: Fix min_count docstring (pandas-dev#19005) DOC: More 0.22.0 updates (pandas-dev#19002) TST: Remove pow test in expressions COMPAT: Avoid td.skip decorator DOC: 0.22.0 release docs (pandas-dev#18983) DOC: Include 0.22.0 whatsnew Breaking changes for sum / prod of empty / all-NA (pandas-dev#18921) ENH: Added a min_count keyword to stat funcs (pandas-dev#18876) RLS: v0.21.1 DOC: Add date to whatsnew (pandas-dev#18740) DOC: Include 0.21.1 whatsnew DOC: Update relase notes (pandas-dev#18739) CFG: Ignore W503 DOC: fix options table (pandas-dev#18730) ENH: support non default indexes in writing to Parquet (pandas-dev#18629) BUG: Fix to_latex with longtable (pandas-dev#17959) (pandas-dev#17960) Parquet: Add error message for no engine found (pandas-dev#18717) BUG: Categorical data fails to load from hdf when all columns are NaN (pandas-dev#18652) DOC: clean-up whatsnew file for 0.21.1 (pandas-dev#18690) ...
The current default is 1, reproducing the behavior of pandas 0.21. The current
test suite should pass. I'll add additional commits here changing the default to be 0.
Currently, only nansum and nanprod actually do anything with
min_count
. Itwill not be hard to adjust other nan* methods use it if we want. This was just
simplest for now.
Additional tests for the new behavior have been added.
closes #18678