-
-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: update the pandas.DataFrame.cummax docstring #20336
Changes from 5 commits
5ccedc2
04f70dd
aec6084
4acf753
1214c93
a88e95a
fe94dad
f73b52f
33e5337
15b38dd
3c30d18
0cb3168
9d46623
5d502cb
e1e190f
aa34ea0
94fc1b3
657feac
77789a8
463eef7
b03c32a
9b05313
1147a0d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8055,17 +8055,17 @@ def compound(self, axis=None, skipna=None, level=None): | |
cls.cummin = _make_cum_function( | ||
cls, 'cummin', name, name2, axis_descr, "cumulative minimum", | ||
lambda y, axis: np.minimum.accumulate(y, axis), "min", | ||
np.inf, np.nan) | ||
np.inf, np.nan, '') | ||
cls.cumsum = _make_cum_function( | ||
cls, 'cumsum', name, name2, axis_descr, "cumulative sum", | ||
lambda y, axis: y.cumsum(axis), "sum", 0., np.nan) | ||
lambda y, axis: y.cumsum(axis), "sum", 0., np.nan, '') | ||
cls.cumprod = _make_cum_function( | ||
cls, 'cumprod', name, name2, axis_descr, "cumulative product", | ||
lambda y, axis: y.cumprod(axis), "prod", 1., np.nan) | ||
lambda y, axis: y.cumprod(axis), "prod", 1., np.nan, '') | ||
cls.cummax = _make_cum_function( | ||
cls, 'cummax', name, name2, axis_descr, "cumulative max", | ||
cls, 'cummax', name, name2, axis_descr, "cumulative maximum", | ||
lambda y, axis: np.maximum.accumulate(y, axis), "max", | ||
-np.inf, np.nan) | ||
-np.inf, np.nan, _cummax_examples) | ||
|
||
cls.sum = _make_min_count_stat_function( | ||
cls, 'sum', name, name2, axis_descr, | ||
|
@@ -8327,24 +8327,95 @@ def _doc_parms(cls): | |
""" | ||
|
||
_cnum_doc = """ | ||
Return %(desc)s over a DataFrame or Series axis. | ||
|
||
Returns a DataFrame or Series of the same size containing the %(desc)s. | ||
|
||
Parameters | ||
---------- | ||
axis : %(axis_descr)s | ||
skipna : boolean, default True | ||
Exclude NA/null values. If an entire row/column is NA, the result | ||
will be NA | ||
will be NA. | ||
*args : any, default None | ||
**kwargs : any, default None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you have |
||
Additional keywords have no effect but might be accepted for | ||
compatibility with NumPy. | ||
|
||
Returns | ||
------- | ||
%(outname)s : %(name1)s\n | ||
|
||
|
||
%(examples)s | ||
See also | ||
-------- | ||
pandas.core.window.Expanding.%(accum_func_name)s : Similar functionality | ||
but ignores ``NaN`` values. | ||
pandas.Series.%(outname)s : Return %(desc)s over Series axis. | ||
pandas.DataFrame.cummax : Return cumulative maximum over DataFrame axis. | ||
pandas.DataFrame.cummin : Return cumulative minimum over DataFrame axis. | ||
pandas.DataFrame.cumsum : Return cumulative sum over DataFrame axis. | ||
pandas.DataFrame.cumprod : Return cumulative product over DataFrame axis. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you get rid of the |
||
""" | ||
|
||
_cummax_examples = """\ | ||
Examples | ||
-------- | ||
**DataFrame** | ||
|
||
Create a DataFrame: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We probably can get rid of this, I'm sure users will find out :) |
||
|
||
>>> df = pd.DataFrame([[9, 7, 9, 7], | ||
... [7, 5, 2, 7], | ||
... [3, 5, 2, 2], | ||
... [8, 0, 9, 0]], | ||
... columns=list('ABCD')) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd use a much smaller dataset, probably 2 columns and 3 or 4 rows. Smaller numbers would make it easier for users to see what's being added, specially if we reuse same dataframe for Also, you can add a |
||
>>> df | ||
A B C D | ||
0 9 7 9 7 | ||
1 7 5 2 7 | ||
2 3 5 2 2 | ||
3 8 0 9 0 | ||
|
||
axis=None : Iterates over rows and finds the maximum value in each column. | ||
If value is larger than the previous maximum, updates it: | ||
|
||
>>> df.cummax(axis=None) | ||
A B C D | ||
0 9 7 9 7 | ||
1 9 7 9 7 | ||
2 9 7 9 7 | ||
3 9 7 9 7 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this docstring is reused, and we want to keep it this way, I think we should add examples for all methods. |
||
|
||
axis=1 : Iterates over columns and finds the maximum value in each row. | ||
If value is larger than the previous maximum, updates it: | ||
|
||
>>> df.cummax(axis=1) | ||
A B C D | ||
0 9 9 9 9 | ||
1 7 7 7 7 | ||
2 3 5 5 5 | ||
3 8 8 9 9 | ||
|
||
**Series** | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As with all the functions this is getting very long, I'd probably avoid having examples for If you keep them, personally I'd have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we decide to have separate string examples for each method, we can keep the examples for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
+ 1 Another suggestion would be to start with Series to just illustrate the concept of "cumulative max", as this will make the examples a little bit easier, and show the effect of NaNs. And then show DataFrame, saying that by default the same happens for each column of the DataFrame, and optionally use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the suggestion. I agree, it is definitely easier to see what is going on with NaNs if we use a Series example instead of DataFrame. I will change it this way. |
||
Create a Series: | ||
|
||
>>> s = pd.Series([5,0,-5,10,-10]) | ||
>>> s | ||
0 5 | ||
1 0 | ||
2 -5 | ||
3 10 | ||
4 -10 | ||
dtype: int64 | ||
|
||
>>> s.cummax() | ||
0 5 | ||
1 5 | ||
2 5 | ||
3 10 | ||
4 10 | ||
dtype: int64 | ||
""" | ||
|
||
_any_see_also = """\ | ||
|
@@ -8541,11 +8612,11 @@ def stat_func(self, axis=None, skipna=None, level=None, ddof=1, | |
|
||
|
||
def _make_cum_function(cls, name, name1, name2, axis_descr, desc, | ||
accum_func, accum_func_name, mask_a, mask_b): | ||
accum_func, accum_func_name, mask_a, mask_b, examples): | ||
@Substitution(outname=name, desc=desc, name1=name1, name2=name2, | ||
axis_descr=axis_descr, accum_func_name=accum_func_name) | ||
@Appender("Return {0} over requested axis.".format(desc) + | ||
_cnum_doc) | ||
axis_descr=axis_descr, accum_func_name=accum_func_name, | ||
examples=examples) | ||
@Appender(_cnum_doc) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's all right like this, but may be it'd be simpler to leave this as it was, and have the examples in Another option would be to have a different string for each method example, in that case, something similar to this would make more sense. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think having separate string examples for each method makes everything clearer, especially when showing examples for use of The disadvantage is user will only see examples for the method they’re checking, but I think this is ok because we are referencing all methods in the ‘See also’ section, which comes before 'Examples'. In these PRs #20216 and #20217 examples for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I am also in favor of splitting up the examples. |
||
def cum_func(self, axis=None, skipna=True, *args, **kwargs): | ||
skipna = nv.validate_cum_func_with_skipna(skipna, args, kwargs, name) | ||
if axis is None: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure were
axis_descr
is defined, but the format it{0 or 'index', 1 or 'columns'}
if I'm not wrong. You can check recent merge PRs to be sure.