DOC: update the pandas.core.resample.Resampler.fillna docstring #20379

prcastro · 2018-03-16T14:38:10Z

PR title is "DOC: update the pandas.core.resample.Resampler.fillna docstring"
The validation script passes: scripts/validate_docstrings.py pandas.core.resample.Resampler.fillna
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single pandas.core.resample.Resampler.fillna
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:


################################################################################
############## Docstring (pandas.core.resample.Resampler.fillna)  ##############
################################################################################

Fill the new missing values in the resampled data using different
methods.

In statistics, imputation is the process of replacing missing data with
substituted values [1]_. When resampling data, missing values may
appear (e.g., when the resampling frequency is higher than the original
frequency).

The backward fill ('bfill') will replace NaN values that appeared in
the resampled data with the next value in the original sequence. The
forward fill ('ffill'), on the other hand, will replace NaN values
that appeared in the resampled data with the previous value in the
original sequence. Missing values that existed in the orginal data will
not be modified.

Parameters
----------
method : str, method of resampling ('ffill', 'bfill')
    Method to use for filling holes in resampled data
        * ffill: use previous valid observation to fill gap (forward
          fill).
        * bfill: use next valid observation to fill gap (backward
          fill).
limit : integer, optional
    Limit of how many values to fill.

Returns
-------
Series, DataFrame
    An upsampled Series or DataFrame with backward or forwards filled
    NaN values.

Examples
--------

Resampling a Series:

>>> s = pd.Series([1, 2, 3],
...               index=pd.date_range('20180101', periods=3, freq='h'))
>>> s
2018-01-01 00:00:00    1
2018-01-01 01:00:00    2
2018-01-01 02:00:00    3
Freq: H, dtype: int64

>>> s.resample('30min').fillna("bfill")
2018-01-01 00:00:00    1
2018-01-01 00:30:00    2
2018-01-01 01:00:00    2
2018-01-01 01:30:00    3
2018-01-01 02:00:00    3
Freq: 30T, dtype: int64

>>> s.resample('15min').fillna("bfill", limit=2)
2018-01-01 00:00:00    1.0
2018-01-01 00:15:00    NaN
2018-01-01 00:30:00    2.0
2018-01-01 00:45:00    2.0
2018-01-01 01:00:00    2.0
2018-01-01 01:15:00    NaN
2018-01-01 01:30:00    3.0
2018-01-01 01:45:00    3.0
2018-01-01 02:00:00    3.0
Freq: 15T, dtype: float64

>>> s.resample('30min').fillna("ffill")
2018-01-01 00:00:00    1
2018-01-01 00:30:00    1
2018-01-01 01:00:00    2
2018-01-01 01:30:00    2
2018-01-01 02:00:00    3
Freq: 30T, dtype: int64

Resampling a DataFrame that has missing values:

>>> df = pd.DataFrame({'a': [2, np.nan, 6], 'b': [1, 3, 5]},
...                   index=pd.date_range('20180101', periods=3,
...                                       freq='h'))
>>> df
                       a  b
2018-01-01 00:00:00  2.0  1
2018-01-01 01:00:00  NaN  3
2018-01-01 02:00:00  6.0  5

>>> df.resample('30min').fillna("bfill")
                       a  b
2018-01-01 00:00:00  2.0  1
2018-01-01 00:30:00  NaN  3
2018-01-01 01:00:00  NaN  3
2018-01-01 01:30:00  6.0  5
2018-01-01 02:00:00  6.0  5

>>> df.resample('15min').fillna("bfill", limit=2)
                       a    b
2018-01-01 00:00:00  2.0  1.0
2018-01-01 00:15:00  NaN  NaN
2018-01-01 00:30:00  NaN  3.0
2018-01-01 00:45:00  NaN  3.0
2018-01-01 01:00:00  NaN  3.0
2018-01-01 01:15:00  NaN  NaN
2018-01-01 01:30:00  6.0  5.0
2018-01-01 01:45:00  6.0  5.0
2018-01-01 02:00:00  6.0  5.0

>>> df.resample('30min').fillna("ffill")
                      a b
2018-01-01 00:00:00     2.0     1
2018-01-01 00:30:00     2.0     1
2018-01-01 01:00:00     NaN     3
2018-01-01 01:30:00     NaN     3
2018-01-01 02:00:00     6.0     5

See Also
--------
backfill : Backward fill NaN values in the resampled data.
pad : Forward fill NaN values in the resampled data.
bfill : Alias of backfill.
ffill: Alias of pad.
nearest : Fill NaN values in the resampled data
    with nearest neighbor starting from center.
pandas.Series.fillna : Fill NaN values in the Series using the
    specified method, which can be 'bfill' and 'ffill'.
pandas.DataFrame.fillna : Fill NaN values in the DataFrame using the
    specified method, which can be 'bfill' and 'ffill'.

References
----------
.. [1] https://en.wikipedia.org/wiki/Imputation_(statistics)

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.core.resample.Resampler.fillna" correct. :)

jorisvandenbossche

Thanks for the PR!

Added some inline comments. Futher:

could you add some more explanation between the examples?

jorisvandenbossche · 2018-03-16T15:47:31Z

pandas/core/resample.py

+        forward fill ('ffill'), on the other hand, will replace NaN values
+        that appeared in the resampled data with the previous value in the
+        original sequence. Missing values that existed in the orginal data will
+        not be modified.

        Parameters
        ----------
        method : str, method of resampling ('ffill', 'bfill')


Can you change the type description to method : {'ffill', 'bfill'} ? ("method of resampling" belongs on the next line, and is already there)

jorisvandenbossche · 2018-03-16T15:48:09Z

pandas/core/resample.py


        Parameters
        ----------
        method : str, method of resampling ('ffill', 'bfill')
+            Method to use for filling holes in resampled data
+                * ffill: use previous valid observation to fill gap (forward


No indentation is needed here (compared to the "Method ..." on the line above), but, sphinx needs a blank line between those two lines

jorisvandenbossche · 2018-03-16T15:49:20Z

pandas/core/resample.py

+        2018-01-01 02:00:00    3
+        Freq: H, dtype: int64
+
+        >>> s.resample('30min').fillna("bfill")


I would maybe first show what it does without filling (which is s.resample().asfreq()), of course this is another method, but it will then be easier to see which values have actually been filled by fillna()

TomAugspurger · 2018-03-16T15:46:21Z

pandas/core/resample.py

@@ -624,18 +624,134 @@ def backfill(self, limit=None):

    def fillna(self, method, limit=None):
        """
-        Fill missing values
+        Fill the new missing values in the resampled data using different


This should be a single line. Does

Fill missing values introduced by upsampling.

sound good?

TomAugspurger · 2018-03-16T15:47:14Z

pandas/core/resample.py

+        appear (e.g., when the resampling frequency is higher than the original
+        frequency).
+
+        The backward fill ('bfill') will replace NaN values that appeared in


Aside from the last sentence, this can be folded into the Parameters section.

TomAugspurger · 2018-03-16T15:49:50Z

pandas/core/resample.py

+        forward fill ('ffill'), on the other hand, will replace NaN values
+        that appeared in the resampled data with the previous value in the
+        original sequence. Missing values that existed in the orginal data will
+        not be modified.

        Parameters
        ----------
        method : str, method of resampling ('ffill', 'bfill')


method : {'ffill', 'pad', 'bfill', 'backfill', 'nearst'}

Note that ffilll is an alias for pad and bfill is an alias for backfill.

Can you check that 'nearest' works as expected?

And move the descriptions from above here.

TomAugspurger · 2018-03-16T15:50:53Z

pandas/core/resample.py


        Parameters
        ----------
        method : str, method of resampling ('ffill', 'bfill')
+            Method to use for filling holes in resampled data
+                * ffill: use previous valid observation to fill gap (forward


Quote these, since they're strings.

TomAugspurger · 2018-03-16T15:51:14Z

pandas/core/resample.py

        limit : integer, optional
-            limit of how many values to fill
+            Limit of how many values to fill.


Say consecutive values to fill

TomAugspurger · 2018-03-16T15:51:28Z

pandas/core/resample.py

+
+        Returns
+        -------
+        Series, DataFrame


I think Series or DataFrame, can't recall.

TomAugspurger · 2018-03-16T15:52:36Z

pandas/core/resample.py

+        backfill : Backward fill NaN values in the resampled data.
+        pad : Forward fill NaN values in the resampled data.
+        bfill : Alias of backfill.
+        ffill: Alias of pad.


Can remove these aliases I think, since they go to the same page.

jorisvandenbossche · 2018-03-16T15:52:06Z

pandas/core/resample.py

@@ -624,18 +624,134 @@ def backfill(self, limit=None):

    def fillna(self, method, limit=None):
        """
-        Fill missing values
+        Fill the new missing values in the resampled data using different
+        methods.


Can you try to get this on a single line?

jorisvandenbossche · 2018-03-16T15:53:50Z

pandas/core/resample.py

+        >>> df
+                               a  b
+        2018-01-01 00:00:00  2.0  1
+        2018-01-01 01:00:00  NaN  3


I would add such an example with a missing value above for Series as well (or instead of this example).
I think using a Series will make it easier to understand and easier to focus on that specific behaviour.

In the end, we can limit the number of examples for DataFrame and basically say that for a DataFrame everything works similar as for Series column-by-column

TomAugspurger · 2018-03-16T16:22:03Z

The nice thing about the DataFrame example for `limit` is that you can show side-by-side how only newly-introduced missing values are filled.

…

On Fri, Mar 16, 2018 at 10:54 AM, Joris Van den Bossche < ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pandas/core/resample.py <#20379 (comment)>: > @@ -624,18 +624,134 @@ def backfill(self, limit=None): def fillna(self, method, limit=None): """ - Fill missing values + Fill the new missing values in the resampled data using different + methods. Can you try to get this on a single line? ------------------------------ In pandas/core/resample.py <#20379 (comment)>: > + 2018-01-01 00:00:00 1 + 2018-01-01 00:30:00 1 + 2018-01-01 01:00:00 2 + 2018-01-01 01:30:00 2 + 2018-01-01 02:00:00 3 + Freq: 30T, dtype: int64 + + Resampling a DataFrame that has missing values: + + >>> df = pd.DataFrame({'a': [2, np.nan, 6], 'b': [1, 3, 5]}, + ... index=pd.date_range('20180101', periods=3, + ... freq='h')) + >>> df + a b + 2018-01-01 00:00:00 2.0 1 + 2018-01-01 01:00:00 NaN 3 I would add such an example with a missing value above for Series as well (or instead of this example). I think using a Series will make it easier to understand and easier to focus on that specific behaviour. In the end, we can limit the number of examples for DataFrame and basically say that for a DataFrame everything works similar as for Series column-by-column — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#20379 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIhLxXLjCR0_ZYmYNh9bGaifDMlvCks5te-AzgaJpZM4St5WE> .

codecov · 2018-03-16T17:02:15Z

Codecov Report

Merging #20379 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #20379      +/-   ##
==========================================
- Coverage   91.79%   91.77%   -0.03%     
==========================================
  Files         152      152              
  Lines       49184    49184              
==========================================
- Hits        45150    45138      -12     
- Misses       4034     4046      +12

Flag	Coverage Δ
#multiple	`90.15% <ø> (-0.03%)`	⬇️
#single	`41.83% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/resample.py	`96.43% <ø> (ø)`	⬆️
pandas/plotting/_converter.py	`65.07% <0%> (-1.74%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a58303...39e69ba. Read the comment docs.

prcastro · 2018-03-16T17:03:17Z

Made the requested changes, also adding a little more info between examples.

[ci skip]

TomAugspurger · 2018-03-17T19:49:33Z

Moved the See Also up. Thanks @prcastro .

…ame_describe * upstream/master: (158 commits) Add link to "Craft Minimal Bug Report" blogpost (pandas-dev#20431) BUG: fixed json_normalize for subrecords with NoneTypes (pandas-dev#20030) (pandas-dev#20399) BUG: ExtensionArray.fillna for scalar values (pandas-dev#20412) DOC" update the Pandas core window rolling count docstring" (pandas-dev#20264) DOC: update the pandas.DataFrame.plot.hist docstring (pandas-dev#20155) DOC: Only use ~ in class links to hide prefixes. (pandas-dev#20402) Bug: Allow np.timedelta64 objects to index TimedeltaIndex (pandas-dev#20408) DOC: add disallowing of Series construction of len-1 list with index to whatsnew (pandas-dev#20392) MAINT: Remove weird pd file DOC: update the Index.isin docstring (pandas-dev#20249) BUG: Handle all-NA blocks in concat (pandas-dev#20382) DOC: update the pandas.core.resample.Resampler.fillna docstring (pandas-dev#20379) BUG: Don't raise exceptions splitting a blank string (pandas-dev#20067) DOC: update the pandas.DataFrame.cummax docstring (pandas-dev#20336) DOC: update the pandas.core.window.x.mean docstring (pandas-dev#20265) DOC: update the api.types.is_number docstring (pandas-dev#20196) Fix linter (pandas-dev#20389) DOC: Improved the docstring of pandas.Series.dt.to_pytimedelta (pandas-dev#20142) DOC: update the pandas.Series.dt.is_month_end docstring (pandas-dev#20181) DOC: update the window.Rolling.min docstring (pandas-dev#20263) ...

DOC: update the pandas.core.resample.Resampler.fillna docstring

162873d

jorisvandenbossche added the Docs label Mar 16, 2018

jorisvandenbossche reviewed Mar 16, 2018

View reviewed changes

TomAugspurger reviewed Mar 16, 2018

View reviewed changes

jorisvandenbossche reviewed Mar 16, 2018

View reviewed changes

DOC: make suggested corrections and added more useful examples

7160e0d

Updates [ci skip]

39e69ba

[ci skip]

TomAugspurger merged commit 670c2e4 into pandas-dev:master Mar 17, 2018

TomAugspurger added this to the 0.23.0 milestone Mar 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: update the pandas.core.resample.Resampler.fillna docstring #20379

DOC: update the pandas.core.resample.Resampler.fillna docstring #20379

prcastro commented Mar 16, 2018 •

edited

Loading

jorisvandenbossche left a comment

jorisvandenbossche Mar 16, 2018

jorisvandenbossche Mar 16, 2018

prcastro Mar 16, 2018

jorisvandenbossche Mar 16, 2018

TomAugspurger Mar 16, 2018

TomAugspurger Mar 16, 2018

TomAugspurger Mar 16, 2018

TomAugspurger Mar 16, 2018

TomAugspurger Mar 16, 2018

TomAugspurger Mar 16, 2018

TomAugspurger Mar 16, 2018

jorisvandenbossche Mar 16, 2018

jorisvandenbossche Mar 16, 2018

prcastro Mar 16, 2018

TomAugspurger commented Mar 16, 2018 via email

codecov bot commented Mar 16, 2018 •

edited

Loading

prcastro commented Mar 16, 2018

TomAugspurger commented Mar 17, 2018 •

edited

Loading

DOC: update the pandas.core.resample.Resampler.fillna docstring #20379

DOC: update the pandas.core.resample.Resampler.fillna docstring #20379

Conversation

prcastro commented Mar 16, 2018 • edited Loading

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Mar 16, 2018 via email

codecov bot commented Mar 16, 2018 • edited Loading

Codecov Report

prcastro commented Mar 16, 2018

TomAugspurger commented Mar 17, 2018 • edited Loading

prcastro commented Mar 16, 2018 •

edited

Loading

codecov bot commented Mar 16, 2018 •

edited

Loading

TomAugspurger commented Mar 17, 2018 •

edited

Loading