Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: update the pandas.core.resample.Resampler.fillna docstring #20379

Merged
merged 3 commits into from
Mar 17, 2018

Conversation

prcastro
Copy link
Contributor

@prcastro prcastro commented Mar 16, 2018

  • PR title is "DOC: update the pandas.core.resample.Resampler.fillna docstring"
  • The validation script passes: scripts/validate_docstrings.py pandas.core.resample.Resampler.fillna
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single pandas.core.resample.Resampler.fillna
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:


################################################################################
############## Docstring (pandas.core.resample.Resampler.fillna)  ##############
################################################################################

Fill the new missing values in the resampled data using different
methods.

In statistics, imputation is the process of replacing missing data with
substituted values [1]_. When resampling data, missing values may
appear (e.g., when the resampling frequency is higher than the original
frequency).

The backward fill ('bfill') will replace NaN values that appeared in
the resampled data with the next value in the original sequence. The
forward fill ('ffill'), on the other hand, will replace NaN values
that appeared in the resampled data with the previous value in the
original sequence. Missing values that existed in the orginal data will
not be modified.

Parameters
----------
method : str, method of resampling ('ffill', 'bfill')
    Method to use for filling holes in resampled data
        * ffill: use previous valid observation to fill gap (forward
          fill).
        * bfill: use next valid observation to fill gap (backward
          fill).
limit : integer, optional
    Limit of how many values to fill.

Returns
-------
Series, DataFrame
    An upsampled Series or DataFrame with backward or forwards filled
    NaN values.

Examples
--------

Resampling a Series:

>>> s = pd.Series([1, 2, 3],
...               index=pd.date_range('20180101', periods=3, freq='h'))
>>> s
2018-01-01 00:00:00    1
2018-01-01 01:00:00    2
2018-01-01 02:00:00    3
Freq: H, dtype: int64

>>> s.resample('30min').fillna("bfill")
2018-01-01 00:00:00    1
2018-01-01 00:30:00    2
2018-01-01 01:00:00    2
2018-01-01 01:30:00    3
2018-01-01 02:00:00    3
Freq: 30T, dtype: int64

>>> s.resample('15min').fillna("bfill", limit=2)
2018-01-01 00:00:00    1.0
2018-01-01 00:15:00    NaN
2018-01-01 00:30:00    2.0
2018-01-01 00:45:00    2.0
2018-01-01 01:00:00    2.0
2018-01-01 01:15:00    NaN
2018-01-01 01:30:00    3.0
2018-01-01 01:45:00    3.0
2018-01-01 02:00:00    3.0
Freq: 15T, dtype: float64

>>> s.resample('30min').fillna("ffill")
2018-01-01 00:00:00    1
2018-01-01 00:30:00    1
2018-01-01 01:00:00    2
2018-01-01 01:30:00    2
2018-01-01 02:00:00    3
Freq: 30T, dtype: int64

Resampling a DataFrame that has missing values:

>>> df = pd.DataFrame({'a': [2, np.nan, 6], 'b': [1, 3, 5]},
...                   index=pd.date_range('20180101', periods=3,
...                                       freq='h'))
>>> df
                       a  b
2018-01-01 00:00:00  2.0  1
2018-01-01 01:00:00  NaN  3
2018-01-01 02:00:00  6.0  5

>>> df.resample('30min').fillna("bfill")
                       a  b
2018-01-01 00:00:00  2.0  1
2018-01-01 00:30:00  NaN  3
2018-01-01 01:00:00  NaN  3
2018-01-01 01:30:00  6.0  5
2018-01-01 02:00:00  6.0  5

>>> df.resample('15min').fillna("bfill", limit=2)
                       a    b
2018-01-01 00:00:00  2.0  1.0
2018-01-01 00:15:00  NaN  NaN
2018-01-01 00:30:00  NaN  3.0
2018-01-01 00:45:00  NaN  3.0
2018-01-01 01:00:00  NaN  3.0
2018-01-01 01:15:00  NaN  NaN
2018-01-01 01:30:00  6.0  5.0
2018-01-01 01:45:00  6.0  5.0
2018-01-01 02:00:00  6.0  5.0

>>> df.resample('30min').fillna("ffill")
                      a b
2018-01-01 00:00:00     2.0     1
2018-01-01 00:30:00     2.0     1
2018-01-01 01:00:00     NaN     3
2018-01-01 01:30:00     NaN     3
2018-01-01 02:00:00     6.0     5

See Also
--------
backfill : Backward fill NaN values in the resampled data.
pad : Forward fill NaN values in the resampled data.
bfill : Alias of backfill.
ffill: Alias of pad.
nearest : Fill NaN values in the resampled data
    with nearest neighbor starting from center.
pandas.Series.fillna : Fill NaN values in the Series using the
    specified method, which can be 'bfill' and 'ffill'.
pandas.DataFrame.fillna : Fill NaN values in the DataFrame using the
    specified method, which can be 'bfill' and 'ffill'.

References
----------
.. [1] https://en.wikipedia.org/wiki/Imputation_(statistics)

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.core.resample.Resampler.fillna" correct. :)

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Added some inline comments. Futher:

  • could you add some more explanation between the examples?

forward fill ('ffill'), on the other hand, will replace NaN values
that appeared in the resampled data with the previous value in the
original sequence. Missing values that existed in the orginal data will
not be modified.

Parameters
----------
method : str, method of resampling ('ffill', 'bfill')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the type description to method : {'ffill', 'bfill'} ? ("method of resampling" belongs on the next line, and is already there)


Parameters
----------
method : str, method of resampling ('ffill', 'bfill')
Method to use for filling holes in resampled data
* ffill: use previous valid observation to fill gap (forward
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No indentation is needed here (compared to the "Method ..." on the line above), but, sphinx needs a blank line between those two lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

2018-01-01 02:00:00 3
Freq: H, dtype: int64

>>> s.resample('30min').fillna("bfill")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe first show what it does without filling (which is s.resample().asfreq()), of course this is another method, but it will then be easier to see which values have actually been filled by fillna()

@@ -624,18 +624,134 @@ def backfill(self, limit=None):

def fillna(self, method, limit=None):
"""
Fill missing values
Fill the new missing values in the resampled data using different
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a single line. Does

Fill missing values introduced by upsampling.

sound good?

appear (e.g., when the resampling frequency is higher than the original
frequency).

The backward fill ('bfill') will replace NaN values that appeared in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the last sentence, this can be folded into the Parameters section.

forward fill ('ffill'), on the other hand, will replace NaN values
that appeared in the resampled data with the previous value in the
original sequence. Missing values that existed in the orginal data will
not be modified.

Parameters
----------
method : str, method of resampling ('ffill', 'bfill')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method : {'ffill', 'pad', 'bfill', 'backfill', 'nearst'}

Note that ffilll is an alias for pad and bfill is an alias for backfill.

Can you check that 'nearest' works as expected?

And move the descriptions from above here.


Parameters
----------
method : str, method of resampling ('ffill', 'bfill')
Method to use for filling holes in resampled data
* ffill: use previous valid observation to fill gap (forward
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quote these, since they're strings.

limit : integer, optional
limit of how many values to fill
Limit of how many values to fill.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say consecutive values to fill


Returns
-------
Series, DataFrame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Series or DataFrame, can't recall.

backfill : Backward fill NaN values in the resampled data.
pad : Forward fill NaN values in the resampled data.
bfill : Alias of backfill.
ffill: Alias of pad.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove these aliases I think, since they go to the same page.

@@ -624,18 +624,134 @@ def backfill(self, limit=None):

def fillna(self, method, limit=None):
"""
Fill missing values
Fill the new missing values in the resampled data using different
methods.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try to get this on a single line?

>>> df
a b
2018-01-01 00:00:00 2.0 1
2018-01-01 01:00:00 NaN 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add such an example with a missing value above for Series as well (or instead of this example).
I think using a Series will make it easier to understand and easier to focus on that specific behaviour.

In the end, we can limit the number of examples for DataFrame and basically say that for a DataFrame everything works similar as for Series column-by-column

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 16, 2018 via email

@codecov
Copy link

codecov bot commented Mar 16, 2018

Codecov Report

Merging #20379 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20379      +/-   ##
==========================================
- Coverage   91.79%   91.77%   -0.03%     
==========================================
  Files         152      152              
  Lines       49184    49184              
==========================================
- Hits        45150    45138      -12     
- Misses       4034     4046      +12
Flag Coverage Δ
#multiple 90.15% <ø> (-0.03%) ⬇️
#single 41.83% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/resample.py 96.43% <ø> (ø) ⬆️
pandas/plotting/_converter.py 65.07% <0%> (-1.74%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a58303...39e69ba. Read the comment docs.

@prcastro
Copy link
Contributor Author

Made the requested changes, also adding a little more info between examples.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 17, 2018

Moved the See Also up. Thanks @prcastro .

@TomAugspurger TomAugspurger merged commit 670c2e4 into pandas-dev:master Mar 17, 2018
@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Mar 17, 2018
nehiljain added a commit to nehiljain/pandas that referenced this pull request Mar 21, 2018
…ame_describe

* upstream/master: (158 commits)
  Add link to "Craft Minimal Bug Report" blogpost (pandas-dev#20431)
  BUG: fixed json_normalize for subrecords with NoneTypes (pandas-dev#20030) (pandas-dev#20399)
  BUG: ExtensionArray.fillna for scalar values (pandas-dev#20412)
  DOC" update the Pandas core window rolling count docstring" (pandas-dev#20264)
  DOC: update the pandas.DataFrame.plot.hist docstring (pandas-dev#20155)
  DOC: Only use ~ in class links to hide prefixes. (pandas-dev#20402)
  Bug: Allow np.timedelta64 objects to index TimedeltaIndex (pandas-dev#20408)
  DOC: add disallowing of Series construction of len-1 list with index to whatsnew (pandas-dev#20392)
  MAINT: Remove weird pd file
  DOC: update the Index.isin docstring (pandas-dev#20249)
  BUG: Handle all-NA blocks in concat (pandas-dev#20382)
  DOC: update the pandas.core.resample.Resampler.fillna docstring (pandas-dev#20379)
  BUG: Don't raise exceptions splitting a blank string (pandas-dev#20067)
  DOC: update the pandas.DataFrame.cummax docstring (pandas-dev#20336)
  DOC: update the pandas.core.window.x.mean docstring (pandas-dev#20265)
  DOC: update the api.types.is_number docstring (pandas-dev#20196)
  Fix linter (pandas-dev#20389)
  DOC: Improved the docstring of pandas.Series.dt.to_pytimedelta (pandas-dev#20142)
  DOC: update the pandas.Series.dt.is_month_end docstring (pandas-dev#20181)
  DOC: update the window.Rolling.min docstring (pandas-dev#20263)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants