Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: update the Index.isin docstring #20249

Merged

Conversation

noemielteto
Copy link
Contributor

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:


################################################################################
######################## Docstring (pandas.Index.isin)  ########################
################################################################################

Boolean array on existence of index values in `values`.

Compute boolean array of whether each index value is found in the
passed set of values. Length of the returned boolean array matches
the length of the index.

Parameters
----------
values : set or list-like
    Sought values.

    .. versionadded:: 0.18.1

    Support for values as a set.

level : str or int, optional in the case of Index, compulsory on
    MultiIndex
    Name or position of the index level to use (if the index is a
    MultiIndex).

Returns
-------
is_contained : ndarray (boolean dtype)

See also
--------
DatetimeIndex.isin : an Index of :class:`Datetime` s
TimedeltaIndex : an Index of :class:`Timedelta` s
PeriodIndex : an Index of :class:`Period` s
MultiIndex.isin : Same for `MultiIndex`
NumericIndex.isin : Same for `Int64Index`, `UInt64Index`,
                    `Float64Index`

Notes
-----
If `level` is specified:

- if it is the name of one *and only one* index level, use that level;
- otherwise it should be a number indicating level position.

Examples
--------
>>> idx = pd.Index([1,2,3])
>>> idx
Int64Index([1, 2, 3], dtype='int64')

Check whether a value is in the Index:
>>> idx.isin([1])
array([ True, False, False])

>>> midx = pd.MultiIndex.from_arrays([[1,2,3],
...                                    ['red','blue','green']],
...                                    names=('number', 'color'))
>>> midx
MultiIndex(levels=[[1, 2, 3], ['blue', 'green', 'red']], labels=[[0, 1, 2], [2, 0, 1]], names=['number', 'color'])

Check whether a string index value is in the 'color' level of the
MultiIndex:

>>> midx.isin(['red'],'color')
array([ True, False, False])

>>> dates = ['3/11/2000', '3/12/2000', '3/13/2000']
>>> dti = pd.to_datetime(dates)
>>> dti
DatetimeIndex(['2000-03-11', '2000-03-12', '2000-03-13'], dtype='datetime64[ns]', freq=None)

Check whether a datetime index value is in the DatetimeIndex:

>>> dti.isin(['3/11/2000'])
array([ True, False, False])

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.Index.isin" correct. :)

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A few comments.

@@ -3112,8 +3112,11 @@ def map(self, mapper, na_action=None):

def isin(self, values, level=None):
"""
Boolean array on existence of index values in `values`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does

Return a boolean array where the index values are in `values`.

fit on a line?

Compute boolean array of whether each index value is found in the
passed set of values.
passed set of values. Length of the returned boolean array matches
Copy link
Contributor

@TomAugspurger TomAugspurger Mar 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Length" -> "The length"


level : str or int, optional
level : str or int, optional in the case of Index, compulsory on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite, it's always optional but we don't handle it properly see #20252

array([ True, False, False])

>>> midx = pd.MultiIndex.from_arrays([[1,2,3],
... ['red','blue','green']],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8: align these.

... ['red','blue','green']],
... names=('number', 'color'))
>>> midx
MultiIndex(levels=[[1, 2, 3], ['blue', 'green', 'red']],\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about the trailing backslashes here. Just line things up.

Check whether a string index value is in the 'color' level of the
MultiIndex:

>>> midx.isin(['red'],'color')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8: space after comma

Explicit keyword argument, so level='color'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an example with no level, midx.isin([(1, 'red'), (3, 'red')])

>>> dti = pd.to_datetime(dates)
>>> dti
DatetimeIndex(['2000-03-11', '2000-03-12', '2000-03-13'],\
dtype='datetime64[ns]', freq=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting

See also
--------
DatetimeIndex.isin : an Index of :class:`Datetime` s
TimedeltaIndex : an Index of :class:`Timedelta` s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need any of these. however pls add Series.isin

@jreback jreback added Docs Indexing Related to indexing on series/frames, not to indexes themselves labels Mar 10, 2018
@jorisvandenbossche
Copy link
Member

@noemielteto Do you have time to update the PR based on the feedback?

@noemielteto
Copy link
Contributor Author

@jorisvandenbossche Yes, I am going to update it today.

@noemielteto noemielteto force-pushed the updating_Index.isin_docstring branch from bf6307b to b18d848 Compare March 18, 2018 01:18
@codecov
Copy link

codecov bot commented Mar 18, 2018

Codecov Report

Merging #20249 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20249      +/-   ##
==========================================
- Coverage   91.79%   91.77%   -0.03%     
==========================================
  Files         152      152              
  Lines       49205    49205              
==========================================
- Hits        45169    45159      -10     
- Misses       4036     4046      +10
Flag Coverage Δ
#multiple 90.16% <ø> (-0.03%) ⬇️
#single 41.85% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.68% <ø> (ø) ⬆️
pandas/plotting/_converter.py 65.07% <0%> (-1.74%) ⬇️
pandas/util/testing.py 83.95% <0%> (+0.2%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 670c2e4...25d97ec. Read the comment docs.

@TomAugspurger
Copy link
Contributor

Rephrased the examples slightly. Instead of saying "check whether a string is in the index", we say "check whether each value in the index is in a list of strings." Subtle difference, but "string in index" is more like foo in Index or Index.contains(foo).

Thanks @noemielteto!

@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Mar 18, 2018
@TomAugspurger TomAugspurger merged commit 4a43815 into pandas-dev:master Mar 18, 2018
@noemielteto noemielteto deleted the updating_Index.isin_docstring branch March 18, 2018 21:44
@noemielteto
Copy link
Contributor Author

@TomAugspurger Totally agree, thanks!

@emesterhazy
Copy link

@TomAugspurger Just wondering - what is this 15mb file that got merged in with the pull request?

https://github.com/pandas-dev/pandas/blob/master/pd

gfyoung added a commit to forking-repos/pandas that referenced this pull request Mar 19, 2018
@gfyoung
Copy link
Member

gfyoung commented Mar 19, 2018

@emesterhazy : That's a file that no longer exists 😄 (it should never have been there, not sure why it wasn't caught before merging).

@TomAugspurger
Copy link
Contributor

Hmm.

@gfyoung did 7d5e653 remove it from the git history?

@TomAugspurger
Copy link
Contributor

I believe it's still present.

cc @jreback @jorisvandenbossche.

We have

* 81e0f6908 - (HEAD -> master, origin/master, origin/HEAD) Bug: Allow np.timedelta64 objects to index TimedeltaIndex (#20408) (53 minutes ago) <Matthew Roeschke>
* 0722af8b1 - DOC: add disallowing of Series construction of len-1 list with index to whatsnew (#20392) (59 minutes ago) <Joris Van den Bossche>
* 7d5e65343 - MAINT: Remove weird pd file (6 hours ago) <gfyoung>
* 4a43815d9 - DOC: update the Index.isin docstring (#20249) (13 hours ago) <Noémi Éltető>

Merge end (newest on top)

The size of a fresh clone:

$ git count-objects -vH
count: 0
size: 0 bytes
in-pack: 145618
packs: 1
size-pack: 120.35 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

Cleanup with bfg

$ bfg --delete-files pd

Using repo : /private/tmp/pandas/.git

Found 1212 objects to protect
Found 85 tag-pointing refs : refs/tags/debian/0.4.0-1, refs/tags/debian/0.4.1-1, refs/tags/debian/0.4.3-1, ...
Found 16 commit-pointing refs : HEAD, refs/heads/master, refs/remotes/origin/0.19.x, ...

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit 81e0f690 (protected by 'HEAD')

Cleaning
--------

Found 17369 commits
Cleaning commits:       100% (17369/17369)
Cleaning commits completed in 1,628 ms.

Updating 2 Refs
---------------

	Ref                          Before     After
	------------------------------------------------
	refs/heads/master          | 81e0f690 | f4343b31
	refs/remotes/origin/master | 81e0f690 | f4343b31

Updating references:    100% (2/2)
...Ref update completed in 18 ms.

Commit Tree-Dirt History
------------------------

	Earliest                                              Latest
	|                                                          |
	...........................................................D

	D = dirty commits (file tree fixed)
	m = modified commits (commit message or parents changed)
	. = clean commits (no changes to file tree)

	                        Before     After
	-------------------------------------------
	First modified commit | 4a43815d | ea22c025
	Last dirty commit     | 4a43815d | ea22c025

Deleted files
-------------

	Filename   Git id
	-----------------------------
	pd       | 8a8a16ad (15.2 MB)


In total, 5 object ids were changed. Full details are logged here:

	/private/tmp/pandas.bfg-report/2018-03-19/06-03-42

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive


--
You can rewrite history in Git - don't let Trump do it for real!
Trump's administration has lied consistently, to make people give up on ever
being told the truth. Don't give up: https://github.com/bkeepers/stop-trump
--

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 19, 2018

Here's the log after

* f4343b312 - (HEAD -> master, origin/master, origin/HEAD) Bug: Allow np.timedelta64 objects to index TimedeltaIndex (#20408) (61 minutes ago) <Matthew Roeschke>
* 9dbb89cc5 - DOC: add disallowing of Series construction of len-1 list with index to whatsnew (#20392) (67 minutes ago) <Joris Van den Bossche>
* 3e14b8d7c - MAINT: Remove weird pd file (6 hours ago) <gfyoung>
* ea22c025d - DOC: update the Index.isin docstring (#20249) (13 hours ago) <Noémi Éltető>

Do we want to force push that to master? That'll cause some short term pain probably, but I think it's for the best.

I'll wait for a +1, and I suspect @jreback will have to temporarily disable branch protection on master.

@jorisvandenbossche
Copy link
Member

@TomAugspurger do you want to do a force push to master?

@TomAugspurger
Copy link
Contributor

I'm trying to read a bit more about how that messes up downstream. It seems to be a pretty bad idea.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 19, 2018

FYI, those outputs I posted are out of date due to #20402 being merged. I re-ran bfg with latest master including that change.

Using repo : /private/tmp/pandas/.git

Found 1212 objects to protect
Found 85 tag-pointing refs : refs/tags/debian/0.4.0-1, refs/tags/debian/0.4.1-1, refs/tags/debian/0.4.3-1, ...
Found 16 commit-pointing refs : HEAD, refs/heads/master, refs/remotes/origin/0.19.x, ...

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit ee45e05d (protected by 'HEAD')

Cleaning
--------

Found 17370 commits
Cleaning commits:       100% (17370/17370)
Cleaning commits completed in 2,007 ms.

Updating 2 Refs
---------------

	Ref                          Before     After
	------------------------------------------------
	refs/heads/master          | ee45e05d | 7273ea07
	refs/remotes/origin/master | ee45e05d | 7273ea07

Updating references:    100% (2/2)
...Ref update completed in 20 ms.

Commit Tree-Dirt History
------------------------

	Earliest                                              Latest
	|                                                          |
	...........................................................D

	D = dirty commits (file tree fixed)
	m = modified commits (commit message or parents changed)
	. = clean commits (no changes to file tree)

	                        Before     After
	-------------------------------------------
	First modified commit | 4a43815d | ea22c025
	Last dirty commit     | 4a43815d | ea22c025

Deleted files
-------------

	Filename   Git id
	-----------------------------
	pd       | 8a8a16ad (15.2 MB)


In total, 6 object ids were changed. Full details are logged here:

	/private/tmp/pandas.bfg-report/2018-03-19/07-51-40

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive


--
You can rewrite history in Git - don't let Trump do it for real!
Trump's administration has lied consistently, to make people give up on ever
being told the truth. Don't give up: https://www.theguardian.com/us-news/trump-administration
--

And here's the new log

* 7273ea070 - (HEAD -> master, origin/master, origin/HEAD) DOC: Only use ~ in class links to hide prefixes. (#20402) (2 hours ago) <Israel Saeta Pérez>
* f4343b312 - Bug: Allow np.timedelta64 objects to index TimedeltaIndex (#20408) (3 hours ago) <Matthew Roeschke>
* 9dbb89cc5 - DOC: add disallowing of Series construction of len-1 list with index to whatsnew (#20392) (3 hours ago) <Joris Van den Bossche>
* 3e14b8d7c - MAINT: Remove weird pd file (8 hours ago) <gfyoung>
* ea22c025d - DOC: update the Index.isin docstring (#20249) (15 hours ago) <Noémi Éltető>

I haven't force pushed yet.

@jreback
Copy link
Contributor

jreback commented Mar 19, 2018

@TomAugspurger ok to force push. I did this once before a long time ago. just be careful!

@jreback
Copy link
Contributor

jreback commented Mar 19, 2018

you may have to set the no-force push flag (in settings), then do it, then revert the flag

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 19, 2018

Done.

Re-enabled branch protection on master. Could someone check the settings? I think that's how they were before.

TomAugspurger pushed a commit that referenced this pull request Mar 19, 2018
@jorisvandenbossche
Copy link
Member

Everything looks ok

@gfyoung
Copy link
Member

gfyoung commented Mar 19, 2018

@TomAugspurger : Yes, I did remove that file in that commit.

gfyoung pushed a commit to forking-repos/pandas that referenced this pull request Mar 19, 2018
gfyoung added a commit to forking-repos/pandas that referenced this pull request Mar 19, 2018
@TomAugspurger
Copy link
Contributor

Yeah, it was still around in the git history though. So I used bfg to purge it from the history (rewriting master) and force pushed.

We should probably have some kind of policy around that. OK to do if you notice it right away and master hasn't changed? Wait for a +1 otherwise?

@gfyoung
Copy link
Member

gfyoung commented Mar 19, 2018

We should probably have some kind of policy around that. OK to do if you notice it right away and master hasn't changed? Wait for a +1 otherwise?

That makes sense.

GGordonGordon added a commit to GGordonGordon/pandas that referenced this pull request Mar 21, 2018
Revert "DOC: update the Index.isin docstring (pandas-dev#20249)"
This reverts commit 4a43815.
GGordonGordon pushed a commit to GGordonGordon/pandas that referenced this pull request Mar 21, 2018
GGordonGordon added a commit to GGordonGordon/pandas that referenced this pull request Mar 21, 2018
Revert "DOC: update the Index.isin docstring (pandas-dev#20249)"
This reverts commit 4a43815.
nehiljain added a commit to nehiljain/pandas that referenced this pull request Mar 21, 2018
…ame_describe

* upstream/master: (158 commits)
  Add link to "Craft Minimal Bug Report" blogpost (pandas-dev#20431)
  BUG: fixed json_normalize for subrecords with NoneTypes (pandas-dev#20030) (pandas-dev#20399)
  BUG: ExtensionArray.fillna for scalar values (pandas-dev#20412)
  DOC" update the Pandas core window rolling count docstring" (pandas-dev#20264)
  DOC: update the pandas.DataFrame.plot.hist docstring (pandas-dev#20155)
  DOC: Only use ~ in class links to hide prefixes. (pandas-dev#20402)
  Bug: Allow np.timedelta64 objects to index TimedeltaIndex (pandas-dev#20408)
  DOC: add disallowing of Series construction of len-1 list with index to whatsnew (pandas-dev#20392)
  MAINT: Remove weird pd file
  DOC: update the Index.isin docstring (pandas-dev#20249)
  BUG: Handle all-NA blocks in concat (pandas-dev#20382)
  DOC: update the pandas.core.resample.Resampler.fillna docstring (pandas-dev#20379)
  BUG: Don't raise exceptions splitting a blank string (pandas-dev#20067)
  DOC: update the pandas.DataFrame.cummax docstring (pandas-dev#20336)
  DOC: update the pandas.core.window.x.mean docstring (pandas-dev#20265)
  DOC: update the api.types.is_number docstring (pandas-dev#20196)
  Fix linter (pandas-dev#20389)
  DOC: Improved the docstring of pandas.Series.dt.to_pytimedelta (pandas-dev#20142)
  DOC: update the pandas.Series.dt.is_month_end docstring (pandas-dev#20181)
  DOC: update the window.Rolling.min docstring (pandas-dev#20263)
  ...
mroeschke pushed a commit to mroeschke/pandas that referenced this pull request Mar 26, 2018
mroeschke pushed a commit to mroeschke/pandas that referenced this pull request Mar 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants