Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Distinguish between different types of boolean indexing #10492

Closed
jcjf opened this issue Jul 2, 2015 · 4 comments · Fixed by #36869
Closed

DOC: Distinguish between different types of boolean indexing #10492

jcjf opened this issue Jul 2, 2015 · 4 comments · Fixed by #36869
Assignees
Labels
Docs Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@jcjf
Copy link
Contributor

jcjf commented Jul 2, 2015

It's not clear from the docs that indexing with a boolean ndarray isn't the same as indexing with a boolean Series. It took me a while to realise that:

import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], list('abc'), ['one', 'two'])
sr = (df['one'] > 2)
df.loc[sr, 'two']      # This is ok
df.iloc[sr, 1]         # But this is not
df.iloc[sr.values, 1]  # The right way to use iloc

I've since read through #3631 and it makes sense to me. However, even though the docs emphasise that .iloc is purely integer-based, it didn't click at the time that indexability of Series was the problem I was facing.

@jreback
Copy link
Contributor

jreback commented Jul 2, 2015

the soln in this case would be to

df.ix[sr,1]

which would interpret the 1 as a positional indexer, of course the usual caveats of .ix apply
personally I would do

df.loc[sr, df.columns[1]] to be very explict. More of a matter of taste.

However, would appreciate basically this example in the docs section if you would.

@jreback jreback added Docs Indexing Related to indexing on series/frames, not to indexes themselves labels Jul 2, 2015
@jreback jreback added this to the 0.17.0 milestone Jul 2, 2015
@jreback jreback modified the milestones: Next Major Release, 0.17.0 Aug 26, 2015
@jreback
Copy link
Contributor

jreback commented Aug 26, 2015

@jcjf pull-request?

@junjunjunk
Copy link
Contributor

junjunjunk commented Oct 4, 2020

If this issue is still open, i'm going to take.
I'll add the following example and prose to https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing.
Anyone has more improvements or comments, please let me know.

df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], list('abc'), ['column_1', 'column_2'])
sr = (df['column_1'] > 2)
sr

df.loc[sr, df.columns[1]] 

df.iloc[sr.values, 1]

prose:
iloc distinguishes between different types of boolean indexing. If the type of boolean indexing is Series , an error will be raised. For instance, in the above example, df.iloc[sr.values, 1] is ok. This type of boolean indexing is array. But df.iloc[sr, 1]would raise ValueError.

@junjunjunk
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants