-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: add 'level' kwarg to 'Index.isin' method #7892
API: add 'level' kwarg to 'Index.isin' method #7892
Conversation
Also, probably related to #3268 |
@@ -687,13 +687,29 @@ def _engine(self): | |||
# property, for now, slow to look up | |||
return self._engine_type(lambda: self.values, len(self)) | |||
|
|||
def _validate_index_level(self, level): | |||
""" | |||
Validate index level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
his already exists as _get_level_number (oh I see you are extending it), ok then
Performance patch is in, here's a synthetic benchmark: In [7]: idx = pd.MultiIndex.from_product([np.arange(10000), ['a', 'b', 'c']])
In [8]: timeit idx.isin(['a'], level=1)
10000 loops, best of 3: 104 µs per loop
In [9]: timeit idx.get_level_values(1).isin(['a'])
1000 loops, best of 3: 1.37 ms per loop
In [10]: timeit idx.isin([1], level=0)
1000 loops, best of 3: 1.02 ms per loop
In [11]: timeit idx.get_level_values(0).isin([1])
100 loops, best of 3: 2.84 ms per loop
|
if level is None: | ||
return lib.ismember(self._array_values(), set(values)) | ||
else: | ||
num = self._get_level_number(level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this currently breaks if level is a list, you want to support that? (and pass tuples for the isin values)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't go for that in this PR, I tend to be conservative about API and would wait for an actual use case where
idx.isin(vals1, lev1) & idx.isin(vals2, lev2)
is clearly inferior to
idx.isin([vals1, vals2], [lev1, lev2])
looks good....doc and then can merge FYI, if you want to revise/update isin section gr8! (I would also make it a top-level, rather than below boolean indexing) |
re: doc other than that I've already added? |
I think it makes sense to combine those 2 sections (isin of Index objects) and isin of PandasObject as they are really the same (and confusing to look in 2 sections). http://pandas-docs.github.io/pandas-docs-travis/indexing.html#indexing-with-isin I would simply change the isin section (that I am pointing), make it a top-level (e.g. same as Boolean Indexing), them put the examples there |
I'm a bit concerned about introducing functionality of classes that themselves have not yet been described, because Although, I see some of that happening for docs on |
I agree with all of that index/MultiIndex should have an intro at the top of indexing (side issue) but I think splitting isin up (as it is now) is a bigger problem so I would consolidate and out as a big bullet after boolean indexing feel free to do a short intro if u would like as well (for index/mi) |
@@ -2157,12 +2178,17 @@ def isin(self, values): | |||
Parameters | |||
---------- | |||
values : set or sequence of values | |||
level : {0, None} | |||
|
|||
`level` argument is provided for compatibility with MultiIndex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two remarks:
- can you follow the numpy docstring format?
- I think we should write the docstring for the 'general' case (so it is also applicable for MultiIndex), as this is the docstring that will be rendered in the html docs.
So roughly something like:
level : int, string, default None
Level number or name to specify the index level to use if the index is a MultiIndex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you follow the numpy docstring format?
Sure, although I thought I was close. What do you think stands out?
I think we should write the docstring for the 'general' case
I was trying to state explicitly that this argument means nothing in particular for simple Index
classes while adding a meaningful docstring for MultiIndex.isin
, but I do understand that some might like to have it generic and in one place.
Also, is it stated somewhere that level names must be (or rather, are expected to be) strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
level can be a name or int (level number)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you follow the numpy docstring format?
Sure, although I thought I was close. What do you think stands out?
It is the exact formatting that is important for the correct rendering (see https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt). It should be formatted as:
arg : type
Explanation
(so no blank line, and the explanation should be indented)
I think we should write the docstring for the 'general' case
I was trying to state explicitly that this argument means nothing in particular for simple Index classes while adding a meaningful docstring for MultiIndex.isin, but I do understand that some might like to have it generic and in one place.
The docstring in MultiIndex.isin should also be meaningful. But it is just that in the online documentation, there are references to Index.isin, so I think the all relevant information should also be there (to not have to link to both Index.isin and MultiIndex.isin)
Also, is it stated somewhere that level names must be (or rather, are expected to be) strings?
Ah, no, it can be any object I think, but int and string will be the normal ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(so no blank line, and the explanation should be indented)
hah, ok, that particular line was meant as a note, hence non-indented and separated, but ok, let's make it a parameter description :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, but 'notes' like this don't work in the 'Parameter' section (it is then interpreted as another parameter, the format is quit strict), therefore your have the 'Notes' section, as you did now :-)
PERF: optimize MultiIndex.isin when level != None DOC: update doc bits on Index.isin and MultiIndex.isin DOC: merge isin sections for container & index classes DOC: make isin a separate section It's no longer a subsection of boolean indexing.
looks good... @jorisvandenbossche anything else? |
no, looking good! |
API: add 'level' kwarg to 'Index.isin' method
thanks let's check docs after it's built and review for correctness |
closes #7890
This PR adds
level
kwarg forIndex
objects as discussed (briefly in #7890):Summary
MultiIndex
classes:level
values areNone
,0
,-1
andself.name
level
doesn't match the name, it's aKeyError
(wasAssertionError
before)MultiIndex
classes:level=None
, elements ofself.values
(tuples) are usedlevel
values are the same as inMultiIndex.get_level_values
method,-self.nlevels..(self.nlevels - 1)
plus all unique index namesTODO