Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: allow scalar setting/getting via float indexer on integer indexes #12370

Closed
wants to merge 1 commit into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Feb 17, 2016

closes #12333

@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Feb 17, 2016
@jreback jreback added this to the 0.18.0 milestone Feb 17, 2016
@jreback
Copy link
Contributor Author

jreback commented Feb 17, 2016

not a lot of code changes, mostly making the tests robust & w/o repeating things.

haven't updated docs yet

self.assertEqual(result4, result6)
for result in [s[5.0], s[5],
s.loc[5.0], s.loc[5],
s.ix[5.0], s[5]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the last one here should probably be s.ix[5]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

for idxr in [lambda x: x.loc,
lambda x: x.iloc,
lambda x: x]:
self.assertRaises(TypeError, lambda: idxr(s)[l])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am probably missing something (the diff in the tests is a bit hard to follow), but shouldn't there be a difference between loc and iloc here? As loc is doing label-based slicing, so float is allowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are about slicing:

In [1]: s = Series(range(3))

In [2]: s[1.0:2]
TypeError: cannot do slice start value indexing on <class 'pandas.indexes.range.RangeIndex'> with these indexers [1.0] of <type 'float'>

In [3]: s.loc[1.0:2]
TypeError: cannot do slice indexing on <class 'pandas.indexes.range.RangeIndex'> with these indexers [1.0] of <type 'float'>

In [4]: s.ix[1.0:2]
Out[4]: 
1    1
2    2
dtype: int64

In [5]: s.iloc[1.0:2]
TypeError: cannot do slice start value indexing on <class 'pandas.indexes.range.RangeIndex'> with these indexers [1.0] of <type 'float'>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but why would slicing be an exception?

In [107]: s.loc[1.0]
Out[107]: 1

In [108]: s.loc[2.0]
Out[108]: 2

In [109]: s.loc[1.0:2.0]
TypeError: cannot do slice indexing on <class 'pandas.indexes.numeric.Int64Index
'> with these indexers [1.0] of <type 'float'>

In s[1.0:2], the 1.0 is an integer location, so that should indeed raise, but in s.loc[1.0:2], the 1.0 is a label, and this label is found when using scalar indexing (s.loc[1.0])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#12333 (comment)

this is the same as your comment issue. We don't allow float indexers in slicing at ALL, except for Float64Index. This is possible to relax, but introduces another layer of complexity here. (this doesn't apply for .ix, just [] and .loc)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We perfectly allow floats in slices?

In [1]: s = Series(range(3))

In [2]: s.loc[1.0:2]
Out[2]:
1    1
2    2
dtype: int64

In [3]: s.ix[1.0:2]
Out[3]:
1    1
2    2
dtype: int64

In [4]: pd.__version__
Out[4]: u'0.17.1'

@jreback
Copy link
Contributor Author

jreback commented Feb 22, 2016

ok, this is updated

  • reorged indexing tests (mainly moved out float indexers from the monolith test_indexing.py and dropped some duplicate tests)
  • bulit comprehensive float scalar/slice testing, separated out by index
  • removed some net code! yeh
  • cleaned up the some duplicated / awkward indexing code (some, but it does flow a bit better now). main issue was some duplicated code w.r.t. some of the slice indexing and checking. hopefully this will allow future changes to be in single places rather than all over the place.

net-net-net. I don't think actually much has changed from 0.17.1. (with the exception that I note when setting with .ix in the docs, IOW, using a float indexer on an object index now sets the label rather than positional sets).

We have dropped the FutureWarning entirely on float indexers, but now only raise for .iloc (previously we were also raising on non-float indexes for [], .ix, .loc).

So give it a whirl. If you find missing / unclear tests, let's fix em. The more comprehensive the better.

@jreback
Copy link
Contributor Author

jreback commented Feb 22, 2016

@jorisvandenbossche
Copy link
Member

Great!

Quick note: I think we should also raise on [] when this does positional indexing (eg in slicing) and using floats (just as iloc). Eg s[1:2.5] raised in 0.17.1, but now works with this PR.

I will try to take a more detailed look later this week.

@jreback
Copy link
Contributor Author

jreback commented Feb 23, 2016

In [1]: s = Series([1,2,3])

In [2]: s[1:2.5]
TypeError: cannot do slice stop value indexing on <class 'pandas.core.index.Int64Index'> with these indexers [2.5] of <type 'float'>

In [3]: s.loc[1:2.5]
Out[3]: 
1    2
2    3
dtype: int64

why are you calling out this a special case? This is the same path as .loc.

@jorisvandenbossche
Copy link
Member

I get

In [48]: s = Series([1,2,3])

In [49]: s[1:2.5]
Out[49]: array([2], dtype=int64)

using your branch (so something else as you show in your last comment)

@jorisvandenbossche
Copy link
Member

Which now I note gives an array instead of series, which is also odd (wait a moment, maybe I have done something wrong in checkout out your branch, as this was a series when I first tested it)

@jreback
Copy link
Contributor Author

jreback commented Feb 23, 2016

@jorisvandenbossche yeh something odd going on. will update soon. In any event I don't see a reason to make this any different it is very consistent with the other indexers.

@jorisvandenbossche
Copy link
Member

Very strange, it seems to depend on the values in the index:

In [14]: s = Series([1,2,3])

In [15]: s[1:2.5]
Out[15]: array([2], dtype=int64)

In [16]: s = Series([1,2,3], index=[1,2,3])

In [17]: s[1:2.5]
Out[17]:
2    2
dtype: int64

@jreback
Copy link
Contributor Author

jreback commented Feb 23, 2016

@jorisvandenbossche ok, adding tests for these off integers.

@jorisvandenbossche
Copy link
Member

But to come back to what I originally wanted to say: raising in [].
You gave above the example (#12370 (comment)):

In [1]: s = Series([1,2,3])

In [2]: s[1:2.5]
TypeError: cannot do slice stop value indexing on <class 'pandas.core.index.Int64Index'> with these indexers [2.5] of <type 'float'>

In [3]: s.loc[1:2.5]
Out[3]: 
1    2
2    3
dtype: int64

This raises with floats when slicing in []. And this is what I meant that I think should happen, but it is not what currently happens in the PR.
What was your purpose in showing that example? That you think that should be the behaviour? If so, we are in full agreement :-) If not, it's not clear from that comment what you mean.

@jreback
Copy link
Contributor Author

jreback commented Feb 23, 2016

oh, I just generalized things. This is particular was not exactly tested :< oh
ok in agreement that this is NOT a special case then? gr8 (that example was from 0.17.1 btw)

@jorisvandenbossche
Copy link
Member

@jreback yep, ok, in agreement it is not a special case and should stay as it was in 0.17.1 (your example). The confusion was that is actually behaved differently in the PR, which I tried to point out :-)

@jreback
Copy link
Contributor Author

jreback commented Feb 23, 2016

@jorisvandenbossche no, my example is from 0.17.1. But I am NOT treating it as a special case, which means this WILL work the same as .loc (aside fro the np.array(..) issue which I have to track).

why should this raise?

@jorisvandenbossche
Copy link
Member

why should this raise?

Because [] is not like loc in this case, but like iloc (which raises)

@jreback
Copy link
Contributor Author

jreback commented Feb 23, 2016

@jorisvandenbossche no i disagree, this is label based. Again this is introducing special cases where none should exist.

@jorisvandenbossche
Copy link
Member

It is for slicing (not scalars):

In [13]: pd.__version__
Out[13]: '0.18.0rc1+79.gff90e0a'

In [14]: s = pd.Series(range(2,6), index=range(2,6))

In [17]: s.loc[2:4]
Out[17]:
2    2
3    3
4    4
dtype: int64

In [18]: s[2:4]
Out[18]:
4    4
5    5
dtype: int64

In [19]: s[2:4.0]   <---------- this is positional indexing with floats -> so should raise?
Out[19]:
4    4
5    5
dtype: int64

In [20]: s.iloc[2:4.0]
TypeError: cannot do slice indexing on <class 'pandas.indexes.numeric.Int64Index
'> with these indexers [4.0] of <type 'float'>

vs

In [6]: pd.__version__
Out[6]: u'0.17.1'

In [7]:  s = pd.Series(range(2,6), index=range(2,6))

In [8]: s[2:4.0]          <------------ does raise in 0.17.1
TypeError: cannot do slice stop value indexing on <class 'pandas.core.index.Int6
4Index'> with these indexers [4.0] of <type 'float'>

@jreback
Copy link
Contributor Author

jreback commented Mar 6, 2016

ok let me adjust that

@jreback jreback force-pushed the float branch 2 times, most recently from 4dae30e to 980a97c Compare March 7, 2016 14:21
@jreback
Copy link
Contributor Author

jreback commented Mar 7, 2016

@jorisvandenbossche ok updated. I think that we are now consistent with 0.17.1 (and now fully tested).

@jorisvandenbossche
Copy link
Member

This looks good! Thanks a lot

@jorisvandenbossche
Copy link
Member

An edge case: if you have mixed dtype index, you get some inconsistencies:

In [48]: s2 = pd.Series([1,2,3], index=['a', 'b', 'c'])

In [49]: s3 = pd.Series([1,2,3], index=['a', 'b', 1.5])

In [50]: s2[1.0]
TypeError: cannot do label indexing on <class 'pandas.indexes.base.Index'> with
these indexers [1.0] of <type 'float'>

In [51]: s3[1.0]
Out[51]: 2

So the second case (s3[1.0]) appears to do positional indexing with a float (which should raise?)

@jorisvandenbossche
Copy link
Member

Actually, the fact that s2[1.0] (and also s2.ix[1.0]) in the above example raises, is an API change (previously it was coerced to integer and positional indexing was done), so we should mention this in the whatsnew docs I think

@jreback
Copy link
Contributor Author

jreback commented Mar 8, 2016

hmm, I'll put in a fix for that. was not the intention to change the API (at all)

@jorisvandenbossche
Copy link
Member

No, I think it is a good change (although maybe we should have raised a FutureWarning for that ..). With a full string index, s2[1.0] is either a KeyError (since 1.0 is not a label in the index), or either is tries to do positional but then should raise because it is a float (and positional is strictly with integers)

@jreback
Copy link
Contributor Author

jreback commented Mar 8, 2016

neh, this is really odd .ix fails here as well

This is in 0.17.1.

In [1]: In [48]: s2 = pd.Series([1,2,3], index=['a', 'b', 'c'])

In [2]: s2.ix[1.0]
/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/indexing.py:1215: FutureWarning: scalar indexers for index type Index should be integers and not floating point
  self._convert_scalar_indexer(key, axis)
/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/indexing.py:81: FutureWarning: scalar indexers for index type Index should be integers and not floating point
  return self.obj[label]
Out[2]: 2

In [3]: s2[1.0]
/Users/jreback/miniconda/bin/ipython:1: FutureWarning: scalar indexers for index type Index should be integers and not floating point
  #!/bin/bash /Users/jreback/miniconda/bin/python.app
Out[3]: 2

These are consistent (and positional).

@jorisvandenbossche
Copy link
Member

And they raised a warning, so I think it is certainly OK that they now both raise (as is currently the case with this PR)

@jreback
Copy link
Contributor Author

jreback commented Mar 8, 2016

ok, will push a fix in a minute

@jorisvandenbossche
Copy link
Member

So the only issue here , then, is for the mixed dtype index, where it should also raise.

@jreback
Copy link
Contributor Author

jreback commented Mar 8, 2016

actually, I could make a case for s3.ix[1.0] just being a KeyError as its label based (as the index is mixed)

@jreback
Copy link
Contributor Author

jreback commented Mar 8, 2016

I added tests, but I didnt' actually change any code. lmk what you think.

@jorisvandenbossche
Copy link
Member

actually, I could make a case for s3.ix[1.0] just being a KeyError as its label based (as the index is mixed)

+1

@jreback
Copy link
Contributor Author

jreback commented Mar 8, 2016

@jorisvandenbossche

on something like this. (from 0.17.1)

In [3]: In [48]: s2 = pd.Series([1,2,3], index=['a', 'b', 'c'])

In [4]: s2.loc[1.0]
KeyError: 'the label [1] is not in the [index]'

but we are a bit more exact in that this is REALLY a TypeError, or is this just confusing? e.g. we KNOW by the dtype that some values are simply an error (floats in a DatetimeIndex for example). For a mixed index, we are not sure and HAVE to raise a KeyError. But what about a pure-string index (which we currently raise a KeyError), that should actually be a TypeError?

@jorisvandenbossche
Copy link
Member

If we had something like a StringIndex, it would be clear that a float is not possible. But for a plain Index, although it has only string values, .. Hmm, it's a bit a dubious case of course. If it was a KeyError previously, I would maybe keep it that way? (unless that makes it more complicated)

@jreback
Copy link
Contributor Author

jreback commented Mar 8, 2016

yeah just going to keep it. pushing in a moment.

@jreback
Copy link
Contributor Author

jreback commented Mar 8, 2016

ok all fixed up. simplified the logic a bit so its pretty clear.

if self.is_integer() or is_index_slice:
return key
return self._convert_slice_indexer(key)
if kind in ['getitem', 'ix'] and is_float(key):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where it aises a TypeError here if its label based but not the right type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: behaviour of label indexing with floats on integer index
2 participants