API: allow scalar setting/getting via float indexer on integer indexes #12370

jreback · 2016-02-17T19:31:24Z

jreback · 2016-02-17T19:31:50Z

not a lot of code changes, mostly making the tests robust & w/o repeating things.

haven't updated docs yet

jorisvandenbossche · 2016-02-17T22:12:43Z

pandas/tests/test_indexing.py

-        self.assertEqual(result4, result6)
+        for result in [s[5.0], s[5],
+                       s.loc[5.0], s.loc[5],
+                       s.ix[5.0], s[5]]:


the last one here should probably be s.ix[5]

jorisvandenbossche · 2016-02-17T22:25:06Z

pandas/tests/test_indexing.py

+                    for idxr in [lambda x: x.loc,
+                                 lambda x: x.iloc,
+                                 lambda x: x]:
+                        self.assertRaises(TypeError, lambda: idxr(s)[l])


I am probably missing something (the diff in the tests is a bit hard to follow), but shouldn't there be a difference between loc and iloc here? As loc is doing label-based slicing, so float is allowed?

these are about slicing:

In [1]: s = Series(range(3)) In [2]: s[1.0:2] TypeError: cannot do slice start value indexing on <class 'pandas.indexes.range.RangeIndex'> with these indexers [1.0] of <type 'float'> In [3]: s.loc[1.0:2] TypeError: cannot do slice indexing on <class 'pandas.indexes.range.RangeIndex'> with these indexers [1.0] of <type 'float'> In [4]: s.ix[1.0:2] Out[4]: 1 1 2 2 dtype: int64 In [5]: s.iloc[1.0:2] TypeError: cannot do slice start value indexing on <class 'pandas.indexes.range.RangeIndex'> with these indexers [1.0] of <type 'float'>

Yes, but why would slicing be an exception?

In [107]: s.loc[1.0] Out[107]: 1 In [108]: s.loc[2.0] Out[108]: 2 In [109]: s.loc[1.0:2.0] TypeError: cannot do slice indexing on <class 'pandas.indexes.numeric.Int64Index '> with these indexers [1.0] of <type 'float'>

In s[1.0:2], the 1.0 is an integer location, so that should indeed raise, but in s.loc[1.0:2], the 1.0 is a label, and this label is found when using scalar indexing (s.loc[1.0])

#12333 (comment)

this is the same as your comment issue. We don't allow float indexers in slicing at ALL, except for Float64Index. This is possible to relax, but introduces another layer of complexity here. (this doesn't apply for .ix, just [] and .loc)

We perfectly allow floats in slices?

In [1]: s = Series(range(3)) In [2]: s.loc[1.0:2] Out[2]: 1 1 2 2 dtype: int64 In [3]: s.ix[1.0:2] Out[3]: 1 1 2 2 dtype: int64 In [4]: pd.__version__ Out[4]: u'0.17.1'

jreback · 2016-02-22T01:59:24Z

ok, this is updated

reorged indexing tests (mainly moved out float indexers from the monolith test_indexing.py and dropped some duplicate tests)
bulit comprehensive float scalar/slice testing, separated out by index
removed some net code! yeh
cleaned up the some duplicated / awkward indexing code (some, but it does flow a bit better now). main issue was some duplicated code w.r.t. some of the slice indexing and checking. hopefully this will allow future changes to be in single places rather than all over the place.

net-net-net. I don't think actually much has changed from 0.17.1. (with the exception that I note when setting with .ix in the docs, IOW, using a float indexer on an object index now sets the label rather than positional sets).

We have dropped the FutureWarning entirely on float indexers, but now only raise for .iloc (previously we were also raising on non-float indexes for [], .ix, .loc).

So give it a whirl. If you find missing / unclear tests, let's fix em. The more comprehensive the better.

jreback · 2016-02-22T01:59:34Z

cc @shoyer @jorisvandenbossche @TomAugspurger @wesm

jorisvandenbossche · 2016-02-23T12:49:42Z

Great!

Quick note: I think we should also raise on [] when this does positional indexing (eg in slicing) and using floats (just as iloc). Eg s[1:2.5] raised in 0.17.1, but now works with this PR.

I will try to take a more detailed look later this week.

jreback · 2016-02-23T14:46:44Z

In [1]: s = Series([1,2,3])

In [2]: s[1:2.5]
TypeError: cannot do slice stop value indexing on <class 'pandas.core.index.Int64Index'> with these indexers [2.5] of <type 'float'>

In [3]: s.loc[1:2.5]
Out[3]: 
1    2
2    3
dtype: int64

why are you calling out this a special case? This is the same path as .loc.

jorisvandenbossche · 2016-02-23T14:56:04Z

I get

In [48]: s = Series([1,2,3])

In [49]: s[1:2.5]
Out[49]: array([2], dtype=int64)

using your branch (so something else as you show in your last comment)

jorisvandenbossche · 2016-02-23T14:57:58Z

Which now I note gives an array instead of series, which is also odd (wait a moment, maybe I have done something wrong in checkout out your branch, as this was a series when I first tested it)

jreback · 2016-02-23T15:02:59Z

@jorisvandenbossche yeh something odd going on. will update soon. In any event I don't see a reason to make this any different it is very consistent with the other indexers.

jorisvandenbossche · 2016-02-23T15:04:58Z

Very strange, it seems to depend on the values in the index:

In [14]: s = Series([1,2,3])

In [15]: s[1:2.5]
Out[15]: array([2], dtype=int64)

In [16]: s = Series([1,2,3], index=[1,2,3])

In [17]: s[1:2.5]
Out[17]:
2    2
dtype: int64

jreback · 2016-02-23T15:07:33Z

@jorisvandenbossche ok, adding tests for these off integers.

jorisvandenbossche · 2016-02-23T15:07:49Z

But to come back to what I originally wanted to say: raising in [].
You gave above the example (#12370 (comment)):

In [1]: s = Series([1,2,3])

In [2]: s[1:2.5]
TypeError: cannot do slice stop value indexing on <class 'pandas.core.index.Int64Index'> with these indexers [2.5] of <type 'float'>

In [3]: s.loc[1:2.5]
Out[3]: 
1    2
2    3
dtype: int64

This raises with floats when slicing in []. And this is what I meant that I think should happen, but it is not what currently happens in the PR.
What was your purpose in showing that example? That you think that should be the behaviour? If so, we are in full agreement :-) If not, it's not clear from that comment what you mean.

jreback · 2016-02-23T15:17:27Z

oh, I just generalized things. This is particular was not exactly tested :< oh
ok in agreement that this is NOT a special case then? gr8 (that example was from 0.17.1 btw)

jorisvandenbossche · 2016-02-23T15:31:18Z

@jreback yep, ok, in agreement it is not a special case and should stay as it was in 0.17.1 (your example). The confusion was that is actually behaved differently in the PR, which I tried to point out :-)

jreback · 2016-02-23T15:47:02Z

@jorisvandenbossche no, my example is from 0.17.1. But I am NOT treating it as a special case, which means this WILL work the same as .loc (aside fro the np.array(..) issue which I have to track).

why should this raise?

jorisvandenbossche · 2016-02-23T15:48:20Z

why should this raise?

Because [] is not like loc in this case, but like iloc (which raises)

jreback · 2016-02-23T15:51:14Z

@jorisvandenbossche no i disagree, this is label based. Again this is introducing special cases where none should exist.

jorisvandenbossche · 2016-03-06T23:31:41Z

It is for slicing (not scalars):

In [13]: pd.__version__
Out[13]: '0.18.0rc1+79.gff90e0a'

In [14]: s = pd.Series(range(2,6), index=range(2,6))

In [17]: s.loc[2:4]
Out[17]:
2    2
3    3
4    4
dtype: int64

In [18]: s[2:4]
Out[18]:
4    4
5    5
dtype: int64

In [19]: s[2:4.0]   <---------- this is positional indexing with floats -> so should raise?
Out[19]:
4    4
5    5
dtype: int64

In [20]: s.iloc[2:4.0]
TypeError: cannot do slice indexing on <class 'pandas.indexes.numeric.Int64Index
'> with these indexers [4.0] of <type 'float'>

vs

In [6]: pd.__version__
Out[6]: u'0.17.1'

In [7]:  s = pd.Series(range(2,6), index=range(2,6))

In [8]: s[2:4.0]          <------------ does raise in 0.17.1
TypeError: cannot do slice stop value indexing on <class 'pandas.core.index.Int6
4Index'> with these indexers [4.0] of <type 'float'>

jreback · 2016-03-06T23:33:02Z

ok let me adjust that

jreback · 2016-03-07T14:22:36Z

@jorisvandenbossche ok updated. I think that we are now consistent with 0.17.1 (and now fully tested).

jorisvandenbossche · 2016-03-08T20:22:15Z

This looks good! Thanks a lot

jorisvandenbossche · 2016-03-08T20:31:33Z

An edge case: if you have mixed dtype index, you get some inconsistencies:

In [48]: s2 = pd.Series([1,2,3], index=['a', 'b', 'c'])

In [49]: s3 = pd.Series([1,2,3], index=['a', 'b', 1.5])

In [50]: s2[1.0]
TypeError: cannot do label indexing on <class 'pandas.indexes.base.Index'> with
these indexers [1.0] of <type 'float'>

In [51]: s3[1.0]
Out[51]: 2

So the second case (s3[1.0]) appears to do positional indexing with a float (which should raise?)

jorisvandenbossche · 2016-03-08T20:32:59Z

Actually, the fact that s2[1.0] (and also s2.ix[1.0]) in the above example raises, is an API change (previously it was coerced to integer and positional indexing was done), so we should mention this in the whatsnew docs I think

jreback · 2016-03-08T20:35:04Z

hmm, I'll put in a fix for that. was not the intention to change the API (at all)

jorisvandenbossche · 2016-03-08T20:38:00Z

No, I think it is a good change (although maybe we should have raised a FutureWarning for that ..). With a full string index, s2[1.0] is either a KeyError (since 1.0 is not a label in the index), or either is tries to do positional but then should raise because it is a float (and positional is strictly with integers)

jreback · 2016-03-08T20:40:05Z

neh, this is really odd .ix fails here as well

This is in 0.17.1.

In [1]: In [48]: s2 = pd.Series([1,2,3], index=['a', 'b', 'c'])

In [2]: s2.ix[1.0]
/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/indexing.py:1215: FutureWarning: scalar indexers for index type Index should be integers and not floating point
  self._convert_scalar_indexer(key, axis)
/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/indexing.py:81: FutureWarning: scalar indexers for index type Index should be integers and not floating point
  return self.obj[label]
Out[2]: 2

In [3]: s2[1.0]
/Users/jreback/miniconda/bin/ipython:1: FutureWarning: scalar indexers for index type Index should be integers and not floating point
  #!/bin/bash /Users/jreback/miniconda/bin/python.app
Out[3]: 2

These are consistent (and positional).

jorisvandenbossche · 2016-03-08T20:45:38Z

And they raised a warning, so I think it is certainly OK that they now both raise (as is currently the case with this PR)

jreback · 2016-03-08T20:46:59Z

ok, will push a fix in a minute

jorisvandenbossche · 2016-03-08T20:47:22Z

So the only issue here , then, is for the mixed dtype index, where it should also raise.

jreback · 2016-03-08T21:01:51Z

actually, I could make a case for s3.ix[1.0] just being a KeyError as its label based (as the index is mixed)

jreback · 2016-03-08T21:04:04Z

I added tests, but I didnt' actually change any code. lmk what you think.

jorisvandenbossche · 2016-03-08T21:07:33Z

actually, I could make a case for s3.ix[1.0] just being a KeyError as its label based (as the index is mixed)

+1

jreback · 2016-03-08T21:26:56Z

@jorisvandenbossche

on something like this. (from 0.17.1)

In [3]: In [48]: s2 = pd.Series([1,2,3], index=['a', 'b', 'c'])

In [4]: s2.loc[1.0]
KeyError: 'the label [1] is not in the [index]'

but we are a bit more exact in that this is REALLY a TypeError, or is this just confusing? e.g. we KNOW by the dtype that some values are simply an error (floats in a DatetimeIndex for example). For a mixed index, we are not sure and HAVE to raise a KeyError. But what about a pure-string index (which we currently raise a KeyError), that should actually be a TypeError?

jorisvandenbossche · 2016-03-08T21:33:02Z

If we had something like a StringIndex, it would be clear that a float is not possible. But for a plain Index, although it has only string values, .. Hmm, it's a bit a dubious case of course. If it was a KeyError previously, I would maybe keep it that way? (unless that makes it more complicated)

jreback · 2016-03-08T21:39:17Z

yeah just going to keep it. pushing in a moment.

jreback · 2016-03-08T21:47:05Z

ok all fixed up. simplified the logic a bit so its pretty clear.

jreback · 2016-03-08T21:48:28Z

pandas/indexes/base.py

-        if self.is_integer() or is_index_slice:
-            return key
-        return self._convert_slice_indexer(key)
+            if kind in ['getitem', 'ix'] and is_float(key):


This is where it aises a TypeError here if its label based but not the right type.

closes pandas-dev#12333

jreback added the Compat pandas objects compatability with Numpy or Python functions label Feb 17, 2016

jreback added this to the 0.18.0 milestone Feb 17, 2016

jreback force-pushed the float branch from cee1f73 to f17402a Compare February 17, 2016 22:01

jorisvandenbossche reviewed Feb 17, 2016
View reviewed changes

jreback force-pushed the float branch from f17402a to 0b36ad0 Compare February 17, 2016 22:24

jorisvandenbossche reviewed Feb 17, 2016
View reviewed changes

jreback mentioned this pull request Feb 17, 2016

CLN/PERF: .get_loc on float with integer coercion #12371

Closed

jreback force-pushed the float branch 6 times, most recently from 2f05c51 to b194c22 Compare February 22, 2016 01:51

jorisvandenbossche mentioned this pull request Mar 6, 2016

DOC: fix doc build warnings #12545

Closed

jreback force-pushed the float branch 2 times, most recently from 4dae30e to 980a97c Compare March 7, 2016 14:21

jreback force-pushed the float branch from 980a97c to a981825 Compare March 7, 2016 14:45

jreback force-pushed the float branch from a981825 to 3641e65 Compare March 8, 2016 21:03

jreback force-pushed the float branch from 3641e65 to e28ccd0 Compare March 8, 2016 21:07

jreback force-pushed the float branch from b4ab13d to 5d4913b Compare March 8, 2016 21:46

jreback reviewed Mar 8, 2016
View reviewed changes

API: allow scalar setting/getting via float indexer on integer indexes

3a148a7

closes pandas-dev#12333

jreback force-pushed the float branch from 5d4913b to 3a148a7 Compare March 8, 2016 22:09

jreback closed this in 5f7e290 Mar 8, 2016

API: allow scalar setting/getting via float indexer on integer indexes #12370

API: allow scalar setting/getting via float indexer on integer indexes #12370

Conversation

jreback commented Feb 17, 2016

jreback commented Feb 17, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Feb 22, 2016

jreback commented Feb 22, 2016

jorisvandenbossche commented Feb 23, 2016

jreback commented Feb 23, 2016

jorisvandenbossche commented Feb 23, 2016

jorisvandenbossche commented Feb 23, 2016

jreback commented Feb 23, 2016

jorisvandenbossche commented Feb 23, 2016

jreback commented Feb 23, 2016

jorisvandenbossche commented Feb 23, 2016

jreback commented Feb 23, 2016

jorisvandenbossche commented Feb 23, 2016

jreback commented Feb 23, 2016

jorisvandenbossche commented Feb 23, 2016

jreback commented Feb 23, 2016

jorisvandenbossche commented Mar 6, 2016

jreback commented Mar 6, 2016

jreback commented Mar 7, 2016

jorisvandenbossche commented Mar 8, 2016

jorisvandenbossche commented Mar 8, 2016

jorisvandenbossche commented Mar 8, 2016

jreback commented Mar 8, 2016

jorisvandenbossche commented Mar 8, 2016

jreback commented Mar 8, 2016

jorisvandenbossche commented Mar 8, 2016

jreback commented Mar 8, 2016

jorisvandenbossche commented Mar 8, 2016

jreback commented Mar 8, 2016

jreback commented Mar 8, 2016

jorisvandenbossche commented Mar 8, 2016

jreback commented Mar 8, 2016

jorisvandenbossche commented Mar 8, 2016

jreback commented Mar 8, 2016

jreback commented Mar 8, 2016

Choose a reason for hiding this comment