Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert set_index inspection/error handling for 0.24.1 #25085

Merged
merged 34 commits into from
Feb 3, 2019
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
31dcbb7
DOC: Minor what's new fix (#24933)
rth Jan 26, 2019
84056c5
Backport PR #24916: BUG-24212 fix regression in #24897 (#24951)
meeseeksmachine Jan 26, 2019
e22a6c8
Revert "Backport PR #24916: BUG-24212 fix regression in #24897 (#24951)"
jorisvandenbossche Jan 28, 2019
638ac19
Backport PR #24965: Fixed itertuples usage in to_dict (#24978)
meeseeksmachine Jan 28, 2019
72dc33f
Backport PR #24989: DOC: Document breaking change to read_csv (#24996)
meeseeksmachine Jan 29, 2019
fd1c66c
Backport PR #24964: DEPR: Fixed warning for implicit registration (#2…
meeseeksmachine Jan 29, 2019
d54c3a5
Backport PR #24973: fix for BUG: grouping with tz-aware: Values falls…
TomAugspurger Jan 29, 2019
e3cc0b1
Backport PR #24967: REGR: Preserve order by default in Index.differen…
meeseeksmachine Jan 30, 2019
c228597
Backport PR #24961: fix+test to_timedelta('NaT', box=False) (#25025)
meeseeksmachine Jan 30, 2019
7956533
Backport PR #25033: BUG: Fixed merging on tz-aware (#25041)
meeseeksmachine Jan 30, 2019
722bb79
Backport PR #24993: Test nested PandasArray (#25042)
meeseeksmachine Jan 30, 2019
e3634b1
Backport PR #25039: BUG: avoid usage in_qtconsole for recent IPython …
meeseeksmachine Jan 31, 2019
4f865c5
Backport PR #25024: REGR: fix read_sql delegation for queries on MySQ…
meeseeksmachine Jan 31, 2019
c21d32f
Backport PR #25069: REGR: rename_axis with None should remove axis na…
meeseeksmachine Feb 1, 2019
5cb622a
DOC: 0.24.1 whatsnew (#25027)
TomAugspurger Feb 1, 2019
c397839
Revert "DOC: update DF.set_index (#24762)"
h-vetinari Feb 1, 2019
4a211e9
Revert "API: better error-handling for df.set_index (#22486)"
h-vetinari Feb 1, 2019
103a092
Replace deprecated assert_raises_regex
h-vetinari Feb 1, 2019
8086f39
Re-migrate 0.24.0 extension (.txt -> .rst)
h-vetinari Feb 1, 2019
999295e
Re-add docstring clarifications
h-vetinari Feb 1, 2019
c24df00
Backport PR #25063: API: change Index set ops sort=True -> sort=None …
meeseeksmachine Feb 1, 2019
627b17a
trigger azure
TomAugspurger Feb 1, 2019
bc405ce
Backport PR #25084: DOC: Cleanup 0.24.1 whatsnew (#25086)
meeseeksmachine Feb 2, 2019
02db6ec
Backport PR #25026: DOC: Start 0.24.2.rst (#25073)
meeseeksmachine Feb 2, 2019
ff34d2e
trigger azure
TomAugspurger Feb 2, 2019
2aa800c
Merge remote-tracking branch 'upstream/0.24.x' into revert_set_index
h-vetinari Feb 3, 2019
330b343
Keep all tests from #24984; xfail where necessary
h-vetinari Feb 3, 2019
24a4df4
Merge remote-tracking branch 'origin/revert_set_index' into revert_se…
h-vetinari Feb 3, 2019
963a813
Remove stray debugging line
h-vetinari Feb 3, 2019
4db4849
Add whatsnew
h-vetinari Feb 3, 2019
8c913c2
Merge remote-tracking branch 'upstream/master' into h-vetinari-revert…
jorisvandenbossche Feb 3, 2019
5a6cc73
Merge remote-tracking branch 'upstream/master' into revert_set_index
h-vetinari Feb 3, 2019
ff62753
Re-add reverted 0.24.0 whatsnew
h-vetinari Feb 3, 2019
65c7880
Re-add handling for duplicate drops
h-vetinari Feb 3, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1195,8 +1195,6 @@ Other API Changes
- :class:`pandas.io.formats.style.Styler` supports a ``number-format`` property when using :meth:`~pandas.io.formats.style.Styler.to_excel` (:issue:`22015`)
- :meth:`DataFrame.corr` and :meth:`Series.corr` now raise a ``ValueError`` along with a helpful error message instead of a ``KeyError`` when supplied with an invalid method (:issue:`22298`)
- :meth:`shift` will now always return a copy, instead of the previous behaviour of returning self when shifting by 0 (:issue:`22397`)
- :meth:`DataFrame.set_index` now gives a better (and less frequent) KeyError, raises a ``ValueError`` for incorrect types,
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
and will not fail on duplicate column names with ``drop=True``. (:issue:`22484`)
- Slicing a single row of a DataFrame with multiple ExtensionArrays of the same type now preserves the dtype, rather than coercing to object (:issue:`22784`)
- :class:`DateOffset` attribute `_cacheable` and method `_should_cache` have been removed (:issue:`23118`)
- :meth:`Series.searchsorted`, when supplied a scalar value to search for, now returns a scalar instead of an array (:issue:`23801`).
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ Fixed Regressions
- Fixed regression in :func:`merge` when merging an empty ``DataFrame`` with multiple timezone-aware columns on one of the timezone-aware columns (:issue:`25014`).
- Fixed regression in :meth:`Series.rename_axis` and :meth:`DataFrame.rename_axis` where passing ``None`` failed to remove the axis name (:issue:`25034`)
- Fixed regression in :func:`to_timedelta` with `box=False` incorrectly returning a ``datetime64`` object instead of a ``timedelta64`` object (:issue:`24961`)
- Fixed regression where custom hashable types could not be used as column keys in :meth:`DataFrame.set_index` (:issue:`24969`)

.. _whatsnew_0241.bug_fixes:

Expand Down
58 changes: 19 additions & 39 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@
is_iterator,
is_sequence,
is_named_tuple)
from pandas.core.dtypes.generic import ABCSeries, ABCIndexClass, ABCMultiIndex
from pandas.core.dtypes.generic import ABCSeries, ABCIndexClass
from pandas.core.dtypes.missing import isna, notna

from pandas.core import algorithms
Expand Down Expand Up @@ -4138,33 +4138,8 @@ def set_index(self, keys, drop=True, append=False, inplace=False,
4 16 10 2014 31
"""
inplace = validate_bool_kwarg(inplace, 'inplace')

err_msg = ('The parameter "keys" may be a column key, one-dimensional '
'array, or a list containing only valid column keys and '
'one-dimensional arrays.')

if (is_scalar(keys) or isinstance(keys, tuple)
or isinstance(keys, (ABCIndexClass, ABCSeries, np.ndarray))):
# make sure we have a container of keys/arrays we can iterate over
# tuples can appear as valid column keys!
if not isinstance(keys, list):
keys = [keys]
elif not isinstance(keys, list):
raise ValueError(err_msg)

missing = []
for col in keys:
if (is_scalar(col) or isinstance(col, tuple)):
# if col is a valid column key, everything is fine
# tuples are always considered keys, never as list-likes
if col not in self:
missing.append(col)
elif (not isinstance(col, (ABCIndexClass, ABCSeries,
np.ndarray, list))
or getattr(col, 'ndim', 1) > 1):
raise ValueError(err_msg)

if missing:
raise KeyError('{}'.format(missing))

if inplace:
frame = self
Expand All @@ -4175,31 +4150,37 @@ def set_index(self, keys, drop=True, append=False, inplace=False,
names = []
if append:
names = [x for x in self.index.names]
if isinstance(self.index, ABCMultiIndex):
if isinstance(self.index, MultiIndex):
for i in range(self.index.nlevels):
arrays.append(self.index._get_level_values(i))
else:
arrays.append(self.index)

to_remove = []
for col in keys:
if isinstance(col, ABCMultiIndex):
for n in range(col.nlevels):
if isinstance(col, MultiIndex):
# append all but the last column so we don't have to modify
# the end of this loop
for n in range(col.nlevels - 1):
arrays.append(col._get_level_values(n))

level = col._get_level_values(col.nlevels - 1)
names.extend(col.names)
elif isinstance(col, (ABCIndexClass, ABCSeries)):
# if Index then not MultiIndex (treated above)
arrays.append(col)
elif isinstance(col, Series):
level = col._values
names.append(col.name)
elif isinstance(col, Index):
level = col
names.append(col.name)
elif isinstance(col, (list, np.ndarray)):
arrays.append(col)
elif isinstance(col, (list, np.ndarray, Index)):
level = col
names.append(None)
# from here, col can only be a column label
else:
arrays.append(frame[col]._values)
level = frame[col]._values
names.append(col)
if drop:
to_remove.append(col)
arrays.append(level)

index = ensure_index_from_sequences(arrays, names)

Expand All @@ -4208,8 +4189,7 @@ def set_index(self, keys, drop=True, append=False, inplace=False,
raise ValueError('Index has duplicate keys: {dup}'.format(
dup=duplicates))

# use set to handle duplicate column names gracefully in case of drop
for c in set(to_remove):
for c in to_remove:
del frame[c]

# clear up memory usage
Expand Down
121 changes: 116 additions & 5 deletions pandas/tests/frame/test_alter_axes.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,11 @@ def test_set_index_pass_arrays_duplicate(self, frame_of_index_cols, drop,
df.index.name = index_name

keys = [box1(df['A']), box2(df['A'])]

TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
# == gives ambiguous Boolean for Series
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
if drop and keys[0] is 'A' and keys[1] is 'A':
pytest.xfail(reason='broken due to reversion, see GH 25085')

result = df.set_index(keys, drop=drop, append=append)

# need to adapt first drop for case that both keys are 'A' --
Expand Down Expand Up @@ -253,23 +258,129 @@ def test_set_index_raise_keys(self, frame_of_index_cols, drop, append):
df.set_index(['A', df['A'], tuple(df['A'])],
drop=drop, append=append)

@pytest.mark.xfail(reason='broken due to revert, see GH 25085')
@pytest.mark.parametrize('append', [True, False])
@pytest.mark.parametrize('drop', [True, False])
@pytest.mark.parametrize('box', [set, iter])
@pytest.mark.parametrize('box', [set, iter, lambda x: (y for y in x)],
ids=['set', 'iter', 'generator'])
def test_set_index_raise_on_type(self, frame_of_index_cols, box,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could leave these tests and just xfail them.

drop, append):
df = frame_of_index_cols

msg = 'The parameter "keys" may be a column key, .*'
# forbidden type, e.g. set/tuple/iter
with pytest.raises(ValueError, match=msg):
# forbidden type, e.g. set/iter/generator
with pytest.raises(TypeError, match=msg):
df.set_index(box(df['A']), drop=drop, append=append)

# forbidden type in list, e.g. set/tuple/iter
with pytest.raises(ValueError, match=msg):
# forbidden type in list, e.g. set/iter/generator
with pytest.raises(TypeError, match=msg):
df.set_index(['A', df['A'], box(df['A'])],
drop=drop, append=append)

def test_set_index_custom_label_type(self):
# GH 24969

class Thing(object):
def __init__(self, name, color):
self.name = name
self.color = color

def __str__(self):
return "<Thing %r>" % (self.name,)

# necessary for pretty KeyError
__repr__ = __str__

thing1 = Thing('One', 'red')
thing2 = Thing('Two', 'blue')
df = DataFrame({thing1: [0, 1], thing2: [2, 3]})
expected = DataFrame({thing1: [0, 1]},
index=Index([2, 3], name=thing2))

# use custom label directly
result = df.set_index(thing2)
tm.assert_frame_equal(result, expected)

# custom label wrapped in list
result = df.set_index([thing2])
tm.assert_frame_equal(result, expected)

# missing key
thing3 = Thing('Three', 'pink')
msg = "<Thing 'Three'>"
with pytest.raises(KeyError, match=msg):
# missing label directly
df.set_index(thing3)

with pytest.raises(KeyError, match=msg):
# missing label in list
df.set_index([thing3])

def test_set_index_custom_label_hashable_iterable(self):
# GH 24969

# actual example discussed in GH 24984 was e.g. for shapely.geometry
# objects (e.g. a collection of Points) that can be both hashable and
# iterable; using frozenset as a stand-in for testing here

class Thing(frozenset):
# need to stabilize repr for KeyError (due to random order in sets)
def __repr__(self):
tmp = sorted(list(self))
# double curly brace prints one brace in format string
return "frozenset({{{}}})".format(', '.join(map(repr, tmp)))

thing1 = Thing(['One', 'red'])
thing2 = Thing(['Two', 'blue'])
df = DataFrame({thing1: [0, 1], thing2: [2, 3]})
expected = DataFrame({thing1: [0, 1]},
index=Index([2, 3], name=thing2))

# use custom label directly
result = df.set_index(thing2)
tm.assert_frame_equal(result, expected)

# custom label wrapped in list
result = df.set_index([thing2])
tm.assert_frame_equal(result, expected)

# missing key
thing3 = Thing(['Three', 'pink'])
msg = '.*' # due to revert, see GH 25085
with pytest.raises(KeyError, match=msg):
# missing label directly
df.set_index(thing3)

with pytest.raises(KeyError, match=msg):
# missing label in list
df.set_index([thing3])

def test_set_index_custom_label_type_raises(self):
# GH 24969

# purposefully inherit from something unhashable
class Thing(set):
def __init__(self, name, color):
self.name = name
self.color = color

def __str__(self):
return "<Thing %r>" % (self.name,)

thing1 = Thing('One', 'red')
thing2 = Thing('Two', 'blue')
df = DataFrame([[0, 2], [1, 3]], columns=[thing1, thing2])

msg = 'unhashable type.*'

with pytest.raises(TypeError, match=msg):
# use custom label directly
df.set_index(thing2)

with pytest.raises(TypeError, match=msg):
# custom label wrapped in list
df.set_index([thing2])

def test_construction_with_categorical_index(self):
ci = tm.makeCategoricalIndex(10)
ci.name = 'B'
Expand Down