Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: more consistent error message for MultiIndex.from_arrays #25189

Merged
merged 7 commits into from
Feb 20, 2019
10 changes: 8 additions & 2 deletions pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -324,20 +324,26 @@ def from_arrays(cls, arrays, sortorder=None, names=None):
codes=[[0, 0, 1, 1], [1, 0, 1, 0]],
names=['number', 'color'])
"""
error_msg = "Input must be a list / sequence of array-likes."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could maybe think about how to improve the actual message as well, because on a first read I was interpreting this as "Input must be [a list] or [a sequence of array-likes]" (while of course it is "[list or sequence] of array-likes"), which confused me at first ..

To be true to the code, what it actually needs to be is a "list-like of list-likes"? Which is also not that nice to write ..
I am wondering if a more strict error message (stricter than what we allow), something like "Input must be a list of arrays" is not actually easier to understand for users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking that the messages should be changed from Input must be... to something along the lines of 'arrays' parameter of MultiIndex.from_arrays must be... and then regurgitate whatever is in the docstring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if a more strict error message (stricter than what we allow), something like "Input must be a list of arrays" is not actually easier to understand for users.

IIUC the reason that a sequence is accepted is to provide backward compatibility with zip. So sequence does not necessarily need to be mentioned in the docstring.

if not is_list_like(arrays):
raise TypeError("Input must be a list / sequence of array-likes.")
raise TypeError(error_msg)
elif is_iterator(arrays):
arrays = list(arrays)

# Check if lengths of all arrays are equal or not,
# raise ValueError, if not
for i in range(1, len(arrays)):
if not is_list_like(arrays[i]):
simonjayhawkins marked this conversation as resolved.
Show resolved Hide resolved
raise TypeError(error_msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm what was this doing before if not raising this message already?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it raises TypeError: object of type 'int' has no len() if it's an int

if len(arrays[i]) != len(arrays[i - 1]):
raise ValueError('all arrays must be same length')

from pandas.core.arrays.categorical import _factorize_from_iterables

codes, levels = _factorize_from_iterables(arrays)
try:
simonjayhawkins marked this conversation as resolved.
Show resolved Hide resolved
codes, levels = _factorize_from_iterables(arrays)
except TypeError:
raise TypeError(error_msg)
if names is None:
names = [getattr(arr, "name", None) for arr in arrays]

Expand Down
4 changes: 1 addition & 3 deletions pandas/tests/indexes/multi/test_constructor.py
Original file line number Diff line number Diff line change
Expand Up @@ -256,9 +256,7 @@ def test_from_arrays_empty():
@pytest.mark.parametrize('invalid_sequence_of_arrays', [
1, [1], [1, 2], [[1], 2], 'a', ['a'], ['a', 'b'], [['a'], 'b']])
def test_from_arrays_invalid_input(invalid_sequence_of_arrays):
msg = (r"Input must be a list / sequence of array-likes|"
r"Input must be list-like|"
r"object of type 'int' has no len\(\)")
msg = "Input must be a list / sequence of array-likes"
with pytest.raises(TypeError, match=msg):
MultiIndex.from_arrays(arrays=invalid_sequence_of_arrays)

Expand Down