Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: set should not be considered list_like #23061

Closed
h-vetinari opened this issue Oct 9, 2018 · 7 comments
Closed

API: set should not be considered list_like #23061

h-vetinari opened this issue Oct 9, 2018 · 7 comments

Comments

@h-vetinari
Copy link
Contributor

The name of the function is_list_like from pandas.core.dtypes.common suggests that whatever gets True returned will be "like a list". As such, it's used in a handful of places - e.g. I got asked to use it in #20347, and now again in #22486. What I found out in the latter one is that it's true for sets:

>>> from pandas.core.dtypes.common import is_list_like
>>> is_list_like({1, 2, 3})
True

This has some uncomfortable consequences - for str.cat it's a bug (#23009), and for df.set_index it would be too.

@jreback asked me to try out removing set from is_list_like (#22486 (comment)), so this issue is following up on that.

@h-vetinari why don't you try (separate PR) excluding set from is_list_like and see what the implications of that are.

There are some rare cases like .isin, where sets should also be included. I'd say to have a looser definition is_set_like that's basically is_list_like or set. Alternatively, one could think of is_list_like(strict=False) to include sets.

Another question is if this needs a deprecation cycle and how?

@TomAugspurger
Copy link
Contributor

FWIW, changing these is_* in the past have led to us breaking user-code. I don't think it's possible to have a high-level is_list_like that does the right thing in for every use case.

@jschendel
Copy link
Member

I imagine this issue extends beyond just set, e.g. dict is considered list-like too:

In [2]: d = dict(zip('xyz', [10, 20, 30]))

In [3]: d
Out[3]: {'x': 10, 'y': 20, 'z': 30}

In [4]: is_list_like(d)
Out[4]: True

In [5]: s = pd.Series(list('abc'))

In [6]: s.str.cat(d)
Out[6]:
0    ax
1    by
2    cz
dtype: object

Quickly reading over the associated issues, it looks like what you want is to basically place an additional restriction on is_list_like so unordered objects return False? Given that is_list_like relies on collections.abc.Iterable, it seems like such a change would cut out a significant number of things that were previously considered list-like, and to @TomAugspurger's point, would lead to breaking user code. I wouldn't be opposed to adding a function along the lines of is_ordered_list_like though.

@h-vetinari
Copy link
Contributor Author

h-vetinari commented Oct 9, 2018

@jschendel

dict is ordered from 3.6 onwards, but in any case, it could be included in the path I've now chosen to pursue for a first PR:

  • add a strict-kwarg to is_list_like, which defaults to False (raising FutureWarning if not passed), in which case the behaviour is unchanged. For strict=True, sets are excluded.
  • this has the advantage that it can't break things.

@h-vetinari
Copy link
Contributor Author

Added a first draft of a PR in #23065. Happy to add dict (with PY35) to the list of excluded types for strict=True.

@TomAugspurger
Copy link
Contributor

I don't think that's a good idea. In the past changing these have led to surprising breaks in user code. These checks are used all over pandas, so it's hard to predict the ramifications of changing one.

If a user has a set-like object, then what would they do when they see that warning cropping up from their object?

I think if the caller needs to know that the collection is ordered, then they should assert that independently.

@h-vetinari
Copy link
Contributor Author

@TomAugspurger

  • It's not a breaking change
  • I've hunted down all the places where it's used internally
  • No tests break

If a user has a set-like object, then what would they do when they see that warning cropping up from their object?

They're not gonna see it from pandas, unless they from pandas.core.dtypes.common import is_list_like. And then, the warning tells them that sets will not be considered list-like in the future unless strict=False is specified. Can't see how this is ambiguous...?

@h-vetinari
Copy link
Contributor Author

h-vetinari commented Oct 23, 2018

@jreback @gfyoung @WillAyd this issue has been closed by #23065.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants