-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent behaviours due to #6497 #6594
Comments
I think we must set the @alexander-beedie I chime you in on this one. |
Hmm... tricky one; I feel that the intent is stronger when explicitly assigning a multi-output expression to a keyword arg (vs an alias), so I could be argued one way or the other here, though there are obviously clear merits to consistency. Irrespective of that, I agree that if we don't auto-structify multi-output expressions then we should definitely raise an error; I can't think of many (any?) cases where you would really want to silently eat an expression by masking it with a duplicative name. Would suggest we do the following:
|
Sorry ;-)
I do not really share this feeling, since
Perhaps in the context of an expression that has side-effect, but I don't know whether such a use case exists, and I agree with you that it's probably safer to raise in such a case and to require users to explicitly structify the output ("Explicit is better than implicit.") :-)
That looks good. And what about warning/preventing users to combine **kwargs, alias, suffix and prefix? (I personally prefer a warning than an exception) |
It's not a strongly-held opinion, and it's hard to argue with consistency ;) @ritchie46: I'll assign myself this one and start moving |
I think that's a good idea @alexander-beedie. This behavior needs some landing. I like the idea of it I must say. It really fits nicely in our expression expansion magic. But I want to add this behavior on the rust side and I am not yet ready for the potential bugs adding this brings. First want to finish other stuff. |
Yup, shouldn't be Python-only. I'll make the |
Polars version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
Hello. First of all, thanks for this really nice library!! ;)
This "issue" is due to the recent change introduced in #6497 that leads to inconsistent outputs.
Reproducible example
Expected behavior
The current behaviour is:
Code A has a single new column "n" whose results is based exclusively on "b";
Code B has a single new column "n" whose results combine "a" and "b" in a struct;
Code C results in a
DuplicateError
;Code D has a single column "n" whose results combine "a" and "b" in a struct.
B and D are consistent, but A and C aren't.
A, B, C and D are not consistent if I assume that using a keyword-based column is roughly equivalent to using
.alias
, i.e., thatdf.select(n=expr)
would be equivalent todf.select(expr.alias('n'))
. Said differently: I expected A and B to have the same output, and C and D to have the same output.Suggestions:
df.select(n=expr)
anddf.select(expr.alias('n'))
(and itswith_columns
counterpart). This could be either aDuplicateError
or a single column with astruct
(my preference goes for the former, reducing the risk of silently having a potentially unexpected behaviour, since one can explicitly create astruct
through the expression if needed);DuplicateError
both for A and C for consistency..prefix
and.suffix
when.alias
or a keyword-named column is used (or raise a warning ?). Similarly, prevent the use (or raise a warning) when.alias
is used with a keyword-named column.To summarize:
.alias
and**kwargs
;select
andwith_columns
(there are perhaps other candidate methods?);Installed versions
The text was updated successfully, but these errors were encountered: