-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): automagically upconvert with_columns
kwarg expressions with multiple output names to struct; extend **named_kwargs
support to select
#6497
Conversation
@ritchie46: this is actually really nice behaviour... :) |
Does this work with I think we need to add a function on the rust side of py-polars that will be called if We should also add some extra tests here that use lazy + projection / struct field expansion. Lazy must know the schema at all times. |
Works like a charm for DATA_TYPE ... df = pl.DataFrame( {"x1":[1,2,6], "x2":[1,2,3]} ).with_columns( ints=pl.col(pl.Int64) )
# ┌─────┬─────┬───────────┐
# │ x1 ┆ x2 ┆ ints │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ struct[2] │
# ╞═════╪═════╪═══════════╡
# │ 1 ┆ 1 ┆ {1,1} │
# │ 2 ┆ 2 ┆ {2,2} │
# │ 6 ┆ 3 ┆ {6,3} │
# └─────┴─────┴───────────┘ ... and for "expanding" calls (with df = pl.DataFrame( {"x1":[1,2,6], "x2":[1,2,3]} ).with_columns( mins=pl.all().min().suffix("_min") )
# ┌─────┬─────┬───────────┐
# │ x1 ┆ x2 ┆ mins │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ struct[2] │
# ╞═════╪═════╪═══════════╡
# │ 1 ┆ 1 ┆ {1,1} │
# │ 2 ┆ 2 ┆ {1,1} │
# │ 6 ┆ 3 ┆ {1,1} │
# └─────┴─────┴───────────┘
df.unnest('mins')
# ┌─────┬─────┬────────┬────────┐
# │ x1 ┆ x2 ┆ x1_min ┆ x2_min │
# │ --- ┆ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 ┆ i64 │
# ╞═════╪═════╪════════╪════════╡
# │ 1 ┆ 1 ┆ 1 ┆ 1 │
# │ 2 ┆ 2 ┆ 1 ┆ 1 │
# │ 6 ┆ 3 ┆ 1 ┆ 1 │
# └─────┴─────┴────────┴────────┘ ... but doesn't for REGEX (as the expr doesn't know if it will match one or more cols until evaluated). |
You read my mind ;) I don't like relying on the |
Then I think we must follow up with that checking function and then we can make it default, I think. |
Side-note: how does one select column names that looks like regexes? 🤣 df = pl.DataFrame( {"^x.z$": [1,2,3]} )
# shape: (3, 1)
# ┌───────┐
# │ ^x.z$ │
# │ --- │
# │ i64 │
# ╞═══════╡
# │ 1 │
# │ 2 │
# │ 3 │
# └───────┘
df.select("^x.z$")
# shape: (0, 0)
# ┌┐
# ╞╡
# └┘ |
Those column names are not allowed in polars. ;) Or you must use a regex escape as fallback. The meta utilities are coming up btw. |
Shall we actively prevent them? Could easily detect/raise inside
Fantastic :) |
Yeah, I could add an assert there. |
Why do not automatically convert expression with multiple names into a struct when we used alias so that :
would work as well ? I think this is a nice feature, It could be extend to other functions like |
5533bc7
to
d6451d5
Compare
Hmm... It doesn't feel as intentional?
Update: Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small remark.
Can we also add a structifying
example to with_columns
. This behavior deserves some documentation. :)
A very good point :) |
b3cab2e
to
bef9da0
Compare
…ultiple output names to struct form
bef9da0
to
caefb64
Compare
Done; added an extra docstring example/explanation, squashed, rebased...👌 |
Almost there @alexander-beedie. Maybe I missed it, but I believe we don't yet have added this functionality to |
Ahh, you didn't miss it - apparently I missed that we were extending it there (though it makes perfect sense to do so!). I'll be at the airport shortly - will see if there's time to do it before boarding - and, if not, I'm still hoping it might be possible to do a commit from 10km up 😂 |
Yeah, I want to keep those two consistent. There is no hurry. The 0.16 release will still take few days. Have a good flight! |
with_columns
kwarg expressions with multiple output names to structwith_columns
kwarg expressions with multiple output names to struct; extend **named_kwargs
support to select
7fe63f5
to
c4ee51d
Compare
Alright. Here goes! 💯 |
…s with multiple output names to struct; extend `**named_kwargs` support to `select` (pola-rs#6497) Co-authored-by: Ritchie Vink <ritchie46@gmail.com>
Closes #6486, removes the "experimental" status from
with_columns
kwargs, and adds the same capability toselect
.Example: