Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional axis parameter for pl.any , pl.Expr.any, pl.all, pl.Expr.all #8503

Closed
lucazanna opened this issue Apr 25, 2023 · 5 comments
Closed
Labels
enhancement New feature or an improvement of an existing feature

Comments

@lucazanna
Copy link

lucazanna commented Apr 25, 2023

Problem description

I just read a Stack Overflow answer which shows a concise way to verify equality in Polars:
https://stackoverflow.com/questions/76092263/column-and-row-wise-logical-operations-on-polars-dataframe

I like that the code is short and easy to write:

# Verify which columns contain a specific value
df.select((pl.all() == "?").any())

# Verify which rows contain a specific value
df.select(pl.any(pl.all() == "?"))

While I like the compactness of it, I feel that at first read is not very clear that pl.Expr.any does computations column-wise while pl.any does computations row-wise.
I believe the clarity of Polars expressions is one of the strengths of the library.

My proposal for this:

  • is it possible to add an optional 'axis' parameter to pl.any, pl.Expr.any, pl.all, pl.Expr.all that allows to specify if the computation should be column-wise or row-wise?
  • in addition to it, what about aligning these functions to run column-wise by default?
    I believe having pl.any work the same way as pl.Expr.any would be good for the predictability of the language

What are your thoughts?

@lucazanna lucazanna added the enhancement New feature or an improvement of an existing feature label Apr 25, 2023
@gab23r
Copy link
Contributor

gab23r commented Apr 25, 2023

If you do that I am scared that you need to do that for each method: min, max, sum, mean...

@cmdlineluser
Copy link
Contributor

Perhaps there should be an .arr.all() so you could write it similar to the other "row-wise" computations?

It would be slightly more verbose:

df.select(pl.concat_list(pl.all() == "?").arr.all())

@lucazanna
Copy link
Author

lucazanna commented Apr 25, 2023

Fair point @gab23r

i like @cmdlineluser ´s idea of concat_list with .arr.any or .arr.all for row-wise computations

it would be more verbose that what exists today, but it would follow Polars logic and be easy to understand

and the existing pl.any , pl.Expr.any, etc could work column-wise

This can also fit into Polars logic that fast code should be easy to write.
So column-wise code can be easier to write

if someone is doing lots of row-wise computations, maybe the dataframe should be reworked / transposed / melted

@cmdlineluser
Copy link
Contributor

Just browsing old issues, I think .all_horizontal / .any_horizontal #9752 resolves this.

@lucazanna
Copy link
Author

@cmdlineluser yes it does. closing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants