Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] execution order in .when() chain #7725

Closed
stevenlis opened this issue Mar 23, 2023 · 5 comments
Closed

[DOC] execution order in .when() chain #7725

stevenlis opened this issue Mar 23, 2023 · 5 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@stevenlis
Copy link

Problem description

I encounter a situation I think it's important to be mentioned in the doc:

import polars as pl
df = pl.DataFrame(
    {'a': ['T', 'T', 'T', 'T', None],
     'b': ['D', 'D', 'D', 'F', None]}
)
shape: (5, 2)
┌──────┬──────┐
│ a    ┆ b    │
│ ---  ┆ ---  │
│ str  ┆ str  │
╞══════╪══════╡
│ T    ┆ D    │
│ T    ┆ D    │
│ T    ┆ D    │
│ T    ┆ F    │
│ null ┆ null │
└──────┴──────┘

I want to use T in a when T and use F in b when F. There is a conflict at row 4 in which you have both T and F:

df.with_columns(
    pl
    .when(pl.col('b') == 'F').then('F')
    .when(pl.col('a') == 'T').then('T')
    .otherwise('C').alias('c')
)
shape: (5, 3)
┌──────┬──────┬─────┐
│ a    ┆ b    ┆ c   │
│ ---  ┆ ---  ┆ --- │
│ str  ┆ str  ┆ str │
╞══════╪══════╪═════╡
│ T    ┆ D    ┆ T   │
│ T    ┆ D    ┆ T   │
│ T    ┆ D    ┆ T   │
│ T    ┆ F    ┆ F   │
│ null ┆ null ┆ C   │
└──────┴──────┴─────┘

But if I change the order of the when()chain:

df.with_columns(
    pl
    .when(pl.col('a') == 'T').then('T')
    .when(pl.col('b') == 'F').then('F')
    .otherwise('C').alias('c')
)
shape: (5, 3)
┌──────┬──────┬─────┐
│ a    ┆ b    ┆ c   │
│ ---  ┆ ---  ┆ --- │
│ str  ┆ str  ┆ str │
╞══════╪══════╪═════╡
│ T    ┆ D    ┆ T   │
│ T    ┆ D    ┆ T   │
│ T    ┆ D    ┆ T   │
│ T    ┆ F    ┆ T   │
│ null ┆ null ┆ C   │
└──────┴──────┴─────┘

If this is the expected behavior and the result also corresponds to the order of the chain. I would say it should be mentioned in the doc. This is also kind of counterintuitive as a later expr does not cover the results of an earlier expr, but the earlier one is the finial one.

@stevenlis stevenlis added the enhancement New feature or an improvement of an existing feature label Mar 23, 2023
@avimallu
Copy link
Contributor

avimallu commented Mar 23, 2023

I think your counter-intuitiveness stems from your understanding of the when,then,otherwise to be equivalent to (pseudo-code):

IF column A = "T":
  C = "T"
IF column B = "F":
  C = "F"
ELSE
  C = "C"

when it is actually more like:

IF column A = "T":
  C = "T"
ELSE IF column B = "F":
  C = "F"
ELSE
  C = "C"

In the first case, the second IF statement will be checked regardless of whether the first one has run. In the second case, the second IF statement will be checked only if the first one was FALSE. The when,then,otherwise syntax is inspired from SQL's CASE WHEN ... THEN ... ELSE ..

@stevenlis
Copy link
Author

@avimallu I see... Thanks for the explanation.. So if...elif...that's easy to understand.

@zundertj
Copy link
Collaborator

The documentation is lacking here indeed. I have made an attempt to clarify this (+optional otherwise) in #7793. Feedback welcome.

@stevenlis
Copy link
Author

@zundertj could you also take a look at this one> #7794?

@stevenlis
Copy link
Author

#7793 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants