Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add ignore_nulls for pl.concat_str #13877

Merged
merged 1 commit into from
Jan 21, 2024

Conversation

reswqa
Copy link
Collaborator

@reswqa reswqa commented Jan 20, 2024

I set the default value of ignore_nulls to False so that we can keep the same behavior as before. We could change it to True in the breaking release then.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jan 20, 2024
@@ -839,14 +839,14 @@ impl SQLFunctionVisitor<'_> {
Concat => if function.args.is_empty() {
polars_bail!(InvalidOperation: "Invalid number of arguments for Concat: 0");
} else {
self.visit_variadic(|exprs: &[Expr]| concat_str(exprs, ""))
self.visit_variadic(|exprs: &[Expr]| concat_str(exprs, "", true))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By offline discuss with Alex. This SQL part change the behavior of Concat and ConcatWS, but they didn't match the SQL standard before indeed. So this changes should be treated as a bug-fix.

options,
},
_,
) => {
if sep.is_empty() {
if sep.is_empty() && !ignore_nulls {
Copy link
Collaborator Author

@reswqa reswqa Jan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For concat + str and str + concat, we can only simplify it if concat propagate null values, otherwise it will change the behavior of None + str = None.

For example, the following result should be a Series with None:

pl.select(
      pl.lit(None, dtype=pl.String) 
     + pl.concat_str(pl.lit("a"), pl.lit("b"), ignore_nulls=True)
)

Comment on lines +66 to +69
"c2": ["a-d", "e", "c-f"],
"c3": ["aad", "e", "ccf"],
"c4": ["ad2", "e4", "cf6"],
"c5": ["a:d:1", "e:2", "c:f:3"],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexander-beedie, I think that's the kind of behavior we want, right?

@mcrumiller
Copy link
Contributor

Is there an issue with a discusson related to this? I never realized that having a single null value wipes your entire concat_str string:

>>> pl.select(pl.concat_str(pl.lit("a"), None, pl.lit("b")))
shape: (1, 1)
┌─────────┐
│ literal │
│ ---     │
│ str     │
╞═════════╡
│ null    │
└─────────┘

I would prefer a null_str='' argument, which allows for cases like:

>>> pl.select(pl.concat_str(col("last_name"), col("first_name"), null_str="<missing>", separator=", "))
shape: (1, 1)
┌──────────────────┐
│ literal          │
│ ---              │
│ str              │
╞══════════════════╡
│ Smith, John| Price, <missing> |
└──────────────────┘

I suppose in the meantime one could do col(x).fill_null("<missing>"). Maybe this has to do with aligning with SQL behavior, which I'm unsure how exactly that works.

@reswqa
Copy link
Collaborator Author

reswqa commented Jan 20, 2024

I never realized that having a single null value wipes your entire concat_str string.

Unfortunately, this is the default behavior right now, but I want to change it in the next breaking release.

Is there an issue with a discusson related to this?

Yes, we have #13633.

I would prefer a null_str='' argument, which allows for cases like:

I tend to explicit use col(x).fill_null(), not by an extra keyword argument. ignore/propagate nulls are more basic behaviors that the engine should do.

@mcrumiller
Copy link
Contributor

@reswqa makes sense, thanks!

@ritchie46 ritchie46 merged commit 4ec24a9 into pola-rs:main Jan 21, 2024
27 checks passed
r-brink pushed a commit to r-brink/polars that referenced this pull request Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants