-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add ignore_nulls
for pl.concat_str
#13877
Conversation
20fd694
to
db88f4c
Compare
@@ -839,14 +839,14 @@ impl SQLFunctionVisitor<'_> { | |||
Concat => if function.args.is_empty() { | |||
polars_bail!(InvalidOperation: "Invalid number of arguments for Concat: 0"); | |||
} else { | |||
self.visit_variadic(|exprs: &[Expr]| concat_str(exprs, "")) | |||
self.visit_variadic(|exprs: &[Expr]| concat_str(exprs, "", true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By offline discuss with Alex. This SQL part change the behavior of Concat
and ConcatWS
, but they didn't match the SQL standard before indeed. So this changes should be treated as a bug-fix.
options, | ||
}, | ||
_, | ||
) => { | ||
if sep.is_empty() { | ||
if sep.is_empty() && !ignore_nulls { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For concat + str
and str + concat
, we can only simplify it if concat
propagate null values, otherwise it will change the behavior of None + str = None
.
For example, the following result should be a Series with None
:
pl.select(
pl.lit(None, dtype=pl.String)
+ pl.concat_str(pl.lit("a"), pl.lit("b"), ignore_nulls=True)
)
"c2": ["a-d", "e", "c-f"], | ||
"c3": ["aad", "e", "ccf"], | ||
"c4": ["ad2", "e4", "cf6"], | ||
"c5": ["a:d:1", "e:2", "c:f:3"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexander-beedie, I think that's the kind of behavior we want, right?
Is there an issue with a discusson related to this? I never realized that having a single null value wipes your entire >>> pl.select(pl.concat_str(pl.lit("a"), None, pl.lit("b")))
shape: (1, 1)
┌─────────┐
│ literal │
│ --- │
│ str │
╞═════════╡
│ null │
└─────────┘ I would prefer a >>> pl.select(pl.concat_str(col("last_name"), col("first_name"), null_str="<missing>", separator=", "))
shape: (1, 1)
┌──────────────────┐
│ literal │
│ --- │
│ str │
╞══════════════════╡
│ Smith, John │
| Price, <missing> |
└──────────────────┘ I suppose in the meantime one could do |
Unfortunately, this is the default behavior right now, but I want to change it in the next breaking release.
Yes, we have #13633.
I tend to explicit use |
@reswqa makes sense, thanks! |
I set the default value of
ignore_nulls
toFalse
so that we can keep the same behavior as before. We could change it toTrue
in the breaking release then.