-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust, python): expressify offset
and length
parameters for str.slice
#12071
Conversation
let offset = offset.cast(&DataType::Int64)?; | ||
let offset = offset.i64()?; | ||
|
||
let length = match s2.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to add broadcasting by extending memory. See discussion here; #11900
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Excellent, thanks @ritchie46 - that explains the need for the added length checks / fallback logic.
I'll switch it around.
binary_elementwise(ca, length, |opt_str_val, opt_length| utf8_substring_ternary(opt_str_val, offset.get(0), opt_length)) Does anybody know what I'm doing wrong here? error: implementation of `FnOnce` is not general enough
--> /Users/user/git/polars/crates/polars-ops/src/chunked_array/strings/substring.rs:63:5
|
63 | binary_elementwise(ca, length, |opt_str_val, opt_length| utf8_substring_ternary(opt_str_val, offset.get(0), opt_length))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ implementation of `FnOnce` is not general enough
|
= note: closure with signature `fn(std::option::Option<&'2 str>, std::option::Option<u64>) -> std::option::Option<&str>` must implement `FnOnce<(std::option::Option<&'1 str>, std::option::Option<u64>)>`, for any lifetime `'1`...
= note: ...but it actually implements `FnOnce<(std::option::Option<&'2 str>, std::option::Option<u64>)>`, for some specific lifetime `'2` |
@cmdlineluser You're not doing anything wrong, the compiler is just inferring stuff wrong here. Let me check if I can rework this... |
@cmdlineluser Unfortunately this is one place where we must help the compiler a bit. This works: match (offset.len(), length.len()) {
(1, 1) => ca.apply_generic(|opt_str_val| {
utf8_substring_ternary(opt_str_val, offset.get(0), length.get(0))
}),
(1, _) => {
fn infer<F: for<'a> FnMut(Option<&'a str>, Option<u64>) -> Option<&'a str>>(f: F) -> F where
{
f
}
let off = offset.get(0);
binary_elementwise(
ca,
length,
infer(|val, len| utf8_substring_ternary(val, off, len)),
)
},
(_, 1) => {
fn infer<F: for<'a> FnMut(Option<&'a str>, Option<i64>) -> Option<&'a str>>(f: F) -> F where
{
f
}
let len = length.get(0);
binary_elementwise(
ca,
offset,
infer(|val, off| utf8_substring_ternary(val, off, len)),
)
},
_ => ternary_elementwise(ca, offset, length, utf8_substring_ternary),
} |
Ah, thank you @orlp So that's what the
polars/crates/polars-ops/src/chunked_array/strings/namespace.rs Lines 17 to 24 in a366bc9
|
@cmdlineluser Yes, that's why. Perhaps we should be using the https://crates.io/crates/higher-order-closure crates for this. |
It's a real pain fighting the compiler here, it would be awesome if this crates could save us. 😞 |
Hi @cmdlineluser. Thank you for this nice work! If you don't have free time, can I take over and keep pushing it as we have some future work that depends on this one. Of course, I will preserve your authorship. :D) |
@reswqa Please feel free to take over - thank you. |
Superseded by #13747 |
Attempts to resolve #10890
I see it had been previously self-assigned @orlp - hopefully I'm not intruding here.
(I imagine you had more important things to work on.)
I used the
strip_prefix
impl as a guide/template https://github.com/pola-rs/polars/blob/main/crates/polars-ops/src/chunked_array/strings/strip.rsSome things I'm not sure about if anybody wants to give feedback:
offset
andlength
using.clear().extend_constant()
- is this an acceptable approach or is it a bad idea + performance concern?.cast()
?UInt64
the correct type forlength
?offset
but internallystart
is used, I changed everything tooffset
, is this the right approach?polars-sql
as it used.slice()
(incidentally, that usesstart
as the param name, should it bestart
everywhere instead ofoffset
?)!matches!
?length
is a param name, I had to re-jig the error format a bit to avoid ending up withlength value length
.