Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Speed up dt.offset_by 2x for constant durations #16728

Merged
merged 5 commits into from
Jun 6, 2024

Conversation

MarcoGorelli
Copy link
Collaborator

@MarcoGorelli MarcoGorelli commented Jun 4, 2024

closes #16722 (timing becomes in line with the alternative method presented - I couldn't reproduce the 10x difference reported)

timing results available here https://www.kaggle.com/code/marcogorelli/polars-timing?scriptVersionId=181508652


I think this is fine but I didn't sleep too well so I'll check over it again tomorrow before marking as 'ready for review'


The gist of the change is: in the simple case, keep things simple and use apply_values

@github-actions github-actions bot added performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars labels Jun 4, 2024
// fastpath!
match datetime.time_unit() {
TimeUnit::Milliseconds => {
Ok(datetime.0.apply_values(|v| v + offset.duration_ms()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes the same duration_*s() call for every element, right? You can pull the offset out front and then apply the value, as in:

let offset = match datetime.time_unit() {
    TimeUnit::Milliseconds => offset.duration_ms(),
    TimeUnit::Microseconds => offset.duration_us(),
    TimeUnit::Nanoseconds => offset.duration_ns(),
};
Ok(datetime.0.apply_values(|v| v + offset)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @mcrumiller!

Copy link

codspeed-hq bot commented Jun 4, 2024

CodSpeed Performance Report

Merging #16728 will not alter performance

Comparing MarcoGorelli:faster-offset-by (42777b5) with main (5d0e339)

Summary

✅ 37 untouched benchmarks

Copy link

codecov bot commented Jun 4, 2024

Codecov Report

Attention: Patch coverage is 96.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 81.45%. Comparing base (79afe75) to head (42777b5).
Report is 11 commits behind head on main.

Files Patch % Lines
...ates/polars-plan/src/dsl/function_expr/temporal.rs 96.66% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #16728   +/-   ##
=======================================
  Coverage   81.45%   81.45%           
=======================================
  Files        1413     1413           
  Lines      186096   186057   -39     
  Branches     2776     2756   -20     
=======================================
- Hits       151585   151557   -28     
- Misses      33991    33997    +6     
+ Partials      520      503   -17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MarcoGorelli MarcoGorelli changed the title perf: speed up offset_by for constant durations perf: speed up offset_by 2x for constant durations Jun 4, 2024
@MarcoGorelli MarcoGorelli marked this pull request as ready for review June 4, 2024 20:46
@MarcoGorelli MarcoGorelli marked this pull request as draft June 4, 2024 20:46
@MarcoGorelli MarcoGorelli marked this pull request as ready for review June 5, 2024 06:29
@@ -189,33 +189,53 @@ pub(super) fn datetime(
fn apply_offsets_to_datetime(
datetime: &Logical<DatetimeType, Int64Type>,
offsets: &StringChunked,
offset_fn: fn(&Duration, i64, Option<&Tz>) -> PolarsResult<i64>,
time_zone: Option<&Tz>,
) -> PolarsResult<Int64Chunked> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just notice this now, but ideally the implementation should be in polars-ops or polars-time and then we only dispatch to those functions here. I'd like polars-plan to only be about the query planning/resolving.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup - ok if I do that separately?

TimeUnit::Microseconds => offset.duration_us(),
TimeUnit::Nanoseconds => offset.duration_ns(),
};
if offset.negative() {
Copy link
Member

@ritchie46 ritchie46 Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reduce binary bloat, I think we can make one code branch of this that does the apply_values.

if offset.negative() {
   duration = - duration;
}

Ok(datetime.0.apply_values(|v| v + duration))

I think we can also use our already compiled kernels now. Which might also have more SIMD acceleration. If not, it at least saves compiler bloat.

// apply to inner and then you have to reconstruct the datetime.
datetime.0.clone().wrapping_add_scalar(duration);

@ritchie46
Copy link
Member

Great hammering on those temporal performance! 💪 Left some comments.

@ritchie46 ritchie46 merged commit c6ab549 into pola-rs:main Jun 6, 2024
26 checks passed
@stinodego stinodego changed the title perf: speed up offset_by 2x for constant durations perf: Speed up dt.offset_by 2x for constant durations Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Alternative method 10x faster than dt.offset_by()
3 participants