Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Better GC and push_view for binviews #17627

Merged
merged 6 commits into from
Jul 17, 2024
Merged

Conversation

ruihe774
Copy link
Contributor

This PR add the logic to consider reference count when deciding whether to perform GC or not in BinaryViewArrayGeneric::maybe_gc(). Shared buffers that have multiple references won't be freed after GC, so including them when calculating memory savings, in the previous logic, does not make sense. This PR can avoid unnecessary GC in this case.

@github-actions github-actions bot added performance Performance issues or improvements rust Related to Rust Polars labels Jul 14, 2024
Copy link

codecov bot commented Jul 14, 2024

Codecov Report

Attention: Patch coverage is 61.76471% with 78 lines in your changes missing coverage. Please review.

Project coverage is 80.64%. Comparing base (b85654e) to head (405d7d2).
Report is 10 commits behind head on main.

Files Patch % Lines
crates/polars-arrow/src/array/binview/mutable.rs 42.15% 59 Missing ⚠️
crates/polars-arrow/src/array/binview/mod.rs 14.28% 18 Missing ⚠️
crates/polars-arrow/src/array/growable/binview.rs 97.95% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17627      +/-   ##
==========================================
- Coverage   80.69%   80.64%   -0.05%     
==========================================
  Files        1484     1485       +1     
  Lines      195409   195582     +173     
  Branches     2779     2782       +3     
==========================================
+ Hits       157679   157728      +49     
- Misses      37220    37342     +122     
- Partials      510      512       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46
Copy link
Member

I am a bit conservative here because we had multiple problems with quadratic increasing buffers. Can you check if these issue doesn't occur again? #16989 and #15615

@ruihe774 ruihe774 marked this pull request as draft July 15, 2024 10:46
@ruihe774 ruihe774 force-pushed the better-gc branch 6 times, most recently from ced6f93 to 20e4883 Compare July 16, 2024 00:35
@ruihe774 ruihe774 changed the title perf(rust): Consider ref count when deciding GC perf(rust): Better GC and push_view for binviews Jul 16, 2024
@ruihe774
Copy link
Contributor Author

I am a bit conservative here because we had multiple problems with quadratic increasing buffers. Can you check if these issue doesn't occur again? #16989 and #15615

These problems were caused by not deduplicating the data buffers. I make a refactor that moves buffer deduplication from GrowableBinaryViewArray to MutableBinaryViewArray so that it can be performed when appending individual views as well. IfThenElseKernel now uses the newly introduced push_view_* and extend_views_* methods that perform deduplication transparently.

@ruihe774 ruihe774 marked this pull request as ready for review July 16, 2024 00:50
@ritchie46
Copy link
Member

Thank you @ruihe774. It looks good. I did a run of the mentioned regressions just to be sure and all is good.

@ritchie46 ritchie46 merged commit 28d8196 into pola-rs:main Jul 17, 2024
21 checks passed
@ritchie46 ritchie46 changed the title perf(rust): Better GC and push_view for binviews perf: Better GC and push_view for binviews Jul 17, 2024
@github-actions github-actions bot added the python Related to Python Polars label Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants