Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Ensure better chunk sizes #16071

Merged
merged 3 commits into from
May 6, 2024
Merged

perf: Ensure better chunk sizes #16071

merged 3 commits into from
May 6, 2024

Conversation

ritchie46
Copy link
Member

If a chunk in a DataFrame was 100k rows and we split at 100k - 1 we wrote row-groups of 1, which is terrible. This ensures we keep reasonable sized chunks.

@github-actions github-actions bot added performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars labels May 6, 2024
Copy link

codecov bot commented May 6, 2024

Codecov Report

Attention: Patch coverage is 83.92857% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 80.92%. Comparing base (cb9db1c) to head (504701d).
Report is 2 commits behind head on main.

❗ Current head 504701d differs from pull request most recent head c690aa3. Consider uploading reports for the commit c690aa3 to get more accurate results

Files Patch % Lines
crates/polars-core/src/utils/mod.rs 82.35% 6 Missing ⚠️
crates/polars-core/src/frame/mod.rs 50.00% 1 Missing ⚠️
crates/polars-parquet/src/parquet/write/stream.rs 0.00% 1 Missing ⚠️
...es/polars-pipe/src/executors/sinks/group_by/ooc.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16071      +/-   ##
==========================================
- Coverage   80.97%   80.92%   -0.06%     
==========================================
  Files        1386     1386              
  Lines      178380   178445      +65     
  Branches     3059     3075      +16     
==========================================
- Hits       144448   144410      -38     
- Misses      33442    33545     +103     
  Partials      490      490              
Flag Coverage Δ
python 74.47% <82.14%> (-0.06%) ⬇️
rust 78.12% <83.92%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46 ritchie46 merged commit 4b63818 into main May 6, 2024
19 of 20 checks passed
@ritchie46 ritchie46 deleted the parquet_groups branch May 6, 2024 09:13
@ritchie46
Copy link
Member Author

@alexander-beedie the benchmarks fails because it cannot find torch. :/

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented May 6, 2024

@alexander-beedie the benchmarks fails because it cannot find torch. :/

@ritchie46: This should fix it; added the new reqs to bench/coverage jobs (#16072).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants