Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Batch Plain Parquet UTF-8 verification #18397

Merged
merged 1 commit into from
Aug 27, 2024

Conversation

coastalwhite
Copy link
Collaborator

$ POLARS_MAX_THREADS=1 poop './plparbench-before 10 plain.parquet auto' './plparbench-after 10 plain.parquet auto'

Benchmark 1 (3 runs): ./plparbench-before 10 plain.parquet auto
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          13.0s  ±  701ms    12.5s  … 13.8s           0 ( 0%)        0%
  peak_rss           1.46GB ±  407KB    1.46GB … 1.46GB          0 ( 0%)        0%
  cpu_cycles         37.9G  ± 84.2M     37.8G  … 38.0G           0 ( 0%)        0%
  instructions        113G  ± 46.7M      113G  …  113G           0 ( 0%)        0%
  cache_references    426M  ±  397K      425M  …  426M           0 ( 0%)        0%
  cache_misses        111M  ±  777K      110M  …  112M           0 ( 0%)        0%
  branch_misses       634K  ± 53.6K      597K  …  695K           0 ( 0%)        0%
Benchmark 2 (3 runs): ./plparbench-after 10 plain.parquet auto
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          8.55s  ± 36.1ms    8.51s  … 8.58s           0 ( 0%)        ⚡- 34.2% ±  8.7%
  peak_rss           1.46GB ±  179KB    1.46GB … 1.46GB          0 ( 0%)          -  0.0% ±  0.0%
  cpu_cycles         20.6G  ± 3.93M     20.6G  … 20.6G           0 ( 0%)        ⚡- 45.6% ±  0.4%
  instructions       50.0G  ± 11.1M     50.0G  … 50.0G           0 ( 0%)        ⚡- 55.6% ±  0.1%
  cache_references    377M  ±  299K      377M  …  378M           0 ( 0%)        ⚡- 11.4% ±  0.2%
  cache_misses        102M  ± 94.0K      102M  …  102M           0 ( 0%)        ⚡-  8.1% ±  1.1%
  branch_misses       237K  ± 8.80K      229K  …  247K           0 ( 0%)        ⚡

For PLAIN string page decoding, this can significantly improve the performance.

```
$ POLARS_MAX_THREADS=1 poop './plparbench-before 10 plain.parquet auto' './plparbench-after 10 plain.parquet auto'

Benchmark 1 (3 runs): ./plparbench-before 10 plain.parquet auto
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          13.0s  ±  701ms    12.5s  … 13.8s           0 ( 0%)        0%
  peak_rss           1.46GB ±  407KB    1.46GB … 1.46GB          0 ( 0%)        0%
  cpu_cycles         37.9G  ± 84.2M     37.8G  … 38.0G           0 ( 0%)        0%
  instructions        113G  ± 46.7M      113G  …  113G           0 ( 0%)        0%
  cache_references    426M  ±  397K      425M  …  426M           0 ( 0%)        0%
  cache_misses        111M  ±  777K      110M  …  112M           0 ( 0%)        0%
  branch_misses       634K  ± 53.6K      597K  …  695K           0 ( 0%)        0%
Benchmark 2 (3 runs): ./plparbench-after 10 plain.parquet auto
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          8.55s  ± 36.1ms    8.51s  … 8.58s           0 ( 0%)        ⚡- 34.2% ±  8.7%
  peak_rss           1.46GB ±  179KB    1.46GB … 1.46GB          0 ( 0%)          -  0.0% ±  0.0%
  cpu_cycles         20.6G  ± 3.93M     20.6G  … 20.6G           0 ( 0%)        ⚡- 45.6% ±  0.4%
  instructions       50.0G  ± 11.1M     50.0G  … 50.0G           0 ( 0%)        ⚡- 55.6% ±  0.1%
  cache_references    377M  ±  299K      377M  …  378M           0 ( 0%)        ⚡- 11.4% ±  0.2%
  cache_misses        102M  ± 94.0K      102M  …  102M           0 ( 0%)        ⚡-  8.1% ±  1.1%
  branch_misses       237K  ± 8.80K      229K  …  247K           0 ( 0%)        ⚡
```

For `PLAIN` string page decoding, this can significantly improve the performance.
@github-actions github-actions bot added performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars labels Aug 27, 2024
Copy link

codecov bot commented Aug 27, 2024

Codecov Report

Attention: Patch coverage is 76.47059% with 4 lines in your changes missing coverage. Please review.

Project coverage is 79.87%. Comparing base (884b2ac) to head (32e7ff7).
Report is 57 commits behind head on main.

Files with missing lines Patch % Lines
...lars-parquet/src/arrow/read/deserialize/binview.rs 76.47% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #18397      +/-   ##
==========================================
- Coverage   79.87%   79.87%   -0.01%     
==========================================
  Files        1495     1495              
  Lines      200199   200210      +11     
  Branches     2867     2867              
==========================================
+ Hits       159905   159911       +6     
- Misses      39748    39753       +5     
  Partials      546      546              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46 ritchie46 merged commit fb77bad into pola-rs:main Aug 27, 2024
22 checks passed
@coastalwhite coastalwhite deleted the perf-parquet-batch-plain-utf8 branch August 27, 2024 12:09
@c-peters c-peters added the accepted Ready for implementation label Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants