Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Ipc exec multiple paths #15040

Merged
merged 17 commits into from
Mar 21, 2024

Conversation

mickvangelderen
Copy link
Contributor

No description provided.

Copy link

codecov bot commented Mar 13, 2024

Codecov Report

Attention: Patch coverage is 96.24060% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 81.22%. Comparing base (103bb66) to head (4c116c3).
Report is 2 commits behind head on main.

❗ Current head 4c116c3 differs from pull request most recent head a0194c9. Consider uploading reports for the commit a0194c9 to get more accurate results

Files Patch % Lines
...olars-lazy/src/physical_plan/executors/scan/ipc.rs 95.49% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15040      +/-   ##
==========================================
- Coverage   81.25%   81.22%   -0.04%     
==========================================
  Files        1354     1348       -6     
  Lines      175390   175534     +144     
  Branches     2518     2508      -10     
==========================================
+ Hits       142505   142569      +64     
- Misses      32404    32485      +81     
+ Partials      481      480       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46 ritchie46 changed the title Ipc exec multiple paths perf: Ipc exec multiple paths Mar 14, 2024
@github-actions github-actions bot added performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars labels Mar 14, 2024
@mickvangelderen
Copy link
Contributor Author

This PR seems to pass now but won't after merging #15065

@mickvangelderen mickvangelderen force-pushed the ipc-exec-multiple-paths branch 3 times, most recently from 82a5e8f to 7b8bb3c Compare March 19, 2024 10:42
@@ -101,7 +227,7 @@ impl Executor for IpcExec {
};

let profile_name = if state.has_node_timer() {
let mut ids = vec![self.path.to_string_lossy().into()];
let mut ids = vec![self.paths[0].to_string_lossy().into()];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken from the parquet implementation. Should we just map each path instead?

@mickvangelderen mickvangelderen force-pushed the ipc-exec-multiple-paths branch 5 times, most recently from cf9b13a to 4c116c3 Compare March 20, 2024 10:51
@mickvangelderen mickvangelderen marked this pull request as ready for review March 20, 2024 11:00
crates/polars-lazy/src/physical_plan/executors/scan/ipc.rs Outdated Show resolved Hide resolved
crates/polars-lazy/src/scan/ipc.rs Outdated Show resolved Hide resolved
Comment on lines +191 to +204
def test_scan(
capfd: Any, monkeypatch: pytest.MonkeyPatch, data_file: _DataFile, force_async: bool
) -> None:
if data_file.path.suffix == ".csv" and force_async:
pytest.skip(reason="async reading of .csv not yet implemented")

if force_async:
_enable_force_async(monkeypatch)

df = _scan(data_file.path, data_file.df.schema).collect()

if force_async:
_assert_force_async(capfd)

Copy link
Contributor Author

@mickvangelderen mickvangelderen Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each test fn now has a lot of fluff which distracts from what the test is actually doing. I have chosen to leave it as is now so we can get this feature out. If anyone sees a way to simplify and deduplicate this please share your ideas.

What I was thinking of is having fixtures for (ipc|csv|parquet, sync|async) * (single|glob), and then having a separate test file cache which can memoize requested test files (ipc|csv|parquet, single|glob).

@ritchie46
Copy link
Member

Hmm.. Something goes wrong with the windows tests.

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Left a few comments.

crates/polars-lazy/src/physical_plan/executors/scan/ipc.rs Outdated Show resolved Hide resolved
crates/polars-lazy/src/physical_plan/executors/scan/ipc.rs Outdated Show resolved Hide resolved
crates/polars-lazy/src/physical_plan/executors/scan/ipc.rs Outdated Show resolved Hide resolved
crates/polars-lazy/src/physical_plan/executors/scan/ipc.rs Outdated Show resolved Hide resolved
crates/polars-lazy/src/scan/ipc.rs Outdated Show resolved Hide resolved
crates/polars-lazy/src/physical_plan/executors/scan/ipc.rs Outdated Show resolved Hide resolved
@mickvangelderen
Copy link
Contributor Author

mickvangelderen commented Mar 20, 2024

Hmm.. Something goes wrong with the windows tests.

@ritchie46 Testing this idea to really fix the windows tests #15191

@ritchie46 ritchie46 merged commit 645ce62 into pola-rs:main Mar 21, 2024
23 checks passed
@mickvangelderen mickvangelderen deleted the ipc-exec-multiple-paths branch March 21, 2024 13:24
@c-peters c-peters added the accepted Ready for implementation label Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants