Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CHORE] Remove daft-scan dependency from planning crates #3250

Merged
merged 6 commits into from
Nov 12, 2024

Conversation

kevinzwang
Copy link
Member

@kevinzwang kevinzwang commented Nov 8, 2024

This PR creates a better abstraction over scan tasks and uses that abstraction in the planning crates (daft-logical-plan, daft-physical-plan, daft-local-plan). This is done by creating a new common-scan-info crate which holds a ScanTaskLike trait which planning crates can use to access scan task info without needing knowledge of implementation.

Now, in our planning stages, we solely use our ScanTaskLike abstraction, and then in execution, we downcast to ScanTask to actually materialize them. That way, we can actually remove the daft-scan dependency from planning crates.

Copy link

codspeed-hq bot commented Nov 8, 2024

CodSpeed Performance Report

Merging #3250 will degrade performances by 51.94%

Comparing kevin/swap-plan-scan-dependency (0429eea) with main (16f5a8c)

Summary

❌ 1 regressions
✅ 16 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main kevin/swap-plan-scan-dependency Change
test_show[100 Small Files] 23.7 ms 49.3 ms -51.94%

Copy link

codecov bot commented Nov 8, 2024

Codecov Report

Attention: Patch coverage is 52.88303% with 286 lines in your changes missing coverage. Please review.

Project coverage is 77.62%. Comparing base (f290f40) to head (0429eea).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-scan/src/builder.rs 17.54% 94 Missing ⚠️
src/common/scan-info/src/test/mod.rs 39.32% 54 Missing ⚠️
src/daft-scan/src/lib.rs 20.93% 34 Missing ⚠️
src/common/scan-info/src/partitioning.rs 40.81% 29 Missing ⚠️
src/common/scan-info/src/pushdowns.rs 69.89% 28 Missing ⚠️
src/daft-scan/src/anonymous.rs 0.00% 18 Missing ⚠️
src/common/scan-info/src/python.rs 82.14% 15 Missing ⚠️
src/common/scan-info/src/scan_operator.rs 50.00% 6 Missing ⚠️
src/common/scan-info/src/scan_task.rs 0.00% 4 Missing ⚠️
src/daft-physical-plan/src/ops/scan.rs 25.00% 3 Missing ⚠️
... and 1 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3250      +/-   ##
==========================================
+ Coverage   76.07%   77.62%   +1.55%     
==========================================
  Files         644      658      +14     
  Lines       81581    80527    -1054     
==========================================
+ Hits        62065    62512     +447     
+ Misses      19516    18015    -1501     
Files with missing lines Coverage Δ
daft/logical/builder.py 89.87% <100.00%> (ø)
src/common/scan-info/src/expr_rewriter.rs 47.87% <ø> (ø)
src/common/scan-info/src/lib.rs 100.00% <100.00%> (ø)
src/daft-local-execution/src/pipeline.rs 95.30% <100.00%> (+8.34%) ⬆️
src/daft-local-execution/src/sources/scan_task.rs 75.55% <ø> (ø)
src/daft-local-plan/src/plan.rs 96.38% <100.00%> (+13.21%) ⬆️
src/daft-local-plan/src/translate.rs 93.58% <100.00%> (-0.05%) ⬇️
src/daft-logical-plan/src/builder.rs 92.11% <ø> (+11.46%) ⬆️
src/daft-logical-plan/src/lib.rs 100.00% <ø> (ø)
src/daft-logical-plan/src/ops/source.rs 55.81% <ø> (ø)
... and 25 more

... and 25 files with indirect coverage changes

Comment on lines 13 to 16
pub trait ScanTaskLike {
fn as_any(&self) -> &dyn Any;
fn as_any_arc(self: Arc<Self>) -> Arc<dyn Any + Send + Sync>;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we'd have a bit more methods on here. I was thinking we'd define the entire interface of what a ScanTask is here, then impl it elsewhere.

There are only a handful of methods on the scan task, so i feel like it'd be pretty easy to put that all into the trait.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this to make the actual API changes in this PR minimal, but I think it does make sense to define this trait a little better. Will work on that!

@kevinzwang kevinzwang changed the title [CHORE] Invert dependency between daft-scan and daft-logical-plan [CHORE] Remove daft-scan dependency from planning crates Nov 11, 2024
@universalmind303
Copy link
Contributor

@kevinzwang I took another look at this PR, and maybe it can be done in a separate PR, but we also need to invert the dependency between daft-table and daft-logical-plan.

@kevinzwang
Copy link
Member Author

@kevinzwang I took another look at this PR, and maybe it can be done in a separate PR, but we also need to invert the dependency between daft-table and daft-logical-plan.

I'll take a look at it, if it's a complex task i'll merge this one in first

@kevinzwang kevinzwang enabled auto-merge (squash) November 12, 2024 01:25
@kevinzwang kevinzwang merged commit 7e89850 into main Nov 12, 2024
42 of 44 checks passed
@kevinzwang kevinzwang deleted the kevin/swap-plan-scan-dependency branch November 12, 2024 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants