-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't optimize AnalyzeExec (#6379) (try 2) #6494
Conversation
))); | ||
// Gather futures that will run each input partition using a | ||
// JoinSet to cancel outstanding futures on drop | ||
let mut set = JoinSet::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses the cool JoinSet
I learned about from @nvartolomei and @Darksonn on #6449 ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logic was just extracted into its own function
#[cfg_attr(tarpaulin, ignore)] | ||
async fn csv_explain_analyze_order_by() { | ||
let ctx = SessionContext::new(); | ||
register_aggregate_csv_by_sql(&ctx).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the new test
// Turn the tasks in the JoinSet into a stream of | ||
// Result<usize> representing the counts of each output | ||
// partition. | ||
let counts_stream = futures::stream::unfold(set, |mut set| async { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think looking at https://github.com/apache/arrow-datafusion/pull/6494/files?w=1 makes it clearer what I did -- which was to change the plumbing to use future
s and stream
fu rather than channels
@@ -387,7 +387,7 @@ mod sql_tests { | |||
}; | |||
let test2 = UnaryTestCase { | |||
source_type: SourceType::Unbounded, | |||
expect_fail: true, | |||
expect_fail: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it works now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
// Turn the tasks in the JoinSet into a stream of | ||
// Result<usize> representing the counts of each output | ||
// partition. | ||
let counts_stream = futures::stream::unfold(set, |mut set| async { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirmed that the tokio stream adapaters don't appear to have a JoinSet impl - https://docs.rs/tokio-stream/latest/tokio_stream/wrappers/index.html?search=
let end = Instant::now(); | ||
// future that gathers the input counts into an overall output | ||
// count, and makes an output batch | ||
let output = counts_stream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW you could just use a regular async move here, instead of needing the futures adapters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is an excellent point -- I got so carried away with being clever I lost sight of that.
I rewrote this logic to use a single future and also moved the output record batch creation into a function to separate the business logic from the async orchestration: 5521a70
Test failures do not appear to be related |
Test failures I think are due to #6495 |
Which issue does this PR close?
Closes #6380
Closes #6379
Rationale for this change
See #6379
What changes are included in this PR?
This is based off #6380 from @tustvold but it was getting a little big to just push to his branch so I decided to make a new PR
AnalyzeExec
AnalyzeExec
to handle multiple input streams usingfutures
-fuAre these changes tested?
Yes
Are there any user-facing changes?
Less confusing
EXPLAIN ANALYZE
results