Fix panic propagation in `CoalescePartitions`, consolidates panic propagation into `RecordBatchReceiverStream` #6507

alamb · 2023-05-31T19:49:17Z

Which issue does this PR close?

Based on #6449 from @nvartolomei

Closes #3104
Closes #6449

Rationale for this change

I wanted to centralize the logic for propagating panics from tasks.

What changes are included in this PR?

Add a RecordBatchReceiverStreamBuilder which handles doing the abort-on-drop dance using tokio::task::JoinSet as shown by @nvartolomei
Port CoalsceExec and AnalyzeExec to use this builder

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, panic's are not ignored

Another try for fixing apache#3104. RepartitionExec might need a similar fix.

alamb · 2023-05-31T19:50:29Z

datafusion/core/src/physical_plan/coalesce_partitions.rs

@@ -183,32 +168,6 @@ impl ExecutionPlan for CoalescePartitionsExec {
    }
 }

-struct MergeStream {


I basically taught RecordBatchReceiverStream how to propagate panics and then updated CoalescePartitionsExec to use it

alamb · 2023-05-31T19:51:08Z

datafusion/core/src/physical_plan/coalesce_partitions.rs

@@ -270,4 +231,19 @@ mod tests {

        Ok(())
    }
+
+    #[tokio::test]


This is the new test from @nvartolomei

alamb · 2023-05-31T21:53:25Z

datafusion/core/src/physical_plan/stream.rs

+
+/// Stream wrapper that records `BaselineMetrics` for a particular
+/// partition
+pub(crate) struct ObservedStream {


Moved from Union so it can be reused

alamb · 2023-05-31T21:53:50Z

datafusion/core/src/physical_plan/stream.rs

+        let num_partitions = 2;
+        let input = PanicingExec::new(schema.clone(), num_partitions)
+            .with_partition_panic(0, 10)
+            .with_partition_panic(1, 3); // partition 1 should panic first (after 3 )


Here is a test showing that when the second partition panic's it is properly reported

alamb · 2023-05-31T21:54:00Z

datafusion/core/src/physical_plan/union.rs

@@ -560,40 +561,6 @@ impl Stream for CombinedRecordBatchStream {
    }
 }

-/// Stream wrapper that records `BaselineMetrics` for a particular


alamb · 2023-05-31T21:54:41Z

datafusion/core/src/physical_plan/coalesce_partitions.rs

@@ -137,27 +131,17 @@ impl ExecutionPlan for CoalescePartitionsExec {
                // use a stream that allows each sender to put in at
                // least one result in an attempt to maximize
                // parallelism.
-                let (sender, receiver) =
-                    mpsc::channel::<Result<RecordBatch>>(input_partitions);
+                let mut builder =


I am quite pleased that this is now all encapsulated into RecordBatchReceiverStream

nvartolomei · 2023-06-01T10:45:35Z

datafusion/core/src/physical_plan/stream.rs

+            // unwrap Option / only return the error
+            .filter_map(|item| async move { item });
+
+        let inner = ReceiverStream::new(rx).chain(check_stream).boxed();


This change looks much better than mine. Wondering if it can be improved further and if it makes sense at all.

With this implementation, the panics will be propagated only after all input (from other partitions is consumed). Probably fine, as this shouldn't happen during normal operation and is more of a correctness check. Also, the check future will not make any progress up until all the inputs are exhausted. Shouldn't be much work, fine for it to be sequential.

As an alternative, what if we build a "supervisor" task (tokio::spawn) which is launched to do all that work, and then in the check_stream we just check the JoinHandle of the "supervisor" task? This way the supervisor task will be able to make progress concurrently and panic/report errors early.

Thought about this after looking at RepartitionStream which would need something similar (supervisor) to get task failures and then multiplex them to all the output partitions. Then, all "output streams" would only have to ensure that the supervisor didn't die. Currently, in RepartitionStream there are |output partitions| "supervisors" (wait_for_task) which aren't checked for success either. Wondering if it could fail at all though (tokio-rs/tokio#5744).

https://docs.rs/futures/latest/futures/prelude/stream/trait.StreamExt.html#method.take_while is possibly a better way to formulate this

With this implementation, the panics will be propagated only after all input

This is my worry as well. I think you could move the check future into another task (that holds the join set and is also aborted on drop, like a two-level join set) and that sends the error to tx.

I am working on tests for this behavior

I believe I fixed this in b1a817c

After trying several other approaches, I found https://docs.rs/tokio-stream/latest/tokio_stream/trait.StreamExt.html#method.merge which did exactly what I wanted 💯

It is tested in record_batch_receiver_stream_propagates_panics_early_shutdown

tustvold · 2023-06-01T11:01:36Z

datafusion/core/src/physical_plan/stream.rs

+
+            while let Some(item) = stream.next().await {
+                // If send fails, plan being torn down,
+                // there is no place to send the error.


I think this should also short-circuit if item is an error, I think it will currently drive execution to completion

Fixed in 56a26eb

Tested in record_batch_receiver_stream_error_does_not_drive_completion

datafusion/core/src/test/exec.rs

datafusion/core/src/physical_plan/stream.rs

datafusion/core/src/physical_plan/analyze.rs

tustvold

I like where this is headed, left some comments to potentially improve it further

crepererum

Would be nice to be a bit more eager w/ error reporting, but this is at least better than the status quo.

crepererum · 2023-06-01T12:55:18Z

datafusion/core/src/physical_plan/stream.rs

+            // unwrap Option / only return the error
+            .filter_map(|item| async move { item });
+
+        let inner = ReceiverStream::new(rx).chain(check_stream).boxed();


With this implementation, the panics will be propagated only after all input

This is my worry as well. I think you could move the check future into another task (that holds the join set and is also aborted on drop, like a two-level join set) and that sends the error to tx.

alamb · 2023-06-02T11:39:19Z

I plan to work on improving the panic checks to be more eager later today

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>

…sion into alamb/propagate_error

alamb · 2023-06-06T13:20:55Z

I believe I have resolved all outstanding comments in this PR. Please take another look if you have time

tustvold

Looks good to me, left some minor comments you can take or leave

tustvold · 2023-06-06T13:52:07Z

datafusion/core/src/physical_plan/stream.rs

+
+        // Merge the streams together so whichever is ready first
+        // produces the batch (since futures::stream:StreamExt is
+        // already in scope, need to call it explicitly)


FWIW https://docs.rs/futures/latest/futures/stream/fn.select.html is the futures crate version of this, not sure if there is a material difference between the two impls

Changed in fb17af8 -- I didn't see select. TIL!

(it also inspired me to do #6565)

tustvold · 2023-06-06T13:53:53Z

datafusion/core/src/physical_plan/stream.rs

+                            // the JoinSet were aborted, which in turn
+                            // would imply that the receiver has been
+                            // dropped and this code is not running
+                            return Some(Err(DataFusionError::Internal(format!(


If this is unreachable (which I'm fairly certain it is) I'm not sure why we don't just panic here, making this future infallible and therefore an ideal candidate for https://docs.rs/futures/latest/futures/stream/trait.StreamExt.html#method.take_until

I haven't studied the tokio JoinHandle code or under what conditions it currently or in the future might return an error (like if the task is canceled in some way will it error??) .

Given that the API returns an error I think handling and propagating the error is the most future proof thing to do.

nvartolomei · 2023-06-06T20:07:28Z

great work @alamb! 👏

alamb · 2023-06-06T20:55:55Z

great work @alamb! 👏

Thank you @nvartolomei for starting the process (and providing the tests!)

nvartolomei added 2 commits May 31, 2023 15:32

Propagate panics

0fbc3dd

Another try for fixing apache#3104. RepartitionExec might need a similar fix.

avoid allocation by pinning on the stack instead

742597a

github-actions bot added the core Core DataFusion crate label May 31, 2023

alamb changed the title ~~Alamb/propagate error~~ Consolidate panic propagation into RecordBatchReceiverStream May 31, 2023

alamb commented May 31, 2023

View reviewed changes

alamb force-pushed the alamb/propagate_error branch from d887201 to fe8b82d Compare May 31, 2023 20:19

Consolidate panic propagation into RecordBatchReceiverStream

e1c827a

alamb force-pushed the alamb/propagate_error branch from b9535e7 to e1c827a Compare May 31, 2023 21:52

alamb commented May 31, 2023

View reviewed changes

alamb mentioned this pull request May 31, 2023

Propagate panics #6449

Closed

Update docs / cleanup/

8ffc015

alamb marked this pull request as ready for review June 1, 2023 10:32

alamb changed the title ~~Consolidate panic propagation into RecordBatchReceiverStream~~ Fix panic propagation in CoalescePartitions, consolidates panic propagation into RecordBatchReceiverStream Jun 1, 2023

nvartolomei reviewed Jun 1, 2023

View reviewed changes

tustvold reviewed Jun 1, 2023

View reviewed changes

datafusion/core/src/test/exec.rs Outdated Show resolved Hide resolved

tustvold reviewed Jun 1, 2023

View reviewed changes

datafusion/core/src/test/exec.rs Outdated Show resolved Hide resolved

tustvold reviewed Jun 1, 2023

View reviewed changes

datafusion/core/src/physical_plan/stream.rs Outdated Show resolved Hide resolved

tustvold reviewed Jun 1, 2023

View reviewed changes

datafusion/core/src/physical_plan/analyze.rs Outdated Show resolved Hide resolved

tustvold reviewed Jun 1, 2023

View reviewed changes

alamb mentioned this pull request Jun 1, 2023

Replace AbortOnDrop / AbortDropOnMany with tokio JoinSet #6513

Closed

4 tasks

alamb requested a review from crepererum June 1, 2023 12:44

crepererum approved these changes Jun 1, 2023

View reviewed changes

alamb and others added 4 commits June 2, 2023 16:45

Apply suggestions from code review

e5c4e03

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>

Merge remote-tracking branch 'apache/main' into alamb/propagate_error

e7c0c6f

rename to be consistent and not deal with English pecularities

3f80690

Merge branch 'alamb/propagate_error' of github.com:alamb/arrow-datafu…

d50c819

…sion into alamb/propagate_error

alamb added 9 commits June 2, 2023 17:27

Add a test and comments

76270f0

Merge remote-tracking branch 'apache/main' into alamb/propagate_error

2cea75e

write test for drop cancel

5ae3620

Add test fpr not driving to completion

a531afe

Do not drive all streams to error

56a26eb

terminate early on panic

b1a817c

Merge remote-tracking branch 'apache/main' into alamb/propagate_error

6a50ee9

tweak comments

79fcbfa

tweak comments

70a3d57

tustvold approved these changes Jun 6, 2023

View reviewed changes

use futures::stream

fb17af8

alamb mentioned this pull request Jun 6, 2023

Minor: remove tokio_stream dependency #6565

Merged

alamb merged commit 39ee59a into apache:main Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix panic propagation in `CoalescePartitions`, consolidates panic propagation into `RecordBatchReceiverStream` #6507

Fix panic propagation in `CoalescePartitions`, consolidates panic propagation into `RecordBatchReceiverStream` #6507

alamb commented May 31, 2023 •

edited

Loading

alamb May 31, 2023

alamb May 31, 2023

alamb May 31, 2023

alamb May 31, 2023

alamb May 31, 2023

alamb May 31, 2023

nvartolomei Jun 1, 2023

tustvold Jun 1, 2023

crepererum Jun 1, 2023

alamb Jun 2, 2023

alamb Jun 6, 2023

tustvold Jun 1, 2023 •

edited

Loading

alamb Jun 6, 2023

tustvold left a comment

crepererum left a comment

crepererum Jun 1, 2023

alamb commented Jun 2, 2023

alamb commented Jun 6, 2023

tustvold left a comment

tustvold Jun 6, 2023

alamb Jun 6, 2023

alamb Jun 6, 2023

tustvold Jun 6, 2023

alamb Jun 6, 2023

nvartolomei commented Jun 6, 2023 •

edited

Loading

alamb commented Jun 6, 2023

Fix panic propagation in CoalescePartitions, consolidates panic propagation into RecordBatchReceiverStream #6507

Fix panic propagation in CoalescePartitions, consolidates panic propagation into RecordBatchReceiverStream #6507

Conversation

alamb commented May 31, 2023 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold Jun 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold left a comment

Choose a reason for hiding this comment

crepererum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Jun 2, 2023

alamb commented Jun 6, 2023

tustvold left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nvartolomei commented Jun 6, 2023 • edited Loading

alamb commented Jun 6, 2023

Fix panic propagation in `CoalescePartitions`, consolidates panic propagation into `RecordBatchReceiverStream` #6507

Fix panic propagation in `CoalescePartitions`, consolidates panic propagation into `RecordBatchReceiverStream` #6507

alamb commented May 31, 2023 •

edited

Loading

tustvold Jun 1, 2023 •

edited

Loading

nvartolomei commented Jun 6, 2023 •

edited

Loading