-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CHORE] Swordfish refactors #3256
Conversation
CodSpeed Performance ReportMerging #3256 will improve performances by 45.45%Comparing Summary
Benchmarks breakdown
|
|
||
/// A dispatcher that distributes morsels to workers in an unordered fashion. | ||
/// Used if the operator does not require maintaining the order of the input. | ||
pub(crate) struct UnorderedDispatcher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Script that tests for uneven runtimes across tasks. With the UnorderedDispatcher
, performance is considerably increased.
daft.context.set_execution_config(default_morsel_size=1)
@daft.udf(return_dtype=daft.DataType.int64())
def do_work(x):
x = x.to_pylist()
# my laptop has 12 cores, so 12 workers will be spawned to run this udf.
if x[0] % 12 == 0:
print("doing a lot of work on ", x)
time.sleep(1)
else:
print("doing a little work on ", x)
time.sleep(0.1)
return x
df = daft.from_pydict({"a": [i for i in range(48)]}).with_column("b", do_work(col("a"))).agg(col("b").sum())
df.collect()
src/daft-local-execution/src/lib.rs
Outdated
/// The `OperatorOutput` enum represents the output of an operator. | ||
/// It can be either `Ready` or `Pending`. | ||
/// If the output is `Ready`, the value is immediately available. | ||
/// If the output is `Pending`, the value is not yet available and a `RuntimeTask` is returned. | ||
pub(crate) enum OperatorOutput<T> { | ||
Ready(T), | ||
Pending(RuntimeTask<T>), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This OperatorOutput
allows operators to spawn their own tasks on the compute runtime if needed, or if no compute is necessary, return whatever result they have. This also allows hash join probes to await on the probe tables within the execute
method.
@@ -46,6 +47,13 @@ impl LocalPartitionIterator { | |||
#[cfg_attr(feature = "python", pyclass(module = "daft.daft"))] | |||
pub struct NativeExecutor { | |||
local_physical_plan: Arc<LocalPhysicalPlan>, | |||
cancel: CancellationToken, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tasks spawned on the execution runtime handle currently only abort if they try a send
and see that the channel is closed (this is something we did in the native runner pr). However, they should also just abort if the native executor gets dropped.
pub(crate) struct ProbeStateBridge { | ||
inner: OnceLock<Arc<ProbeState>>, | ||
notify: tokio::sync::Notify, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.rs/tokio/latest/tokio/sync/struct.OnceCell.html doesn't work cuz there's no get
method that is async.
Opted for just a manual implementation.
src/daft-local-execution/src/intermediate_ops/actor_pool_project.rs
Outdated
Show resolved
Hide resolved
/// | ||
/// It returns a vector of receivers (one per worker) that will receive the morsels. | ||
/// | ||
/// Implementations must spawn a task on the runtime handle that reads from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make more sense to name this:
trait DispatcherBuilder {
fn spawn_dispatcher(...) -> SpawnedDispatcherResult;
}
struct SpawnedDispatcherResult {
receivers: Vec<Receiver<Arc<MicroPartition>>>,
dispatch_handle: SpawnedTask
}
I think my key confusion was that dispatch
sounds like an action that would happen on every tick rather than something that happens once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, made it more explicit that it should spawn a dispatching task. Also implemented the SpawnedDispatcherResult
, then the caller can await on it themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
src/daft-local-execution/src/lib.rs
Outdated
|
||
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> { | ||
let this = self.get_mut(); | ||
Pin::new(&mut this.0).poll(cx).map(|r| r.context(JoinSnafu)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this looks a bit sketchy! Why do we have to Pin::new
this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The poll method on a JoinHandle
accepts Pin<&mut Self>
, but self.0
is not pinned even though self
is pinned, so i pinned it.
I did more reading, and I found pin_project
which allows you to project the pin on a struct into it's fields, and don't need to Pin::new
. Technically i think it was fine to Pin::new
it since the joinhandle is Unpin
? Note to self: Need to read up more on pin and unpin though
There's a couple outstanding issues / inefficiencies / ugliness in the swordfish code. I originally intended of breaking these up into smaller PRs, but during the process of trying to split it up, I realized that all the changes are quite intertwined, and it may be easier on the reviewer to just see all of them in one. That being said, I'll try my best to explain all the changes and rationale in detail.
Problems
PipelineChannel
abstraction doesn't play well with streaming sinks, in fact, in only really works for intermediate ops. This is because thePipelineChannel
assumes sending/receiving is round robin, but streaming sinks can send data from the workers, as well as after the finalize.execute
orfinalize
methods. These don't need to be spawned on the compute runtime.Proposed Changes
Receiver<Arc<Micropartition>>
.ProbeStateBridge
.loole
channels, which are multi-producer multi-consumer. For consistency, useloole
channels across all swordfish. In the future we can think about implementing our custom channels. https://users.rust-lang.org/t/loole-a-safe-async-sync-multi-producer-multi-consumer-channel/113785