Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the DAG to have separate nodes for operations and arrays #337

Merged
merged 1 commit into from
Jan 3, 2024

Conversation

tomwhite
Copy link
Member

Currently there is a one-to-one correspondence between operations and arrays: one operation produces one array, and is represented by one node in the DAG.

However, this won't be true when we support multiple outputs (#69) where one operation can produce multiple arrays.

Similarly, it also won't support optimizations like sibling fusion (in Apache Beam, and from FlumeJava originally), where operations that have the same inputs and produce different outputs can be fused into one operation producing multiple outputs. (Wukong calls this "task clustering".)

So we need to change the DAG representation so that one operation can produce multiple arrays, and this means breaking nodes into two types: operations and arrays.

This PR does exactly that. An example DAG (from test_add_with_broadcast) in the current representation

cubed

now looks like this:

cubed-new-dag

Note:

  • Operations are shown as boxes with rounded corners, arrays are rectangles.
  • Colours indicate the following: pink for a blockwise operation, green for a rechunk operation, orange for a materialized array, white for an operation that is not run in its own pipeline or for an array that is not materialized to disk as a Zarr array ("virtual arrays")

I wondered about having an option to visualize DAGs as we currently do, but that won't work when there are multiple outputs, so it may be best just to have the more general representation.

@tomwhite tomwhite added the core label Dec 17, 2023
@tomwhite tomwhite mentioned this pull request Dec 19, 2023
20 tasks
@tomwhite
Copy link
Member Author

tomwhite commented Jan 2, 2024

@TomNicholas any thoughts on this one before merging?

@TomNicholas
Copy link
Member

This looks great! I also asked my colorblind friend if he could distinguish all the colors and he gave it a thumbs up 👍

@tomwhite tomwhite merged commit f83e556 into main Jan 3, 2024
7 checks passed
@tomwhite tomwhite deleted the new-dag branch January 3, 2024 08:39
@tomwhite
Copy link
Member Author

tomwhite commented Jan 3, 2024

I also asked my colorblind friend if he could distinguish all the colors and he gave it a thumbs up 👍

Thank you!

tomwhite added a commit that referenced this pull request Feb 12, 2024
tomwhite added a commit that referenced this pull request Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants