-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature
Improved DAG visualization
#512
Conversation
Apply Sweep Rules to your PR?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good -- a few comments:
- Let's add some comments in the code to explain what we're doing -- not always clear
- Walrus operator is not supported in 3.7 (maybe
__future__
?). Happy to use it but that makes this dependent on killing 3.7 which is coming soon, just waiting on pyarrow support for 3.12) - Unit tests seem to be failing -- we can probably update them to test this or remove the ones that were too specific in the first place.
|
||
def _get_function_modifier_style(modifier: str): | ||
if modifier == "output": | ||
modifier_style = dict(fillcolor="#FFC857") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will overwrite the prior fillcolor in certain cases, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the only type of node with a fillcolor is "function nodes" (the default).
I wonder how people design visualization software, but would it make sense to have a sort of lexicon of all the possible combinations for internal purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I'm likely overthinking it though? As in, we can see what feedback we get?
), | ||
) | ||
|
||
sorted_types = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit -- maybe put these in order of commonality? So scan order is useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved materializer downwards
It's a bespoke ordering, but I thought about having config
and input
first because they most often appear at the top of the graph near the legend. Then, all others are of function
type with modifiers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -501,8 +502,9 @@ def test_function_graph_has_cycles_false(): | |||
assert fg.has_cycles(nodes, user_nodes) is False | |||
|
|||
|
|||
def test_function_graph_display(): | |||
def test_function_graph_display(tmp_path: pathlib.Path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was initially making two assertions:
- is the content of the file valid
- does it not create a file when passed
output_file_path = None
I split this into two tests: test_function_graph_display()
and test_function_graph_display_no_dot_output()
dot = dot_file_path.open("r").readlines() | ||
dot_set = set(dot) | ||
|
||
assert dot_set.issuperset(expected_set) and len(dot_set.difference(expected_set)) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the sorting issues for DOT file lines and now the content of input nodes (order of rows in the table), I used sets instead. The DOT lines expected_set
have the input node commented out. Therefore, the newly produced DOT file dot_set
should be a superset of expected_set
with exactly one more line for the input node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Let's 🚢 🇮🇹
I tried to improve the DAG visualization using the graphviz library. The visualization should be less cluttered and visually represent the expressivity of the Hamilton DAG (inputs, config, functions, materializers, overrides, etc.). I added a legend directory in the viz for better readability.
Changes
All the changes are made to the function
create_graphviz_graph()
. The algorithm follows the same general structure: build nodes, then build edges. Several utility functions were created to centralized relevant definitions:create_graphviz_graph()
these should be wired with other user-facing visualization functionsRemoved:
How I tested this
Manually tested it with several DAGs. I included DAGs that used a maximum number of features.
Notes
style
attribute which is a comma-separated list with heterogeneous attributes. These attributes are hard to manage and overriden as a whole when updatingstyle
Checklist