Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Improved DAG visualization #512

Merged
merged 10 commits into from
Nov 3, 2023
Merged

Feature Improved DAG visualization #512

merged 10 commits into from
Nov 3, 2023

Conversation

zilto
Copy link
Collaborator

@zilto zilto commented Nov 2, 2023

I tried to improve the DAG visualization using the graphviz library. The visualization should be less cluttered and visually represent the expressivity of the Hamilton DAG (inputs, config, functions, materializers, overrides, etc.). I added a legend directory in the viz for better readability.

Changes

All the changes are made to the function create_graphviz_graph(). The algorithm follows the same general structure: build nodes, then build edges. Several utility functions were created to centralized relevant definitions:

  • create node label (same for all nodes except input nodes)
  • get the node type (input, config, function)
  • get node modifiers (override, output, collect, expand, materializer)
  • create a legend based on the node types of the DAG
  • added arguments to the internal function create_graphviz_graph() these should be wired with other user-facing visualization functions

Removed:

  • no longer display "Input" or "Override" prefix to reduce space used
  • the base shape for functions are rectangles / rounded rectangles, which take significantly less space than ellipses.

How I tested this

Manually tested it with several DAGs. I included DAGs that used a maximum number of features.

Notes

  • Currently doesn't implement the Parallel/Collect special edges (but both types can now be distinguished from the node style)
  • Modifiers are applied sequentially and may override each other. Currently, it is not the case because each modifier acts on a different property (shape, fillcolor, periphery color, style)
  • Graph, nodes, and edges have a style attribute which is a comma-separated list with heterogeneous attributes. These attributes are hard to manage and overriden as a whole when updating style

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

Copy link
Contributor

sweep-ai bot commented Nov 2, 2023

Apply Sweep Rules to your PR?

  • Apply: Leftover TODOs in the code should be handled.
  • Apply: All new business logic should have corresponding unit tests in the tests/ directory.
  • Apply: Any clearly inefficient or repeated code should be optimized or refactored.

Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good -- a few comments:

  1. Let's add some comments in the code to explain what we're doing -- not always clear
  2. Walrus operator is not supported in 3.7 (maybe __future__?). Happy to use it but that makes this dependent on killing 3.7 which is coming soon, just waiting on pyarrow support for 3.12)
  3. Unit tests seem to be failing -- we can probably update them to test this or remove the ones that were too specific in the first place.

hamilton/graph.py Outdated Show resolved Hide resolved
hamilton/graph.py Outdated Show resolved Hide resolved

def _get_function_modifier_style(modifier: str):
if modifier == "output":
modifier_style = dict(fillcolor="#FFC857")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will overwrite the prior fillcolor in certain cases, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the only type of node with a fillcolor is "function nodes" (the default).

I wonder how people design visualization software, but would it make sense to have a sort of lexicon of all the possible combinations for internal purposes?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I'm likely overthinking it though? As in, we can see what feedback we get?

hamilton/graph.py Outdated Show resolved Hide resolved
),
)

sorted_types = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit -- maybe put these in order of commonality? So scan order is useful?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved materializer downwards

It's a bespoke ordering, but I thought about having config and input first because they most often appear at the top of the graph near the legend. Then, all others are of function type with modifiers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

hamilton/graph.py Show resolved Hide resolved
hamilton/graph.py Outdated Show resolved Hide resolved
hamilton/graph.py Outdated Show resolved Hide resolved
@@ -501,8 +502,9 @@ def test_function_graph_has_cycles_false():
assert fg.has_cycles(nodes, user_nodes) is False


def test_function_graph_display():
def test_function_graph_display(tmp_path: pathlib.Path):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was initially making two assertions:

  1. is the content of the file valid
  2. does it not create a file when passed output_file_path = None

I split this into two tests: test_function_graph_display() and test_function_graph_display_no_dot_output()

dot = dot_file_path.open("r").readlines()
dot_set = set(dot)

assert dot_set.issuperset(expected_set) and len(dot_set.difference(expected_set)) == 1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the sorting issues for DOT file lines and now the content of input nodes (order of rows in the table), I used sets instead. The DOT lines expected_set have the input node commented out. Therefore, the newly produced DOT file dot_set should be a superset of expected_set with exactly one more line for the input node.

@elijahbenizzy elijahbenizzy self-requested a review November 3, 2023 17:58
Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let's 🚢 🇮🇹

@zilto zilto merged commit 1d9fc7d into main Nov 3, 2023
@zilto zilto deleted the example/better-viz branch November 3, 2023 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants