Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add show_graph to display a GraphViz plot for expressions #19365

Merged
merged 8 commits into from
Nov 2, 2024

Conversation

corwinjoy
Copy link
Contributor

@corwinjoy corwinjoy commented Oct 21, 2024

Rationale: We have found that the simple text output from polars.Expr.meta.tree_format can become difficult to read for large expressions. Therefore, we wanted to build on the logic in tree_format to also be able to produce a GraphViz output for more complex expressions. This PR follows the logic in polars.LazyFrame.show_graph to add polars.Expr.meta.show_graph

Example:

e = (pl.col("foo") * pl.col("bar")).sum().over(pl.col("ham")) / 2
e.meta.show_graph()

expr

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Oct 21, 2024
@corwinjoy
Copy link
Contributor Author

@adamreeve

Copy link

codecov bot commented Oct 21, 2024

Codecov Report

Attention: Patch coverage is 72.83951% with 22 lines in your changes missing coverage. Please review.

Project coverage is 79.84%. Comparing base (00650b9) to head (3ca7bd9).
Report is 139 commits behind head on main.

Files with missing lines Patch % Lines
py-polars/polars/_utils/various.py 15.38% 21 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #19365      +/-   ##
==========================================
- Coverage   80.02%   79.84%   -0.19%     
==========================================
  Files        1528     1536       +8     
  Lines      209871   211405    +1534     
  Branches     2419     2445      +26     
==========================================
+ Hits       167954   168800     +846     
- Misses      41366    42050     +684     
- Partials      551      555       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@adamreeve
Copy link
Contributor

adamreeve commented Oct 22, 2024

One thing I notice in the example is that the binary / expression has the operands in the reverse order to what I'd expect. Dot seems to have support for specifying that child nodes must be ordered from left to right (https://graphviz.org/docs/attrs/ordering/), is there a way to apply this attribute for non-commutative operators? (Or maybe applying this to all expressions would be fine)

@corwinjoy
Copy link
Contributor Author

One thing I notice in the example is that the binary / expression has the operands in the reverse order to what I'd expect. Dot seems to have support for specifying that child nodes must be ordered from left to right (https://graphviz.org/docs/attrs/ordering/), is there a way to apply this attribute for non-commutative operators? (Or maybe applying this to all expressions would be fine)

The order here matches what is done by tree_format. I don't see an option there for reordering.

@ritchie46
Copy link
Member

The order here matches what is done by tree_format. I don't see an option there for reordering.

Couldn't we traverse in an alternative order? I think it's important to show the order of operations.

@corwinjoy
Copy link
Contributor Author

The order here matches what is done by tree_format. I don't see an option there for reordering.

Couldn't we traverse in an alternative order? I think it's important to show the order of operations.

So, in the TreeWalker that builds the output shown here (and the output for tree_format) the iteration is deliberately done backwards in the code...

// reverse order so that left is popped first

So, I'm not really clear why the code does binary children backwards.
For this new GraphViz output I can probably just tweak the output, but the tree_format text output will still be backwards.
I could probably write a custom iterator but I'm not quite sure how to do it.
I could use some guidance / advice on how to do this if we really want to reverse the children for division.
@ritchie46

@adamreeve
Copy link
Contributor

adamreeve commented Oct 24, 2024

For this new GraphViz output I can probably just tweak the output

It seems reasonable to me to only un-reverse the order for the GraphViz output and keep the existing behaviour of tree_format unchanged. Eg. something like this change seems to work as it looks like the display order depends only on the order of the edges rather than the order the nodes are defined in:

--- a/crates/polars-plan/src/plans/ir/tree_format.rs
+++ b/crates/polars-plan/src/plans/ir/tree_format.rs
@@ -902,12 +902,13 @@ impl fmt::Binary for TreeFmtVisitor {
                 if !cell.text.is_empty() {
                     // Add node
                     let node_label = &cell.text.join("\n");
-                    let node_desc = format!("n{i}{j} [label=\"{node_label}\"]");
+                    let node_desc = format!("n{i}{j} [label=\"{node_label}\",ordering=\"out\"]");
                     relations.push(node_desc);
 
                     // Add child edges
                     if i < tree_view.rows.len() - 1 {
-                        for child_col in cell.children_columns.iter() {
+                        // Iter in reversed order to undo the reversed child order when iterating expressions
+                        for child_col in cell.children_columns.iter().rev() {
                             let next_row = i + 1;
                             let edge = format!("n{i}{j} -- n{next_row}{child_col}");
                             relations.push(edge);

@corwinjoy
Copy link
Contributor Author

corwinjoy commented Oct 24, 2024

let node_desc = format!("n{i}{j} [label=\"{node_label}\",ordering=\"out\"]");

Sounds good. My only hesitation is on how many operators are reversed. But, looking through sample output it seems that most of them are so I think we can just do this in general rather than for specific operators. This will make our output different from tree_format but I guess that can be fixed in a future PR if the polars team feels it is worthwhile.

@corwinjoy
Copy link
Contributor Author

Thanks @adamreeve , I've gone ahead and put in the changes as suggested and now this gives a cleaner output and more streamlined code. Graph below.

e = (pl.col("foo") * pl.col("bar")).sum().over(pl.col("ham")) / 2

expr

@ritchie46
Copy link
Member

So, I'm not really clear why the code does binary children backwards.

So that the iterator traverses left nodes first. The tree-walker should not be adapted.

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, then it's good to go. Thanks @corwinjoy

@corwinjoy
Copy link
Contributor Author

Small nit, then it's good to go. Thanks @corwinjoy

Thanks @ritchie46 ! I have gone ahead and made the change.

@ritchie46 ritchie46 merged commit 1eb2fcc into pola-rs:main Nov 2, 2024
24 of 26 checks passed
coastalwhite pushed a commit to coastalwhite/polars that referenced this pull request Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants