-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor to split off dask and generalize graph builder #12
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
I agree that there should be a class for Graph
. In particular since it currently is represented in terms of a relatively complicated tuple. I'd even have a separate graph class that only has the encoding and graph primitives in addition to a task graph class that has the higher level ops like compute
.
src/sciline/pipeline.py
Outdated
} | ||
graph: Graph = {tp: (provider, bound, args)} | ||
for arg in args.values(): | ||
graph.update(self.build(arg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential infinite loop here (or up to max stack depth). @YooSunYoung has a solution based on graphlib
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind just getting RecursionError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It would be nicer to report cycles in the graph more clearly. Maybe even during construction of the pipeline.
_format_type(ret), | ||
) | ||
for ret, (provider, bound, args) in graph.items() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not hugely relevant since graphs are small, but could yield tuples here instead of building a temporary dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small comments. Looks good.
from .pipeline import Graph | ||
|
||
|
||
def to_graphviz(graph: Graph) -> Digraph: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a good idea to also add a test for this.
In addition, do we want to make it explicit that it is using graphviz
(as it is currently), or is that an implementation detail that the user should not care about? Meaning we would call it something like show_graph
, as we have done for the transform_coords
and plopp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, the main point of this PR is the other refactor.
This is essentially what a dask collection is (so original PR message). I am not sure this is the right interface for us, since we may frequently want to compute multiple intermediate results. Will need more thought before moving on. |
Not really. A dask collection is more like a node of the graph (with links to parents). So it defines a graph as a linked structure. I was talking about a graph class that has a useful encoding for the graph plus methods like |
Not really. A dask collection is a graph (given by a dict), and a set of keys.
My initial comment was on the task graph class, not the graph class |
Due you guys prefer merging this, or do you want to see two more days of follow-up changes in the same PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due you guys prefer merging this, or do you want to see two more days of follow-up changes in the same PR?
I'm pretty sure we'll go with something like this. And since no one uses sciline, you can merge it. Let's keep the diffs short.
This refactors
Pipeline
, addingbuild()
which returns a generic graph (currently as adict
). This can be turned into a task graph that works withdask
, or can be used to makegraphviz.Digraph
. The internal use ofdask.delayed
is removed completely.I think I am way more happy with this split. This opens the doors for, e.g., dropping the
dask
requirement completely, or implement a customdask
Collection on top of this.I am not fully happy with the interface yet. I think what we have here now can be suitable as a low-level interface that we can build on top of. I am considering, among other things, splitting the
graph
into a class that providers some helpers such ascompute()
, or facilities to build a dask graph. Not sure exactly where this is going, but it may be useful to review now before such changes.