-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix out of order execution on model select (#1354) #1355
Fix out of order execution on model select (#1354) #1355
Conversation
…ve dependencies while removing the skipped nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@beckjake this logic looks good to me! Going to revisit with fresh eyes and try to think about any weird scenarios where this wouldn't work (thinking: ephemeral models, non-model selectors, etc).
Might be a silly question, but this works even if tag: or source: selectors are supplied, right? Should be essentially the same code path as before?
My only other thought here is that we could bypass the need for this logic by computing the transitive closure of the dag first, then just removing the nodes we don't need. Did you consider an approach like that? I'd need to do some reading, but I'd be curious if it's faster to computer the closure once vs. computing it dynamically for each node that we need to prune from the graph.
Don't maintain the links manually while removing nodes Just take the transitive closure and remove all nodes
Not a silly question! But no, the "include set" that this is based on is generated earlier up the stack by the node selector, which handles source vs tag vs model name, so the linker can ignore all that.
That's pretty neat, I didn't know networkx had a function for transitive closure, though maybe I should have guessed! I bet it's at least a bit faster as you get more space between kept nodes in the graph, because you save so much useless calculation (unless the underlying transitive_closure part is slow), but more importantly the code gets simpler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@beckjake ok, this looks great! Let's ship it
Fixes #1354
When we're removing nodes from the new graph we've constructed for the graph queue, replace all the removed transitive dependencies with explicit ones. The idea here is like this:
Imagine you have a DAG of models that is really boring and linear like: (A -> B -> C -> D)
And you've done
dbt run --model A D
. The old code would see a graph with those two nodes and no dependencies:The new code works like this:
A will now correctly execute before D.
Note that if B or C are early-binding views as in postgres/redshift default, this code will still fail, because the
drop ... cascade
when building A will drop B and C as well, and then D has no existing refs.