Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-1570] Experiment with layer ordering and assignment client-side #147

Closed
wants to merge 4 commits into from

Conversation

limdauto
Copy link
Collaborator

@limdauto limdauto commented Apr 16, 2020

Description

This PR introduces logic to:

  • Calculate layer ordering server-side based on the layer's dependencies.
  • This layer ordering is then fed forward to a toposort client-side when calculating node ranks. I left this calculation untouched as I think it's good enough.
  • Remove the need to use a separate layer id and layer name. Instead, use name every where.
  • Specify layer: null for every Kedro node in the mock data.
  • Specify layer: null for some dataset.

The reason layer: null works is because a graph node is always assigned a rank, so it will get displayed based on its topological order in any case. If there is a layer assigned, it just means this layer changes the node's topological order and the node is displayed in the same rank as other nodes in the same layer.

An interesting property of the way we add fake edges between layers is that a Kedro node will be displayed on the same layer as its output dataset if its input dataset is in the layer below, which is in line with data engineer's convention.

Development notes

QA notes

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

Legal notice

  • I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

  • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.

  • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.

  • I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

}
for (const edge of data.edges) {
addEdge(state)(edge);
nodeDeps[edge.source].push(edge.target);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, we are not doing that artificial node insertion, which may bring us some problems.

Let's imagine the following graph:

mermaid-diagram-20200417104328

If I understand that correctly, Batching toposort will put Raw 0 and Int 0 into the same group (as they don't depend on each other). Then the order in which Raw and Int layers will be presented only depends on the order in which they appear in that toposort group (which is unreliable). So if I flip the branches on the graph, Int layer will be presented first, which is incorrect to me.

Gordon's suggestion was to add an artificial dependency node Raw-to-Int as in
mermaid-diagram-20200417105148
Which depends on all Raw layer nodes and is a dependency for all Int layer nodes. In that case toposort will put Raw 0 and Int 0 into different groups and Raw layer will go before Int layer.

I'm happy to discuss it over Zoom after the standup if necessary.

@limdauto
Copy link
Collaborator Author

Closing in favour of #148

@limdauto limdauto closed this Apr 18, 2020
@limdauto limdauto deleted the prototype/fe-layers branch April 18, 2020 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants