Improve performance of select_children() and select_parents() #11099
+23
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The select_children() and select_parents() functions have been identified as performance bottlenecks, particularly as they are used via semantic layer code.
We've received reports of projects spending over 15 minutes waiting for these operations.
Solution
The present changes are behavior-neutral, but they avoid the repeated application of networkx graph views, which are slow. We already have some unit tests covering these functions, but went further and tested that the new functions gave the same results as the old on a large set of real-world graphs, choosing sets of "selected" nodes at random at a variety of sizes. I observed a reliable speedup of 60-100x for select_children() and 20-30x for select_parents().
This should reduce the project time mentioned above from 15 minutes to less than 15 seconds.
Checklist