Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataflow cluster analysis #977

Closed
Ellpeck opened this issue Sep 16, 2024 · 0 comments · Fixed by #985
Closed

Dataflow cluster analysis #977

Ellpeck opened this issue Sep 16, 2024 · 0 comments · Fixed by #985
Labels
dataflow Related to dataflow extraction enhancement New feature or request

Comments

@Ellpeck
Copy link
Member

Ellpeck commented Sep 16, 2024

We want to calculate clusters on the dataflow graph that determine which parts of code are highly dependent on each other (ie "belong together"). Open question: when can we ignore opposite-facing directed edges, and when do we have to traverse them?

Output should be a set of flowR node IDs that form a cluster, to be evaluated further later on.

Step 1: Simple/naive cluster calculation using reachability analysis.

Step 2: The result will likely be one large cluster because there are shared dependencies on setup steps, reused functions etc. We can implement a "bottleneck" node calculation that splits clusters on these sorts of nodes and creates separate clusters that all individually contain the "bottleneck" node. Open question: what constitutes a "bottleneck" node, ie when is it reused enough, and when is the cluster around it small enough, to be splittable?

Implementation as a separate "post analysis" module rather than a pipeline step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataflow Related to dataflow extraction enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant