Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node and Edge Filtering #886

Merged
merged 15 commits into from
Jun 11, 2023
Merged

Node and Edge Filtering #886

merged 15 commits into from
Jun 11, 2023

Conversation

danielleodigie
Copy link
Contributor

Aims to resolve #800
Adds filter_nodes() and filter_edges() methods to the Graph and DiGraph classes
:)

@coveralls
Copy link

coveralls commented May 31, 2023

Pull Request Test Coverage Report for Build 5226533962

  • 48 of 48 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.01%) to 96.46%

Totals Coverage Status
Change from base Build 5225584797: 0.01%
Covered Lines: 14878
Relevant Lines: 15424

💛 - Coveralls

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking like a good start, thanks for doing this! I left a small suggestion inline on how you can remove an unnecessary map() call. Besides that can you add some tests to validate it works as expected. Also make sure to add a test that raises an exception when filter_function is run to validate the exception gets raised back to python correctly. Besides that if you could add a release note documenting the addition of this new feature (you can see the details on this here: https://github.com/Qiskit/rustworkx/blob/main/CONTRIBUTING.md#release-notes )

src/digraph.rs Outdated Show resolved Hide resolved
src/digraph.rs Outdated Show resolved Hide resolved
src/graph.rs Outdated Show resolved Hide resolved
src/graph.rs Outdated Show resolved Hide resolved
@mtreinish mtreinish added this to the 0.14.0 milestone Jun 1, 2023
…ted tests for filter_edges and filter_nodes for both PyGraph and PyDiGraph. Created release notes for the functions.
Copy link
Contributor

@raynelfss raynelfss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Danielle, your additions seem to work well with rustworkx on python. I have personally tested it with both PyGraph and PyDiGraph. Everything LGTM!

However, there are some changes to be made to your release notes, as they're not passing the required tests. Because of the way the release notes are generated, the examples you provided will not run unless formatted properly. Consider the examples I have left below.

releasenotes/notes/add-filter-98d00f306b5689ee.yaml Outdated Show resolved Hide resolved
releasenotes/notes/add-filter-98d00f306b5689ee.yaml Outdated Show resolved Hide resolved
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great now, thanks for the fast updates adding the tests and release notes. I just had a few small comments inline mostly with some small documentation improvements we should make.

I also left a question inline about memory allocation, it's not a blocker as one way is not clearly better than the other.

releasenotes/notes/add-filter-98d00f306b5689ee.yaml Outdated Show resolved Hide resolved
releasenotes/notes/add-filter-98d00f306b5689ee.yaml Outdated Show resolved Hide resolved
src/digraph.rs Outdated
res.extract(py)
};

let mut n = Vec::new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A potential performance improvement we can make here is to do:

Suggested change
let mut n = Vec::new();
let mut n = Vec::with_capacity(self.graph.node_count());

This basically allocates up front a Vec with enough space for the number of nodes in the graph. THis avoids having to do multiple memory allocations as the Vec grows (which are slow). The tradeoff here being that we are likely over allocating memory in the return object (unless filter_function returns True for every node. I'm thinking this tradeoff is ok in this case though because each element of the output array is just a 64bit integer so except for pathological edge cases where we have millions of nodes and the filter only has a few matches nobody will really notice if we've allocated and extra space for a few integers.

But, I'm curious what your thoughts are on this tradeoff and what you think we should do here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, this makes sense. I think over allocating is worth it in the long run, especially if reallocating is expensive. I'll make that change!

src/digraph.rs Outdated
res.extract(py)
};

let mut e = Vec::new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above with filter_nodes we can do:

Suggested change
let mut e = Vec::new();
let mut e = Vec::with_capacity(self.graph.edge_count());

@@ -2889,6 +2889,48 @@ impl PyDiGraph {
}
}

/// Filters a graph's nodes by some criteria conditioned on a node's data payload and returns those nodes' indices
///
/// :param filter_function: Function with which to filter nodes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a little more detail on how this is used. I think we should include what the format is for the filter function (what it gets passed and the required return type). Maybe having an example in the docs here would be useful too since it might be unclear even with an English description of the behavior.

This comment applies to all the functions being added in both graph and digraph.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the example goes a long way for clarity thanks for adding that. But I still think we should explicitly document the inputs and required output from filter_function so that it's clear to users that it will be passed a node weight and is required to return a boolean value (this applies to all the docstrings for the other 3 methods too).

@danielleodigie danielleodigie marked this pull request as ready for review June 8, 2023 16:16
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for updating the docstring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Node and Edge Filtering and Selection
4 participants