[FEA]: nx-cugraph should intercept built-in constructors like `from_pandas_edgelist` if `NETWORKX_BACKEND_PRIORITY=cugraph` #4456

beckernick · 2024-05-31T14:13:26Z

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

High

Please provide a clear description of problem this feature solves

When we use nx-cugraph, we currently need to create the NetworkX graph on the CPU regardless of whether every algorithm we intend to use is supported by the cuGraph backend. As a result, we pay a non-trivial performance penalty converting between CPU and GPU graphs.

The new caching mechanism configurable via CACHE_CONVERTED_GRAPH=True was designed to address this problem, making it possible to only pay this cost once per graph if you're going to run multiple algorithms.

But it would be great to avoid this cost in the first place by dispatching on the graph construction operators in addition to the algorithms. In the example below, we spend significant time in from_pandas_edgelist and _convert_graph (the latter of which is only a one-time cost if we use caching).

If I've already committed to using the cuGraph backend as the top priority backend, I'd ideally just create the graph on the GPU and only pay the CPU/GPU conversion cost if I need to fallback to the CPU.

# !wget https://data.rapids.ai/cugraph/datasets/cit-Patents.csv

%env NETWORKX_BACKEND_PRIORITY=cugraph

import pandas as pd
import networkx as nx

df = pd.read_csv("cit-Patents.csv", sep=" ", names=["src", "dst"], dtype="int32")

%%snakeviz

G = nx.from_pandas_edgelist(df.head(1000000), source="src", target="dst")
pr = nx.pagerank(G, alpha=0.9)

But cuGraph supports from_pandas_edgelist and it's much faster (100ms vs 8s in this case):

%timeit -n3 -r3 G_gpu = cugraph.from_pandas_edgelist(df.head(1000000), source="src", destination="dst")
71.2 ms ± 11.8 ms per loop (mean ± std. dev. of 3 runs, 3 loops each)

Describe your ideal solution

The following code should dispatch to the cuGraph backend for from_pandas_edgelist in addition to pagerank.

# !wget https://data.rapids.ai/cugraph/datasets/cit-Patents.csv

%env NETWORKX_BACKEND_PRIORITY=cugraph

import pandas as pd
import networkx as nx

df = pd.read_csv("cit-Patents.csv", sep=" ", names=["src", "dst"], dtype="int32")

G = nx.from_pandas_edgelist(df.head(1000000), source="src", target="dst")
pr = nx.pagerank(G, alpha=0.9)

Describe any alternatives you have considered

No response

Additional context

No response

Code of Conduct

I agree to follow cuGraph's Code of Conduct
I have searched the open feature requests and have found no duplicates for this feature request

The text was updated successfully, but these errors were encountered:

beckernick · 2024-05-31T14:25:26Z

Seems like this would be an expansion of networkx/networkx#6876

eriknw · 2024-06-03T17:20:05Z

Thanks for raising this issue! I think this behavior ought to be controllable via a different environment variable and/or config option... and I do think this is important.

Today, you can add backend="cugraph" to e.g. nx.from_pandas_edgelist, but this is not longer "zero code change", so we'd like to make this story better.

eriknw · 2024-06-17T19:59:44Z

I have begun to explore a solution to this in networkx/networkx#7502

rlratzel · 2024-08-23T14:59:08Z

Update: we've been discussing this issue quite a lot, and we're currently working through some solutions for it to include in 24.10.

We've also defined some criteria that we think needs to be supported by any solution in order to close this issue. We think these were implied in the issue, but we're explicitly listing them to be sure. @beckernick please don't hesitate to update this as needed:

Requirements:

Does not break existing work flows (i.e. truly zero-code-change)
Works for the "scipy example" (the workflow presented at the SciPy 2024 talk on Accelerated Pandas and NetworkX - TODO: add link)
Works for dynamic graph manipulation and associated algorithms

rlratzel · 2024-08-27T14:57:49Z

We're currently planning on having these two PRs close this issue:

beckernick added feature request New feature or request ? - Needs Triage Need team to review and classify labels May 31, 2024

alexbarghi-nv assigned eriknw Jun 3, 2024

BradReesWork removed the ? - Needs Triage Need team to review and classify label Jun 13, 2024

rlratzel self-assigned this Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA]: nx-cugraph should intercept built-in constructors like `from_pandas_edgelist` if `NETWORKX_BACKEND_PRIORITY=cugraph` #4456

[FEA]: nx-cugraph should intercept built-in constructors like `from_pandas_edgelist` if `NETWORKX_BACKEND_PRIORITY=cugraph` #4456

beckernick commented May 31, 2024 •

edited

Loading

beckernick commented May 31, 2024

eriknw commented Jun 3, 2024

eriknw commented Jun 17, 2024

rlratzel commented Aug 23, 2024

rlratzel commented Aug 27, 2024

[FEA]: nx-cugraph should intercept built-in constructors like from_pandas_edgelist if NETWORKX_BACKEND_PRIORITY=cugraph #4456

[FEA]: nx-cugraph should intercept built-in constructors like from_pandas_edgelist if NETWORKX_BACKEND_PRIORITY=cugraph #4456

Comments

beckernick commented May 31, 2024 • edited Loading

Is this a new feature, an improvement, or a change to existing functionality?

How would you describe the priority of this feature request

Please provide a clear description of problem this feature solves

Describe your ideal solution

Describe any alternatives you have considered

Additional context

Code of Conduct

beckernick commented May 31, 2024

eriknw commented Jun 3, 2024

eriknw commented Jun 17, 2024

rlratzel commented Aug 23, 2024

rlratzel commented Aug 27, 2024

[FEA]: nx-cugraph should intercept built-in constructors like `from_pandas_edgelist` if `NETWORKX_BACKEND_PRIORITY=cugraph` #4456

[FEA]: nx-cugraph should intercept built-in constructors like `from_pandas_edgelist` if `NETWORKX_BACKEND_PRIORITY=cugraph` #4456

beckernick commented May 31, 2024 •

edited

Loading