-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST]: Benchmarking cugraph.leiden()
#4488
Comments
@rlratzel should have a better answer for your question. Alex Fender has moved on to our cuopt effort and doesn't work on this software anymore. I'm fuzzy on the performance overheads of the python API - where they exist and if/how you can avoid them. I know at one time we had (and perhaps still have) some lazy computations that occur on the first call to an algorithm. I believe there is a way to avoid those. @rlratzel should be able to clarify. Expensive validation steps are directly enabled in the C/C++ layer by passing a parameter called As implemented, memory allocation for the result is done inside of Leiden. That memory allocation does not include initialization, we copy the result into uninitialized memory. So the performance overhead of memory allocation of the result should be minimal. All other memory allocation done inside of Leiden is dynamic based on the progress of the clustering algorithm. If you configure RMM to use the pool allocator then memory allocations should be pretty fast. Perhaps @rlratzel can clarify how to do that from python. |
Hi @wolfram77 , I don't know if this is acceptable, but I think the best way to benchmark only the algorithm implementation and eliminate any additional allocations/conversions/input checks done in the cugraph python library would be to benchmark leiden from the C++ library in C++. Because the cugraph python library calls the libcugraph C++ library implementation, you'd be benchmarking as close to the algorithm implementation as possible (without modifying C++ source code to isolate further beyond the API). If C++ isn't an option, you could benchmark leiden from our lower-level python library (pylibcugraph.leiden). The cugraph python library wraps Finally, configuring RMM to use pool allocation might also be something to consider, as @ChuckHastings mentioned. You can read about how to do that from python here. |
Thanks @ChuckHastings and @rlratzel As suggested, I configured RMM to use pool allocation (code below). This seems to help a lot. pool = rmm.mr.PoolMemoryResource(rmm.mr.CudaMemoryResource(), initial_pool_size=2**36)
rmm.mr.set_current_device_resource(pool) I also discard the runtime of the first call to Below is the runtimes we observed for cuGraph Leiden (inc. other comparisons). cuGraph Leiden fails to run on the arabic-2005, uk-2005, webbase-2001, it-2004, and sk-2005 graphs due to out of memory issues. We use an NVIDIA A100 GPU. |
What is your question?
Hello @afender I want to benchmark the runtime of
cugraph.leiden()
. For a benchmark of the algorithm, one should only consider the runtime of the algorithm, and exclude the runtime for validations and initial memory allocations. A direct measurement of runtime around the cugraph call includes all of the above. Is it possible to get an "algorithm runtime" from the call tocugraph.leiden()
?Code of Conduct
The text was updated successfully, but these errors were encountered: