-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues running C++ multi-GPU test #4601
Comments
First thing I would check is to make sure that you're getting what you asked for. Try adding an You might also set the following environment variable:
This will litter your output file with debugging messages, but it looks like there might be a comms issue (it's failing in a all gather). |
Attaching the error and output files with the debug info. CUGXX_6681940_4294967294.err.txt |
Thanks, I will review. |
As you may have found out, this issue is happening at the graph distribution phase, and it might be due to a GPU not having a partition since the test graph is relatively small (this is easy to fix, just throw an exception if a GPU has empty buffers). Running this on 2 GPUs works at my end, with this error:
This is probably because Another related question - what is the easiest way to read and distribute matrix market files (from SuiteSparse collection) to use in C++ MG graph codes? Is there a function in cugraph utilities that can be used? |
Sorry, I had not had a chance to look through your logs last week. These tests were written to get folks started in calling our C++ code directly. I think you have identified a couple of edge conditions that aren't being handled properly in these tests. About these specific issues:
Regarding reading matrix market files... we have functions within our test suite that can be used for this:
There's no fundamental difference (you can look in the code). If you want to tweak the edge list in some way before creating the graph you probably should use the first, otherwise the second is less code to manage. These are less than optimal (we only use them for testing). The biggest issue is that each GPU reads the entire MTX file and then filters out the subset it cares about. That means that you need sufficient GPU memory on each node to contain the entire edge list. I created a function somewhere (never merged it into the code base) that would read a different block of data on each GPU to do the parsing and then shuffle the parsed data to the proper GPU. You could adapt the code I linked to have each GPU read the file in blocks and filter the edges that are relevant there. That would let you manage the memory size but still would result in duplicate computations. |
Thanks, even after invoking |
What is your question?
This is the second part of: #4596
I am trying to run the multi-GPU test (https://github.com/rapidsai/cugraph/blob/branch-24.10/cpp/examples/users/multi_gpu_application/mg_graph_algorithms.cpp) on a single node, this is my job script:
Encountering segfault (from every process):
Please advise; the platform OpenMPI is CUDA-aware:
Code of Conduct
The text was updated successfully, but these errors were encountered: