-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward-merge branch-24.12 into branch-25.02 #4782
Conversation
Enables telemetry during cugraph's build process. This parses github job metadata to obtain timing information. It should have very little impact on overall build time, and should not interfere with any build tools. This implements emitting OpenTelemetry traces and spans, as described in rapidsai/build-infra#139 Authors: - Mike Sarahan (https://github.com/msarahan) Approvers: - Bradley Dice (https://github.com/bdice) URL: #4740
FAILURE - Unable to forward-merge due to an error, manual merge is necessary. Do not use the IMPORTANT: When merging this PR, do not use the auto-merger (i.e. the |
Increase max_iterations in MG HITS tests (with edge masking). With masking, we may end up with different graphs with different numbers of GPUs; this results in higher iteration counts for convergence for certain GPU counts. Increase the maximum iteration count to consider this. Authors: - Seunghwa Kang (https://github.com/seunghwak) Approvers: - Chuck Hastings (https://github.com/ChuckHastings) URL: #4783
This PR includes multiple updates to cut peak memory usage in graph creation and improve performance of BFS on scale-free graphs. * Add a bitmap for non-zero local degree vertices in the hypersparse region; this information can be used to quickly filter out locally zero degree vertices which don't need to be processed in multiple instances. * Store (global-)degree offsets for vertices in the hypersparse region; this information can used to quickly identify the vertices with a certain global degree (e.g. for global degree 1 vertices, we can skip inter-GPU reduction as we know each vertex has only one neighbor). * Skip kernel invocations in computing edge counts if the vertex list is empty. * Add asynchronous functions to compute edge counts. This helps in preventing unnecessary serialization when we can process multiple such functions concurrently. * Replace rmm::exec_policy with rmm::exec_policy_nosync in multiple places; the former enforces stream synchronization at the end. The latter does not. * Enforce cache line alignment in NCCL communication in multiple places (NCCL communication performance is significantly affected by cache line alignment, often leading to 30-40% or more differences). * For primitives working on a subset of vertices, broadcast a vertex list using a bitmap if the vertex frontier size is large. If the vertex frontier size is small (in case vertex_t is 8B and the local vertex partition range can fit into 4B), use vertex offsets instead of vertices to cut communication volume. * Merge multiple host scalar communication function calls to a single one. * Increase multi-stream concurrency in detail::extract_transform_e & detail::per_v_transform_reduce_e * Multiple optimizations in template specialization (for update_major == true && reduce_op == any && key type is vertex && working on a subset of vertices) in detail::per_v_transform_reduce_e (this includes pre-processing vertices with non-zero local degrees; so we don't need to process such vertices using multiple GPUs, pre-filtering of zero local degree vertices, allreduce communication to reduce shuffle communication volumes, and special treatment of global degree 1 vertices, and so on). * Multiple optimizations & specializations in detail::fill_edge_minor_property that works on a subset of vertices (this includes kernel fusion, specialization for bitmap properties including direct broadcast to the property buffer and special treatments for vertex partition boundaries, and so on). * Added multiple optimizations & specializations in transform_reduce_v_frontier_outgoing_e (especially for reduce_op::any and to cut communication volumes and to filter out (key, value) pairs that won't contribute to the final results). * Multiple low-level optimizations in direction optimizing BFS (including approximations in determining between bottom -up and top-down). * Multiple optimizations to cut peak memory usage in graph creation. Authors: - Seunghwa Kang (https://github.com/seunghwak) Approvers: - Chuck Hastings (https://github.com/ChuckHastings) URL: #4751
cugraph is no longer dependent on cugraph_ops and no longer pulls files from the cugrpahops repo. But we have two `assert` statements still assuming cugrpahops files are available. These assert statements are compiled only in the debug mode. This PR fixes build errors due to these assert statements in the debug build. Closes #4763 Authors: - Seunghwa Kang (https://github.com/seunghwak) - Chuck Hastings (https://github.com/ChuckHastings) Approvers: - Chuck Hastings (https://github.com/ChuckHastings) - Joseph Nke (https://github.com/jnke2016) URL: #4774
…e`, minor cleanup (#4776) * updates READMEs to remove outdated nx-cugraph text * updates `core_number` docs, APIs, tests to properly ignore `degree_type` due to `core_number` not supporting directed graphs which `degree_type` is intended for - `degree_type` settings will be honored when directed graphs are supported. * renames test helper function for clarity * fixes issue with datasets API to properly recreate the edgelist for MG (dask) if previously created for SG. Authors: - Rick Ratzel (https://github.com/rlratzel) Approvers: - Don Acosta (https://github.com/acostadon) - Alex Barghi (https://github.com/alexbarghi-nv) - Brad Rees (https://github.com/BradReesWork) URL: #4776
This updates docs. Code changes are here: rapidsai/nx-cugraph#27 See: #4760 Authors: - Erik Welch (https://github.com/eriknw) Approvers: - Rick Ratzel (https://github.com/rlratzel) URL: #4766
I noticed the `Traversals` table is not showing up in the [nx-cugraph docs](https://docs.rapids.ai/api/cugraph/stable/nx_cugraph/supported-algorithms/). `sphinx-lint` catches the underlying issue, and it also caught a couple other minor issues that this PR fixes. `sphinx-lint` is used by other notable repos such as `pandas` ([here](https://github.com/pandas-dev/pandas/blob/7fe270c8e7656c0c187260677b3b16a17a1281dc/.pre-commit-config.yaml#L92-L96)] and CPython ([here](https://github.com/python/cpython/blob/8fe1926164932f868e6e907ad72a74c2f2372b07/.pre-commit-config.yaml#L68-L73)) Authors: - Erik Welch (https://github.com/eriknw) Approvers: - Jake Awe (https://github.com/AyodeAwe) - Ralph Liu (https://github.com/nv-rliu) - Rick Ratzel (https://github.com/rlratzel) URL: #4771
Forward-merge triggered by push to branch-24.12 that creates a PR to keep branch-25.02 up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge. See forward-merger docs for more info.