Skip to content
This repository has been archived by the owner on Nov 25, 2024. It is now read-only.

Forward-merge branch-23.12 to branch-24.02 #102

Merged
merged 6 commits into from
Nov 27, 2023

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Nov 27, 2023

Manual forward merge from 23.12 to 24.02. This PR should not be squashed.

Closes #99.

chuangz0 and others added 6 commits November 21, 2023 17:53
replace `optparse`  with `argparser` in `pylibwholegraph`

Authors:
  - Chuang Zhu (https://github.com/chuangz0)

Approvers:
  - https://github.com/dongxuy04
  - Brad Rees (https://github.com/BradReesWork)

URL: rapidsai#61
The package currently has runtime dependencies on librmm and libraft-headers. However, these packages are header-only, so it's only sensible to require them at build time. This PR removes librmm and libraft-headers from the runtime dependencies of libwholegraph.

This PR is needed to fix the conda environments in [unified devcontainers](https://github.com/rapidsai/devcontainers).

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)

URL: rapidsai#96
Replace the random number generator (RNG) implemented by wholegraph with RNG provided by RAFT. 
It is put forward by issue rapidsai#7 and issue rapidsai#23.

Authors:
  - https://github.com/linhu-nv
  - Brad Rees (https://github.com/BradReesWork)

Approvers:
  - https://github.com/dongxuy04
  - Chuang Zhu (https://github.com/chuangz0)

URL: rapidsai#79
Support `gather` and `scatter` operations with NVSHMEM when using the `distributed` embedding.

The way to use `nvshmem` is 
```
#build command:  ./build.sh --enable-nvshmem ....


    global_comm, local_comm = wgth.init_torch_env_and_create_wm_comm(
        wgth.get_rank(),
        wgth.get_world_size(),
        wgth.get_local_rank(),
        wgth.get_local_size(),
        options.distributed_backend_type
    )

```
If `distributed_backend_type` is `"nvshmem"`, then the `gather` and `scatter` operation for distributed wholememory created by `global_comm` will be implemented with `nvshmem`.
Distributed embedding with `cache` is not supported when using nvshmem.

Authors:
   - Chuang Zhu (https://github.com/chuangz0)

Approvers:
   - Seunghwa Kang (https://github.com/seunghwak)
   - Brad Rees (https://github.com/BradReesWork)
@bdice bdice requested a review from a team as a code owner November 27, 2023 20:17
@BradReesWork BradReesWork added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Nov 27, 2023
@bdice bdice force-pushed the branch-24.02-merge-23.12 branch 2 times, most recently from 94fd363 to 9853c62 Compare November 27, 2023 20:23
@BradReesWork
Copy link
Member

/ok to test

@bdice
Copy link
Contributor Author

bdice commented Nov 27, 2023

@BradReesWork Just FYI, we do not want to /merge this PR because that will squash merge it. Typically we admin-merge the forward-merge PRs without running CI at all. But @raydouglass said that his merge power is being blocked by a GitHub outage https://www.githubstatus.com/incidents/66vhjmd266r9.

@raydouglass raydouglass merged commit 0586438 into rapidsai:branch-24.02 Nov 27, 2023
44 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants