Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for new NCCL Net plugin API change #1472

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/how-to/using-nccl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,17 @@ Initialization
supports ``dmabuf``, it should set ``ptrSupport`` to ``NCCL_PTR_HOST|NCCL_PTR_CUDA|NCCL_PTR_DMABUF`` and
provide a ``regMrDmaBuf`` function.

* The ``regIsGlobal`` field allows NCCL to register buffers in advance, for example, using a loopback connection.
Later, it also lets NCCL expect that a subsequent registration on a buffer from a previous registration
will happen nearly immediately, because the buffer is already known by the network adapter. A typical
implementation maintains a registration cache, with the call to ``ncclCommRegister`` creating the
initial entry in the cache using ``regMr()`` on a loopback connection. Any later call to the NCCL
system can call ``regMr()`` again on the real connection, with the real buffer (which could be at a
different offset within the original buffer, with a smaller size, for example). It
could then call ``deregMr()`` immediately afterwards.
The ``ncclCommDeregister`` call should issue the final call to ``deregMr()`` and effectively remove the mapping
on the network adapter.

* The ``speed`` field indicates the speed of the network port in Mbps (10^6 bits per second).
This ensures proper optimization of flows within the node.

Expand Down