Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for GPU-aware BP4 and BP5 #3088

Merged
merged 4 commits into from
Mar 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 63 additions & 1 deletion docs/user_guide/source/advanced/gpu_aware.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,66 @@
GPU-aware I/O
#################

TODO
The ``Put`` and ``Get`` functions in the BP4 and BP5 engines can receive user buffers allocated on the host or the device in both Sync and Deferred modes.

.. note::
Currently only CUDA allocated buffers are supported for device data.

If ADIOS2 is build without CUDA support, only buffers allocated on the host are supported. If ADIOS2 is build with CUDA support, by default, the library will automatically detect where does the buffer memory physically resides.

Users can also provide information about where the buffer was allocated by using the ``SetMemorySpace`` function within each variable.

.. code-block:: c++

enum class MemorySpace
{
Detect, ///< Detect the memory space automatically
Host, ///< Host memory space (default)
CUDA ///< CUDA memory spaces
};


Building with CUDA enabled
------------

If there is no CUDA toolkit installed, cmake will turn CUDA off automatically. ADIOS2 default behavior for ``ADIOS2_USE_CUDA`` is to enable CUDA if it can find a CUDA toolkit on the system. In case the system has a CUDA toolkit installed, but it is desired to build ADIOS2 without CUDA enabled ``-DADIOS2_USE_CUDA=OFF`` must be used.

When building ADIOS2 with CUDA enabled, the ``CMAKE_CUDA_ARCHITECTURES`` is set by default to 70 to match the NVIDIA Volta V100. For any other architecture, the user is responsible with setting the correct ``CMAKE_CUDA_ARCHITECTURES``.


Using CUDA buffers
------------

The following is a simple example of writing data to storage directly from a GPU buffer allocated with CUDA.

.. code-block:: c++

float *gpuSimData;
cudaMalloc(&gpuSimData, N * sizeof(float));
cudaMemset(gpuSimData, 0, N);
auto data = io.DefineVariable<float>("data", shape, start, count);

io.SetEngine("BP5"); // or BPFile
adios2::Engine bpWriter = io.Open(fname, adios2::Mode::Write);
// Simulation steps
for (size_t step = 0; step < nSteps; ++step)
{
bpWriter.BeginStep();
bpWriter.Put(data, gpuSimData, adios2::Mode::Deferred); // or Sync
bpWriter.EndStep();
}


The API is unchanged compared to using Host buffers for both the read and write logic.

If the ``SetMemorySpace`` function is used, the ADIOS2 library will not detect automatically where the buffer was allocated and will use the information provided by the user for all subsequent Puts or Gets. Example:

.. code-block:: c++

variable.SetMemorySpace(adios2::MemorySpace::CUDA);
for (size_t step = 0; step < nSteps; ++step)
{
bpWriter.BeginStep();
bpWriter.Put(data, gpuSimData, adios2::Mode::Deferred); // or Sync
bpWriter.EndStep();
}
2 changes: 1 addition & 1 deletion docs/user_guide/source/components/engine.rst
Original file line number Diff line number Diff line change
Expand Up @@ -306,7 +306,7 @@ The ``data`` fed to the ``Put`` function is assumed to be allocated on the Host
.. note::

Only CUDA allocated buffers are supported for device data.
Only the BP4 engine is capable of receiving device allocated buffers.
Only the BP4 and BP5 engines are capable of receiving device allocated buffers.


PerformsPuts
Expand Down
4 changes: 2 additions & 2 deletions source/adios2/common/ADIOSTypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,12 @@
namespace adios2
{

/** Memory space for the buffers received with Put */
/** Memory space for the user provided buffers */
enum class MemorySpace
{
Detect, ///< Detect the memory space automatically
Host, ///< Host memory space (default)
CUDA ///< GPU memory spaces
CUDA ///< CUDA memory spaces
};

/** Variable shape type identifier, assigned automatically from the signature of
Expand Down