GPU: Asynchronous data transfers and kernels - Heterogeneous computation #132

cguzman95 · 2019-11-05T14:02:58Z

I will work on this in the branch for #129. This issue is to document all ideas for asynchronous GPU execution, allowing GPU and CPU computation simultaneously.

cguzman95 · 2019-11-05T14:22:31Z

Adding model_data_id variable on ModelData struct. This variable will identify the ModelData objects created, at least with the GPU flag ON.

Reason: For asynchronous memcpy between GPU and CPU, it's necessary to define a stream. Example:

cudaStream_t stream[nStreams];
  for (int i = 0; i < nStreams; ++i)
    checkCuda( cudaStreamCreate(&stream[i]) );
    cudaMemcpyAsync(d_a, a, N, cudaMemcpyHostToDevice, stream[i])
    cudaStreamDestroy(stream[i]);

So, the stream needs to be declared on new_solver and be destroyed alongside the deallocation of other structures. This means it needs to be declared at the start of the camp_gpu_solver.cu file (or declared inside ModelData, but for the moment is global since it gives me an error trying to declare cudaStream_t in a .h file)

But what happens when we have multiple ModelData objects in an execution? (example: new unit tests, with multi-cell and one-cell solver). Both will try to create streams with the same id. So, this means they will share the same stream.

This could seem like no problem since in principle they won't call solve at the same time. But, it's possible that in the future we (or the user) want to divide multiple solvers into individual CPU threads (with MPI for example). In the case all threads use the same GPU, the GPU execution will slow down since all threads will try to use the same stream.

Not only this, if for some reason the user destroys a solver object, it will destroy also the global streams, so if some solver is still in execution, it will crash.

In conclusion: A specific id of ModelData objects is necessary in order to assign different streams for each possible ModelData object.

mattldawson · 2019-11-05T15:39:15Z

Hi @cguzman95 - I agree it's important to allow multiple instances of the CAMP core to run simultaneously. I've been talking to people at NCAR that are interested in possibly using CAMP once it's ready and they will require multiple cores to run at the same time on different threads using OpenMP. So we have to make sure there are no global variables (I don't think there are currently) and no fortran module variables (I think there are only constants right now). I think your original idea of including the cudaStream_t in ModelData is the best design. Trying to internally manage externally generated instances of the CAMP core (and thus ModelData) using ids is going to get too complicated. What is the error you're getting when you try to include the cudaStream_t in ModelData?

cguzman95 · 2019-11-05T15:46:08Z

Hi,

Yep, I agree with setting cudaStream_t in ModelData (I'm only using it as a global variable as a temporal "patch" to continue developing). Speaking of the error, only setting this two lines in any .h file:

#include <cuda.h>
cudaStream_t *stream_gpu;

Raise the error:

/gpfs/scratch/bsc32/bsc32815/gpupartmc/partmc/src/camp_common.h:198:3: error: unknown type name ‘cudaStream_t’
  cudaStream_t *stream_gpu;

Not sure the reason, maybe we are missing some configuration on the CMake?

mattldawson · 2019-11-05T16:08:42Z

do you also need: #include <cuda_runtime.h>?

cguzman95 · 2019-11-05T16:45:02Z

Yep, it's compiling fine now

cguzman95 added the enhancement label Nov 5, 2019

cguzman95 added this to the GPU chemistry solver milestone Nov 5, 2019

cguzman95 self-assigned this Nov 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU: Asynchronous data transfers and kernels - Heterogeneous computation #132

GPU: Asynchronous data transfers and kernels - Heterogeneous computation #132

cguzman95 commented Nov 5, 2019

cguzman95 commented Nov 5, 2019 •

edited

Loading

mattldawson commented Nov 5, 2019

cguzman95 commented Nov 5, 2019 •

edited

Loading

mattldawson commented Nov 5, 2019 •

edited

Loading

cguzman95 commented Nov 5, 2019

GPU: Asynchronous data transfers and kernels - Heterogeneous computation #132

GPU: Asynchronous data transfers and kernels - Heterogeneous computation #132

Comments

cguzman95 commented Nov 5, 2019

cguzman95 commented Nov 5, 2019 • edited Loading

mattldawson commented Nov 5, 2019

cguzman95 commented Nov 5, 2019 • edited Loading

mattldawson commented Nov 5, 2019 • edited Loading

cguzman95 commented Nov 5, 2019

cguzman95 commented Nov 5, 2019 •

edited

Loading

cguzman95 commented Nov 5, 2019 •

edited

Loading

mattldawson commented Nov 5, 2019 •

edited

Loading