forked from pytorch/FBGEMM
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Make the scratch pad tensor UVA (pytorch#2844)
Summary: Pull Request resolved: pytorch#2844 Before this diff, the scratch pad in SSD TBE (see D55998215 for more detail) was a CPU tensor which was later transferred to GPU to allow the TBE kernels to access it. The scratch pad tranfer was highly inefficient since TBE over provisioned the scratch pad buffer allocation (as it did not know the exact number of cache missed rows) causing extra data transfer. Such the extra data transfer could be large since the number of cache missed rows was normally much smaller than value that TBE over provisioned. There are two ways to avoid the extra data transfer: (1) Let TBE have the exact number of cache missed rows on host which requires device-to-host data transfer which will cause a sync point between host and device (not desirable in most trainings). However, this will allow TBE to use `cudaMemcpy` which will utilize the DMA engine and will allow the memory copy to overlap efficiently with other compute kernels. (2) Make the scratch pad accessible by both CPU and GPU. In other words, make the scratch pad a UVA tensor. This does not require device and host synchornization. However, the memory copy has to be done through CUDA load/store which requires a kernel to run on SMs. Thus, the memory copy and compute kernel overlapping will require a careful SMs management. Based on the tradeoffs explained above, we chose to implement (2) to avoid the host and device sync point. Differential Revision: D58631974
- Loading branch information
1 parent
27ef127
commit cb0be42
Showing
1 changed file
with
87 additions
and
51 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters