PrefixSum Re-enable tests. #420

AdityaAtulTewari · 2024-04-15T19:08:16Z

No description provided.

Initialization of weights on CPU did not initialize weights on GPU: this commit adds a call to copy over newly initialized CPU weights over to the GPU. Moves init of GPU memory before the CPU weights are initialized as well to prevent a nullptr copy. Also adds a debug function to print vectors on the GPU.

Reenable assertions for the GPU GCN layer forward pass which now works. The next step is to get the backward pass test working which involves adding a function for user code to copy things over to CUDA without needing to include the CUDA header + adding the appropriate GPU functions in the backend.

Adds a function that allocates GPU memory and copies over a particular passed in vector to the GPU. At the moment there is no way to free this memory and it will leak. This function is added mostly for unit test purposes and should not be used otherwise.

Add ifdefs to separate CPU/GPU code in the backward step and also change the structures being used to PointerWithSize (gpus don't like CPU vectors). Added a few TODOs too for better code organization later.

Adds functions for calculating the weight and layer gradients of the GCN layer. Untested: the tests will be added in a commit down the line.

Add functions to copy the backward output and weight gradients of a layer from GPU to CPU. Also moved some function definitions to the header since the definitions were quite small.

Since the aggregation in the GPU doesn't actually overwrite but adds to, the entire output matrix needs to be zero'd out before anything is done on it else you will have garbage values on it.

The forward and backward pass of a GCN layer without dropout/activation works fine now. All that is left for fully functioning GPU code is the output layer (softmax). Dropout and activation are nice to have but are not critical to "function" (though obviously they will be added).

Add ifdefs to calls in softmax layer in preparation for GPU calls.

For some reason they were Labels which are not needed since the masks are essentially bitsets: they have been changed to chars to save more space.

Returning dead objects + removing unused arguments in Softmax layer files to allow GPU build to compile

Adds code to copy over the masks for the train, val, test sets to the GPU. Removes norm factor variable + adds the free calls for masks to the destructor as well.

Adds the file for the GPU object for the Softmax layer and adds the call to the Forward phase of the GPU code. The call itself is not yet defined.

Adds a softmax function on GPUs that can be called from GPU kernels.

Added a few things from old codebase's CUDA utils to new one in preparation for using the newly added things to compute the softmax layer. Also added the original source of the old code: Caffe.

This commit adds the softmax/cross entropy function to GNNMath.cu and uses it to define the GPU Softmax forward phase function. An additional argument was added to the forward phase gpu call to deal with the different phases: the phase argument details which mask to use in the softmax. There are a few things left to do that will be done later, namely zero'ing out the output matrix. Note that I have NOT defined cross entropy for the forward phase: it is only used to calculate loss, and I'm not using loss nor referring to it anywhere in my code or analysis at the moment..

Fixed some bugs exposed by the unit test for softmax forward, namely that the feature length size was incorrect and that the vector was not being 0'd out before softmax occured. The unit test in question has been ported over from the cpu softmax unit test as well. The next step is to finish up the backward pass for the softmax layer and reactivate the unit test calls to the backward phase. I also need to consider actually checking backward phase output to make sure it is sane.

Moved code to select the right mask pointer passed on the current layer phase to a function as it will be used in backward phase as well.

Ground truth is represented with GNNLabel, but I was using a GNNFloat. This caused the labels being read to be garbaged when used on the GPU. This commit changes it them to the correct type. It also includes the signature definition of the backward phase: the implementation will be included in the next commit. (Split the commits up for modularity's sake)

Adds the backward phase for the softmax layer for the GPU. The implementation is taken from the non-refactored old code: it copies a prediction to shared memory (presumably to improve locality) then does cross entropy to softmax derivatives. It remains to be seen if the shared memory copy is actually more efficient; some testing will be done down the line. Also adds print to both cpu and gpu softmax tests in order to verify that both are doing the same compute (which they are in this commit).

This commit adds the declarations for the global accuracy getter for GPU GNNs as well as the orchestration of the call to the GPU version. The rest of the implementation will come in a later commit: for now this isn't priority as I can still compute accuracy on the CPU. Adds a new GNNGPU object to hold all GPU related things for the GNN class.

Adds a GPU Adam optimizer class that holds the allocations for the moments used in the adam optimizer on the GPU. Adds a gpu version of the adam test as well to make sure build is sane in its current state. The CPU optimizer class is also now split into the CPU/GPU paths depending on which build is being used. Next step is to do the adam optimizer on the GPU proper.

The gradient descent call in the optimizers now uses PointerWithSize rather than std::vectors. This is for compatibility with GPU pointers. Calls to the function have been changed throughout the code accordingly.

Implements Adam optimization on the GPU and makes sure it's sane via the gpu unit test. Also fixes an inconsistency with the CPU adam optimizer where a sqrt wasn't being applied to epsilon like it is in the original non-refactored code.

Adds a gpu version of the epoch test and fixes the pointers returned from a GNN layer (it was always returning CPU pointers even in the GPU build). Adds error checking to cuSparse call too. gpu-epoch-test runs a GNN end to end (still missing some features that CPU has), but it has to copy predictions over from GPU (slow, should do this from GPU end) + there seem to be accuracy issues on reddit. Will be resolved in a later commit.

Norm factors are required during aggregation in order for the current computation on GPU to match CPU computation (earlier I was under the impression that norm factors were integrated into the data that was already copied, but this is incorrect). This commit adds the norm factor copy from CPU to GPU.

Aggregation in the GPU for GCN now uses norm factors to normalize the aggregations of neighbors. This change allows it to exactly match computation done on a CPU if dropout is turned off. The next step is to add dropout support to the GPU.

Efficient dropout support requires RNG on the GPU: this commit adds a function to init the CuRAND RNG so that the GPU can generate the random numbers required to choose things to drop for dropout.

Initializes a dropout mask for every GPU layer. Can be optimized if dropout is disabled (i.e. do not allocate) for both CPU/GPUs. This will be handled later once a base implementation of everything is settled. It is a float because the float will be checked during dropout to see if it crosses some threshold for dropout.

chore: Remove unused submodules

* add info for compaction policy * align atomics to cache line

* fix: WMD graph vertex schema and add phmap to part of importer

This reverts commit 9b91847.

* chore: Remove instrumentation legacy code

* changes to switch to LC_LS_CSR graph * fixing debug err * fixing pre-commit issues * changing api calls for wf4 * Added data.001.csv using lfs * fixing getEdgeData api * fix for getEdgeData() * ci fix * data.001.csv * changing test dataset * quickfix * fixing precommit * fixing graph deallocate() * fixing test * Update workflows to be realistic * CPU set * Try this again * Try this again * Slight refactor --------- Co-authored-by: AdityaAtulTewari <adityaatewari@gmail.com>

* dynamic edges support * adding correct test * fixing precommit * moving static file to lfs

l-hoang added 30 commits November 11, 2020 15:52

When init CuBLAS, set var to true

818b685

Prepping GCN Layer backward phase for GPU code

75719d0

Add ifdefs to separate CPU/GPU code in the backward step and also change the structures being used to PointerWithSize (gpus don't like CPU vectors). Added a few TODOs too for better code organization later.

GPU GCN weight gradient/layer gradient calc

6af0343

Adds functions for calculating the weight and layer gradients of the GCN layer. Untested: the tests will be added in a commit down the line.

Functions for backward/weight gradient from GPU

d1a7eff

Add functions to copy the backward output and weight gradients of a layer from GPU to CPU. Also moved some function definitions to the header since the definitions were quite small.

Readded zero'ing of output matrix for aggregation

614651f

Since the aggregation in the GPU doesn't actually overwrite but adds to, the entire output matrix needs to be zero'd out before anything is done on it else you will have garbage values on it.

Softmax layer funcs prep split into CPU/GPU

bbe5fe7

Add ifdefs to calls in softmax layer in preparation for GPU calls.

Dataset masks in GNNs now chars

e735eb6

For some reason they were Labels which are not needed since the masks are essentially bitsets: they have been changed to chars to save more space.

Cleanup to Softmax layer to let GPU build work

85c544e

Returning dead objects + removing unused arguments in Softmax layer files to allow GPU build to compile

Copy over GNN node masks to GPU

6139288

Adds code to copy over the masks for the train, val, test sets to the GPU. Removes norm factor variable + adds the free calls for masks to the destructor as well.

Softmax GPU object + hook to its forward phase

cba333b

Adds the file for the GPU object for the Softmax layer and adds the call to the Forward phase of the GPU code. The call itself is not yet defined.

Softmax function for GPUs

ed546d4

Adds a softmax function on GPUs that can be called from GPU kernels.

CUDA_KERNEL_LOOP and some helper calcs

8017d6a

Added a few things from old codebase's CUDA utils to new one in preparation for using the newly added things to compute the softmax layer. Also added the original source of the old code: Caffe.

Softmax: mask selection function refactoring

bf1b355

Moved code to select the right mask pointer passed on the current layer phase to a function as it will be used in backward phase as well.

GradientDescent call now uses PointerWithSize

af9c72a

The gradient descent call in the optimizers now uses PointerWithSize rather than std::vectors. This is for compatibility with GPU pointers. Calls to the function have been changed throughout the code accordingly.

Adam optimizer on GPU + test done; CPU fix

599ee5f

Implements Adam optimization on the GPU and makes sure it's sane via the gpu unit test. Also fixes an inconsistency with the CPU adam optimizer where a sqrt wasn't being applied to epsilon like it is in the original non-refactored code.

GPU GCN aggregation uses norm factors

0cdaaf5

Aggregation in the GPU for GCN now uses norm factors to normalize the aggregations of neighbors. This change allows it to exactly match computation done on a CPU if dropout is turned off. The next step is to add dropout support to the GPU.

Init function for CuRAND

982a0bc

Efficient dropout support requires RNG on the GPU: this commit adds a function to init the CuRAND RNG so that the GPU can generate the random numbers required to choose things to drop for dropout.

patrickkenney9801 and others added 28 commits March 18, 2024 11:55

ci: Enable github ci

15f203f

chore: Remove unused submodules

reduce contention on parallel insertions (#5)

6550c3f

chore: Add parallel map and pcg generator as dependencies (#3)

86f9a71

reduce vertex metadata size and remove vertex locks (#8)

700cc9b

add info for compaction policy (#7)

0db6557

* add info for compaction policy * align atomics to cache line

improve compaction performance by reducing lock contention (#9)

8e25f1d

add data to the LS_CSR (#10)

c54db22

chore fix latent merge issues (#13)

12c4e18

* fix: WMD graph vertex schema and add phmap to part of importer

support creation of new vertices in LSCSR (#15)

89d65c3

GNN tests (#12)

9b91847

Revert "GNN tests (#12)" (#19)

0302191

This reverts commit 9b91847.

improve LS_CSR edge iterator performance (#18)

5d4ca96

add features to LS_CSR (#22)

096977f

fix bug in LSCSR class visibility (#25)

24b8db2

add findEdge to LS_CSR (#23)

dcafab6

expose the LS_CSR prefix sum computation (#26)

8c1fd1d

chore: Add test graph constructor (#16)

8ee55e5

* chore: Remove instrumentation legacy code

remove tombstoning from LS_CSR (#27)

2658bd2

remove explicit hugepages (#24)

11b843e

fix latent bug in morphgraph (#30)

cb75c39

Use parallel hashmap in LS_CSR for edge data (#31)

b9a1906

ci: Disable debug builds (#32)

3262256

use work stealing in compaction (#34)

0b435d8

dynamic edges support (#35)

1cad580

* dynamic edges support * adding correct test * fixing precommit * moving static file to lfs

use serial prefix sum in lscsr (#37)

73e91c6

Updated test file

2a07be1

File update

79a5e03

AdityaAtulTewari force-pushed the AdityaAtulTewari/prefixsum-reenable-test branch from 553e98d to 79a5e03 Compare April 15, 2024 19:11

fixup! File update

4296fc8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PrefixSum Re-enable tests. #420

PrefixSum Re-enable tests. #420

AdityaAtulTewari commented Apr 15, 2024

PrefixSum Re-enable tests. #420

Are you sure you want to change the base?

PrefixSum Re-enable tests. #420

Conversation

AdityaAtulTewari commented Apr 15, 2024