-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PrefixSum Re-enable tests. #420
Open
AdityaAtulTewari
wants to merge
671
commits into
IntelligentSoftwareSystems:master
Choose a base branch
from
utcs-scea:AdityaAtulTewari/prefixsum-reenable-test
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
PrefixSum Re-enable tests. #420
AdityaAtulTewari
wants to merge
671
commits into
IntelligentSoftwareSystems:master
from
utcs-scea:AdityaAtulTewari/prefixsum-reenable-test
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Initialization of weights on CPU did not initialize weights on GPU: this commit adds a call to copy over newly initialized CPU weights over to the GPU. Moves init of GPU memory before the CPU weights are initialized as well to prevent a nullptr copy. Also adds a debug function to print vectors on the GPU.
Reenable assertions for the GPU GCN layer forward pass which now works. The next step is to get the backward pass test working which involves adding a function for user code to copy things over to CUDA without needing to include the CUDA header + adding the appropriate GPU functions in the backend.
Adds a function that allocates GPU memory and copies over a particular passed in vector to the GPU. At the moment there is no way to free this memory and it will leak. This function is added mostly for unit test purposes and should not be used otherwise.
Add ifdefs to separate CPU/GPU code in the backward step and also change the structures being used to PointerWithSize (gpus don't like CPU vectors). Added a few TODOs too for better code organization later.
Adds functions for calculating the weight and layer gradients of the GCN layer. Untested: the tests will be added in a commit down the line.
Add functions to copy the backward output and weight gradients of a layer from GPU to CPU. Also moved some function definitions to the header since the definitions were quite small.
Since the aggregation in the GPU doesn't actually overwrite but adds to, the entire output matrix needs to be zero'd out before anything is done on it else you will have garbage values on it.
The forward and backward pass of a GCN layer without dropout/activation works fine now. All that is left for fully functioning GPU code is the output layer (softmax). Dropout and activation are nice to have but are not critical to "function" (though obviously they will be added).
Add ifdefs to calls in softmax layer in preparation for GPU calls.
For some reason they were Labels which are not needed since the masks are essentially bitsets: they have been changed to chars to save more space.
Returning dead objects + removing unused arguments in Softmax layer files to allow GPU build to compile
Adds code to copy over the masks for the train, val, test sets to the GPU. Removes norm factor variable + adds the free calls for masks to the destructor as well.
Adds the file for the GPU object for the Softmax layer and adds the call to the Forward phase of the GPU code. The call itself is not yet defined.
Adds a softmax function on GPUs that can be called from GPU kernels.
Added a few things from old codebase's CUDA utils to new one in preparation for using the newly added things to compute the softmax layer. Also added the original source of the old code: Caffe.
This commit adds the softmax/cross entropy function to GNNMath.cu and uses it to define the GPU Softmax forward phase function. An additional argument was added to the forward phase gpu call to deal with the different phases: the phase argument details which mask to use in the softmax. There are a few things left to do that will be done later, namely zero'ing out the output matrix. Note that I have NOT defined cross entropy for the forward phase: it is only used to calculate loss, and I'm not using loss nor referring to it anywhere in my code or analysis at the moment..
Fixed some bugs exposed by the unit test for softmax forward, namely that the feature length size was incorrect and that the vector was not being 0'd out before softmax occured. The unit test in question has been ported over from the cpu softmax unit test as well. The next step is to finish up the backward pass for the softmax layer and reactivate the unit test calls to the backward phase. I also need to consider actually checking backward phase output to make sure it is sane.
Moved code to select the right mask pointer passed on the current layer phase to a function as it will be used in backward phase as well.
Ground truth is represented with GNNLabel, but I was using a GNNFloat. This caused the labels being read to be garbaged when used on the GPU. This commit changes it them to the correct type. It also includes the signature definition of the backward phase: the implementation will be included in the next commit. (Split the commits up for modularity's sake)
Adds the backward phase for the softmax layer for the GPU. The implementation is taken from the non-refactored old code: it copies a prediction to shared memory (presumably to improve locality) then does cross entropy to softmax derivatives. It remains to be seen if the shared memory copy is actually more efficient; some testing will be done down the line. Also adds print to both cpu and gpu softmax tests in order to verify that both are doing the same compute (which they are in this commit).
This commit adds the declarations for the global accuracy getter for GPU GNNs as well as the orchestration of the call to the GPU version. The rest of the implementation will come in a later commit: for now this isn't priority as I can still compute accuracy on the CPU. Adds a new GNNGPU object to hold all GPU related things for the GNN class.
Adds a GPU Adam optimizer class that holds the allocations for the moments used in the adam optimizer on the GPU. Adds a gpu version of the adam test as well to make sure build is sane in its current state. The CPU optimizer class is also now split into the CPU/GPU paths depending on which build is being used. Next step is to do the adam optimizer on the GPU proper.
The gradient descent call in the optimizers now uses PointerWithSize rather than std::vectors. This is for compatibility with GPU pointers. Calls to the function have been changed throughout the code accordingly.
Implements Adam optimization on the GPU and makes sure it's sane via the gpu unit test. Also fixes an inconsistency with the CPU adam optimizer where a sqrt wasn't being applied to epsilon like it is in the original non-refactored code.
Adds a gpu version of the epoch test and fixes the pointers returned from a GNN layer (it was always returning CPU pointers even in the GPU build). Adds error checking to cuSparse call too. gpu-epoch-test runs a GNN end to end (still missing some features that CPU has), but it has to copy predictions over from GPU (slow, should do this from GPU end) + there seem to be accuracy issues on reddit. Will be resolved in a later commit.
Norm factors are required during aggregation in order for the current computation on GPU to match CPU computation (earlier I was under the impression that norm factors were integrated into the data that was already copied, but this is incorrect). This commit adds the norm factor copy from CPU to GPU.
Aggregation in the GPU for GCN now uses norm factors to normalize the aggregations of neighbors. This change allows it to exactly match computation done on a CPU if dropout is turned off. The next step is to add dropout support to the GPU.
Efficient dropout support requires RNG on the GPU: this commit adds a function to init the CuRAND RNG so that the GPU can generate the random numbers required to choose things to drop for dropout.
Initializes a dropout mask for every GPU layer. Can be optimized if dropout is disabled (i.e. do not allocate) for both CPU/GPUs. This will be handled later once a base implementation of everything is settled. It is a float because the float will be checked during dropout to see if it crosses some threshold for dropout.
chore: Remove unused submodules
* add info for compaction policy * align atomics to cache line
* fix: WMD graph vertex schema and add phmap to part of importer
* chore: Remove instrumentation legacy code
* changes to switch to LC_LS_CSR graph * fixing debug err * fixing pre-commit issues * changing api calls for wf4 * Added data.001.csv using lfs * fixing getEdgeData api * fix for getEdgeData() * ci fix * data.001.csv * changing test dataset * quickfix * fixing precommit * fixing graph deallocate() * fixing test * Update workflows to be realistic * CPU set * Try this again * Try this again * Slight refactor --------- Co-authored-by: AdityaAtulTewari <adityaatewari@gmail.com>
* dynamic edges support * adding correct test * fixing precommit * moving static file to lfs
AdityaAtulTewari
force-pushed
the
AdityaAtulTewari/prefixsum-reenable-test
branch
from
April 15, 2024 19:11
553e98d
to
79a5e03
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.