-
A parallel implementation of LU decomposition
-
Two versions : 1) Using global memory alone 2) Using shared memory for pivot row
-
For both the implementations kernel with single thread scales the pivot row
-
Global memory : Blocks with one thread each are launched for reduction.
-
Shared memory : Blocks with static size are used. Thread with id==0 copies the pivot row into shared memory and after that rest of the threads in the block start reducing.
-
Notifications
You must be signed in to change notification settings - Fork 2
ravikanthreddy89/cuda
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
LU Decomposition using CUDA
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published