Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

highlight subtleties of CUDA code #43

Open
16 tasks
tkphd opened this issue Aug 18, 2017 · 0 comments
Open
16 tasks

highlight subtleties of CUDA code #43

tkphd opened this issue Aug 18, 2017 · 0 comments

Comments

@tkphd
Copy link
Collaborator

tkphd commented Aug 18, 2017

  • CUDA code looks weird due to thread allocation model
    • Traditional: many memory addresses per core -> looping constructs
    • Accelerator: one memory address per thread -> no loops
  • cached mask
  • input vs. output tiles, indexing
  • linear vs. dimensional array access
  • block size and domain boundaries
  • domain-boundary cells (ghosts) vs. tile-boundary cells (halos)
  • tile size, block size, grid size; local, source, and destination arrays
    • tile sizes are static: each block is the same, ignorant of domain boundaries
    • meaning of cuda_kernel<<<num_blocks, threads_per_block, shared_array_size>>> construct
    • block size, misfit, and GPU utilization
  • take care withint, ceil(), and floor()
  • troubleshooting
    • wrss==0.0029?
    • Segfault? printf array locations and sizes; recompile with -g then use cuda-memtest and backtrace
    • CUDA slower than OpenCL? Enable persistence mode.
  • Makefile flags and function locations: main() should be in a .c file, CUDA functions in .cu, and objects built with -dc flags
@tkphd tkphd modified the milestones: diffusion, benchmarks Aug 18, 2017
@tkphd tkphd added the GPU label Aug 18, 2017
@tkphd tkphd modified the milestones: benchmarks, v1.0 Sep 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant