Skip to content

How can I tune parallelization with executors? #952

Answered by upsj
edocento asked this question in Q&A
Discussion options

You must be logged in to vote

You can control the number of processors used for OpenMP parallelization either with the OMP_NUM_THREADS environment variable or the omp_set_num_threads function inside your code. We rely on the OpenMP runtime or user to set a sensible number of cores. In most situations, that just means using as much parallelism as possible. With the exception of some cases where synchronization overheads or NUMA effects have a significant impact, using more threads also gives you better performance.
In CUDA, we do this tuning ourselves, generally using a thread-to-row or (sub)warp-to-row mapping for the kernels, with heuristic oversubscription parameters to ensure all SMs are busy, or calling cuBLAS and…

Replies: 1 comment 7 replies

Comment options

You must be logged in to vote
7 replies
@upsj
Comment options

@edocento
Comment options

@jordanpui
Comment options

@upsj
Comment options

@jordanpui
Comment options

Answer selected by upsj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants