-
Notifications
You must be signed in to change notification settings - Fork 2
/
accelerators.tex
19 lines (16 loc) · 1.04 KB
/
accelerators.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
\abschnitt{interaction with accelerators}
For many core devices several programming models, such as OpenACC, CUDA, OpenCL
etc., have been developed targeting \bfs{host-directed} execution using an
attached or integrated accelerator. The CPU executes the main program while
controlling the activity of the accelerator. Accelerator devices typically
provide capabilities for efficient vector processing\footnote{warp on CUDA
devices, wavefront on AMD GPUs, 512-bit SIMD on Intel Xeon Phi}. Usually the
host-directed execution uses \bfs{computation offloading} that permits
executing computationally intensive work on a separate device
(accelerator)\cite{OpenAcc}.
For instance CUDA devices use a \bfs{command buffer} to establish communication
between host and device. The host puts commands (op-codes) into the command
buffer and the device processes them \bfs{asynchronously}\cite{CUDA}.
It is obvious that a fiber switch does \bfs{not} interact with
\bfs{host-directed device-offloading}. A fiber switch works like a function
call (see \nameref{callingconvention}).