Skip to content

Latest commit

 

History

History
88 lines (55 loc) · 3.2 KB

README.md

File metadata and controls

88 lines (55 loc) · 3.2 KB

Basic Example for OpenCL

This is a basic example for OpenCL programming model, serving a main purpose for being a good reference when new to OpenCL. For simplicity, most OpenCL API calls are wrapped in C++ wrapper functions. Reference the main function for the programming flow, and trace details in each wrapper function to understand how exactly the OpenCL host-side API works.

The repository contains two example programs, "Vector Add" and "Dot Product", each with independent host code (.cpp) and kernel code (.cl).

Vector Add

A very simple vector operation like C = A + B.

Since it is not a compute-intensive operation and OpenCL has some setup overhead, it is possible that the OpenCL kernel is slower in performance compared to C++ sequential code for small vector sizes and global work sizes.

Dot Prodcut

The program calculate the inner product of two vectors. This is also a BLAS1 operation but with good potential for optimization.

Refer to this program to see more techniques used in OpenCL programming such as utilizing local memory, choosing local work size (in other words, the work group size) and work group reduction.

Getting Started

You should be able to run OpenCL on most CPUs and/or GPUs.

A quick way to check if your machine is capable of running OpenCL is to use clinfo, simply run:

$ clinfo

If at least one OpenCL platform is shown, your have one or more hardwares that support OpenCL.

Prerequisites

To run OpenCL programs, you have to install one of the OpenCL runtime/sdk:

AMD SDK is recommended if you do not know which one to use.

Installing AMD APP SDK 3.0

$ wget http://debian.nullivex.com/amd/AMD-APP-SDKInstaller-v3.0.130.136-GA-linux64.tar.bz2
$ tar jxvf AMD-APP-SDKInstaller-v3.0.130.136-GA-linux64.tar.bz2
$ chmod +x ./AMD-APP-SDK-v3.0.130.136-GA-linux64.sh
$ ./AMD-APP-SDK-v3.0.130.136-GA-linux64.sh

Setting up OpenCL path for compilation

After installation, environment variables OCL_INC_DIR and OCL_LIB_DIR should be set to OpenCL include directory and OpenCL lib directory, respectively.

If you are using AMD APP SDK, run:

$ export OCL_IND_DIR=$AMDAPPSDKROOT/include
$ export OCL_LIB_IDR=$AMDAPPSDKROOT/lib/x86_64

Alternatively, you could just modify the compiling flags in Makefile.

Compilation

Compile:

make

Clean:

make clean

Run

Specify vector size and global work size when running the program.

For example, if you want to run the vector add operation while vector size = 100,000 and global work size = 1,000, run:

./build/vector_add 100000 1000

You could see whether the program is successful or not and the relative speed compared to C++ sequential version.

About why using clEnqueueMapBuffer over clEnqueueRead/WriteBuffer