Planning

Table of Contents Phase 1 A1.1 Identification of parts that are related to GPU (host and device) A1.2 Porting host code to OpenCL A1.3 Porting device code to OpenCL A1.4 Entire project setup for OpenCL A1.5 Testing implementation Phase 2 A2.1 Identification of the most common usage scenarios A2.2 Measurement of baseline execution times for CPU, Nvidia GPU (CUDA & OpenCL) and AMD A2.3 Generic optimizations A2.4 Nvidia-specific optimisation to bring the OpenCL performance up to the level of the CUDA implementation A2.5 AMD-specific optimisation (GCN) Phase 3 Phase 4 Phase 5 A5.1 Evaluate performance gain for intra-node multi-GPU execution A5.2 Identify possible optimizations for multi-node multi-GPU execution A5.3 Implement intra-node multi-GPU optimizations A5.4 Implement multi-node multi-GPU optimizations

Phase 1

Goal: To have a working OpenCL version

Status for Phase 1

List with activities for Phase 1:

A1.1 Identification of parts that are related to GPU (host and device)

For A1.1 log see A1.1 log

A1.2 Porting host code to OpenCL

For A1.2 log see A1.2 log

Task list:

T1.2.1 Initialisation - detection of available OpenCL platforms and their setup
T1.2.2 Choosing the target device
T1.2.3 Adding structures or modifying existing ones to accommodate OpenCL execution*
T1.2.4 Setup code for the execution of an OpenCL kernel
T1.2.5 Actual execution of OpenCL code

A1.3 Porting device code to OpenCL

For A1.3 log see A1.3 log

Task list:

T1.3.1 Separation of CUDA host/device code
T1.3.2 Translation of device code from CUDA to OpenCL
- ST1.3.2.1 Achieving a basic compilable form of the actual kernel code
- ST1.3.2.2 Adapting the data exchange format between host and device
- ST1.3.2.3 Finalising implementation in a fully functional form

A1.4 Entire project setup for OpenCL

For A1.4 log see A1.4 log

Task list:

T1.4.1 Changing build scripts to accommodate the new code and all OpenCL dependencies
T1.4.2 Adding necessary flags for switching between CUDA and OpenCL
T1.4.3 Functional OpenCL version on non-NVIDIA GPUs

A1.5 Testing implementation

For A1.5 log see A1.5 log

For testing results go to Testing

Task list:

T1.5.1 Checking that the changes do not affect the functionality of the initial implementation
T1.5.2 Validating the consistency between OpenCL and CUDA code
T1.5.3 Testing on multiple platforms

Phase 2

Goal: To optimize the existing OpenCL kernels

Status for Phase 2

List with activities for Phase 2:

A2.1 Identification of the most common usage scenarios

For A2.1 log see A2.1 log

A2.2 Measurement of baseline execution times for CPU, Nvidia GPU (CUDA & OpenCL) and AMD

For A2.2 log see A2.2 log

A2.3 Generic optimizations

For A2.3 log see A2.3 log

A2.4 Nvidia-specific optimisation to bring the OpenCL performance up to the level of the CUDA implementation

For A2.4 log see A2.4 log

A2.5 AMD-specific optimisation (GCN)

For A2.5 log see A2.5 log

Phase 3

Goal: To finalize the internal architecture

Status for Phase 3

List with activities for Phase 3:

Phase 4

Goal: To identify slow computations that can benefit from GPU execution

Status for Phase 4

List with activities for Phase 4:

Phase 5

Goal: To enable multi-GPU execution

Status for Phase 5

List with activities for Phase 5:

A5.1 Evaluate performance gain for intra-node multi-GPU execution

For A5.1 log see A5.1 log

Task list:

T5.1.1 Evaluate performance gain for splitting the work currently done by the GPU between several GPUs
T5.1.2 Evaluate performance gain for offloading work identified in Phase 4 to another GPU than the one used for non-bonded force calculation

A5.2 Identify possible optimizations for multi-node multi-GPU execution

For A5.2 log see A5.2 log

A5.3 Implement intra-node multi-GPU optimizations

For A5.3 log see A5.3 log

A5.4 Implement multi-node multi-GPU optimizations

For A5.4 log see A5.4 log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Planning

Table of Contents

Phase 1

A1.1 Identification of parts that are related to GPU (host and device)

A1.2 Porting host code to OpenCL

A1.3 Porting device code to OpenCL

A1.4 Entire project setup for OpenCL

A1.5 Testing implementation

Phase 2

A2.1 Identification of the most common usage scenarios

A2.2 Measurement of baseline execution times for CPU, Nvidia GPU (CUDA & OpenCL) and AMD

A2.3 Generic optimizations

A2.4 Nvidia-specific optimisation to bring the OpenCL performance up to the level of the CUDA implementation

A2.5 AMD-specific optimisation (GCN)

Phase 3

Phase 4

Phase 5

A5.1 Evaluate performance gain for intra-node multi-GPU execution

A5.2 Identify possible optimizations for multi-node multi-GPU execution

A5.3 Implement intra-node multi-GPU optimizations

A5.4 Implement multi-node multi-GPU optimizations

Clone this wiki locally