Skip to content
AncaSC edited this page Nov 12, 2014 · 6 revisions

Table of Contents

Phase 1

Goal: To have a working OpenCL version
Status for Phase 1
List with activities for Phase 1:

A1.1 Identification of parts that are related to GPU (host and device)

For A1.1 log see A1.1 log

A1.2 Porting host code to OpenCL

For A1.2 log see A1.2 log
Task list:
  • T1.2.1 Initialisation - detection of available OpenCL platforms and their setup
  • T1.2.2 Choosing the target device
  • T1.2.3 Adding structures or modifying existing ones to accommodate OpenCL execution*
  • T1.2.4 Setup code for the execution of an OpenCL kernel
  • T1.2.5 Actual execution of OpenCL code

A1.3 Porting device code to OpenCL

For A1.3 log see A1.3 log
Task list:
  • T1.3.1 Separation of CUDA host/device code
  • T1.3.2 Translation of device code from CUDA to OpenCL
    • ST1.3.2.1 Achieving a basic compilable form of the actual kernel code
    • ST1.3.2.2 Adapting the data exchange format between host and device
    • ST1.3.2.3 Finalising implementation in a fully functional form

A1.4 Entire project setup for OpenCL

For A1.4 log see A1.4 log
Task list:
  • T1.4.1 Changing build scripts to accommodate the new code and all OpenCL dependencies
  • T1.4.2 Adding necessary flags for switching between CUDA and OpenCL
  • T1.4.3 Functional OpenCL version on non-NVIDIA GPUs

A1.5 Testing implementation

For A1.5 log see A1.5 log
For testing results go to Testing
Task list:
  • T1.5.1 Checking that the changes do not affect the functionality of the initial implementation
  • T1.5.2 Validating the consistency between OpenCL and CUDA code
  • T1.5.3 Testing on multiple platforms

Phase 2

Goal: To optimize the existing OpenCL kernels
Status for Phase 2
List with activities for Phase 2:

A2.1 Identification of the most common usage scenarios

For A2.1 log see A2.1 log

A2.2 Measurement of baseline execution times for CPU, Nvidia GPU (CUDA & OpenCL) and AMD

For A2.2 log see A2.2 log

A2.3 Generic optimizations

For A2.3 log see A2.3 log

A2.4 Nvidia-specific optimisation to bring the OpenCL performance up to the level of the CUDA implementation

For A2.4 log see A2.4 log

A2.5 AMD-specific optimisation (GCN)

For A2.5 log see A2.5 log

Phase 3

Goal: To finalize the internal architecture
Status for Phase 3
List with activities for Phase 3:

Phase 4

Goal: To identify slow computations that can benefit from GPU execution
Status for Phase 4
List with activities for Phase 4:

Phase 5

Goal: To enable multi-GPU execution
Status for Phase 5
List with activities for Phase 5:

A5.1 Evaluate performance gain for intra-node multi-GPU execution

For A5.1 log see A5.1 log
Task list:
  • T5.1.1 Evaluate performance gain for splitting the work currently done by the GPU between several GPUs
  • T5.1.2 Evaluate performance gain for offloading work identified in Phase 4 to another GPU than the one used for non-bonded force calculation

A5.2 Identify possible optimizations for multi-node multi-GPU execution

For A5.2 log see A5.2 log

A5.3 Implement intra-node multi-GPU optimizations

For A5.3 log see A5.3 log

A5.4 Implement multi-node multi-GPU optimizations

For A5.4 log see A5.4 log