Skip to content

Commit

Permalink
Merge pull request #387 from stfc/249_OCLTransKernels
Browse files Browse the repository at this point in the history
Issue #249. OCLTrans() generates OpenCL kernels (also fixes fparser2 stmt_fns)
  • Loading branch information
arporter authored Jun 12, 2019
2 parents 4035127 + 752e2f0 commit c4dd939
Show file tree
Hide file tree
Showing 21 changed files with 288 additions and 93 deletions.
5 changes: 5 additions & 0 deletions changelog
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,11 @@
48) PR #394 for #392. Fixes a bug in the way the test suite checks
for whether the graphviz package is available.

49) PR #387 for #249. Extends OCLTrans() so that all kernels within a
transformed Invoke are converted to OpenCL. Also includes a
work-around for array accesses incorrectly identified as Statement
Functions by fparser2.

release 1.7.0 20th December 2018

1) #172 and PR #173 Add support for logical declaration, the save
Expand Down
12 changes: 2 additions & 10 deletions doc/developer_guide/developers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1893,7 +1893,8 @@ OpenCL
======

PSyclone is able to generate an OpenCL :cite:`opencl` version of
PSy-layer code for the GOcean 1.0 API. Such code may then be executed
PSy-layer code for the GOcean 1.0 API and its associated kernels.
Such code may then be executed
on devices such as GPUs and FPGAs (Field-Programmable Gate
Arrays). Since OpenCL code is very different to that which PSyclone
normally generates, its creation is handled by ``gen_ocl`` methods
Expand Down Expand Up @@ -1995,15 +1996,6 @@ of this setup is done, the kernel itself is launched by calling
Limitations
-----------

Currently PSyclone can only generate the OpenCL version of the PSy
layer. Execution of the resulting code requires that the kernels
themselves be converted from Fortran to OpenCL (a dialect of C) and at
present this must be done manually. Since all data accessed by an
OpenCL kernel must be passed as an argument, this conversion must also
convert any accesses to module data into routine arguments.
Work is in progress to support kernel transformation and this will be
made available in a future PSyclone release.

In OpenCL, all tasks to be performed (whether copying data or kernel
execution) are associated with a command queue. Tasks submitted to
different command queues may then be executed concurrently,
Expand Down
5 changes: 2 additions & 3 deletions doc/user_guide/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,8 @@ installed.
Example 3: OpenCL
^^^^^^^^^^^^^^^^^

Example of the use of PSyclone to generate an OpenCL version of the
PSy layer. The kernels are not yet transformed automatically (Issue
#249).
Example of the use of PSyclone to generate an OpenCL driver version of
the PSy layer and OpenCL kernels.

Example 4: Kernels containing use statements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
41 changes: 23 additions & 18 deletions doc/user_guide/transformations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,8 @@ provided to show the available transformations

.. _sec_transformations_available:

Available
---------
Available transformations
-------------------------

Most transformations are generic as the schedule structure is
independent of the API, however it often makes sense to specialise
Expand Down Expand Up @@ -148,9 +148,6 @@ can be found in the API-specific sections).
:members: apply
:noindex:

.. note:: OpenCL support is still under development. See
:ref:`opencl_dev` for more details.

####

.. autoclass:: psyclone.transformations.OMPLoopTrans
Expand Down Expand Up @@ -518,22 +515,30 @@ transformation.
OpenCL
------

In common with OpenMP, the conversion of the generated code to use
OpenCL is performed by a transformation (``OCLTrans`` - see the
:ref:`sec_transformations_available` Section above). Currently this
transformation is only supported for the GOcean1.0 API and is applied
to the whole InvokeSchedule of an Invoke. This means that all kernels in
that Invoke will be executed on the OpenCL device. At present the
``OCLTrans`` transformation only alters the generated PSy-layer code. It
is currently the user's responsibility to convert the actual kernel code
from Fortran into OpenCL. Work is underway to extend PSyclone in
order to perform this translation automatically.

The OpenCL code generated by PSyclone is still Fortran and makes use
OpenCL is added to a code by using the ``OCLTrans`` transformation (see the
:ref:`sec_transformations_available` Section above).
Currently this transformation is only supported for the GOcean1.0 API and
is applied to the whole InvokeSchedule of an Invoke.
This transformation will add an OpenCL driver infrastructure to the PSy layer
and generate an OpenCL kernel for each of the Invoke kernels.
This means that all kernels in that Invoke will be executed on the OpenCL
device.
The PSy-layer OpenCL code generated by PSyclone is still Fortran and makes use
of the FortCL library (https://github.com/stfc/FortCL) to access
OpenCL functionality. It also relies upon the OpenCL support provided
OpenCL functionality. It also relies upon the OpenCL support provided
by the dl_esm_inf library (https://github.com/stfc/dl_esm_inf).

At the moment we don't apply additional transformations to OpenCL kernels,
this means that all references to the same kernel will have an indentical
OpenCL generated output (with identical names). Nevertheless, we can use
the `--kernel-renaming` psyclone argument to just generate a single output
file (with the `single` option) or multiple index postfixed (identical)
versions of the kernel (with the `multi` option, which is the default one).
Because OpenCL kernels are linked at run-time, it will be up to the run-time
environment to specify which of the kernels to use. For instace, one could
merge multiple kenrels together in a single binary file and
use the `PSYCLONE_KERNELS_FILE` provided by the FortCL library.

The introduction of OpenCL code generation in PSyclone has been
largely motivated by the need to target Field Programmable Gate Array
(FPGA) accelerator devices. It is not currently designed to target the other
Expand Down
4 changes: 1 addition & 3 deletions examples/gocean/README
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,7 @@ Example 3
---------

Illustrates the use of PSyclone to generate an OpenCL driver layer for
a four-kernel invoke. Currently the kernels themselves must be converted
from Fortran to OpenCL manually but work is in progress to automate this
(Issue #249).
a four-kernel invoke and an OpenCL version of each of the kernels.

Example 4
---------
Expand Down
22 changes: 16 additions & 6 deletions examples/gocean/eg3/README
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,10 @@
# ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
#------------------------------------------------------------------------------
# Author A. R. Porter, STFC Daresbury Lab
# Author A. R. Porter and S. Siso, STFC Daresbury Lab

The directory containing this file contains an example of the use of
PSyclone to generate OpenCL driver code with the GOcean 1.0 API.
PSyclone to generate OpenCL code with the GOcean 1.0 API.

In order to use PSyclone you must first install it, ideally with pip.
See ../../../README.md for more details.
Expand All @@ -52,8 +52,18 @@ provided with a transformation script::
psyclone -api "gocean1.0" -s ./ocl_trans.py alg.f90

where ocl_trans.py simply applies the psyclone.transformations.OCLTrans
transformation to the Schedule of the Invoke.
transformation to the Schedule of the Invoke. This will generate the OpenCL
driver layer to stdout and a 'kernel_name'.cl file for each of the kernels
referenced in alg.f90 traslated to OpenCL.

Currently the (Fortran) kernels called by the Invoke must be manually
translated into OpenCL. This step will be automated in a future
release of PSyclone.
Each OpenCL kernel needs to be compiled before buidling the driver layer.
For example, the steps to generate the code using the Intel OpenCL SDK
(https://software.intel.com/en-us/opencl-sdk) are::

psyclone -oalg psyalg.f90 -opsy psylayer.f90 -api "gocean1.0" \
-s ./ocl_trans.py alg.f90

# Pre-build OpenCL kernels
ioc64 -cmd=build -device=cpu -input=kernels.cl -spirv64=kernels.spirv \
-bo="-cl-std=CL1.2"
export PSYCLONE_KERNELS_FILE=kernels.spirv
12 changes: 6 additions & 6 deletions examples/gocean/eg3/alg.f90
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,9 @@ program simple

integer :: ncycle

model_grid = grid_type(ARAKAWA_C, &
(/BC_PERIODIC,BC_PERIODIC,BC_NONE/), &
OFFSET_SW)
model_grid = grid_type(GO_ARAKAWA_C, &
(/GO_BC_PERIODIC,GO_BC_PERIODIC,GO_BC_NONE/), &
GO_OFFSET_SW)

! Create fields on this grid
p_fld = r2d_field(model_grid, T_POINTS)
Expand All @@ -66,13 +66,13 @@ program simple
z_fld = r2d_field(model_grid, F_POINTS)
h_fld = r2d_field(model_grid, T_POINTS)

do ncycle=1,itmax
write(*,*) "Simulation start"
do ncycle=1, 100
call invoke( compute_cu(CU_fld, p_fld, u_fld), &
compute_cv(CV_fld, p_fld, v_fld), &
compute_z(z_fld, p_fld, u_fld, v_fld), &
compute_h(h_fld, p_fld, u_fld, v_fld) )

end do
write(*,*) "Simulation end"

end program simple
7 changes: 3 additions & 4 deletions examples/gocean/eg3/compute_cu_mod.f90
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,10 @@ module compute_cu_mod

private

public invoke_compute_cu
public compute_cu, compute_cu_code

type, extends(kernel_type) :: compute_cu
type(arg), dimension(3) :: meta_args = &
type(go_arg), dimension(3) :: meta_args = &
(/ go_arg(GO_WRITE, GO_CU, GO_POINTWISE), & ! cu
go_arg(GO_READ, GO_CT, GO_POINTWISE), & ! p
go_arg(GO_READ, GO_CU, GO_POINTWISE) & ! u
Expand Down Expand Up @@ -76,8 +75,8 @@ module compute_cu_mod
subroutine compute_cu_code(i, j, cu, p, u)
implicit none
integer, intent(in) :: I, J
real(wp), intent(out), dimension(:,:) :: cu
real(wp), intent(in), dimension(:,:) :: p, u
real(go_wp), intent(out), dimension(:,:) :: cu
real(go_wp), intent(in), dimension(:,:) :: p, u


CU(I,J) = 0.5d0*(P(i,J)+P(I-1,J))*U(I,J)
Expand Down
5 changes: 2 additions & 3 deletions examples/gocean/eg3/compute_h_mod.f90
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ module compute_h_mod

private

public invoke_compute_h
public compute_h, compute_h_code

type, extends(kernel_type) :: compute_h
Expand Down Expand Up @@ -75,8 +74,8 @@ module compute_h_mod
SUBROUTINE compute_h_code(i, j, h, p, u, v)
IMPLICIT none
integer, intent(in) :: I, J
REAL(wp), INTENT(out), DIMENSION(:,:) :: h
REAL(wp), INTENT(in), DIMENSION(:,:) :: p, u, v
REAL(go_wp), INTENT(out), DIMENSION(:,:) :: h
REAL(go_wp), INTENT(in), DIMENSION(:,:) :: p, u, v

H(I,J) = P(I,J)+.25d0*(U(I+1,J)*U(I+1,J)+U(I,J)*U(I,J) + &
V(I,J+1)*V(I,J+1)+V(I,J)*V(I,J))
Expand Down
1 change: 0 additions & 1 deletion examples/gocean/eg3/compute_z_mod.f90
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ module compute_z_mod

private

public invoke_compute_z
public compute_z, compute_z_code

type, extends(kernel_type) :: compute_z
Expand Down
Binary file modified psyclone.pdf
Binary file not shown.
4 changes: 4 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,7 @@
[pycodestyle]
ignore = E266,E121,E123,E126,E133,E226,E241,E242,E704,W503,W504,W505

# Ensure that any XPASS ("unexpectedly passing") results are reported
# as failures in the test suite.
[tool:pytest]
xfail_strict=true
Loading

0 comments on commit c4dd939

Please sign in to comment.