Enable Partial Loop Unroll for all Backends #589

jjfumero · 2024-11-07T10:17:55Z

Description

This patch enhances the JIT compiler with a phase for partial loop unroll and full loop unroll suggestions for SPIR-V.

In a nutshell:

It introduces a full loop unroll suggestion for the SPIR-V compiler.
It introduces partial loop unroller for the PTX compiler.
It fixes the partial loop unroll configuration for OpenCL
It enables partial loop unroll for OpenCL

Note that the partial loop unroll flag is off by default, since more experimentation is needed. If ON, there are some tests failing (~14) due to some failures in the Graal JIT compiler. However, for some compute applications this optimisations must be beneficial, and, in the case of the SPIR-V, it works in combination of SPIR-V unroll suggestions as well as explicit loop unroll.

Problem description

n/ a.

Backend/s tested

Mark the backends affected by this PR.

OpenCL
PTX
SPIRV

OS tested

Mark the OS where this PR is tested.

Linux
OSx
Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

Yes
No

How to test the new patch?

## In the case of SPIR-V
make BACKEND=spirv

tornado-test --printKernel --jvm="-Dgraph.mv.device=0:0" -V uk.ac.manchester.tornado.unittests.compute.ComputeTests#matrixVector

A section of the generated code is as follows:

   %B3_kernel0 = OpLabel 
   %102 = OpPhi %float {%55 %B2_kernel0} {%101 %B4_kernel0} 
   %104 = OpPhi %uint {%56 %B2_kernel0} {%103 %B4_kernel0} 
   %105 = OpSLessThan %bool %104 %51
   OpLoopMerge %B5_kernel0 %B4_kernel0 Unroll          << Loop Unroll suggestion enabled
   OpBranchConditional %105 %B4_kernel0 %B5_kernel0

Enabling explicit loop unroll:

tornado-test --printKernel --jvm="-Dgraph.mv.device=0:0 -Dtornado.experimental.partial.unroll=True" -V uk.ac.manchester.tornado.unittests.compute.ComputeTests#matrixVector

Run the previous command for each of the backends.

…ggestion

Signed-off-by: Juan Fumero <jjfumero@gmail.com>

stratika

In my system, with the flag enabled I see the following reports. Just for reference:

OpenCL:

==================================================
FAILED TESTS
==================================================
uk.ac.manchester.tornado.unittests.foundation.TestIf#test06 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.kernelcontext.matrices.TestMatrixMultiplicationKernelContext#mxm2DKernelContext01 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.kernelcontext.matrices.TestMatrixMultiplicationKernelContext#mxm2DKernelContext02 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.arrays.TestNewArrays#testInitNewArrayInsideParallelWithComplexAccesses - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.arrays.TestNewArrays#testInitNewArrayInsideParallel - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.codegen.CodeGenTest#test02 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testMandelbrot - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testEuler - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testJuliaSets - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testMultipleTasksMultipleCallees - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test01 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test02 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test03 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test04 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test05 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testVector01 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testSingleTask - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testMultipleTasks - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testNoDoubleCompilation - [WHITELISTED]: NO

==================================================

PTX:

==================================================
FAILED TESTS
==================================================
uk.ac.manchester.tornado.unittests.kernelcontext.matrices.TestMatrixMultiplicationKernelContext#mxm2DKernelContext01 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.kernelcontext.matrices.TestMatrixMultiplicationKernelContext#mxm2DKernelContext02 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.arrays.TestNewArrays#testInitNewArrayInsideParallel - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.arrays.TestNewArrays#testInitNewArrayInsideParallelWithComplexAccesses - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testMandelbrot - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testEuler - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testJuliaSets - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test01 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test02 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test03 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test04 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test05 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testVector01 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testMultipleTasksMultipleCallees - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testSingleTask - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testMultipleTasks - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testNoDoubleCompilation - [WHITELISTED]: NO

==================================================

SPIR-V:

==================================================
FAILED TESTS
==================================================
uk.ac.manchester.tornado.unittests.kernelcontext.matrices.TestMatrixMultiplicationKernelContext#mxm2DKernelContext01 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.kernelcontext.matrices.TestMatrixMultiplicationKernelContext#mxm2DKernelContext02 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.arrays.TestNewArrays#testInitNewArrayInsideParallelWithComplexAccesses - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.arrays.TestNewArrays#testInitNewArrayInsideParallel - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.codegen.CodeGenTest#test02 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testMandelbrot - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testEuler - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testJuliaSets - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test01 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test02 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test03 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test04 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test05 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testVector01 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testMultipleTasksMultipleCallees - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testSingleTask - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testMultipleTasks - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testNoDoubleCompilation - [WHITELISTED]: NO

==================================================

stratika

LGTM

jjfumero · 2024-11-12T05:57:56Z

In my system, with the flag enabled I see the following reports. Just for reference:

Yes, there are some tests failing when this flag is ON, It is expected. We will refine this in future iterations.

mikepapadim

LGTM

mairooni

LGTM

Improvements ============= - beehive-lab#573: Enhanced output of unit-tests with a summary of pass-rates and fail-rates. - beehive-lab#576: Extended support for 3D matrices. - beehive-lab#580: Extended debug information for execution plans. - beehive-lab#584: Added helper menu for the ``tornado`` launcher script when no arguments are passed. - beehive-lab#589: Enable partial loop unrolling for all backends. - beehive-lab#594: Added RISC-V 64 CPU port support to run OpenCL with vector instructions RVV 1.0 (using the Codeplay OCK Toolkit). - beehive-lab#598: OpenCL low-level buffers tagged as read, write and read/write based on the data dependency analysis. - beehive-lab#601: Feature to select an immutable task graph to execute from a multi-task graph execution plan. Compatibility ============= - beehive-lab#570: Extended timeout for all suite of unit-tests. - beehive-lab#579: Removed legacy JDK 8 and JDK11 build options from the TornadoVM installer. - beehive-lab#582: Restored tornado runner scripts for IntellIJ. - beehive-lab#583: Automatic generation of IDE IntelliJ configuration runner files from the TornadoVM command. - beehive-lab#597: Updated white-list of unit-test and checkstyle improved. Bug Fixes ============= - beehive-lab#571: Fix issues with bracket closing for if/loops conditions. - beehive-lab#572: Fix for printing default execution plans (execution plans with default parameters). - beehive-lab#575: Fix the Level Zero version used for building the SPIR-V backend. - beehive-lab#577: Fix checkstyle. - beehive-lab#587: Fix thread scheduler for new NVIDIA Drivers. - beehive-lab#592: Fix ``Float.POSITIVE_INFINITY`` and ``Float.NEGATIVE_INFINITIVE`` constants for the OpenCL, CUDA and SPIR-V backends. - beehive-lab#596: Fix extra closing bracket during the code-generation for the FPGAs. - Remove the intermediate CUDA pinned memory regions in the JNI code: [link](beehive-lab@9c3f8ce) - Fix bitwise negation operations for the PTX backend: [link](beehive-lab@0db1cd3) - `GetBackendImpl::getAllDevices` thread-safe: [link](beehive-lab@0d44252) - Check size elements for memory segments: [link](beehive-lab@4360385)

jjfumero added 8 commits November 6, 2024 12:17

[spirv] Initial support for PartialLoopUnroll

3b792cf

[spirv] Debug message for new unroll phase removed

00bfc30

[spirv] replace PartialLoopUnroll (SPIRV 1.4 required) with Unroll su…

84758e5

…ggestion

[OCL][SPIRV] Partial Loop Unroll set to true

cf3652f

[PTX] Enable partial loop unroll from OpenCL and SPIR-V

6b7356a

Common PartialLoopUnroll fixed

c0676a5

Signed-off-by: Juan Fumero <jjfumero@gmail.com>

Set default loop unrolling factor to 4

39def8b

[spirv] Format compiler-debug trace for SPIRV unroll

cd51fd2

jjfumero added compiler OpenCL PTX spirv labels Nov 7, 2024

jjfumero requested review from mikepapadim, mairooni and stratika November 7, 2024 10:17

jjfumero self-assigned this Nov 7, 2024

stratika reviewed Nov 11, 2024

View reviewed changes

stratika approved these changes Nov 11, 2024

View reviewed changes

mikepapadim approved these changes Nov 12, 2024

View reviewed changes

mairooni approved these changes Nov 12, 2024

View reviewed changes

jjfumero merged commit 73d1dfb into beehive-lab:develop Nov 12, 2024
2 checks passed

jjfumero deleted the feat/spirv/unroll branch November 12, 2024 10:24

jjfumero mentioned this pull request Dec 20, 2024

[release] TornadoVM 1.0.9 #602

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Partial Loop Unroll for all Backends #589

Enable Partial Loop Unroll for all Backends #589

jjfumero commented Nov 7, 2024

stratika left a comment

stratika left a comment

jjfumero commented Nov 12, 2024

mikepapadim left a comment

mairooni left a comment

Enable Partial Loop Unroll for all Backends #589

Enable Partial Loop Unroll for all Backends #589

Conversation

jjfumero commented Nov 7, 2024

Description

Problem description

Backend/s tested

OS tested

Did you check on FPGAs?

How to test the new patch?

stratika left a comment

Choose a reason for hiding this comment

stratika left a comment

Choose a reason for hiding this comment

jjfumero commented Nov 12, 2024

mikepapadim left a comment

Choose a reason for hiding this comment

mairooni left a comment

Choose a reason for hiding this comment