Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Partial Loop Unroll for all Backends #589

Merged
merged 8 commits into from
Nov 12, 2024

Conversation

jjfumero
Copy link
Member

@jjfumero jjfumero commented Nov 7, 2024

Description

This patch enhances the JIT compiler with a phase for partial loop unroll and full loop unroll suggestions for SPIR-V.

In a nutshell:

  1. It introduces a full loop unroll suggestion for the SPIR-V compiler.
  2. It introduces partial loop unroller for the PTX compiler.
  3. It fixes the partial loop unroll configuration for OpenCL
  4. It enables partial loop unroll for OpenCL

Note that the partial loop unroll flag is off by default, since more experimentation is needed. If ON, there are some tests failing (~14) due to some failures in the Graal JIT compiler. However, for some compute applications this optimisations must be beneficial, and, in the case of the SPIR-V, it works in combination of SPIR-V unroll suggestions as well as explicit loop unroll.

Problem description

n/ a.

Backend/s tested

Mark the backends affected by this PR.

  • OpenCL
  • PTX
  • SPIRV

OS tested

Mark the OS where this PR is tested.

  • Linux
  • OSx
  • Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

  • Yes
  • No

How to test the new patch?

## In the case of SPIR-V
make BACKEND=spirv

tornado-test --printKernel --jvm="-Dgraph.mv.device=0:0" -V uk.ac.manchester.tornado.unittests.compute.ComputeTests#matrixVector

A section of the generated code is as follows:

   %B3_kernel0 = OpLabel 
   %102 = OpPhi %float {%55 %B2_kernel0} {%101 %B4_kernel0} 
   %104 = OpPhi %uint {%56 %B2_kernel0} {%103 %B4_kernel0} 
   %105 = OpSLessThan %bool %104 %51
   OpLoopMerge %B5_kernel0 %B4_kernel0 Unroll          << Loop Unroll suggestion enabled
   OpBranchConditional %105 %B4_kernel0 %B5_kernel0 

Enabling explicit loop unroll:

tornado-test --printKernel --jvm="-Dgraph.mv.device=0:0 -Dtornado.experimental.partial.unroll=True" -V uk.ac.manchester.tornado.unittests.compute.ComputeTests#matrixVector

Run the previous command for each of the backends.

Copy link
Collaborator

@stratika stratika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my system, with the flag enabled I see the following reports. Just for reference:

  • OpenCL:
==================================================
FAILED TESTS
==================================================
uk.ac.manchester.tornado.unittests.foundation.TestIf#test06 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.kernelcontext.matrices.TestMatrixMultiplicationKernelContext#mxm2DKernelContext01 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.kernelcontext.matrices.TestMatrixMultiplicationKernelContext#mxm2DKernelContext02 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.arrays.TestNewArrays#testInitNewArrayInsideParallelWithComplexAccesses - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.arrays.TestNewArrays#testInitNewArrayInsideParallel - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.codegen.CodeGenTest#test02 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testMandelbrot - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testEuler - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testJuliaSets - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testMultipleTasksMultipleCallees - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test01 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test02 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test03 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test04 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test05 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testVector01 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testSingleTask - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testMultipleTasks - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testNoDoubleCompilation - [WHITELISTED]: NO

==================================================
  • PTX:
==================================================
FAILED TESTS
==================================================
uk.ac.manchester.tornado.unittests.kernelcontext.matrices.TestMatrixMultiplicationKernelContext#mxm2DKernelContext01 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.kernelcontext.matrices.TestMatrixMultiplicationKernelContext#mxm2DKernelContext02 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.arrays.TestNewArrays#testInitNewArrayInsideParallel - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.arrays.TestNewArrays#testInitNewArrayInsideParallelWithComplexAccesses - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testMandelbrot - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testEuler - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testJuliaSets - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test01 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test02 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test03 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test04 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test05 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testVector01 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testMultipleTasksMultipleCallees - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testSingleTask - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testMultipleTasks - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testNoDoubleCompilation - [WHITELISTED]: NO

==================================================
  • SPIR-V:
==================================================
FAILED TESTS
==================================================
uk.ac.manchester.tornado.unittests.kernelcontext.matrices.TestMatrixMultiplicationKernelContext#mxm2DKernelContext01 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.kernelcontext.matrices.TestMatrixMultiplicationKernelContext#mxm2DKernelContext02 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.arrays.TestNewArrays#testInitNewArrayInsideParallelWithComplexAccesses - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.arrays.TestNewArrays#testInitNewArrayInsideParallel - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.codegen.CodeGenTest#test02 - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testMandelbrot - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testEuler - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.compute.ComputeTests#testJuliaSets - [WHITELISTED]: YES
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test01 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test02 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test03 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test04 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#test05 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testVector01 - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testMultipleTasksMultipleCallees - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testSingleTask - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testMultipleTasks - [WHITELISTED]: NO
uk.ac.manchester.tornado.unittests.tasks.TestMultipleFunctions#testNoDoubleCompilation - [WHITELISTED]: NO

==================================================

Copy link
Collaborator

@stratika stratika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jjfumero
Copy link
Member Author

In my system, with the flag enabled I see the following reports. Just for reference:

Yes, there are some tests failing when this flag is ON, It is expected. We will refine this in future iterations.

Copy link
Member

@mikepapadim mikepapadim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@mairooni mairooni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jjfumero jjfumero merged commit 73d1dfb into beehive-lab:develop Nov 12, 2024
2 checks passed
@jjfumero jjfumero deleted the feat/spirv/unroll branch November 12, 2024 10:24
jjfumero added a commit to jjfumero/TornadoVM that referenced this pull request Dec 20, 2024
Improvements
=============

- beehive-lab#573: Enhanced output of unit-tests with a summary  of pass-rates and fail-rates.
- beehive-lab#576: Extended support for 3D matrices.
- beehive-lab#580: Extended debug information for execution plans.
- beehive-lab#584: Added helper menu for the ``tornado`` launcher script when no arguments are passed.
- beehive-lab#589: Enable partial loop unrolling for all backends.
- beehive-lab#594: Added RISC-V 64 CPU port support to run OpenCL with vector instructions RVV 1.0 (using the Codeplay OCK Toolkit).
- beehive-lab#598: OpenCL low-level buffers tagged as read, write and read/write based on the data dependency analysis.
- beehive-lab#601: Feature to select an immutable task graph to execute from a multi-task graph execution plan.

Compatibility
=============

- beehive-lab#570:  Extended timeout for all suite of unit-tests.
- beehive-lab#579: Removed legacy JDK 8 and JDK11 build options from the TornadoVM installer.
- beehive-lab#582: Restored tornado runner scripts for IntellIJ.
- beehive-lab#583: Automatic generation of IDE IntelliJ configuration runner files from the TornadoVM command.
- beehive-lab#597: Updated white-list of unit-test and checkstyle improved.

Bug Fixes
=============

- beehive-lab#571: Fix issues with bracket closing for if/loops conditions.
- beehive-lab#572: Fix for printing default execution plans (execution plans with default parameters).
- beehive-lab#575: Fix the Level Zero version used for building the SPIR-V backend.
- beehive-lab#577: Fix checkstyle.
- beehive-lab#587: Fix thread scheduler for new NVIDIA Drivers.
- beehive-lab#592: Fix ``Float.POSITIVE_INFINITY`` and ``Float.NEGATIVE_INFINITIVE`` constants for the OpenCL, CUDA and SPIR-V backends.
- beehive-lab#596: Fix extra closing bracket during the code-generation for the FPGAs.
- Remove the intermediate CUDA pinned memory regions in the JNI code: [link](beehive-lab@9c3f8ce)
- Fix bitwise negation operations for the PTX backend:  [link](beehive-lab@0db1cd3)
- `GetBackendImpl::getAllDevices` thread-safe: [link](beehive-lab@0d44252)
- Check size elements for memory segments: [link](beehive-lab@4360385)
@jjfumero jjfumero mentioned this pull request Dec 20, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

4 participants