Fix batch computation when the global index is written in the output buffer #400

mairooni · 2024-04-30T15:53:40Z

Description

This PR provides a fix for a corner case in batch processing, which was showcased by the uk.ac.manchester.tornado.unittests.batches.TestBatches.testBatchNotEven unittest. The issue occurs when the loop index is directly written in the output buffer, e.g.:

for (@Parallel int i = 0; i < data.getSize(); i++) {
      data.set(i, i);
}

Since, in batch processing, the loop bound of each kernel is equal to the batch size, the i written in the output buffer is correct only for the first batch. To solve this, the value of i is offseted based on the number of the batch.
For instance, the generated code for the example shown above is changed as follows:
data.set(i, i) -> data.set(i, i + batchNumber * BatchSize)

Backend/s tested

Mark the backends affected by this PR.

OpenCL
PTX
SPIRV

OS tested

Mark the OS where this PR is tested.

Linux
OSx
Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

Yes
No

How to test the new patch?

Run tornado-test -V --fast uk.ac.manchester.tornado.unittests.batches.TestBatches

…g when the global index is used in the computation

…ssing

mikepapadim · 2024-05-01T08:57:09Z

...time/src/main/java/uk/ac/manchester/tornado/runtime/graal/phases/TornadoHighTierContext.java


    public TornadoHighTierContext(Providers providers, PhaseSuite<HighTierContext> graphBuilderSuite, OptimisticOptimizations optimisticOpts, ResolvedJavaMethod method, Object[] args,
-            TaskMetaData meta, boolean isKernel, long batchThreads) {
+            TaskMetaData meta, boolean isKernel, long batchThreads, int batchNumber, long batchSize) {


Can we have a BatchConfig class or maybe a record to pass around long batchThreads, int batchNumber, long batchSize?

jjfumero

I agree with @mikepapadim . Let's create a class called BatchCompilationConfig and store all the related fields here. Then, what we pass around in is an object of type BatchCompilationConfig.

jjfumero · 2024-05-02T06:53:08Z

tornado-runtime/src/main/java/uk/ac/manchester/tornado/runtime/sketcher/Sketch.java

+        return indexInWrite;
+    }
+
+    public void indexInWrite() {


Suggested change

public void indexInWrite() {

public void getIndexInWrite() {

If you do this change, you need to refactor all dependencies.

jjfumero · 2024-05-02T06:55:26Z

tornado-runtime/src/main/java/uk/ac/manchester/tornado/runtime/sketcher/TornadoSketcher.java

@@ -180,7 +180,11 @@ private static Sketch buildSketch(ResolvedJavaMethod resolvedMethod, Providers p
                mergeAccesses(methodAccesses, invoke.callTarget(), sketch.getArgumentsAccess());
            });

-            return new Sketch(graph.copy(TornadoCoreRuntime.getDebugContext()), methodAccesses);
+            Sketch sketch = new Sketch(graph.copy(TornadoCoreRuntime.getDebugContext()), methodAccesses);


Pass the indexInWrite to the Sketch constructor. Also, I think the name should be changed. This is more like batchWriteThreadIndex and the method for get and set should have get and set as prefix.

jjfumero · 2024-05-02T06:56:31Z

tornado-runtime/src/main/java/uk/ac/manchester/tornado/runtime/tasks/CompilableTask.java

+    }
+
+    @Override
+    public void setBatchSize(long batchSize) {


What is the difference between batchSIze and batchThread

void setBatchThreads(long batchThreads); void setBatchSize(long batchSize);

The value of batchThreads changes when the chunks are not even, the batchSize is always the initial batch size. For example, if we have 3 batches, 60000, 60000 and 20000, the batchSize will be 60000 throughout the computation, even though the batch threads will change from 60000 to 20000 when computing the final chunk.

jjfumero · 2024-05-02T06:58:42Z

All batch tests passing for all supported backends (OpenCL, SPIR-V and PTX).

jjfumero · 2024-05-02T06:59:54Z

What happens when you have an expression that uses the thread-ID to store a value?

For example:

for (@Parallel int i = 0; i < data.getSize(); i++) {
      data.set(i, i * 20 + beta);
}

mairooni · 2024-05-09T10:07:47Z

What happens when you have an expression that uses the thread-ID to store a value?

For example:
for (@Parallel int i = 0; i < data.getSize(); i++) {
      data.set(i, i * 20 + beta);
}

This case also works, I just included a unittest to check. To test it, run
tornado-test -V --fast uk.ac.manchester.tornado.unittests.batches.TestBatches#testBatchThreadIndex

mairooni · 2024-05-09T11:55:59Z

All comments have been applied. This PR is ready for another review.

...do-runtime/src/main/java/uk/ac/manchester/tornado/runtime/common/BatchCompilationConfig.java

jjfumero

Missing license header

mairooni · 2024-05-10T08:55:30Z

Missing license header

Done

stratika · 2024-05-10T12:23:47Z

...ava/uk/ac/manchester/tornado/runtime/graal/phases/sketcher/TornadoBatchFunctionAnalysis.java

@@ -0,0 +1,74 @@
+/*
+ * Copyright (c) 2024 APT Group, Department of Computer Science,


Suggested change

* Copyright (c) 2024 APT Group, Department of Computer Science,

* Copyright (c) 2024, APT Group, Department of Computer Science,

stratika · 2024-05-10T12:24:07Z

...java/uk/ac/manchester/tornado/drivers/opencl/graal/phases/TornadoBatchGlobalIndexOffset.java

@@ -0,0 +1,107 @@
+/*
+ * Copyright (c) 2024 APT Group, Department of Computer Science,


Suggested change

* Copyright (c) 2024 APT Group, Department of Computer Science,

* Copyright (c) 2024, APT Group, Department of Computer Science,

We forgot the comma after the year. Please append all files that have new headers.

stratika · 2024-05-10T12:28:05Z

...opencl/src/main/java/uk/ac/manchester/tornado/drivers/opencl/graal/compiler/OCLHighTier.java

        appendPhase(new TornadoTaskSpecialisation(canonicalizer));
+        appendPhase(new TornadoBatchGlobalIndexOffset());


You mentioned that this phase is batch-specific. However, it is invoked also when batch processing is not used. Right?

...runtime/src/main/java/uk/ac/manchester/tornado/runtime/graal/compiler/TornadoSketchTier.java

jjfumero · 2024-05-13T07:52:28Z

@stratika , anything pending from your side? Can we merge this PR?

stratika · 2024-05-13T08:04:56Z

From my side, the suggestion for the header parts of the new files and some comments that have not been answered.

mairooni · 2024-05-13T08:22:15Z

I added the commas in the headers.

Improvements ~~~~~~~~~~~~~~~~~~ - beehive-lab#402 <beehive-lab#402>: Support for TornadoNativeArrays from FFI buffers. - beehive-lab#403 <beehive-lab#403>: Clean-up and refactoring for the code analysis of the loop-interchange. - beehive-lab#405 <beehive-lab#405>: Disable Loop-Interchange for CPU offloading.. - beehive-lab#407 <beehive-lab#407>: Debugging OpenCL Kernels builds improved. - beehive-lab#410 <beehive-lab#410>: CPU block scheduler disabled by default and option to switch between different thread-schedulers added. - beehive-lab#418 <beehive-lab#418>: TornadoOptions and TornadoLogger improved. - beehive-lab#423 <beehive-lab#423>: MxM using ns instead of ms to report performance. - beehive-lab#425 <beehive-lab#425>: Vector types for ``Float<Width>`` and ``Int<Width>`` supported. - beehive-lab#429 <beehive-lab#429>: Documentation of the installation process updated and improved. - beehive-lab#432 <beehive-lab#432>: Support for SPIR-V code generation and dispatcher using the TornadoVM OpenCL runtime. Compatibility ~~~~~~~~~~~~~~~~~~ - beehive-lab#409 <beehive-lab#409>: Guidelines to build the documentation. - beehive-lab#411 <beehive-lab#411>: Windows installer improved. - beehive-lab#412 <beehive-lab#412>: Python installer improved to check download all Python dependencies before the main installer. - beehive-lab#413 <beehive-lab#413>: Improved documentation for installing all configurations of backends and OS. - beehive-lab#424 <beehive-lab#424>: Use Generic GPU Scheduler for some older NVIDIA Drivers for the OpenCL runtime. - beehive-lab#430 <beehive-lab#430>: Improved the installer by checking that the TornadoVM environment is loaded upfront. Bug Fixes ~~~~~~~~~~~~~~~~~~ - beehive-lab#400 <beehive-lab#400>: Fix batch computation when the global thread indexes are used to compute the outputs. - beehive-lab#414 <beehive-lab#414>: Recover Test-Field unit-tests using Panama types. - beehive-lab#415 <beehive-lab#415>: Check style errors fixed. - beehive-lab#416 <beehive-lab#416>: FPGA execution with multiple tasks in a task-graph fixed. - beehive-lab#417 <beehive-lab#417>: Lazy-copy out fixed for Java fields. - beehive-lab#420 <beehive-lab#420>: Fix Mandelbrot example. - beehive-lab#421 <beehive-lab#421>: OpenCL 2D thread-scheduler fixed for NVIDIA GPUs. - beehive-lab#422 <beehive-lab#422>: Compilation for NVIDIA Jetson Nano fixed. - beehive-lab#426 <beehive-lab#426>: Fix Logger for all backends. - beehive-lab#428 <beehive-lab#428>: Math cos/sin operations supported for vector types. - beehive-lab#431 <beehive-lab#431>: Jenkins files fixed.

mairooni added 5 commits April 17, 2024 13:05

Create new compiler phase to calculate the offsets in batch processin…

af96742

…g when the global index is used in the computation

Merge branch 'develop' into fix/batch_index

7ee9899

Fix for when the global index is written in the output in batch proce…

e690b2c

…ssing

Merge branch 'develop' into fix/batch_index

c350664

Fix merging error

d1f53d0

mairooni added compiler OpenCL PTX runtime spirv fix Provides a fix labels Apr 30, 2024

mairooni requested review from jjfumero and mikepapadim April 30, 2024 15:53

mairooni self-assigned this Apr 30, 2024

mikepapadim reviewed May 1, 2024

View reviewed changes

jjfumero requested changes May 2, 2024

View reviewed changes

Include unittest that performs computation with the thread index

0aa0575

mairooni added 2 commits May 9, 2024 14:33

Add class that stores batch information related to compilation

f3616e2

Refactor variable names

d1b61b2

jjfumero reviewed May 9, 2024

View reviewed changes

...do-runtime/src/main/java/uk/ac/manchester/tornado/runtime/common/BatchCompilationConfig.java Show resolved Hide resolved

jjfumero requested changes May 10, 2024

View reviewed changes

mairooni added 2 commits May 10, 2024 11:09

Add missing license header

e276cf6

Merge branch 'develop' into fix/batch_index

18b651b

jjfumero requested a review from stratika May 10, 2024 08:24

jjfumero approved these changes May 10, 2024

View reviewed changes

mairooni added 4 commits May 10, 2024 12:55

Update outdated function in the parameter list of the sketcher

4aecfad

Merge branch 'develop' into fix/batch_index

cbcfd3b

Remove duplicate code

f518201

Merge branch 'develop' into fix/batch_index

36ffc0f

stratika reviewed May 10, 2024

View reviewed changes

mikepapadim approved these changes May 12, 2024

View reviewed changes

mairooni added 2 commits May 13, 2024 11:15

Add comma in license headers

a706558

Merge branch 'develop' into fix/batch_index

69bf8ba

stratika approved these changes May 13, 2024

View reviewed changes

jjfumero merged commit 772905a into beehive-lab:develop May 13, 2024
1 check passed

mairooni deleted the fix/batch_index branch May 13, 2024 08:35

jjfumero mentioned this pull request May 28, 2024

[release] TornadoVM 1.0.5 #433

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix batch computation when the global index is written in the output buffer #400

Fix batch computation when the global index is written in the output buffer #400

mairooni commented Apr 30, 2024

mikepapadim May 1, 2024

jjfumero left a comment

jjfumero May 2, 2024

jjfumero May 2, 2024

jjfumero May 2, 2024

jjfumero May 2, 2024

mairooni May 9, 2024

jjfumero commented May 2, 2024

jjfumero commented May 2, 2024

mairooni commented May 9, 2024

mairooni commented May 9, 2024

jjfumero left a comment

mairooni commented May 10, 2024

stratika May 10, 2024

stratika May 10, 2024

stratika May 10, 2024

stratika May 10, 2024

jjfumero commented May 13, 2024

stratika commented May 13, 2024

mairooni commented May 13, 2024

		@@ -0,0 +1,74 @@
		/*
		* Copyright (c) 2024 APT Group, Department of Computer Science,

	* Copyright (c) 2024 APT Group, Department of Computer Science,
	* Copyright (c) 2024, APT Group, Department of Computer Science,

		@@ -0,0 +1,107 @@
		/*
		* Copyright (c) 2024 APT Group, Department of Computer Science,

		appendPhase(new TornadoTaskSpecialisation(canonicalizer));
		appendPhase(new TornadoBatchGlobalIndexOffset());

Fix batch computation when the global index is written in the output buffer #400

Fix batch computation when the global index is written in the output buffer #400

Conversation

mairooni commented Apr 30, 2024

Description

Backend/s tested

OS tested

Did you check on FPGAs?

How to test the new patch?

Choose a reason for hiding this comment

jjfumero left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjfumero commented May 2, 2024

jjfumero commented May 2, 2024

mairooni commented May 9, 2024

mairooni commented May 9, 2024

jjfumero left a comment

Choose a reason for hiding this comment

mairooni commented May 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjfumero commented May 13, 2024

stratika commented May 13, 2024

mairooni commented May 13, 2024