Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix batch computation when the global index is written in the output buffer #400

Merged
merged 16 commits into from
May 13, 2024

Conversation

mairooni
Copy link
Collaborator

Description

This PR provides a fix for a corner case in batch processing, which was showcased by the uk.ac.manchester.tornado.unittests.batches.TestBatches.testBatchNotEven unittest. The issue occurs when the loop index is directly written in the output buffer, e.g.:

for (@Parallel int i = 0; i < data.getSize(); i++) {
      data.set(i, i);
}

Since, in batch processing, the loop bound of each kernel is equal to the batch size, the i written in the output buffer is correct only for the first batch. To solve this, the value of i is offseted based on the number of the batch.
For instance, the generated code for the example shown above is changed as follows:
data.set(i, i) -> data.set(i, i + batchNumber * BatchSize)

Backend/s tested

Mark the backends affected by this PR.

  • OpenCL
  • PTX
  • SPIRV

OS tested

Mark the OS where this PR is tested.

  • Linux
  • OSx
  • Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

  • Yes
  • No

How to test the new patch?

Run tornado-test -V --fast uk.ac.manchester.tornado.unittests.batches.TestBatches


public TornadoHighTierContext(Providers providers, PhaseSuite<HighTierContext> graphBuilderSuite, OptimisticOptimizations optimisticOpts, ResolvedJavaMethod method, Object[] args,
TaskMetaData meta, boolean isKernel, long batchThreads) {
TaskMetaData meta, boolean isKernel, long batchThreads, int batchNumber, long batchSize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a BatchConfig class or maybe a record to pass around long batchThreads, int batchNumber, long batchSize?

Copy link
Member

@jjfumero jjfumero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @mikepapadim . Let's create a class called BatchCompilationConfig and store all the related fields here. Then, what we pass around in is an object of type BatchCompilationConfig.

return indexInWrite;
}

public void indexInWrite() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void indexInWrite() {
public void getIndexInWrite() {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do this change, you need to refactor all dependencies.

@@ -180,7 +180,11 @@ private static Sketch buildSketch(ResolvedJavaMethod resolvedMethod, Providers p
mergeAccesses(methodAccesses, invoke.callTarget(), sketch.getArgumentsAccess());
});

return new Sketch(graph.copy(TornadoCoreRuntime.getDebugContext()), methodAccesses);
Sketch sketch = new Sketch(graph.copy(TornadoCoreRuntime.getDebugContext()), methodAccesses);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass the indexInWrite to the Sketch constructor. Also, I think the name should be changed. This is more like batchWriteThreadIndex and the method for get and set should have get and set as prefix.

}

@Override
public void setBatchSize(long batchSize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between batchSIze and batchThread

void setBatchThreads(long batchThreads);

void setBatchSize(long batchSize);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of batchThreads changes when the chunks are not even, the batchSize is always the initial batch size. For example, if we have 3 batches, 60000, 60000 and 20000, the batchSize will be 60000 throughout the computation, even though the batch threads will change from 60000 to 20000 when computing the final chunk.

@jjfumero
Copy link
Member

jjfumero commented May 2, 2024

All batch tests passing for all supported backends (OpenCL, SPIR-V and PTX).

@jjfumero
Copy link
Member

jjfumero commented May 2, 2024

What happens when you have an expression that uses the thread-ID to store a value?

For example:

for (@Parallel int i = 0; i < data.getSize(); i++) {
      data.set(i, i * 20 + beta);
}

@mairooni
Copy link
Collaborator Author

mairooni commented May 9, 2024

What happens when you have an expression that uses the thread-ID to store a value?

For example:

for (@Parallel int i = 0; i < data.getSize(); i++) {
      data.set(i, i * 20 + beta);
}

This case also works, I just included a unittest to check. To test it, run
tornado-test -V --fast uk.ac.manchester.tornado.unittests.batches.TestBatches#testBatchThreadIndex

@mairooni
Copy link
Collaborator Author

mairooni commented May 9, 2024

All comments have been applied. This PR is ready for another review.

Copy link
Member

@jjfumero jjfumero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing license header

@jjfumero jjfumero requested a review from stratika May 10, 2024 08:24
@mairooni
Copy link
Collaborator Author

Missing license header

Done

@@ -0,0 +1,74 @@
/*
* Copyright (c) 2024 APT Group, Department of Computer Science,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Copyright (c) 2024 APT Group, Department of Computer Science,
* Copyright (c) 2024, APT Group, Department of Computer Science,

@@ -0,0 +1,107 @@
/*
* Copyright (c) 2024 APT Group, Department of Computer Science,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Copyright (c) 2024 APT Group, Department of Computer Science,
* Copyright (c) 2024, APT Group, Department of Computer Science,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We forgot the comma after the year. Please append all files that have new headers.

appendPhase(new TornadoTaskSpecialisation(canonicalizer));
appendPhase(new TornadoBatchGlobalIndexOffset());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned that this phase is batch-specific. However, it is invoked also when batch processing is not used. Right?

@jjfumero
Copy link
Member

@stratika , anything pending from your side? Can we merge this PR?

@stratika
Copy link
Collaborator

From my side, the suggestion for the header parts of the new files and some comments that have not been answered.

@mairooni
Copy link
Collaborator Author

I added the commas in the headers.

@jjfumero jjfumero merged commit 772905a into beehive-lab:develop May 13, 2024
1 check passed
@mairooni mairooni deleted the fix/batch_index branch May 13, 2024 08:35
jjfumero added a commit to jjfumero/TornadoVM that referenced this pull request May 28, 2024
Improvements
~~~~~~~~~~~~~~~~~~

- beehive-lab#402 <beehive-lab#402>: Support for TornadoNativeArrays from FFI buffers.
- beehive-lab#403 <beehive-lab#403>: Clean-up and refactoring for the code analysis of the loop-interchange.
- beehive-lab#405 <beehive-lab#405>: Disable Loop-Interchange for CPU offloading..
- beehive-lab#407 <beehive-lab#407>: Debugging OpenCL Kernels builds improved.
- beehive-lab#410 <beehive-lab#410>: CPU block scheduler disabled by default and option to switch between different thread-schedulers added.
- beehive-lab#418 <beehive-lab#418>: TornadoOptions and TornadoLogger improved.
- beehive-lab#423 <beehive-lab#423>: MxM using ns instead of ms to report performance.
- beehive-lab#425 <beehive-lab#425>: Vector types for ``Float<Width>`` and ``Int<Width>`` supported.
- beehive-lab#429 <beehive-lab#429>: Documentation of the installation process updated and improved.
- beehive-lab#432 <beehive-lab#432>: Support for SPIR-V code generation and dispatcher using the TornadoVM OpenCL runtime.

Compatibility
~~~~~~~~~~~~~~~~~~

- beehive-lab#409 <beehive-lab#409>: Guidelines to build the documentation.
- beehive-lab#411 <beehive-lab#411>: Windows installer improved.
- beehive-lab#412 <beehive-lab#412>: Python installer improved to check download all Python dependencies before the main installer.
- beehive-lab#413 <beehive-lab#413>: Improved documentation for installing all configurations of backends and OS.
- beehive-lab#424 <beehive-lab#424>: Use Generic GPU Scheduler for some older NVIDIA Drivers for the OpenCL runtime.
- beehive-lab#430 <beehive-lab#430>: Improved the installer by checking  that the TornadoVM environment is loaded upfront.

Bug Fixes
~~~~~~~~~~~~~~~~~~

- beehive-lab#400 <beehive-lab#400>: Fix batch computation when the global thread indexes are used to compute the outputs.
- beehive-lab#414 <beehive-lab#414>: Recover Test-Field unit-tests using Panama types.
- beehive-lab#415 <beehive-lab#415>: Check style errors fixed.
- beehive-lab#416 <beehive-lab#416>: FPGA execution with multiple tasks in a task-graph fixed.
- beehive-lab#417 <beehive-lab#417>: Lazy-copy out fixed for Java fields.
- beehive-lab#420 <beehive-lab#420>: Fix Mandelbrot example.
- beehive-lab#421 <beehive-lab#421>: OpenCL 2D thread-scheduler fixed for NVIDIA GPUs.
- beehive-lab#422 <beehive-lab#422>: Compilation for NVIDIA Jetson Nano fixed.
- beehive-lab#426 <beehive-lab#426>: Fix Logger for all backends.
- beehive-lab#428 <beehive-lab#428>: Math cos/sin operations supported for vector types.
- beehive-lab#431 <beehive-lab#431>: Jenkins files fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

4 participants