-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix batch computation when the global index is written in the output buffer #400
Conversation
…g when the global index is used in the computation
|
||
public TornadoHighTierContext(Providers providers, PhaseSuite<HighTierContext> graphBuilderSuite, OptimisticOptimizations optimisticOpts, ResolvedJavaMethod method, Object[] args, | ||
TaskMetaData meta, boolean isKernel, long batchThreads) { | ||
TaskMetaData meta, boolean isKernel, long batchThreads, int batchNumber, long batchSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a BatchConfig
class or maybe a record to pass around long batchThreads, int batchNumber, long batchSize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @mikepapadim . Let's create a class called BatchCompilationConfig
and store all the related fields here. Then, what we pass around in is an object of type BatchCompilationConfig
.
return indexInWrite; | ||
} | ||
|
||
public void indexInWrite() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public void indexInWrite() { | |
public void getIndexInWrite() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do this change, you need to refactor all dependencies.
@@ -180,7 +180,11 @@ private static Sketch buildSketch(ResolvedJavaMethod resolvedMethod, Providers p | |||
mergeAccesses(methodAccesses, invoke.callTarget(), sketch.getArgumentsAccess()); | |||
}); | |||
|
|||
return new Sketch(graph.copy(TornadoCoreRuntime.getDebugContext()), methodAccesses); | |||
Sketch sketch = new Sketch(graph.copy(TornadoCoreRuntime.getDebugContext()), methodAccesses); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass the indexInWrite
to the Sketch
constructor. Also, I think the name should be changed. This is more like batchWriteThreadIndex
and the method for get and set should have get
and set
as prefix.
} | ||
|
||
@Override | ||
public void setBatchSize(long batchSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between batchSIze and batchThread
void setBatchThreads(long batchThreads);
void setBatchSize(long batchSize);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of batchThreads
changes when the chunks are not even, the batchSize
is always the initial batch size. For example, if we have 3 batches, 60000, 60000 and 20000, the batchSize will be 60000 throughout the computation, even though the batch threads will change from 60000 to 20000 when computing the final chunk.
All batch tests passing for all supported backends (OpenCL, SPIR-V and PTX). |
What happens when you have an expression that uses the thread-ID to store a value? For example: for (@Parallel int i = 0; i < data.getSize(); i++) {
data.set(i, i * 20 + beta);
} |
This case also works, I just included a unittest to check. To test it, run |
All comments have been applied. This PR is ready for another review. |
...do-runtime/src/main/java/uk/ac/manchester/tornado/runtime/common/BatchCompilationConfig.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing license header
Done |
@@ -0,0 +1,74 @@ | |||
/* | |||
* Copyright (c) 2024 APT Group, Department of Computer Science, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Copyright (c) 2024 APT Group, Department of Computer Science, | |
* Copyright (c) 2024, APT Group, Department of Computer Science, |
@@ -0,0 +1,107 @@ | |||
/* | |||
* Copyright (c) 2024 APT Group, Department of Computer Science, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Copyright (c) 2024 APT Group, Department of Computer Science, | |
* Copyright (c) 2024, APT Group, Department of Computer Science, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We forgot the comma after the year. Please append all files that have new headers.
appendPhase(new TornadoTaskSpecialisation(canonicalizer)); | ||
appendPhase(new TornadoBatchGlobalIndexOffset()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned that this phase is batch-specific. However, it is invoked also when batch processing is not used. Right?
...runtime/src/main/java/uk/ac/manchester/tornado/runtime/graal/compiler/TornadoSketchTier.java
Show resolved
Hide resolved
@stratika , anything pending from your side? Can we merge this PR? |
From my side, the suggestion for the header parts of the new files and some comments that have not been answered. |
I added the commas in the headers. |
Improvements ~~~~~~~~~~~~~~~~~~ - beehive-lab#402 <beehive-lab#402>: Support for TornadoNativeArrays from FFI buffers. - beehive-lab#403 <beehive-lab#403>: Clean-up and refactoring for the code analysis of the loop-interchange. - beehive-lab#405 <beehive-lab#405>: Disable Loop-Interchange for CPU offloading.. - beehive-lab#407 <beehive-lab#407>: Debugging OpenCL Kernels builds improved. - beehive-lab#410 <beehive-lab#410>: CPU block scheduler disabled by default and option to switch between different thread-schedulers added. - beehive-lab#418 <beehive-lab#418>: TornadoOptions and TornadoLogger improved. - beehive-lab#423 <beehive-lab#423>: MxM using ns instead of ms to report performance. - beehive-lab#425 <beehive-lab#425>: Vector types for ``Float<Width>`` and ``Int<Width>`` supported. - beehive-lab#429 <beehive-lab#429>: Documentation of the installation process updated and improved. - beehive-lab#432 <beehive-lab#432>: Support for SPIR-V code generation and dispatcher using the TornadoVM OpenCL runtime. Compatibility ~~~~~~~~~~~~~~~~~~ - beehive-lab#409 <beehive-lab#409>: Guidelines to build the documentation. - beehive-lab#411 <beehive-lab#411>: Windows installer improved. - beehive-lab#412 <beehive-lab#412>: Python installer improved to check download all Python dependencies before the main installer. - beehive-lab#413 <beehive-lab#413>: Improved documentation for installing all configurations of backends and OS. - beehive-lab#424 <beehive-lab#424>: Use Generic GPU Scheduler for some older NVIDIA Drivers for the OpenCL runtime. - beehive-lab#430 <beehive-lab#430>: Improved the installer by checking that the TornadoVM environment is loaded upfront. Bug Fixes ~~~~~~~~~~~~~~~~~~ - beehive-lab#400 <beehive-lab#400>: Fix batch computation when the global thread indexes are used to compute the outputs. - beehive-lab#414 <beehive-lab#414>: Recover Test-Field unit-tests using Panama types. - beehive-lab#415 <beehive-lab#415>: Check style errors fixed. - beehive-lab#416 <beehive-lab#416>: FPGA execution with multiple tasks in a task-graph fixed. - beehive-lab#417 <beehive-lab#417>: Lazy-copy out fixed for Java fields. - beehive-lab#420 <beehive-lab#420>: Fix Mandelbrot example. - beehive-lab#421 <beehive-lab#421>: OpenCL 2D thread-scheduler fixed for NVIDIA GPUs. - beehive-lab#422 <beehive-lab#422>: Compilation for NVIDIA Jetson Nano fixed. - beehive-lab#426 <beehive-lab#426>: Fix Logger for all backends. - beehive-lab#428 <beehive-lab#428>: Math cos/sin operations supported for vector types. - beehive-lab#431 <beehive-lab#431>: Jenkins files fixed.
Description
This PR provides a fix for a corner case in batch processing, which was showcased by the
uk.ac.manchester.tornado.unittests.batches.TestBatches.testBatchNotEven
unittest. The issue occurs when the loop index is directly written in the output buffer, e.g.:Since, in batch processing, the loop bound of each kernel is equal to the batch size, the
i
written in the output buffer is correct only for the first batch. To solve this, the value ofi
is offseted based on the number of the batch.For instance, the generated code for the example shown above is changed as follows:
data.set(i, i) -> data.set(i, i + batchNumber * BatchSize)
Backend/s tested
Mark the backends affected by this PR.
OS tested
Mark the OS where this PR is tested.
Did you check on FPGAs?
If it is applicable, check your changes on FPGAs.
How to test the new patch?
Run
tornado-test -V --fast uk.ac.manchester.tornado.unittests.batches.TestBatches