Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Under Demand Data Transfers from Device to Host #305

Merged
merged 4 commits into from
Jan 11, 2024

Conversation

jjfumero
Copy link
Member

@jjfumero jjfumero commented Jan 10, 2024

Description

This PR enables under-demand sub-copies from the device to the host.
This is useful for iterative algorithms in which we do not have to copy all data back and forth between
iterations. Rather. we can keep the data on the device, and perform the transfer under demand.

This functionality was already available since version 0.15 of TornadoVM. This PR extends this functionality by allowing developers to copy sub-regions under demand.

There is a new method in the TornadoExecutionResult to perform partial copies.
The following code snippet shows an example:

// Build a Task Graph and declare the copy out (transfer to host) as a UNDER_DEMAND copy
taskGraph.transferToDevice(DataTransferMode.FIRST_EXECUTION, data) //
                .task("t0", TestArrays::addAccumulator, data, 1) //
                .transferToHost(DataTransferMode.UNDER_DEMAND, data);

// Create an immutable task graph
ImmutableTaskGraph immutableTaskGraph = taskGraph.snapshot();

// Create an execution plan
TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(immutableTaskGraph);

// Run the execution plan and obtain the execution result.
TornadoExecutionResult executionResult = executionPlan.execute();

// From the Execution Result, we can build a DataRange and specify the range to copy out
DataRange dataRange = new DataRange(data);


// Copy under demand from offset 0 (0 is the default is not specified) until size = N/2.
executionResult.transferToHost(dataRange.withSize(N / 2));

// Perform a second copy from offset = N/2 with size = N/2. 
executionResult.transferToHost(dataRange.withOffset(N / 2).withSize(N / 2));

Using the data range, developers specify the offset and the size to copy.
Both of these parameters are optional (withOffset(x) and withSize(y)).

  • If the offset is not specified, the TornadoVM Runtime selects offset 0
  • If the size if not specified, the TornadoVM runtime copies from the selected (or default) offset until the end of the array.

Backend/s tested

Mark the backends affected by this PR.

  • OpenCL
  • PTX
  • SPIRV

OS tested

Mark the OS where this PR is tested.

  • Linux
  • OSx
  • Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

  • Yes
  • No

How to test the new patch?

$ make
$ tornado-test -V uk.ac.manchester.tornado.unittests.api.TestAPI#testLazyPartialCopyOut

@jjfumero jjfumero self-assigned this Jan 10, 2024
@jjfumero jjfumero added enhancement New feature or request API runtime feature New feature proposal labels Jan 10, 2024
@mairooni
Copy link
Collaborator

As far as I understand, the UNDER_DEMAND mode can be used to perform a lazy D2H copy of the whole dataset or a subset of the dataset if a DataRange object is included in the transferToHost, right? Does the partial copy need to be lazy? If not, do you think - not right now, but at a later point - that it would make sense to decouple the two? To have the UNDER_DEMAND mode just to define lazy copies and to have a let's say a PARTIAL mode or something similar to specify partial copies?

@jjfumero
Copy link
Member Author

As far as I understand, the UNDER_DEMAND mode can be used to perform a lazy D2H copy of the whole dataset or a subset of the dataset if a DataRange object is included in the transferToHost, right?

Yes, this is correct.

Does the partial copy need to be lazy? If not, do you think - not right now, but at a later point - that it would make sense to decouple the two? To have the UNDER_DEMAND mode just to define lazy copies and to have a let's say a PARTIAL mode or something similar to specify partial copies?

This is an interesting point. In my view, if the user needs a partial copy, the way to do it is via DataRanges using the under demand mode. So, by default, it operates using all copy in/out. Having said this, I think is a nice to have if we can also specify custom DataRages for copy-in in a task-graph, and quite likely, this needs to be done at the ExecutionPlan level, rather than the ExecutionResult.

Copy link
Member

@mikepapadim mikepapadim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

Copy link
Collaborator

@stratika stratika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jjfumero jjfumero merged commit a3cf412 into beehive-lab:develop Jan 11, 2024
2 checks passed
@jjfumero jjfumero deleted the feat/subregions branch January 11, 2024 15:33
jjfumero added a commit that referenced this pull request Jan 30, 2024
TornadoVM 1.0.1
----------------
30/01/2024

Improvements
~~~~~~~~~~~~~~~~~~

- `#305 <https://github.com/beehive-lab/TornadoVM/pull/305>`_: Under-demand data transfer for custom data ranges.
- `#305 <https://github.com/beehive-lab/TornadoVM/pull/305>`_: Copy out subregions using the execution plan:
- `#313 <https://github.com/beehive-lab/TornadoVM/pull/313>`_: Initial support for Half-Precision (FP16) data types.
- `#311 <https://github.com/beehive-lab/TornadoVM/pull/311>`_: Enable Multi-Task Multiple Device (MTMD) model from the ``TornadoExecutionPlan`` API:
- `#315 <https://github.com/beehive-lab/TornadoVM/pull/315>`_: Math ``Ceil`` function added

Compatibility/Integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~

- `#294 <https://github.com/beehive-lab/TornadoVM/pull/294>`_: Separation of the OpenCL Headers from the code base.
- `#297 <https://github.com/beehive-lab/TornadoVM/pull/297>`_: Separation of the LevelZero JNI API in a separate repository.
- `#301 <https://github.com/beehive-lab/TornadoVM/pull/301>`_: Temurin configuration supported.
- `#304 <https://github.com/beehive-lab/TornadoVM/pull/304>`_: Refactor of the common phases for the JIT compiler.
- `#316 <https://github.com/beehive-lab/TornadoVM/pull/316>`_: Beehive SPIR-V Toolkit version updated.

Bug Fixes
~~~~~~~~~~~~~~~~~~

- `#298 <https://github.com/beehive-lab/TornadoVM/pull/298>`_: OpenCL Codegen fixed open-close brackets.
- `#300 <https://github.com/beehive-lab/TornadoVM/pull/300>`_: Python Dependencies fixed for AWS
- `#308 <https://github.com/beehive-lab/TornadoVM/pull/308>`_: Runtime check for Grid-Scheduler names
- `#309 <https://github.com/beehive-lab/TornadoVM/pull/309>`_: Fix check-style to support STR templates
- `#314 <https://github.com/beehive-lab/TornadoVM/pull/314>`_: emit Vector16 Capability for 16-width vectors
jjfumero added a commit that referenced this pull request Jan 30, 2024
Improvements
~~~~~~~~~~~~~~~~~~

- `#305 <https://github.com/beehive-lab/TornadoVM/pull/305>`_: Under-demand data transfer for custom data ranges.
- `#313 <https://github.com/beehive-lab/TornadoVM/pull/313>`_: Initial support for Half-Precision (FP16) data types.
- `#311 <https://github.com/beehive-lab/TornadoVM/pull/311>`_: Enable Multi-Task Multiple Device (MTMD) model from the ``TornadoExecutionPlan`` API.
- `#315 <https://github.com/beehive-lab/TornadoVM/pull/315>`_: Math ``Ceil`` function added.

Compatibility/Integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~

- `#294 <https://github.com/beehive-lab/TornadoVM/pull/294>`_: Separation of the OpenCL Headers from the code base.
- `#297 <https://github.com/beehive-lab/TornadoVM/pull/297>`_: Separation of the LevelZero JNI API in a separate repository.
- `#301 <https://github.com/beehive-lab/TornadoVM/pull/301>`_: Temurin configuration supported.
- `#304 <https://github.com/beehive-lab/TornadoVM/pull/304>`_: Refactor of the common phases for the JIT compiler.
- `#316 <https://github.com/beehive-lab/TornadoVM/pull/316>`_: Beehive SPIR-V Toolkit version updated.

Bug Fixes
~~~~~~~~~~~~~~~~~~

- `#298 <https://github.com/beehive-lab/TornadoVM/pull/298>`_: OpenCL Codegen fixed open-close brackets.
- `#300 <https://github.com/beehive-lab/TornadoVM/pull/300>`_: Python Dependencies fixed for AWS.
- `#308 <https://github.com/beehive-lab/TornadoVM/pull/308>`_: Runtime check for Grid-Scheduler names.
- `#309 <https://github.com/beehive-lab/TornadoVM/pull/309>`_: Fix check-style to support STR templates.
- `#314 <https://github.com/beehive-lab/TornadoVM/pull/314>`_: emit Vector16 Capability for 16-width vectors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API enhancement New feature or request feature New feature proposal runtime
Projects
Development

Successfully merging this pull request may close these issues.

4 participants