-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows variant of Linux installer without MSys2 #356
Conversation
Thank you @otabuzzman . This is awesome! I was planing to do something like this soon, so very timely. Give me a few days to check with my windows PC and try all instructions step by step. |
Take your time, I'm in no hurry ;-) but glad to hear you find it useful.
When I first started, I used the cmd.exe tool. Later I realized that using
Python would have been better since it is necessary to run and test
TornadoVM interactively anyway.
I now think that customizing the original installer should be possible with
little effort and am considering giving it a try.
Juan Fumero ***@***.***> schrieb am Di. 19. März 2024 um
22:02:
… Thank you @otabuzzman <https://github.com/otabuzzman> . This is awesome!
I was planing to do something like so very timely. Give me a few days to
check with my windows PC and try all instructions step by step.
—
Reply to this email directly, view it on GitHub
<#356 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD7PMXHPLH2OZFRETFMUUHLYZCRXRAVCNFSM6AAAAABE6IT42SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBYGEZDSNRZGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I will start with the dependencies and then switch to this main repo. |
I could make it work. However, depending on the backend, I get errors. OpenCL: python %TORNADO_SDK%\bin\tornado --threadInfo -m tornado.examples/uk.ac.manchester.tornado.examples.compute.MatrixMultiplication2D
[TornadoVM-OCL-JNI] ERROR : clEnqueueNDRangeKernel[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> CL_OUT_OF_RESOURCES error executing CL_COMMAND_NDRANGE_KERNEL on NVIDIA GeForce RTX 3070 (Device 0).
-> Returned: -5
Single Threaded CPU Execution: 2.63 GFlops, Total time = 102 ms
Streams Execution: 16.78 GFlops, Total time = 16 ms
TornadoVM Execution on GPU (Accelerated): 268.44 GFlops, Total Time = 1 ms
Speedup: 102.0x
Verification false But the same kernel, running with SPIR-V (Level Zero) and CUDA PTX works fine:
It looks to me a driver issue, but this test passes on Linux and OSx. OpenCL devices: python %TORNADO_SDK%\bin\tornado --devices
WARNING: Using incubator modules: jdk.incubator.vector
[TornadoVM-OCL-JNI] ERROR : clGetDeviceIDs -> Returned: 4294967295
[TornadoVM-OCL-JNI] ERROR : clGetDeviceIDs -> Returned: 4294967266
[TornadoVM-OCL-JNI] ERROR : clCreateContext -> Returned: -30
Number of Tornado drivers: 1
Driver: OpenCL
Total number of OpenCL devices : 4
Tornado device=0:0 (DEFAULT)
OPENCL -- [NVIDIA CUDA] -- NVIDIA GeForce RTX 3070
Global Memory Size: 8.0 GB
Local Memory Size: 48.0 KB
Workgroup Dimensions: 3
Total Number of Block Threads: [1024]
Max WorkGroup Configuration: [1024, 1024, 64]
Device OpenCL C version: OpenCL C 1.2
Tornado device=0:1
OPENCL -- [Intel(R) OpenCL Graphics] -- Intel(R) UHD Graphics 770
Global Memory Size: 12.7 GB
Local Memory Size: 64.0 KB
Workgroup Dimensions: 3
Total Number of Block Threads: [512]
Max WorkGroup Configuration: [512, 512, 512]
Device OpenCL C version: OpenCL C 1.2
Tornado device=0:2
OPENCL -- [Intel(R) OpenCL] -- 12th Gen Intel(R) Core(TM) i7-12700K
Global Memory Size: 31.7 GB
Local Memory Size: 32.0 KB
Workgroup Dimensions: 3
Total Number of Block Threads: [8192]
Max WorkGroup Configuration: [8192, 8192, 8192]
Device OpenCL C version: OpenCL C 3.0
Tornado device=0:3
OPENCL -- [Intel(R) FPGA Emulation Platform for OpenCL(TM)] -- Intel(R) FPGA Emulation Device
Global Memory Size: 31.7 GB
Local Memory Size: 256.0 KB
Workgroup Dimensions: 3
Total Number of Block Threads: [67108864]
Max WorkGroup Configuration: [67108864, 67108864, 67108864]
Device OpenCL C version: OpenCL C 1.2
[TornadoVM-OCL-JNI] ERROR : clReleaseContext -> Returned: -34 The errors seems to be related to the FPGA, that we need to access in emulation mode. |
@@ -144,7 +144,7 @@ def build_levelzero_jni_lib(rebuild=False): | |||
[ | |||
"git", | |||
"clone", | |||
"https://github.com/beehive-lab/levelzero-jni", | |||
"https://github.com/otabuzzman/levelzero-jni#winstall", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to keep a note: We should merge first the dependencies and then update this URL to the official repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we merge the develop
of levelzero-jni
to master
, we can revert this link.
@@ -184,7 +184,7 @@ def build_spirv_toolkit_and_level_zero(rebuild=False): | |||
[ | |||
"git", | |||
"clone", | |||
"https://github.com/beehive-lab/beehive-spirv-toolkit.git", | |||
"https://github.com/otabuzzman/beehive-spirv-toolkit.git#winstall", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
Strange, with the OpenCL and my setup, nothing works. It looks to me a problem with my configuration: python %TORNADO_SDK%\bin\tornado --devices
WARNING: Using incubator modules: jdk.incubator.vector
[TornadoVM-OCL-JNI] ERROR : clGetDeviceIDs -> Returned: 4294967295
[TornadoVM-OCL-JNI] ERROR : clGetDeviceIDs -> Returned: 4294967266
[TornadoVM-OCL-JNI] ERROR : clCreateContext -> Returned: -30
Number of Tornado drivers: 1
Driver: OpenCL
Total number of OpenCL devices : 4
Tornado device=0:0 (DEFAULT)
OPENCL -- [NVIDIA CUDA] -- NVIDIA GeForce RTX 3070
Global Memory Size: 8.0 GB
Local Memory Size: 48.0 KB
Workgroup Dimensions: 3
Total Number of Block Threads: [1024]
Max WorkGroup Configuration: [1024, 1024, 64]
Device OpenCL C version: OpenCL C 1.2
Tornado device=0:1
OPENCL -- [Intel(R) OpenCL Graphics] -- Intel(R) UHD Graphics 770
Global Memory Size: 12.7 GB
Local Memory Size: 64.0 KB
Workgroup Dimensions: 3
Total Number of Block Threads: [512]
Max WorkGroup Configuration: [512, 512, 512]
Device OpenCL C version: OpenCL C 1.2
Tornado device=0:2
OPENCL -- [Intel(R) OpenCL] -- 12th Gen Intel(R) Core(TM) i7-12700K
Global Memory Size: 31.7 GB
Local Memory Size: 32.0 KB
Workgroup Dimensions: 3
Total Number of Block Threads: [8192]
Max WorkGroup Configuration: [8192, 8192, 8192]
Device OpenCL C version: OpenCL C 3.0
Tornado device=0:3
OPENCL -- [Intel(R) FPGA Emulation Platform for OpenCL(TM)] -- Intel(R) FPGA Emulation Device
Global Memory Size: 31.7 GB
Local Memory Size: 256.0 KB
Workgroup Dimensions: 3
Total Number of Block Threads: [67108864]
Max WorkGroup Configuration: [67108864, 67108864, 67108864]
Device OpenCL C version: OpenCL C 1.2
[TornadoVM-OCL-JNI] ERROR : clReleaseContext -> Returned: -34
C:\Users\jjfum\source\repos\TornadoVM>python %TORNADO_SDK%\bin\tornado-test
python C:/Users/jjfum/source/repos/TornadoVM/bin/sdk/bin/tornado --jvm "-Xmx6g -Dtornado.recover.bailout=False -Dtornado.unittests.verbose=False " -m tornado.unittests/uk.ac.manchester.tornado.unittests.tools.TornadoTestRunner --params "uk.ac.manchester.tornado.unittests.foundation.TestIntegers"
WARNING: Using incubator modules: jdk.incubator.vector
[TornadoVM-OCL-JNI] ERROR : clGetDeviceIDs -> Returned: 4294967295
[TornadoVM-OCL-JNI] ERROR : clGetDeviceIDs -> Returned: 4294967266
[TornadoVM-OCL-JNI] ERROR : clCreateContext -> Returned: -30
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#pragma OPENCL EXTENSION cl_khr_fp16 : enable |
Strange behavior, indeed. What oneAPI components are installed in your setup? In my there is only the Intel® CPU Runtime for OpenCL™ Applications with SYCL support. To make it work the steps given on the webpage in section Known Issues and Limitations needed to be applied. What is that FPGA emulator? Can you switch it off? |
In my case I installed the oneAPI Base Toolkit, which includes the FPGA emulation and other tools. I also have installed the Intel ARC GPU Drivers, since time to time, I switch my 3070 for the ARC 750 for experiments, and this might be causing the problem.
I will dig in to investigate the problem, but good to know it works for you. I will also work with Thanos to try to reproduce this on a different machine. |
Update:
> python %TORNADO_SDK%\bin\tornado --devices
Number of Tornado drivers: 1
Driver: OpenCL
Total number of OpenCL devices : 4
Tornado device=0:0 (DEFAULT)
OPENCL -- [NVIDIA CUDA] -- NVIDIA GeForce RTX 3070
Global Memory Size: 8.0 GB
Local Memory Size: 48.0 KB
Workgroup Dimensions: 3
Total Number of Block Threads: [1024]
Max WorkGroup Configuration: [1024, 1024, 64]
Device OpenCL C version: OpenCL C 1.2
Tornado device=0:1
OPENCL -- [Intel(R) OpenCL Graphics] -- Intel(R) UHD Graphics 770
Global Memory Size: 12.7 GB
Local Memory Size: 64.0 KB
Workgroup Dimensions: 3
Total Number of Block Threads: [512]
Max WorkGroup Configuration: [512, 512, 512]
Device OpenCL C version: OpenCL C 1.2
Tornado device=0:2
OPENCL -- [Intel(R) OpenCL] -- 12th Gen Intel(R) Core(TM) i7-12700K
Global Memory Size: 31.7 GB
Local Memory Size: 32.0 KB
Workgroup Dimensions: 3
Total Number of Block Threads: [8192]
Max WorkGroup Configuration: [8192, 8192, 8192]
Device OpenCL C version: OpenCL C 3.0
Tornado device=0:3
OPENCL -- [Intel(R) FPGA Emulation Platform for OpenCL(TM)] -- Intel(R) FPGA Emulation Device
Global Memory Size: 31.7 GB
Local Memory Size: 256.0 KB
Workgroup Dimensions: 3
Total Number of Block Threads: [67108864]
Max WorkGroup Configuration: [67108864, 67108864, 67108864]
Device OpenCL C version: OpenCL C 1.2
> python %TORNADO_SDK%\bin\tornado-test -V
Test: class uk.ac.manchester.tornado.unittests.foundation.TestIntegers
Running test: test01 ................ [PASS]
Running test: test03 ................ [PASS]
Running test: test04 ................ [PASS]
Running test: test05 ................ [PASS]
Running test: test06 ................ [PASS]
Running test: test07 ................ [PASS]
Running test: test02 ................ [PASS]
Test: class uk.ac.manchester.tornado.unittests.foundation.TestFloats
Running test: testFloatsCopy ................ [PASS]
Running test: testVectorFloatMul ................ [PASS]
Running test: testVectorFloatDiv ................ [PASS]
Running test: testVectorFloatAdd ................ [PASS]
Running test: testVectorFloatSub ................ [PASS]
Test: class uk.ac.manchester.tornado.unittests.foundation.TestDoubles
Running test: testDoublesMul ................ [PASS]
Running test: testDoublesCopy ................ [PASS]
Running test: testDoublesAdd ................ [PASS]
Running test: testDoublesDiv ................ [PASS]
Running test: testDoublesSub ................ [PASS]
...
Test: class uk.ac.manchester.tornado.unittests.compute.ComputeTests
Running test: testNBodyBigNoWorker ................ [PASS]
Running test: testBlackScholes ................ [PASS]
Running test: testHilbert ................ [PASS]
Running test: testNBodySmall ................ [PASS]
Running test: testDFTVectorTypes ................ [PASS]
Running test: matrixVector ................ [PASS]
Running test: testDFTFloat ................ [PASS]
Running test: testRenderTrack ................ [PASS]
Running test: testDFTDouble ................ [PASS]
Running test: testMandelbrot ................ [FAILED]
\_[REASON] expected:<8> but was:<9>
Running test: testMontecarlo ................ [PASS]
Running test: matrixVectorFloat4 ................ [PASS]
Running test: testJuliaSets ................ [FAILED]
\_[REASON] expected:<-1000.0> but was:<1.5197569>
Running test: testNBody ................ [PASS]
Running test: testEuler ................ [PASS]
...
==================================================
Unit tests report
==================================================
{'[PASS]': 579, '[FAILED]': 16, '[UNSUPPORTED]': 22}
Coverage [PASS/(PASS+FAIL)]: 97.31%
Coverage [PASS/(PASS+FAIL+UNSUPPORTED)]: 93.84%
==================================================
....
|
Based on the previous test, I am more towards a misconfiguration regarding the OpenCL on my Windows 11. |
Ok. My only concern is that, as it is, it kind of branches away from the style we have for Linux and OSx. To simplify the process of merging and review, my suggestion is that, for this iteration of the code, we move on with this CMD tool, and you can open a second PR with the Python migration if you want. Is this something you would like to try? |
More updates regarding NVIDIA OpenCL support on Windows 11:
I am running out of ideas, but at least we know it is not due to the installation of oneAPI + ARC Drivers. |
Ok, I think I got it. So the error is printed by the Driver and captured in our JNI code to dispatch OpeNCL kernels: [TornadoVM-OCL-JNI] ERROR : clEnqueueNDRangeKernel -> Returned: -5
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> CL_OUT_OF_RESOURCES error executing CL_COMMAND_NDRANGE_KERNEL on NVIDIA GeForce RTX 3070 (Device 0). This mainly suggests an issue with the block size. Since I noticed that smaller block sizes are executed correctly with OpenCL, I modified the Matrix Multiplication example in TorandoVM as follows: TaskGraph taskGraph = new TaskGraph("s0") //
.transferToDevice(DataTransferMode.FIRST_EXECUTION, matrixA, matrixB) //
.task("t0", MatrixMultiplication2D::matrixMultiplication, matrixA, matrixB, matrixC, size) //
.transferToHost(DataTransferMode.EVERY_EXECUTION, matrixC);
ImmutableTaskGraph immutableTaskGraph = taskGraph.snapshot();
TornadoExecutionPlan executor = new TornadoExecutionPlan(immutableTaskGraph);
WorkerGrid workerGrid = new WorkerGrid2D(matrixA.getNumRows(), matrixA.getNumColumns());
GridScheduler gridScheduler = new GridScheduler("s0.t0", workerGrid);
workerGrid.setLocalWork(16, 16, 1);
executor.withGridScheduler(gridScheduler).withWarmUp(); Diff: diff --git a/tornado-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixMultiplication2D.java b/tornado-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixMultiplication2D.java
index 0426e2dbb..a28ed57c6 100644
--- a/tornado-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixMultiplication2D.java
+++ b/tornado-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixMultiplication2D.java
@@ -20,9 +20,7 @@ package uk.ac.manchester.tornado.examples.compute;
import java.util.Random;
import java.util.stream.IntStream;
-import uk.ac.manchester.tornado.api.ImmutableTaskGraph;
-import uk.ac.manchester.tornado.api.TaskGraph;
-import uk.ac.manchester.tornado.api.TornadoExecutionPlan;
+import uk.ac.manchester.tornado.api.*;
import uk.ac.manchester.tornado.api.annotations.Parallel;
import uk.ac.manchester.tornado.api.enums.DataTransferMode;
import uk.ac.manchester.tornado.api.enums.TornadoDeviceType;
@@ -97,7 +95,12 @@ public class MatrixMultiplication2D {
ImmutableTaskGraph immutableTaskGraph = taskGraph.snapshot();
TornadoExecutionPlan executor = new TornadoExecutionPlan(immutableTaskGraph);
- executor.withWarmUp();
+
+ WorkerGrid workerGrid = new WorkerGrid2D(matrixA.getNumRows(), matrixA.getNumColumns());
+ GridScheduler gridScheduler = new GridScheduler("s0.t0", workerGrid);
+ workerGrid.setLocalWork(16, 16, 1);
+
+ executor.withGridScheduler(gridScheduler).withWarmUp();
// 1. Warm up Tornado
for (int i = 0; i < WARMING_UP_ITERATIONS; i++) {
So I forced to execute in blocks of 16x16 instead of the default value of 32x32, and the execution is correct. Task info: s0.t0
Backend : OPENCL
Device : NVIDIA GeForce RTX 3070 CL_DEVICE_TYPE_GPU (available)
Dims : 2
Global work offset: [0, 0, 0]
Global work size : [512, 512, 1]
Local work size : [16, 16, 1]
Number of workgroups : [32, 32, 1]
Single Threaded CPU Execution: 2.58 GFlops, Total time = 104 ms
Streams Execution: 15.79 GFlops, Total time = 17 ms
TornadoVM Execution on GPU (Accelerated): 268.44 GFlops, Total Time = 1 ms
Speedup: 104.0x
Verification true Takeaways:
|
Totally fine. I'll try it and come back with a PR if I succeed.
Juan Fumero ***@***.***> schrieb am Fr. 22. März 2024 um
07:40:
… I used the cmd.exe tool. Later I realized that using Python would have
been better since it is necessary to run and test TornadoVM interactively
anyway. I now think that customizing the original installer should be
possible with little effort and am considering giving it a try.
Ok. My only concern is that, as it is, it kind of branches away from the
style we have for Linux and OSx. To simplify the process of merging and
review, my suggestion is that, for this iteration of the code, we move on
with this CMD tool, and you can open a second PR with the Python migration
if you want. Is this something you would like to try?
—
Reply to this email directly, view it on GitHub
<#356 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD7PMXCHMGQMDGUCXDI4JCTYZPG7FAVCNFSM6AAAAABE6IT42SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJUGQ2TENRXGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@@ -144,7 +144,7 @@ def build_levelzero_jni_lib(rebuild=False): | |||
[ | |||
"git", | |||
"clone", | |||
"https://github.com/beehive-lab/levelzero-jni", | |||
"https://github.com/otabuzzman/levelzero-jni#winstall", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we merge the develop
of levelzero-jni
to master
, we can revert this link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I think we can iterate to simplify the part of installation in the documentation. Very good work for native installation in Windows.
I tested it on Windows 11.
I will merge this. Awesome work @otabuzzman . Thank you! |
Improvements ~~~~~~~~~~~~~~~~~~ - `beehive-lab#344 <https://github.com/beehive-lab/TornadoVM/pull/344>`_: Support for Multi-threaded Execution Plans. - `beehive-lab#347 <https://github.com/beehive-lab/TornadoVM/pull/347>`_: Enhanced examples. - `beehive-lab#350 <https://github.com/beehive-lab/TornadoVM/pull/350>`_: Obtain internal memory segment for the Tornado Native Arrays without the object header. - `beehive-lab#357 <https://github.com/beehive-lab/TornadoVM/pull/357>`_: API extensions to query and apply filters to backends and devices from the ``TornadoExecutionPlan``. - `beehive-lab#359 <https://github.com/beehive-lab/TornadoVM/pull/359>`_: Support Factory Methods for FFI-based array collections to be used/composed in TornadoVM Task-Graphs. Compatibility ~~~~~~~~~~~~~~~~~~ - `beehive-lab#351 <https://github.com/beehive-lab/TornadoVM/pull/351>`_: Compatibility of TornadoVM Native Arrays with the Java Vector API. - `beehive-lab#352 <https://github.com/beehive-lab/TornadoVM/pull/352>`_: Refactor memory limit to take into account primitive types and wrappers. - `beehive-lab#354 <https://github.com/beehive-lab/TornadoVM/pull/354>`_: Add DFT-sample benchmark in FP32. - `beehive-lab#356 <https://github.com/beehive-lab/TornadoVM/pull/356>`_: Initial support for Windows 11 using Visual Studio Development tools. - `beehive-lab#361 <https://github.com/beehive-lab/TornadoVM/pull/361>`_: Compatibility with the SPIR-V toolkit v0.0.4. - `beehive-lab#366 <https://github.com/beehive-lab/TornadoVM/pull/363>`_: Level Zero JNI Dependency updated to 0.1.3. Bug Fixes ~~~~~~~~~~~~~~~~~~ - `beehive-lab#346 <https://github.com/beehive-lab/TornadoVM/pull/346>`_: Computation of local-work group sizes for the Level Zero/SPIR-V backend fixed. - `beehive-lab#360 <https://github.com/beehive-lab/TornadoVM/pull/358>`_: Fix native tests to check the JIT compiler for each backend. - `beehive-lab#355 <https://github.com/beehive-lab/TornadoVM/pull/355>`_: Fix custom exceptions when a driver/device is not found.
Description
The PR is about an installer script to simplify installation on Windows. The script is supposed to work similar to the Linux one. It downloads and compiles all repos necessary to build TornadoVM. The script requires standard installations of Windows tools (Visual Studio Community 2022, CMake, Maven, and Python) as well as GraalVM unpacked somewhere in the file system.
The script is stored in
bin
. The name istornadovm-installer.cmd
. It provides a help option (--help
). Further information is in an additional section on Windows installation in the documentation (readthedocs) of TornadoVM.The script downloads the forked beehive-lab repos of the SPIR-V Toolkit and the LevelZero JNI, and checks out the
winstall
branch of each. Repo urls and branch names are hard-coded into the script. Both need to be changed after merging, if you decide to do so.Repo urls and branch names have also been hard-coded into the
bin/compile
script used by the Linux installer. This has been done for testing purposes on Linux. The compile script thus too needs the above changes after merging.Problem description
n/ a.
Backend/s tested
Mark the backends affected by this PR.
OS tested
Mark the OS where this PR is tested.
The unit tests provided with TornadoVM have been executed on Windows 11, Windows Server 2022 and Amazon Linux 2. Details are in this Google sheet. Some notes after a rough inspection:
testBatchNotEven
failed on every system for every backend with same extepcted/ was values for each failure. Might thus be a principal problem.testTornadoMathSinPIDouble
andtestTornadoMathCosPIDouble
failed on every system for the PTX backend with compile errors. CosPi/ SinPI might thus not be implemented at all for PTX.testCopyInWithDevice
fails sometimes. Might be due to different timings and a too small generous value fordelta
inassertEqual
.Did you check on FPGAs?
If it is applicable, check your changes on FPGAs.
How to test the new patch?
On a Windows box:
bin\tornadovm-installer.cmd
setvars.cmd
python %TORNADO_SDK%\bin\tornado --devices