Support vectors of float-16 values #372

mairooni · 2024-04-08T09:29:26Z

Description

This PR provides support for vectors containing half-float values.

Mark the backends affected by this PR.

OpenCL
PTX
SPIRV

OS tested

Mark the OS where this PR is tested.

Linux
OSx
Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

Yes
No

How to test the new patch?

Run tornado-test -V uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats

…plemented functionality

…o object

…sult of an operation or a copy of the fields of an existing vector

…t fp16

…alf float vectors

… for each loadindexedvector if is it for a half float vector instead of assuming that all of them are if one is

jjfumero · 2024-04-08T10:08:14Z

Some testing:

a) OpenCL on the Intel HD Graphics:

tornado-test --threadInfo -V --jvm="-Dtornado.unittests.device=0:1" uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats 
WARNING: Using incubator modules: jdk.incubator.vector

Task info: s0.t0
	Backend           : OPENCL
	Device            : Intel(R) UHD Graphics 770 CL_DEVICE_TYPE_GPU (available)
	Dims              : 1
	Global work offset: [0]
	Global work size  : [16]
	Local  work size  : [16, 1, 1]
	Number of workgroups  : [1]


Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
	Running test: vectorPhiTest              ................  [PASS] 
	Running test: testSimpleDotProductHalf2  ................  [PASS] 
	Running test: testSimpleDotProductHalf3  ................  [PASS] 
	Running test: testSimpleDotProductHalf4  ................  [PASS] 
	Running test: testSimpleDotProductHalf8  ................  [PASS] 
	Running test: testSimpleDotProductHalf16 ................  [PASS] 
	Running test: testSimpleVectorAddition   ................  [PASS] 
	Running test: testVectorHalf2            ................  [PASS] 
	Running test: testVectorHalf3            ................  [PASS] 
	Running test: testVectorFloat3toString   ................  [PASS] 
	Running test: testVectorHalf4            ................  [PASS] 
	Running test: testVectorHalf16           ................  [PASS] 
	Running test: testVectorHalf8            ................  [PASS] 
	Running test: testVectorHalf8_Storage    ................  [PASS] 
	Running test: testDotProduct             ................  [PASS] 
	Running test: privateVectorHalf2         ................  [PASS] 
	Running test: privateVectorHalf4         ................  [PASS] 
	Running test: privateVectorHalf8         ................  [PASS] 
	Running test: testVectorHalf4_Unary      ................  [PASS] 
	Running test: testInternalSetMethod01    ................  [PASS] 
	Running test: testInternalSetMethod02    ................  [PASS] 
	Running test: testInternalSetMethod03    ................  [PASS] 
	Running test: testInternalSetMethod04    ................  [PASS] 
	Running test: testAllocationIssue        ................  [PASS]

B) SPIR-V Backend:

Task info: s0.t0
	Backend           : SPIRV
	Device            : SPIRV LevelZero - Intel(R) UHD Graphics 770 GPU
	Dims              : 1
	Global work offset: [0]
	Global work size  : [16]
	Local  work size  : [16, 1, 1]
	Number of workgroups  : [1]

Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
	Running test: vectorPhiTest              ................  [FAILED] 
		\_[REASON] expected:<8.0> but was:<1.0>
	Running test: testSimpleDotProductHalf2  ................  [PASS] 
	Running test: testSimpleDotProductHalf3  ................  [PASS] 
	Running test: testSimpleDotProductHalf4  ................  [PASS] 
	Running test: testSimpleDotProductHalf8  ................  [PASS] 
	Running test: testSimpleDotProductHalf16 ................  [PASS] 
	Running test: testSimpleVectorAddition   ................  [FAILED] 
		\_[REASON] expected:<4.0> but was:<1.0>
	Running test: testVectorHalf2            ................  [FAILED] 
		\_[REASON] expected:<16.0> but was:<1.0>
	Running test: testVectorHalf3            ................  [FAILED] 
		\_[REASON] expected:<8.0> but was:<1.0>
	Running test: testVectorFloat3toString   ................  [PASS] 
	Running test: testVectorHalf4            ................  [FAILED] 
		\_[REASON] expected:<8.0> but was:<1.0>
	Running test: testVectorHalf16           ................  [FAILED] 
		\_[REASON] expected:<16.0> but was:<1.0>
	Running test: testVectorHalf8            ................  [FAILED] 
		\_[REASON] expected:<8.0> but was:<1.0>
	Running test: testVectorHalf8_Storage    ................  [PASS] 
	Running test: testDotProduct             ................  [PASS] 
	Running test: privateVectorHalf2         ................  [FAILED] 
		\_[REASON] expected:<120.0> but was:<1.0>
	Running test: privateVectorHalf4         ................  [FAILED] 
		\_[REASON] expected:<120.0> but was:<1.0>
	Running test: privateVectorHalf8         ................  [FAILED] 
		\_[REASON] expected:<120.0> but was:<1.0>
	Running test: testVectorHalf4_Unary      ................  [PASS] 
	Running test: testInternalSetMethod01    ................  [PASS] 
	Running test: testInternalSetMethod02    ................  [PASS] 
	Running test: testInternalSetMethod03    ................  [PASS] 
	Running test: testInternalSetMethod04    ................  [PASS] 
	Running test: testAllocationIssue        ................  [PASS] 
Test ran: 24, Failed: 10, Unsupported: 0

C) For the PTX backend:

ornado-test --threadInfo -V --jvm="-Dtornado.unittests.device=0:1" uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats 
WARNING: Using incubator modules: jdk.incubator.vector

Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
	Running test: vectorPhiTest              ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testSimpleDotProductHalf2  ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testSimpleDotProductHalf3  ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testSimpleDotProductHalf4  ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testSimpleDotProductHalf8  ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testSimpleDotProductHalf16 ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testSimpleVectorAddition   ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testVectorHalf2            ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testVectorHalf3            ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testVectorFloat3toString   ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testVectorHalf4            ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testVectorHalf16           ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testVectorHalf8            ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testVectorHalf8_Storage    ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testDotProduct             ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: privateVectorHalf2         ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: privateVectorHalf4         ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: privateVectorHalf8         ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testVectorHalf4_Unary      ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testInternalSetMethod01    ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testInternalSetMethod02    ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testInternalSetMethod03    ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testInternalSetMethod04    ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testAllocationIssue        ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
Test ran: 24, Failed: 24, Unsupported: 0

Commit point: #24c971a95

jjfumero · 2024-04-08T10:08:58Z

Let's work on it together. We can start with the SPIR-V Backend.

mairooni · 2024-04-08T10:13:28Z

I cannot reproduce these errors for some reason. These are the tests for the SPIV backend for me:

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [8]
        Local  work size  : [8, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [2]
        Local  work size  : [2, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [8]
        Local  work size  : [8, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [8]
        Local  work size  : [8, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [8]
        Local  work size  : [8, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0-MAP
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [8]
        Local  work size  : [8, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t1-REDUCE
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
        Running test: vectorPhiTest              ................  [PASS] 
        Running test: testSimpleDotProductHalf2  ................  [PASS] 
        Running test: testSimpleDotProductHalf3  ................  [PASS] 
        Running test: testSimpleDotProductHalf4  ................  [PASS] 
        Running test: testSimpleDotProductHalf8  ................  [PASS] 
        Running test: testSimpleDotProductHalf16 ................  [PASS] 
        Running test: testSimpleVectorAddition   ................  [PASS] 
        Running test: testVectorHalf2            ................  [PASS] 
        Running test: testVectorHalf3            ................  [PASS] 
        Running test: testVectorFloat3toString   ................  [PASS] 
        Running test: testVectorHalf4            ................  [PASS] 
        Running test: testVectorHalf16           ................  [PASS] 
        Running test: testVectorHalf8            ................  [PASS] 
        Running test: testVectorHalf8_Storage    ................  [PASS] 
        Running test: testDotProduct             ................  [PASS] 
        Running test: privateVectorHalf2         ................  [PASS] 
        Running test: privateVectorHalf4         ................  [PASS] 
        Running test: privateVectorHalf8         ................  [PASS] 
        Running test: testVectorHalf4_Unary      ................  [PASS] 
        Running test: testInternalSetMethod01    ................  [PASS] 
        Running test: testInternalSetMethod02    ................  [PASS] 
        Running test: testInternalSetMethod03    ................  [PASS] 
        Running test: testInternalSetMethod04    ................  [PASS] 
        Running test: testAllocationIssue        ................  [PASS] 
Test ran: 24, Failed: 0, Unsupported: 0

mairooni · 2024-04-08T10:15:21Z

For PTX

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [8]
        Blocks dimensions : [8, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [2]
        Blocks dimensions : [2, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [8]
        Blocks dimensions : [8, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [8]
        Blocks dimensions : [8, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [8]
        Blocks dimensions : [8, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0-MAP
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [8]
        Blocks dimensions : [8, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t1-REDUCE
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
        Running test: vectorPhiTest              ................  [PASS] 
        Running test: testSimpleDotProductHalf2  ................  [PASS] 
        Running test: testSimpleDotProductHalf3  ................  [PASS] 
        Running test: testSimpleDotProductHalf4  ................  [PASS] 
        Running test: testSimpleDotProductHalf8  ................  [PASS] 
        Running test: testSimpleDotProductHalf16 ................  [PASS] 
        Running test: testSimpleVectorAddition   ................  [PASS] 
        Running test: testVectorHalf2            ................  [PASS] 
        Running test: testVectorHalf3            ................  [PASS] 
        Running test: testVectorFloat3toString   ................  [PASS] 
        Running test: testVectorHalf4            ................  [PASS] 
        Running test: testVectorHalf16           ................  [PASS] 
        Running test: testVectorHalf8            ................  [PASS] 
        Running test: testVectorHalf8_Storage    ................  [PASS] 
        Running test: testDotProduct             ................  [PASS] 
        Running test: privateVectorHalf2         ................  [PASS] 
        Running test: privateVectorHalf4         ................  [PASS] 
        Running test: privateVectorHalf8         ................  [PASS] 
        Running test: testVectorHalf4_Unary      ................  [PASS] 
        Running test: testInternalSetMethod01    ................  [PASS] 
        Running test: testInternalSetMethod02    ................  [PASS] 
        Running test: testInternalSetMethod03    ................  [PASS] 
        Running test: testInternalSetMethod04    ................  [PASS] 
        Running test: testAllocationIssue        ................  [PASS] 
Test ran: 24, Failed: 0, Unsupported: 0

jjfumero · 2024-04-08T10:15:25Z

ok. let me check with an older CPU. I detected that some of the tests are not passing using > Intel 12th gen HD Graphics.

jjfumero · 2024-04-08T10:18:30Z

My mistake. The PTX tests are passing. The command I used was wrong. Let me work on the SPIR-V and see what I can spot.

jjfumero · 2024-04-08T12:53:34Z

Still with an older CPU fails. I am using Intel compute runtime 23.35.27191.9 I will try to update to a newer version and check again.

jjfumero · 2024-04-08T14:34:56Z

This did the trick for SPIR-V Half2 vectors:

diff --git a/tornado-drivers/spirv/src/main/java/uk/ac/manchester/tornado/drivers/spirv/graal/nodes/vector/VectorAddNode.java b/tornado-drivers/spirv/src/main/java/uk/ac/manchester/tornado/drivers/spirv/graal/nodes/vector/VectorAddNode.java
index 761e060ce..01e2c8ae5 100644
--- a/tornado-drivers/spirv/src/main/java/uk/ac/manchester/tornado/drivers/spirv/graal/nodes/vector/VectorAddNode.java
+++ b/tornado-drivers/spirv/src/main/java/uk/ac/manchester/tornado/drivers/spirv/graal/nodes/vector/VectorAddNode.java
@@ -13,7 +13,7 @@
  *
  * This code is distributed in the hope that it will be useful, but WITHOUT
  * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
  * version 2 for more details (a copy is included in the LICENSE file that
  * accompanied this code).
  *
@@ -95,6 +95,8 @@ public class VectorAddNode extends BinaryNode implements LIRLowerable, VectorOp
 
         if (kind.getElementKind().isFloatingPoint()) {
             binaryOp = SPIRVAssembler.SPIRVBinaryOp.ADD_FLOAT;
+        } else if (kind.isHalf()) {
+            binaryOp = SPIRVAssembler.SPIRVBinaryOp.ADD_FLOAT;
         }

Le'ts replicate this change for all vector types. It might be a driver fix after all with new versions of the Intel compute runtime.

jjfumero · 2024-04-08T14:50:52Z

Cool, now it passes all new tests regarding FP16 with SPIR-V:

Task info: s0.t0
	Backend           : SPIRV
	Device            : SPIRV LevelZero - Intel(R) UHD Graphics 770 GPU
	Dims              : 1
	Global work offset: [0]
	Global work size  : [16]
	Local  work size  : [16, 1, 1]
	Number of workgroups  : [1]

Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
	Running test: vectorPhiTest              ................  [PASS] 
	Running test: testSimpleDotProductHalf2  ................  [PASS] 
	Running test: testSimpleDotProductHalf3  ................  [PASS] 
	Running test: testSimpleDotProductHalf4  ................  [PASS] 
	Running test: testSimpleDotProductHalf8  ................  [PASS] 
	Running test: testSimpleDotProductHalf16 ................  [PASS] 
	Running test: testSimpleVectorAddition   ................  [PASS] 
	Running test: testVectorHalf2            ................  [PASS] 
	Running test: testVectorHalf3            ................  [PASS] 
	Running test: testVectorFloat3toString   ................  [PASS] 
	Running test: testVectorHalf4            ................  [PASS] 
	Running test: testVectorHalf16           ................  [PASS] 
	Running test: testVectorHalf8            ................  [PASS] 
	Running test: testVectorHalf8_Storage    ................  [PASS] 
	Running test: testDotProduct             ................  [PASS] 
	Running test: privateVectorHalf2         ................  [PASS] 
	Running test: privateVectorHalf4         ................  [PASS] 
	Running test: privateVectorHalf8         ................  [PASS] 
	Running test: testVectorHalf4_Unary      ................  [PASS] 
	Running test: testInternalSetMethod01    ................  [PASS] 
	Running test: testInternalSetMethod02    ................  [PASS] 
	Running test: testInternalSetMethod03    ................  [PASS] 
	Running test: testInternalSetMethod04    ................  [PASS] 
	Running test: testAllocationIssue        ................  [PASS] 
Test ran: 24, Failed: 0, Unsupported: 0

jjfumero · 2024-04-09T06:44:58Z

tornado-api/src/main/java/uk/ac/manchester/tornado/api/math/TornadoMath.java

+    public static boolean isEqual(HalfFloatArray a, HalfFloatArray b) {
+        boolean result = true;
+        for (int i = 0; i < a.getSize() && result; i++) {
+            result = compareBits(a.get(i).getHalfFloatValue(), b.get(i).getHalfFloatValue());


Shouldn't be something like:

result = result & compareBits(a.get(i).getHalfFloatValue(), b.get(i).getHalfFloatValue());

This is a copy from the other isEqual methods we have in this class, just for HalfFloatArray data. If I change this one, should I change all the others as well?

jjfumero · 2024-04-09T06:46:50Z

tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/collections/VectorHalf.java

+
+public final class VectorHalf implements TornadoCollectionInterface<ShortBuffer> {
+
+    private static final int ELEMENT_SIZE = 1;


Elements size is 2 bytes, correct?

tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/collections/VectorHalf.java

jjfumero · 2024-04-09T06:50:13Z

tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/collections/VectorHalf16.java

+
+    public static final Class<VectorHalf16> TYPE = VectorHalf16.class;
+
+    private static final int ELEMENT_SIZE = 16;


So, I am confused now. Element size then indicates the number of Half elements, not the half size.

In this case, I suggest renaming this constant: ELEMENT_VECTOR_SIZE

Yes, makes sense. I kept it like that for consistency, because this is how this field is named in all the other vector collection classes. I was thinking to have a separate PR for refactoring all the vector classes at some point, but I can just rename this field for the new classes.

tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/vectors/Half2.java

.../java/uk/ac/manchester/tornado/drivers/opencl/graal/phases/TornadoHalfFloatVectorOffset.java

...ain/java/uk/ac/manchester/tornado/drivers/ptx/graal/phases/TornadoHalfFloatVectorOffset.java

jjfumero · 2024-04-09T07:06:31Z

...in/java/uk/ac/manchester/tornado/drivers/spirv/graal/phases/TornadoHalfFloatReplacement.java

+                for (Node vectorElement : vectorValueNode.inputs()) {
+                    if (vectorElement instanceof VectorLoadElementNode) {
+                        VectorLoadElementNode vectorLoad = (VectorLoadElementNode) vectorElement;
+                        VectorLoadElementNode vectorLoadShort = new VectorLoadElementNode(SPIRVKind.OP_TYPE_FLOAT_16, vectorLoad.getVector(), vectorLoad.getLaneId());


In this case, FLOAT16 is used, instead of SHORT, as we saw in the OpenCL.

...n/java/uk/ac/manchester/tornado/drivers/spirv/graal/phases/TornadoHalfFloatVectorOffset.java

… apply minor compiler fixes for the new unittest

…n and remove div function

jjfumero

In a second review, LGTM. I do not have access to the SPIR-V backend on OSx. I will check the latest changes by Monday on my other laptop.

jjfumero · 2024-04-15T09:01:11Z

tornado-api/src/main/java/uk/ac/manchester/tornado/api/internal/annotations/HalfType.java

@@ -0,0 +1,11 @@
+package uk.ac.manchester.tornado.api.internal.annotations;


Add License Header

jjfumero · 2024-04-15T09:03:18Z

...n/java/uk/ac/manchester/tornado/drivers/spirv/graal/phases/TornadoHalfFloatVectorOffset.java

+            LeftShiftNode leftShiftNode = index.inputs().filter(LeftShiftNode.class).first();
+            ConstantNode currentOffset = leftShiftNode.inputs().filter(ConstantNode.class).first();
+            // if the shifting is by 3 (for float values)
+            if (currentOffset.getValue().toValueString().equals("3")) {


Why the shift is by 3 for a float value? Can we generalize this?

The comment above is wrong, it's not because of float types, it's because the JavaKind for half is Object (8 bytes). This was done because otherwise we were having issues with the stamp. I will update the comment to reflect that.

jjfumero

Minor comments

mikepapadim

LGTM

Improvements ~~~~~~~~~~~~~~~~~~ - [beehive-lab#369](beehive-lab#369): Introduction of Tensor types in TornadoVM API and interoperability with ONNX Runtime. - [beehive-lab#370](beehive-lab#370): Array concatenation operation for TornadoVM native arrays. - [beehive-lab#371](beehive-lab#371): TornadoVM installer script ported for Windows 10/11. - [beehive-lab#372](beehive-lab#372): Add support for ``HalfFloat`` (``Float16``) in vector types. - [beehive-lab#374](beehive-lab#374): Support for TornadoVM array concatenations from the constructor-level. - [beehive-lab#375](beehive-lab#375): Support for TornadoVM native arrays using slices from the Panama API. - [beehive-lab#376](beehive-lab#376): Support for lazy copy-outs in the batch processing mode. - [beehive-lab#377](beehive-lab#377): Expand the TornadoVM profiler with power metrics for NVIDIA GPUs (OpenCL and PTX backends). - [beehive-lab#384](beehive-lab#384): Auto-closable Execution Plans for automatic memory management. Compatibility ~~~~~~~~~~~~~~~~~~ - [beehive-lab#386](beehive-lab#386): OpenJDK 17 support removed. - [beehive-lab#390](beehive-lab#390): SapMachine OpenJDK 21 supported. - [beehive-lab#395](beehive-lab#395): OpenJDK 22 and GraalVM 22.0.1 supported. - TornadoVM tested with Apple M3 chips. Bug Fixes ~~~~~~~~~~~~~~~~~~ - [beehive-lab#367](beehive-lab#367): Fix for Graal/Truffle languages in which some Java modules were not visible. - [beehive-lab#373](beehive-lab#373): Fix for data copies of the ``HalfFloat`` types for all backends. - [beehive-lab#378](beehive-lab#378): Fix free memory markers when running multi-thread execution plans. - [beehive-lab#379](beehive-lab#379): Refactoring package of vector api unit-tests. - [beehive-lab#380](beehive-lab#380): Fix event list sizes to accommodate profiling of large applications. - [beehive-lab#385](beehive-lab#385): Fix code check style. - [beehive-lab#387](beehive-lab#387): Fix TornadoVM internal events in OpenCL, SPIR-V and PTX for running multi-threaded execution plans. - [beehive-lab#388](beehive-lab#388): Fix of expected and actual values of tests. - [beehive-lab#392](beehive-lab#392): Fix installer for using existing JDKs. - [beehive-lab#389](beehive-lab#389): Fix ``DataObjectState`` for multi-thread execution plans. - [beehive-lab#396](beehive-lab#396): Fix JNI code for the CUDA NVML library access with OpenCL.

mairooni added 30 commits January 31, 2024 17:45

[WIP] Support for half2 and vectorhalf types

c4d63ed

Merge branch 'develop' into feat/vectorfloat16

e137cbd

[WIP] Support for Half3 and Half4

4311426

Merge branch 'develop' into feat/vectorfloat16

89e0a00

Add initial support for VectorHalf2 and create unittest to track unim…

fd98c2f

…plemented functionality

Initial support for Half8 and Half16

d625b2b

Merge branch 'develop' into feat/vectorfloat16

825e80e

Merge branch 'develop' into feat/vectorfloat16

7151a92

Initial support for VectorHalf3

5c050d9

Initial support for VectorHalf4

f0bad22

Initial support for VectorHalf8

addf721

Initial support for VectorHalf16

3f423be

Rename the unittest

ba656c8

Minor fix in javadoc

fe33326

Support private half vectors and change ocl type of half from short t…

e8faf81

…o object

Support cases where a new half vector is created with fields being re…

3e2c630

…sult of an operation or a copy of the fields of an existing vector

Merge branch 'develop' into feat/vectorfloat16

27119f0

Quick fix for the dot product test

97227d6

Refactor to remove repeating code

2c0e03a

Include missing license headers

9163b08

Throw exception for vectors of half floats if devices does not suppor…

4df1d44

…t fp16

Merge branch 'develop' into feat/vectorfloat16

2850096

Include unittest for vector half floats

1d20689

Insert node that indicates that offsets need to be changed only for h…

f5daccb

…alf float vectors

Support half vector operations for PTX

b613c13

Merge branch 'develop' into feat/vectorfloat16

e2764d5

If field of half vector is not set, initialize it with zero and check…

796bf6b

… for each loadindexedvector if is it for a half float vector instead of assuming that all of them are if one is

Support half float vectors for SPIRV

dca19f2

Merge branch 'develop' into feat/vectorfloat16

3dbf9ad

fix checkstyle violation

d835453

mairooni requested review from jjfumero and mikepapadim April 8, 2024 09:29

mairooni self-assigned this Apr 8, 2024

Generate float operators for half float vectors on the SPIR-V backend

86d24da

jjfumero reviewed Apr 9, 2024

View reviewed changes

mairooni added 4 commits April 9, 2024 16:29

Refactor the dot function to operate on half floats, add unittest and…

70dc2bc

… apply minor compiler fixes for the new unittest

Merge branch 'develop' into feat/vectorfloat16

654a209

fix checkstyle

e8726dd

Add javadocs, use annotation to identify half types during compilatio…

2ae6e98

…n and remove div function

jjfumero reviewed Apr 11, 2024

View reviewed changes

Merge branch 'develop' into feat/vectorfloat16

8399d81

jjfumero reviewed Apr 15, 2024

View reviewed changes

jjfumero requested changes Apr 15, 2024

View reviewed changes

Include license header and update comments

d701399

jjfumero approved these changes Apr 15, 2024

View reviewed changes

mikepapadim approved these changes Apr 15, 2024

View reviewed changes

jjfumero merged commit ac476de into beehive-lab:develop Apr 15, 2024
2 checks passed

mairooni deleted the feat/vectorfloat16 branch April 16, 2024 08:04

jjfumero mentioned this pull request Apr 30, 2024

[release] TornadoVM v1.0.4 #398

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support vectors of float-16 values #372

Support vectors of float-16 values #372

mairooni commented Apr 8, 2024

jjfumero commented Apr 8, 2024

jjfumero commented Apr 8, 2024

mairooni commented Apr 8, 2024

mairooni commented Apr 8, 2024

jjfumero commented Apr 8, 2024

jjfumero commented Apr 8, 2024 •

edited

Loading

jjfumero commented Apr 8, 2024

jjfumero commented Apr 8, 2024 •

edited

Loading

jjfumero commented Apr 8, 2024

jjfumero Apr 9, 2024

mairooni Apr 9, 2024

jjfumero Apr 9, 2024

jjfumero Apr 9, 2024

mairooni Apr 9, 2024

jjfumero Apr 9, 2024

jjfumero left a comment

jjfumero Apr 15, 2024

jjfumero Apr 15, 2024

mairooni Apr 15, 2024

jjfumero left a comment

mikepapadim left a comment


		public final class VectorHalf implements TornadoCollectionInterface<ShortBuffer> {

		private static final int ELEMENT_SIZE = 1;


		public static final Class<VectorHalf16> TYPE = VectorHalf16.class;

		private static final int ELEMENT_SIZE = 16;

		@@ -0,0 +1,11 @@
		package uk.ac.manchester.tornado.api.internal.annotations;

Support vectors of float-16 values #372

Support vectors of float-16 values #372

Conversation

mairooni commented Apr 8, 2024

Description

OS tested

Did you check on FPGAs?

How to test the new patch?

jjfumero commented Apr 8, 2024

jjfumero commented Apr 8, 2024

mairooni commented Apr 8, 2024

mairooni commented Apr 8, 2024

jjfumero commented Apr 8, 2024

jjfumero commented Apr 8, 2024 • edited Loading

jjfumero commented Apr 8, 2024

jjfumero commented Apr 8, 2024 • edited Loading

jjfumero commented Apr 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjfumero left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjfumero left a comment

Choose a reason for hiding this comment

mikepapadim left a comment

Choose a reason for hiding this comment

jjfumero commented Apr 8, 2024 •

edited

Loading

jjfumero commented Apr 8, 2024 •

edited

Loading