Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark updates for faster run and more reliable results #2164

Merged
merged 57 commits into from
Oct 15, 2024

Conversation

mateuszpn
Copy link
Contributor

No description provided.

Copy link

github-actions bot commented Oct 2, 2024

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/11144890890

Copy link

github-actions bot commented Oct 2, 2024

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/11144890890
Job status: failure. Test status: skipped.

Copy link

github-actions bot commented Oct 2, 2024

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/11144890890

Copy link

github-actions bot commented Oct 2, 2024

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/11144890890
Job status: failure. Test status: skipped.

Copy link

github-actions bot commented Oct 9, 2024

Compute Benchmarks level_zero run (with params: --save baseline):
https://github.com/oneapi-src/unified-runtime/actions/runs/11258108066

Copy link

github-actions bot commented Oct 9, 2024

Compute Benchmarks level_zero run (--save baseline):
https://github.com/oneapi-src/unified-runtime/actions/runs/11258108066
Job status: success. Test status: success.

Summary

No diffs to calculate performance change

(result is better)

Performance change in benchmark groups

Relative perf in group api (6): cannot calculate
Benchmark This PR Relative perf Change -
api_overhead_benchmark_sycl SubmitKernel out of order 25.633000 μs
api_overhead_benchmark_sycl SubmitKernel in order 22.919000 μs
api_overhead_benchmark_ur SubmitKernel out of order 14.052000 μs
api_overhead_benchmark_ur SubmitKernel in order 13.427000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.118000 μs
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.665000 μs
Relative perf in group memory (4): cannot calculate
Benchmark This PR Relative perf Change -
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 226.961000 μs
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 113.775000 μs
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.885000 μs
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.232000 μs
Relative perf in group miscellaneous (1): cannot calculate
Benchmark This PR Relative perf Change -
miscellaneous_benchmark_sycl VectorSum 863.445000 μs
Relative perf in group Velocity-Bench (5): cannot calculate
Benchmark This PR Relative perf Change -
Velocity-Bench Hashtable 358.493626 M keys/sec
Velocity-Bench Bitcracker 35.366400 s
Velocity-Bench Easywave 241.000000 ms
Velocity-Bench QuickSilver 117.990000 MMS/CTT
Velocity-Bench Sobel Filter 551.561000 ms
Relative perf in group Runtime (8): cannot calculate
Benchmark This PR Relative perf Change -
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor 278.081000 ms
Runtime_IndependentDAGTaskThroughput_SingleTask 262.259000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor 273.398000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor 280.169000 ms
Runtime_DAGTaskThroughput_BasicParallelFor 1886.208000 ms
Runtime_DAGTaskThroughput_SingleTask 1805.332000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor 1837.867000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor 1857.392000 ms
Relative perf in group MicroBench (14): cannot calculate
Benchmark This PR Relative perf Change -
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous 5.104000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided 4.880000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided 5.049000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous 618.329000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided 5.070000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous 5.084000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous 4.989000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided 5.164000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided 617.666000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous 4.889000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided 617.720000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous 618.279000 ms
MicroBench_LocalMem_int32_4096 30.361000 ms
MicroBench_LocalMem_fp32_4096 30.380000 ms
Relative perf in group Pattern (10): cannot calculate
Benchmark This PR Relative perf Change -
Pattern_Reduction_Hierarchical_int32 16.827000 ms
Pattern_Reduction_NDRange_int32 16.746000 ms
Pattern_SegmentedReduction_Hierarchical_int64 12.259000 ms
Pattern_SegmentedReduction_NDRange_int16 6.078000 ms
Pattern_SegmentedReduction_NDRange_int32 5.715000 ms
Pattern_SegmentedReduction_NDRange_int64 6.196000 ms
Pattern_SegmentedReduction_Hierarchical_fp32 12.057000 ms
Pattern_SegmentedReduction_Hierarchical_int16 12.219000 ms
Pattern_SegmentedReduction_Hierarchical_int32 12.060000 ms
Pattern_SegmentedReduction_NDRange_fp32 5.715000 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR Relative perf Change -
ScalarProduct_Hierarchical_int32 10.643000 ms
ScalarProduct_NDRange_fp32 6.331000 ms
ScalarProduct_NDRange_int64 8.234000 ms
ScalarProduct_Hierarchical_fp32 10.235000 ms
ScalarProduct_NDRange_int32 6.363000 ms
ScalarProduct_Hierarchical_int64 11.592000 ms
Relative perf in group USM (7): cannot calculate
Benchmark This PR Relative perf Change -
USM_Allocation_latency_fp32_host 37.417000 ms
USM_Allocation_latency_fp32_device 0.143000 ms
USM_Allocation_latency_fp32_shared 0.132000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch 1.049000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch 1.812000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch 1.225000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch 1.644000 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR Relative perf Change -
VectorAddition_int64 3.075000 ms
VectorAddition_fp32 1.448000 ms
VectorAddition_int32 1.449000 ms
Relative perf in group Polybench (3): cannot calculate
Benchmark This PR Relative perf Change -
Polybench_2mm 1.219000 ms
Polybench_3mm 1.726000 ms
Polybench_Atax 6.851000 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR Relative perf Change -
Kmeans_fp32 16.166000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR Relative perf Change -
LinearRegressionCoeff_fp32 966.679000 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR Relative perf Change -
MolecularDynamics 0.026000 ms

Details

Benchmark details - environment, command, output...
api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),25.633,26.014,6.14%,22.013,383.928,[CPU],[us]

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),22.919,22.899,3.21%,22.123,231.962,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),14.052,14.034,1.79%,13.482,47.895,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),13.427,13.410,3.03%,12.612,74.553,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),226.961,226.855,1.11%,223.008,435.901,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),113.775,111.709,14.65%,109.649,297.564,[CPU],[us]

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),5.885,5.604,12.49%,5.219,54.478,[CPU],[us]

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.232,3.243,3.02%,0.547,3.465,[CPU],[GB/s]

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),2.118,2.114,4.41%,1.923,9.527,[CPU],[us]

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),1.665,1.659,6.28%,1.573,26.772,[CPU],[us]

miscellaneous_benchmark_sycl VectorSum

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256),863.445,863.914,0.46%,813.901,875.333,[GPU],bw [GB/s]

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.374394 s
358.493626 million keys/second

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00453225 s
bitcracker - total time for whole calculation: 35.3664 s

Velocity-Bench Easywave

Environment Variables:

Command:

/home/test-user/bench_workdir/easywave/easyWave_sycl -grid /home/test-user/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/test-user/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.3.30049+10)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.308920e-01 6.121290e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.647040e-01 7.548150e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.612650e-01 7.696590e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.725320e-01 8.283810e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.605430e-01 7.965100e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.605540e-01 7.713770e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.602550e-01 7.662040e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.341580e-01 7.906460e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.352210e-01 7.853490e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.309550e-01 7.600320e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.125e+07 1.125e+07 1.125e+07 0.000e+00 100.00
cycleInit 10 3.611e+06 3.611e+06 3.611e+06 0.000e+00 100.00
cycleTracking 10 7.635e+06 7.635e+06 7.635e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.943e+06 4.943e+06 4.943e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.250e+05 2.250e+05 2.250e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.300e+02 4.300e+02 4.300e+02 0.000e+00 100.00
Figure Of Merit 117.99 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.44817 s
sobelfilter - total time for whole calculation: 0.551561 s

Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.281080', '0.278081', '0.277005', '0.277005 0.277265 0.278081 0.281715 0.291333', '0.006035', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.263062', '0.262259', '0.256281', '0.256281 0.261170 0.262259 0.264539 0.271061', '0.005394', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.273478', '0.273398', '0.272563', '0.272563 0.272861 0.273398 0.274165 0.274403', '0.000799', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.286746', '0.280169', '0.273276', '0.273276 0.280037 0.280169 0.282038 0.318211', '0.017902', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.886241', '1.886208', '1.885184', '1.885184 1.885470 1.886208 1.886407 1.887937', '0.001074', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.802568', '1.805332', '1.792171', '1.792171 1.796691 1.805332 1.808887 1.809757', '0.007777', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.837786', '1.837867', '1.835366', '1.835366 1.836542 1.837867 1.838912 1.840243', '0.001918', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.856254', '1.857392', '1.850624', '1.850624 1.854905 1.857392 1.857783 1.860565', '0.003733', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005113', '0.005104', '0.005042', '0.005042 0.005088 0.005104 0.005120 0.005214', '0.000063', '24.790545', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004856', '0.004880', '0.004666', '0.004666 0.004861 0.004880 0.004917 0.004957', '0.000113', '26.790764', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005063', '0.005049', '0.005004', '0.005004 0.005034 0.005049 0.005077 0.005152', '0.000056', '24.981089', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618315', '0.618329', '0.618238', '0.618238 0.618324 0.618329 0.618337 0.618349', '0.000044', '0.202187', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005055', '0.005070', '0.005005', '0.005005 0.005036 0.005070 0.005070 0.005096', '0.000035', '24.973209', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005072', '0.005084', '0.004974', '0.004974 0.005029 0.005084 0.005128 0.005146', '0.000071', '25.131680', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005010', '0.004989', '0.004980', '0.004980 0.004982 0.004989 0.005046 0.005053', '0.000037', '25.100563', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005172', '0.005164', '0.005134', '0.005134 0.005137 0.005164 0.005171 0.005255', '0.000049', '24.346463', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617677', '0.617666', '0.617628', '0.617628 0.617662 0.617666 0.617686 0.617745', '0.000043', '0.202387', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005242', '0.004889', '0.004760', '0.004760 0.004869 0.004889 0.004899 0.006792', '0.000868', '26.260135', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617711', '0.617720', '0.617675', '0.617675 0.617699 0.617720 0.617722 0.617741', '0.000025', '0.202372', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618313', '0.618279', '0.618248', '0.618248 0.618265 0.618279 0.618373 0.618399', '0.000069', '0.202184', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_LocalMem_int32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_int32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030367', '0.030361', '0.030290', '0.030290 0.030321 0.030361 0.030408 0.030454', '0.000066', '10300.296223', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

MicroBench_LocalMem_fp32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_fp32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030377', '0.030380', '0.030297', '0.030297 0.030379 0.030380 0.030391 0.030441', '0.000052', '10298.100977', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

Pattern_Reduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016805', '0.016827', '0.016700', '0.016700 0.016808 0.016827 0.016841 0.016845', '0.000060', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016699', '0.016746', '0.016463', '0.016463 0.016712 0.016746 0.016770 0.016804', '0.000136', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010625', '0.010643', '0.010558', '0.010558 0.010616 0.010643 0.010646 0.010661', '0.000041', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006333', '0.006331', '0.006306', '0.006306 0.006314 0.006331 0.006341 0.006370', '0.000025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.008248', '0.008234', '0.008213', '0.008213 0.008216 0.008234 0.008243 0.008332', '0.000049', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010237', '0.010235', '0.010195', '0.010195 0.010224 0.010235 0.010265 0.010266', '0.000030', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006358', '0.006363', '0.006318', '0.006318 0.006342 0.006363 0.006376 0.006392', '0.000029', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011578', '0.011592', '0.011482', '0.011482 0.011561 0.011592 0.011599 0.011657', '0.000064', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012259', '0.012259', '0.012237', '0.012237 0.012254 0.012259 0.012261 0.012280', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006081', '0.006078', '0.006068', '0.006068 0.006078 0.006078 0.006080 0.006099', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005716', '0.005715', '0.005706', '0.005706 0.005712 0.005715 0.005723 0.005726', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006194', '0.006196', '0.006186', '0.006186 0.006195 0.006196 0.006196 0.006199', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012056', '0.012057', '0.012047', '0.012047 0.012055 0.012057 0.012058 0.012061', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012220', '0.012219', '0.012208', '0.012208 0.012212 0.012219 0.012224 0.012235', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012061', '0.012060', '0.012054', '0.012054 0.012055 0.012060 0.012061 0.012077', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005715', '0.005715', '0.005699', '0.005699 0.005711 0.005715 0.005722 0.005726', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_host

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_host', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.037385', '0.037417', '0.037169', '0.037169 0.037340 0.037417 0.037474 0.037527', '0.000140', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_device

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_device', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000177', '0.000143', '0.000050', '0.000050 0.000081 0.000143 0.000197 0.000416', '0.000145', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_shared

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_shared', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000122', '0.000132', '0.000074', '0.000074 0.000129 0.000132 0.000136 0.000138', '0.000027', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001051', '0.001049', '0.001042', '0.001042 0.001045 0.001049 0.001054 0.001062', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001827', '0.001812', '0.001801', '0.001801 0.001807 0.001812 0.001827 0.001888', '0.000035', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001227', '0.001225', '0.001220', '0.001220 0.001222 0.001225 0.001226 0.001245', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001965', '0.001644', '0.001638', '0.001638 0.001644 0.001644 0.001661 0.003240', '0.000712', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003081', '0.003075', '0.003060', '0.003060 0.003068 0.003075 0.003084 0.003118', '0.000023', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001459', '0.001448', '0.001436', '0.001436 0.001448 0.001448 0.001462 0.001501', '0.000025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001452', '0.001449', '0.001442', '0.001442 0.001445 0.001449 0.001452 0.001470', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/2mm.csv --size=512

Output:

['Polybench_2mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001220', '0.001219', '0.001216', '0.001216 0.001216 0.001219 0.001223 0.001229', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_3mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/3mm.csv --size=512

Output:

['Polybench_3mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001732', '0.001726', '0.001725', '0.001725 0.001725 0.001726 0.001730 0.001751', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Atax

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Atax.csv --size=8192

Output:

['Polybench_Atax', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.006801', '0.006851', '0.006700', '0.006700 0.006732 0.006851 0.006854 0.006869', '0.000079', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Kmeans_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Kmeans.csv --size=700000000

Output:

['Kmeans_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '700000000', '0.016172', '0.016166', '0.016164', '0.016164 0.016165 0.016166 0.016175 0.016192', '0.000012', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegressionCoeff_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LinearRegressionCoeff.csv --size=1638400000

Output:

['LinearRegressionCoeff_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1638400000', '0.966707', '0.966679', '0.966583', '0.966583 0.966614 0.966679 0.966758 0.966900', '0.000127', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MolecularDynamics

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/MolecularDynamics.csv --size=8196

Output:

['MolecularDynamics', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8196', '0.000034', '0.000026', '0.000026', '0.000026 0.000026 0.000026 0.000029 0.000061', '0.000016', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Copy link

github-actions bot commented Oct 9, 2024

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/11258157647

Copy link

github-actions bot commented Oct 9, 2024

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/11258157647
Job status: success. Test status: success.

Summary

Total 70 benchmarks in mean.
Geomean 99.232%.
Improved 11 Regressed 17 (threshold 0.50%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (6): 94.078%
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.118000 μs 2.118 μs 100.00% 0.00% .
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.667 μs 1.665000 μs 99.88% -0.12% .
api_overhead_benchmark_sycl SubmitKernel out of order 25.747 μs 25.633000 μs 99.56% -0.44% .
api_overhead_benchmark_ur SubmitKernel in order 14.037 μs 13.427000 μs 95.65% -4.35% --
api_overhead_benchmark_sycl SubmitKernel in order 24.952 μs 22.919000 μs 91.85% -8.15% ----
api_overhead_benchmark_ur SubmitKernel out of order 17.708 μs 14.052000 μs 79.35% -20.65% ----------
Relative perf in group memory (4): 96.856%
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.824000 μs 5.885 μs 101.05% 1.05% +
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.218000 μs 3.232 μs 100.44% 0.44% .
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 230.452 μs 226.961000 μs 98.49% -1.51% -
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 129.218 μs 113.775000 μs 88.05% -11.95% ------
Relative perf in group miscellaneous (1): 100.759%
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum 856.939000 μs 863.445 μs 100.76% 0.76% .
Relative perf in group Velocity-Bench (6): 99.565%
Benchmark This PR baseline Relative perf Change -
Velocity-Bench QuickSilver 117.850 MMS/CTT 117.990000 MMS/CTT 99.88% -0.12% .
Velocity-Bench Hashtable 357.463 M keys/sec 358.493626 M keys/sec 99.71% -0.29% .
Velocity-Bench Sobel Filter 554.014 ms 551.561000 ms 99.56% -0.44% .
Velocity-Bench Bitcracker 35.545 s 35.366400 s 99.50% -0.50% .
Velocity-Bench Easywave 243.000 ms 241.000000 ms 99.18% -0.82% .
Velocity-Bench CudaSift 221.990000 ms -
Relative perf in group Runtime (8): 100.146%
Benchmark This PR baseline Relative perf Change -
Runtime_IndependentDAGTaskThroughput_BasicParallelFor 277.216000 ms 280.169 ms 101.07% 1.07% +
Runtime_IndependentDAGTaskThroughput_SingleTask 260.403000 ms 262.259 ms 100.71% 0.71% .
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor 277.150000 ms 278.081 ms 100.34% 0.34% .
Runtime_DAGTaskThroughput_HierarchicalParallelFor 1854.591000 ms 1857.392 ms 100.15% 0.15% .
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor 273.247000 ms 273.398 ms 100.06% 0.06% .
Runtime_DAGTaskThroughput_BasicParallelFor 1892.957 ms 1886.208000 ms 99.64% -0.36% .
Runtime_DAGTaskThroughput_SingleTask 1812.295 ms 1805.332000 ms 99.62% -0.38% .
Runtime_DAGTaskThroughput_NDRangeParallelFor 1845.340 ms 1837.867000 ms 99.60% -0.40% .
Relative perf in group MicroBench (14): 100.034%
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_3D_H2D_Strided 5.015000 ms 5.070 ms 101.10% 1.10% +
MicroBench_HostDeviceBandwidth_1D_H2D_Strided 4.853000 ms 4.880 ms 100.56% 0.56% .
MicroBench_HostDeviceBandwidth_2D_H2D_Strided 5.029000 ms 5.049 ms 100.40% 0.40% .
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous 4.982000 ms 4.989 ms 100.14% 0.14% .
MicroBench_LocalMem_fp32_4096 30.365000 ms 30.380 ms 100.05% 0.05% .
MicroBench_LocalMem_int32_4096 30.354000 ms 30.361 ms 100.02% 0.02% .
MicroBench_HostDeviceBandwidth_2D_D2H_Strided 617.706000 ms 617.720 ms 100.00% 0.00% .
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous 618.374 ms 618.329000 ms 99.99% -0.01% .
MicroBench_HostDeviceBandwidth_3D_D2H_Strided 617.747 ms 617.666000 ms 99.99% -0.01% .
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous 618.388 ms 618.279000 ms 99.98% -0.02% .
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous 4.896 ms 4.889000 ms 99.86% -0.14% .
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous 5.113 ms 5.104000 ms 99.82% -0.18% .
MicroBench_HostDeviceBandwidth_1D_D2H_Strided 5.198 ms 5.164000 ms 99.35% -0.65% .
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous 5.123 ms 5.084000 ms 99.24% -0.76% .
Relative perf in group Pattern (10): 99.645%
Benchmark This PR baseline Relative perf Change -
Pattern_SegmentedReduction_NDRange_int64 6.188000 ms 6.196 ms 100.13% 0.13% .
Pattern_SegmentedReduction_Hierarchical_fp32 12.054000 ms 12.057 ms 100.02% 0.02% .
Pattern_SegmentedReduction_Hierarchical_int32 12.057000 ms 12.060 ms 100.02% 0.02% .
Pattern_SegmentedReduction_Hierarchical_int16 12.219000 ms 12.219 ms 100.00% 0.00% .
Pattern_SegmentedReduction_Hierarchical_int64 12.260 ms 12.259000 ms 99.99% -0.01% .
Pattern_SegmentedReduction_NDRange_int16 6.079 ms 6.078000 ms 99.98% -0.02% .
Pattern_SegmentedReduction_NDRange_fp32 5.717 ms 5.715000 ms 99.97% -0.03% .
Pattern_SegmentedReduction_NDRange_int32 5.717 ms 5.715000 ms 99.97% -0.03% .
Pattern_Reduction_Hierarchical_int32 17.100 ms 16.827000 ms 98.40% -1.60% -
Pattern_Reduction_NDRange_int32 17.089 ms 16.746000 ms 97.99% -2.01% -
Relative perf in group ScalarProduct (6): 100.175%
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 6.327000 ms 6.363 ms 100.57% 0.57% .
ScalarProduct_Hierarchical_int32 10.593000 ms 10.643 ms 100.47% 0.47% .
ScalarProduct_NDRange_fp32 6.323000 ms 6.331 ms 100.13% 0.13% .
ScalarProduct_Hierarchical_int64 11.587000 ms 11.592 ms 100.04% 0.04% .
ScalarProduct_NDRange_int64 8.231000 ms 8.234 ms 100.04% 0.04% .
ScalarProduct_Hierarchical_fp32 10.255 ms 10.235000 ms 99.80% -0.20% .
Relative perf in group USM (7): 99.870%
Benchmark This PR baseline Relative perf Change -
USM_Allocation_latency_fp32_device 0.136000 ms 0.143 ms 105.15% 5.15% ++
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch 1.215000 ms 1.225 ms 100.82% 0.82% .
USM_Allocation_latency_fp32_host 37.330000 ms 37.417 ms 100.23% 0.23% .
USM_Allocation_latency_fp32_shared 0.133 ms 0.132000 ms 99.25% -0.75% .
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch 1.839 ms 1.812000 ms 98.53% -1.47% -
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch 1.065 ms 1.049000 ms 98.50% -1.50% -
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch 1.698 ms 1.644000 ms 96.82% -3.18% --
Relative perf in group VectorAddition (3): 98.232%
Benchmark This PR baseline Relative perf Change -
VectorAddition_int64 3.072000 ms 3.075 ms 100.10% 0.10% .
VectorAddition_int32 1.487 ms 1.449000 ms 97.44% -2.56% -
VectorAddition_fp32 1.490 ms 1.448000 ms 97.18% -2.82% -
Relative perf in group Polybench (3): 100.017%
Benchmark This PR baseline Relative perf Change -
Polybench_Atax 6.814000 ms 6.851 ms 100.54% 0.54% .
Polybench_3mm 1.726000 ms 1.726 ms 100.00% 0.00% .
Polybench_2mm 1.225 ms 1.219000 ms 99.51% -0.49% .
Relative perf in group Kmeans (1): 100.000%
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 16.166000 ms 16.166 ms 100.00% 0.00% .
Relative perf in group LinearRegressionCoeff (1): 99.991%
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 966.763 ms 966.679000 ms 99.99% -0.01% .
Relative perf in group MolecularDynamics (1): 104.000%
Benchmark This PR baseline Relative perf Change -
MolecularDynamics 0.025000 ms 0.026 ms 104.00% 4.00% ++

Details

Benchmark details - environment, command, output...
api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),25.747,26.142,5.99%,22.381,370.828,[CPU],[us]

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),24.952,25.081,2.52%,21.910,55.090,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),17.708,17.933,6.56%,13.930,265.724,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),14.037,13.963,3.68%,13.107,93.179,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),230.452,230.348,1.13%,226.610,457.332,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),129.218,118.672,28.47%,114.346,304.381,[CPU],[us]

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),5.824,5.488,13.41%,5.050,40.359,[CPU],[us]

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.218,3.227,3.13%,0.548,3.500,[CPU],[GB/s]

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),2.118,2.113,8.68%,1.926,51.485,[CPU],[us]

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),1.667,1.662,3.67%,1.562,6.200,[CPU],[us]

miscellaneous_benchmark_sycl VectorSum

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256),856.939,857.439,0.43%,818.401,869.586,[GPU],bw [GB/s]

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.375473 s
357.462876 million keys/second

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00450016 s
bitcracker - total time for whole calculation: 35.5449 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Output:

UNKN:

UNKN: ==================================================
UNKN: User input parameters:
UNKN: Trace: ../../inputData
UNKN: ==================================================
UNKN:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1168 1272 31.7133% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1120 1253 30.41% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1257 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1258 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1267 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1272 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1271 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1095 1251 29.7312% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1261 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1256 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1040 1258 28.2379% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1124 1264 30.5186% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1267 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1090 1260 29.5954% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1121 1270 30.4371% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1239 1273 33.6411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1089 1264 29.5683% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1270 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1257 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1112 1260 30.1928% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1130 1268 30.6815% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1008 1253 27.369% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1263 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1266 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1255 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1221 1264 33.1523% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1261 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1098 1261 29.8127% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1210 1255 32.8537% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1087 1258 29.514% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1258 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1260 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1200 1251 32.5821% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1094 1258 29.704% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1068 1258 28.9981% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1262 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1256 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1101 1255 29.8941% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1268 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1064 1260 28.8895% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1201 1271 32.6093% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1211 1262 32.8808% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1151 1270 31.2517% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1143 1256 31.0345% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1268 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1117 1264 30.3285% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1152 1257 31.2788% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1263 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1265 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1258 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 221.99 ms

Velocity-Bench Easywave

Environment Variables:

Command:

/home/test-user/bench_workdir/easywave/easyWave_sycl -grid /home/test-user/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/test-user/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.3.30049+10)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.378050e-01 6.169180e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.690640e-01 7.553790e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.645710e-01 7.733100e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.713190e-01 8.297230e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.642790e-01 7.967770e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.646240e-01 7.713840e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.643480e-01 7.649790e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.332100e-01 7.893380e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.326190e-01 7.848430e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.315770e-01 7.618580e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.128e+07 1.128e+07 1.128e+07 0.000e+00 100.00
cycleInit 10 3.633e+06 3.633e+06 3.633e+06 0.000e+00 100.00
cycleTracking 10 7.645e+06 7.645e+06 7.645e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.938e+06 4.938e+06 4.938e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.283e+05 2.283e+05 2.283e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.270e+02 4.270e+02 4.270e+02 0.000e+00 100.00
Figure Of Merit 117.85 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.48189 s
sobelfilter - total time for whole calculation: 0.554014 s

Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.279092', '0.277150', '0.275581', '0.275581 0.276837 0.277150 0.277569 0.288323', '0.005213', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.274638', '0.273247', '0.272889', '0.272889 0.273193 0.273247 0.274061 0.279800', '0.002918', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.283881', '0.277216', '0.271103', '0.271103 0.273921 0.277216 0.284601 0.312567', '0.016810', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.263276', '0.260403', '0.255712', '0.255712 0.256254 0.260403 0.262063 0.281949', '0.010781', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.894039', '1.892957', '1.887596', '1.887596 1.892474 1.892957 1.898459 1.898708', '0.004649', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.809558', '1.812295', '1.801447', '1.801447 1.804001 1.812295 1.813560 1.816485', '0.006484', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.845699', '1.845340', '1.844042', '1.844042 1.844710 1.845340 1.846630 1.847773', '0.001502', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.853896', '1.854591', '1.846805', '1.846805 1.853455 1.854591 1.856325 1.858304', '0.004367', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_HostDeviceBandwidth_1D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005191', '0.005198', '0.005132', '0.005132 0.005183 0.005198 0.005211 0.005229', '0.000037', '24.357204', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617709', '0.617706', '0.617689', '0.617689 0.617705 0.617706 0.617709 0.617734', '0.000016', '0.202367', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004786', '0.004853', '0.004354', '0.004354 0.004837 0.004853 0.004924 0.004964', '0.000247', '28.711824', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004980', '0.004982', '0.004894', '0.004894 0.004976 0.004982 0.005003 0.005045', '0.000055', '25.541594', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005161', '0.005113', '0.004998', '0.004998 0.005037 0.005113 0.005320 0.005334', '0.000157', '25.007923', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005017', '0.005015', '0.004995', '0.004995 0.004995 0.005015 0.005020 0.005058', '0.000026', '25.023793', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004956', '0.005029', '0.004658', '0.004658 0.004969 0.005029 0.005042 0.005081', '0.000171', '26.833783', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617714', '0.617747', '0.617611', '0.617611 0.617669 0.617747 0.617751 0.617791', '0.000073', '0.202393', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005293', '0.004896', '0.004873', '0.004873 0.004878 0.004896 0.004946 0.006873', '0.000883', '25.648949', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618378', '0.618374', '0.618335', '0.618335 0.618349 0.618374 0.618407 0.618426', '0.000038', '0.202156', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618410', '0.618388', '0.618335', '0.618335 0.618385 0.618388 0.618444 0.618499', '0.000063', '0.202156', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005150', '0.005123', '0.005113', '0.005113 0.005116 0.005123 0.005157 0.005243', '0.000055', '24.446392', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_LocalMem_fp32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_fp32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030381', '0.030365', '0.030358', '0.030358 0.030359 0.030365 0.030393 0.030433', '0.000032', '10277.420520', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

MicroBench_LocalMem_int32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_int32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030360', '0.030354', '0.030291', '0.030291 0.030333 0.030354 0.030397 0.030425', '0.000053', '10300.029969', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

Pattern_Reduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.017103', '0.017089', '0.017041', '0.017041 0.017049 0.017089 0.017091 0.017247', '0.000083', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.017103', '0.017100', '0.017069', '0.017069 0.017097 0.017100 0.017112 0.017137', '0.000025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011589', '0.011587', '0.011511', '0.011511 0.011561 0.011587 0.011611 0.011675', '0.000061', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.008228', '0.008231', '0.008212', '0.008212 0.008222 0.008231 0.008234 0.008239', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006320', '0.006323', '0.006299', '0.006299 0.006314 0.006323 0.006328 0.006336', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006340', '0.006327', '0.006314', '0.006314 0.006315 0.006327 0.006336 0.006408', '0.000039', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010258', '0.010255', '0.010232', '0.010232 0.010250 0.010255 0.010266 0.010286', '0.000020', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010591', '0.010593', '0.010540', '0.010540 0.010587 0.010593 0.010616 0.010619', '0.000032', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012265', '0.012260', '0.012244', '0.012244 0.012258 0.012260 0.012262 0.012298', '0.000020', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006084', '0.006079', '0.006070', '0.006070 0.006077 0.006079 0.006094 0.006101', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005717', '0.005717', '0.005713', '0.005713 0.005714 0.005717 0.005722 0.005722', '0.000004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012058', '0.012054', '0.012046', '0.012046 0.012050 0.012054 0.012059 0.012084', '0.000015', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012222', '0.012219', '0.012201', '0.012201 0.012213 0.012219 0.012226 0.012251', '0.000019', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006188', '0.006188', '0.006181', '0.006181 0.006185 0.006188 0.006189 0.006198', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005717', '0.005717', '0.005711', '0.005711 0.005714 0.005717 0.005721 0.005722', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012060', '0.012057', '0.012053', '0.012053 0.012056 0.012057 0.012059 0.012075', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_host

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_host', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.037277', '0.037330', '0.037063', '0.037063 0.037245 0.037330 0.037347 0.037401', '0.000132', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_shared

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_shared', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000125', '0.000133', '0.000073', '0.000073 0.000125 0.000133 0.000144 0.000148', '0.000030', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_device

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_device', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000128', '0.000136', '0.000050', '0.000050 0.000080 0.000136 0.000179 0.000192', '0.000062', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001851', '0.001839', '0.001832', '0.001832 0.001832 0.001839 0.001839 0.001913', '0.000035', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.002021', '0.001698', '0.001691', '0.001691 0.001694 0.001698 0.001715 0.003306', '0.000718', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001217', '0.001215', '0.001209', '0.001209 0.001212 0.001215 0.001219 0.001229', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001069', '0.001065', '0.001061', '0.001061 0.001063 0.001065 0.001067 0.001087', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001484', '0.001490', '0.001446', '0.001446 0.001447 0.001490 0.001491 0.001545', '0.000041', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001479', '0.001487', '0.001439', '0.001439 0.001446 0.001487 0.001507 0.001516', '0.000035', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003075', '0.003072', '0.003056', '0.003056 0.003072 0.003072 0.003084 0.003093', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/2mm.csv --size=512

Output:

['Polybench_2mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001228', '0.001225', '0.001218', '0.001218 0.001224 0.001225 0.001228 0.001243', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_3mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/3mm.csv --size=512

Output:

['Polybench_3mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001730', '0.001726', '0.001718', '0.001718 0.001725 0.001726 0.001734 0.001748', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Atax

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Atax.csv --size=8192

Output:

['Polybench_Atax', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.006799', '0.006814', '0.006707', '0.006707 0.006710 0.006814 0.006877 0.006886', '0.000087', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Kmeans_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Kmeans.csv --size=700000000

Output:

['Kmeans_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '700000000', '0.016169', '0.016166', '0.016156', '0.016156 0.016166 0.016166 0.016175 0.016184', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegressionCoeff_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LinearRegressionCoeff.csv --size=1638400000

Output:

['LinearRegressionCoeff_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1638400000', '0.966749', '0.966763', '0.966618', '0.966618 0.966721 0.966763 0.966772 0.966872', '0.000092', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MolecularDynamics

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/MolecularDynamics.csv --size=8196

Output:

['MolecularDynamics', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8196', '0.000032', '0.000025', '0.000024', '0.000024 0.000025 0.000025 0.000029 0.000056', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Copy link

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/11270589852

Copy link

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/11270589852
Job status: success. Test status: success.

Summary

Total 70 benchmarks in mean.
Geomean 99.698%.
Improved 19 Regressed 19 (threshold 0.50%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (6): 96.270%
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_sycl SubmitKernel out of order 23.216000 μs 25.633 μs 110.41% 10.41% +++++++
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.647000 μs 1.665 μs 101.09% 1.09% +
api_overhead_benchmark_ur SubmitKernel out of order 14.315 μs 14.052000 μs 98.16% -1.84% -
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.192 μs 2.118000 μs 96.62% -3.38% --
api_overhead_benchmark_sycl SubmitKernel in order 25.701 μs 22.919000 μs 89.18% -10.82% -------
api_overhead_benchmark_ur SubmitKernel in order 15.924 μs 13.427000 μs 84.32% -15.68% ----------
Relative perf in group memory (4): 99.734%
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.814000 μs 5.885 μs 101.22% 1.22% +
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.201000 μs 3.232 μs 100.97% 0.97% +
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 225.309000 μs 226.961 μs 100.73% 0.73% .
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 118.389 μs 113.775000 μs 96.10% -3.90% --
Relative perf in group miscellaneous (1): 106.935%
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum 807.445000 μs 863.445 μs 106.94% 6.94% ++++
Relative perf in group Velocity-Bench (6): 100.213%
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Hashtable 361.850809 M keys/sec 358.494 M keys/sec 100.94% 0.94% +
Velocity-Bench Sobel Filter 548.075000 ms 551.561 ms 100.64% 0.64% .
Velocity-Bench QuickSilver 118.160000 MMS/CTT 117.990 MMS/CTT 100.14% 0.14% .
Velocity-Bench Bitcracker 35.448 s 35.366400 s 99.77% -0.23% .
Velocity-Bench Easywave 242.000 ms 241.000000 ms 99.59% -0.41% .
Velocity-Bench CudaSift 222.500000 ms -
Relative perf in group Runtime (8): 103.199%
Benchmark This PR baseline Relative perf Change -
Runtime_DAGTaskThroughput_NDRangeParallelFor 1684.806000 ms 1837.867 ms 109.08% 9.08% ++++++
Runtime_DAGTaskThroughput_BasicParallelFor 1729.729000 ms 1886.208 ms 109.05% 9.05% ++++++
Runtime_DAGTaskThroughput_HierarchicalParallelFor 1708.434000 ms 1857.392 ms 108.72% 8.72% ++++++
Runtime_DAGTaskThroughput_SingleTask 1671.292000 ms 1805.332 ms 108.02% 8.02% +++++
Runtime_IndependentDAGTaskThroughput_BasicParallelFor 277.824000 ms 280.169 ms 100.84% 0.84% +
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor 280.959 ms 278.081000 ms 98.98% -1.02% -
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor 278.319 ms 273.398000 ms 98.23% -1.77% -
Runtime_IndependentDAGTaskThroughput_SingleTask 279.223 ms 262.259000 ms 93.92% -6.08% ----
Relative perf in group MicroBench (14): 100.151%
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous 4.999000 ms 5.104 ms 102.10% 2.10% +
MicroBench_HostDeviceBandwidth_3D_H2D_Strided 5.014000 ms 5.070 ms 101.12% 1.12% +
MicroBench_HostDeviceBandwidth_1D_H2D_Strided 4.830000 ms 4.880 ms 101.04% 1.04% +
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous 5.064000 ms 5.084 ms 100.39% 0.39% .
MicroBench_HostDeviceBandwidth_2D_H2D_Strided 5.036000 ms 5.049 ms 100.26% 0.26% .
MicroBench_HostDeviceBandwidth_1D_D2H_Strided 5.155000 ms 5.164 ms 100.17% 0.17% .
MicroBench_LocalMem_fp32_4096 30.359000 ms 30.380 ms 100.07% 0.07% .
MicroBench_HostDeviceBandwidth_2D_D2H_Strided 617.689000 ms 617.720 ms 100.01% 0.01% .
MicroBench_HostDeviceBandwidth_3D_D2H_Strided 617.665000 ms 617.666 ms 100.00% 0.00% .
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous 618.382 ms 618.329000 ms 99.99% -0.01% .
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous 618.341 ms 618.279000 ms 99.99% -0.01% .
MicroBench_LocalMem_int32_4096 30.379 ms 30.361000 ms 99.94% -0.06% .
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous 5.058 ms 4.989000 ms 98.64% -1.36% -
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous 4.966 ms 4.889000 ms 98.45% -1.55% -
Relative perf in group Pattern (10): 100.085%
Benchmark This PR baseline Relative perf Change -
Pattern_Reduction_Hierarchical_int32 16.734000 ms 16.827 ms 100.56% 0.56% .
Pattern_Reduction_NDRange_int32 16.704000 ms 16.746 ms 100.25% 0.25% .
Pattern_SegmentedReduction_NDRange_int64 6.187000 ms 6.196 ms 100.15% 0.15% .
Pattern_SegmentedReduction_Hierarchical_int64 12.252000 ms 12.259 ms 100.06% 0.06% .
Pattern_SegmentedReduction_Hierarchical_int32 12.056000 ms 12.060 ms 100.03% 0.03% .
Pattern_SegmentedReduction_Hierarchical_fp32 12.054000 ms 12.057 ms 100.02% 0.02% .
Pattern_SegmentedReduction_Hierarchical_int16 12.216000 ms 12.219 ms 100.02% 0.02% .
Pattern_SegmentedReduction_NDRange_fp32 5.716 ms 5.715000 ms 99.98% -0.02% .
Pattern_SegmentedReduction_NDRange_int16 6.083 ms 6.078000 ms 99.92% -0.08% .
Pattern_SegmentedReduction_NDRange_int32 5.723 ms 5.715000 ms 99.86% -0.14% .
Relative perf in group ScalarProduct (6): 100.284%
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 6.316000 ms 6.363 ms 100.74% 0.74% .
ScalarProduct_Hierarchical_int32 10.593000 ms 10.643 ms 100.47% 0.47% .
ScalarProduct_Hierarchical_int64 11.565000 ms 11.592 ms 100.23% 0.23% .
ScalarProduct_NDRange_fp32 6.320000 ms 6.331 ms 100.17% 0.17% .
ScalarProduct_NDRange_int64 8.225000 ms 8.234 ms 100.11% 0.11% .
ScalarProduct_Hierarchical_fp32 10.238 ms 10.235000 ms 99.97% -0.03% .
Relative perf in group USM (7): 97.754%
Benchmark This PR baseline Relative perf Change -
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch 1.208000 ms 1.225 ms 101.41% 1.41% +
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch 1.044000 ms 1.049 ms 100.48% 0.48% .
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch 1.816 ms 1.812000 ms 99.78% -0.22% .
USM_Allocation_latency_fp32_host 37.544 ms 37.417000 ms 99.66% -0.34% .
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch 1.656 ms 1.644000 ms 99.28% -0.72% .
USM_Allocation_latency_fp32_shared 0.140 ms 0.132000 ms 94.29% -5.71% ----
USM_Allocation_latency_fp32_device 0.159 ms 0.143000 ms 89.94% -10.06% ------
Relative perf in group VectorAddition (3): 98.877%
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 1.458 ms 1.449000 ms 99.38% -0.62% .
VectorAddition_int64 3.114 ms 3.075000 ms 98.75% -1.25% -
VectorAddition_fp32 1.470 ms 1.448000 ms 98.50% -1.50% -
Relative perf in group Polybench (3): 99.664%
Benchmark This PR baseline Relative perf Change -
Polybench_2mm 1.217000 ms 1.219 ms 100.16% 0.16% .
Polybench_Atax 6.888 ms 6.851000 ms 99.46% -0.54% .
Polybench_3mm 1.737 ms 1.726000 ms 99.37% -0.63% .
Relative perf in group Kmeans (1): 99.710%
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 16.213 ms 16.166000 ms 99.71% -0.29% .
Relative perf in group LinearRegressionCoeff (1): 99.977%
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 966.902 ms 966.679000 ms 99.98% -0.02% .
Relative perf in group MolecularDynamics (1): 86.667%
Benchmark This PR baseline Relative perf Change -
MolecularDynamics 0.030 ms 0.026000 ms 86.67% -13.33% ---------

Details

Benchmark details - environment, command, output...
api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),23.216,23.151,3.60%,22.419,238.862,[CPU],[us]

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),25.701,25.769,4.79%,22.234,261.380,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),14.315,14.228,3.32%,13.622,64.408,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),15.924,15.852,6.57%,14.492,245.754,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),225.309,224.689,1.87%,220.410,434.899,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),118.389,118.075,1.75%,113.044,166.132,[CPU],[us]

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),5.814,5.680,9.28%,5.275,34.077,[CPU],[us]

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.201,3.210,3.20%,0.552,3.404,[CPU],[GB/s]

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),2.192,2.170,8.11%,1.980,34.911,[CPU],[us]

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),1.647,1.642,3.64%,1.553,8.575,[CPU],[us]

miscellaneous_benchmark_sycl VectorSum

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256),807.445,807.892,0.39%,776.244,817.869,[GPU],bw [GB/s]

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.37092 s
361.850809 million keys/second

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00451552 s
bitcracker - total time for whole calculation: 35.4479 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Output:

UNKN:

UNKN: ==================================================
UNKN: User input parameters:
UNKN: Trace: ../../inputData
UNKN: ==================================================
UNKN:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1051 1255 28.5365% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1266 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1264 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1272 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1266 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1213 1256 32.9351% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1273 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1173 1275 31.849% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1200 1234 32.5821% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1235 1267 33.5324% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1105 1265 30.0027% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1256 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1119 1266 30.3828% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1242 1273 33.7225% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1264 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1221 1254 33.1523% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1262 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1263 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1127 1258 30.6001% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1217 1252 33.0437% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1261 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1235 1267 33.5324% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1091 1270 29.6226% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1272 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1177 1252 31.9576% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1265 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1260 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1074 1259 29.161% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1263 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1209 1242 32.8265% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1118 1266 30.3557% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1264 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1136 1253 30.8444% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1235 1271 33.5324% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1254 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1266 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1115 1262 30.2742% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1198 1252 32.5278% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1264 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1268 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1261 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1134 1264 30.7901% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1155 1255 31.3603% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1040 1257 28.2379% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1256 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1222 1260 33.1795% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1258 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1133 1271 30.763% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1235 1268 33.5324% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1258 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 222.5 ms

Velocity-Bench Easywave

Environment Variables:

Command:

/home/test-user/bench_workdir/easywave/easyWave_sycl -grid /home/test-user/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/test-user/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.3.30049+10)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.236810e-01 6.157730e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.579470e-01 7.484100e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.311310e-01 7.663040e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.618290e-01 8.280920e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.542520e-01 7.973990e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.543670e-01 7.667230e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.276860e-01 7.669270e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.273290e-01 7.888040e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.265220e-01 7.854820e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.275820e-01 7.605360e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.112e+07 1.112e+07 1.112e+07 0.000e+00 100.00
cycleInit 10 3.492e+06 3.492e+06 3.492e+06 0.000e+00 100.00
cycleTracking 10 7.624e+06 7.624e+06 7.624e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.947e+06 4.947e+06 4.947e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.081e+05 2.081e+05 2.081e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.070e+02 4.070e+02 4.070e+02 0.000e+00 100.00
Figure Of Merit 118.16 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.44354 s
sobelfilter - total time for whole calculation: 0.548075 s

Runtime_IndependentDAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.281391', '0.279223', '0.268487', '0.268487 0.274069 0.279223 0.283389 0.301786', '0.012696', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.290970', '0.277824', '0.274083', '0.274083 0.275831 0.277824 0.308648 0.318461', '0.020949', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.283061', '0.280959', '0.274784', '0.274784 0.280375 0.280959 0.282855 0.296334', '0.008006', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.279481', '0.278319', '0.276103', '0.276103 0.277954 0.278319 0.278685 0.286342', '0.003962', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.709239', '1.708434', '1.707458', '1.707458 1.707524 1.708434 1.710720 1.712058', '0.002055', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.729750', '1.729729', '1.727269', '1.727269 1.728815 1.729729 1.730220 1.732720', '0.002004', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.670420', '1.671292', '1.662264', '1.662264 1.670859 1.671292 1.672660 1.675025', '0.004840', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.685440', '1.684806', '1.683864', '1.683864 1.684232 1.684806 1.687116 1.687183', '0.001596', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005002', '0.004999', '0.004975', '0.004975 0.004976 0.004999 0.005013 0.005047', '0.000030', '25.126502', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005027', '0.005064', '0.004845', '0.004845 0.005031 0.005064 0.005072 0.005120', '0.000106', '25.799900', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005062', '0.005014', '0.004999', '0.004999 0.005011 0.005014 0.005139 0.005148', '0.000075', '25.007022', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005051', '0.005036', '0.005005', '0.005005 0.005009 0.005036 0.005090 0.005115', '0.000049', '24.975180', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005150', '0.005155', '0.005111', '0.005111 0.005127 0.005155 0.005171 0.005187', '0.000031', '24.457269', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618363', '0.618341', '0.618318', '0.618318 0.618334 0.618341 0.618402 0.618423', '0.000046', '0.202161', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005028', '0.005058', '0.004953', '0.004953 0.005000 0.005058 0.005059 0.005071', '0.000050', '25.235727', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005280', '0.004966', '0.004873', '0.004873 0.004904 0.004966 0.004996 0.006660', '0.000773', '25.652597', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618371', '0.618382', '0.618342', '0.618342 0.618343 0.618382 0.618390 0.618396', '0.000026', '0.202154', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617691', '0.617689', '0.617657', '0.617657 0.617666 0.617689 0.617708 0.617734', '0.000032', '0.202378', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617659', '0.617665', '0.617588', '0.617588 0.617606 0.617665 0.617710 0.617726', '0.000061', '0.202400', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.004836', '0.004830', '0.004739', '0.004739 0.004814 0.004830 0.004865 0.004930', '0.000070', '26.379238', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_LocalMem_int32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_int32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030397', '0.030379', '0.030342', '0.030342 0.030370 0.030379 0.030433 0.030461', '0.000049', '10282.778722', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

MicroBench_LocalMem_fp32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_fp32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030347', '0.030359', '0.030294', '0.030294 0.030313 0.030359 0.030383 0.030384', '0.000041', '10298.915118', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

Pattern_Reduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016840', '0.016734', '0.016729', '0.016729 0.016729 0.016734 0.016771 0.017236', '0.000222', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016716', '0.016704', '0.016329', '0.016329 0.016675 0.016704 0.016708 0.017166', '0.000298', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.008230', '0.008225', '0.008213', '0.008213 0.008215 0.008225 0.008248 0.008249', '0.000018', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010241', '0.010238', '0.010207', '0.010207 0.010236 0.010238 0.010244 0.010279', '0.000026', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010586', '0.010593', '0.010552', '0.010552 0.010592 0.010593 0.010593 0.010597', '0.000019', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006326', '0.006316', '0.006314', '0.006314 0.006315 0.006316 0.006333 0.006353', '0.000017', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006321', '0.006320', '0.006308', '0.006308 0.006320 0.006320 0.006323 0.006335', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011571', '0.011565', '0.011528', '0.011528 0.011536 0.011565 0.011587 0.011641', '0.000045', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005720', '0.005723', '0.005707', '0.005707 0.005714 0.005723 0.005727 0.005728', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012219', '0.012216', '0.012212', '0.012212 0.012214 0.012216 0.012218 0.012233', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012058', '0.012056', '0.012052', '0.012052 0.012055 0.012056 0.012064 0.012065', '0.000006', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012253', '0.012252', '0.012231', '0.012231 0.012239 0.012252 0.012254 0.012287', '0.000021', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012055', '0.012054', '0.012048', '0.012048 0.012054 0.012054 0.012055 0.012062', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005718', '0.005716', '0.005711', '0.005711 0.005713 0.005716 0.005721 0.005732', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006086', '0.006083', '0.006071', '0.006071 0.006080 0.006083 0.006086 0.006109', '0.000014', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006187', '0.006187', '0.006180', '0.006180 0.006184 0.006187 0.006191 0.006195', '0.000006', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_device

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_device', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000127', '0.000159', '0.000047', '0.000047 0.000068 0.000159 0.000177 0.000187', '0.000065', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_shared

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_shared', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000130', '0.000140', '0.000073', '0.000073 0.000140 0.000140 0.000147 0.000148', '0.000032', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_host

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_host', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.037512', '0.037544', '0.037398', '0.037398 0.037414 0.037544 0.037586 0.037619', '0.000100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001207', '0.001208', '0.001195', '0.001195 0.001207 0.001208 0.001208 0.001219', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001826', '0.001816', '0.001804', '0.001804 0.001808 0.001816 0.001822 0.001880', '0.000031', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001045', '0.001044', '0.001034', '0.001034 0.001037 0.001044 0.001047 0.001059', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001972', '0.001656', '0.001641', '0.001641 0.001650 0.001656 0.001657 0.003257', '0.000718', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003128', '0.003114', '0.003071', '0.003071 0.003102 0.003114 0.003174 0.003180', '0.000047', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001474', '0.001470', '0.001453', '0.001453 0.001458 0.001470 0.001486 0.001504', '0.000021', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001464', '0.001458', '0.001444', '0.001444 0.001445 0.001458 0.001482 0.001491', '0.000022', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/2mm.csv --size=512

Output:

['Polybench_2mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001213', '0.001217', '0.001202', '0.001202 0.001211 0.001217 0.001218 0.001219', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_3mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/3mm.csv --size=512

Output:

['Polybench_3mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001738', '0.001737', '0.001729', '0.001729 0.001734 0.001737 0.001745 0.001745', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Atax

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Atax.csv --size=8192

Output:

['Polybench_Atax', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.006857', '0.006888', '0.006726', '0.006726 0.006870 0.006888 0.006894 0.006908', '0.000074', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Kmeans_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Kmeans.csv --size=700000000

Output:

['Kmeans_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '700000000', '0.016215', '0.016213', '0.016204', '0.016204 0.016212 0.016213 0.016216 0.016227', '0.000008', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegressionCoeff_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LinearRegressionCoeff.csv --size=1638400000

Output:

['LinearRegressionCoeff_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1638400000', '0.966865', '0.966902', '0.966653', '0.966653 0.966663 0.966902 0.967051 0.967053', '0.000198', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MolecularDynamics

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/MolecularDynamics.csv --size=8196

Output:

['MolecularDynamics', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8196', '0.000035', '0.000030', '0.000028', '0.000028 0.000029 0.000030 0.000032 0.000058', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Copy link

Compute Benchmarks level_zero run (with params: --epsilon 0.02):
https://github.com/oneapi-src/unified-runtime/actions/runs/11271364562

Copy link

Compute Benchmarks level_zero run (--epsilon 0.02):
https://github.com/oneapi-src/unified-runtime/actions/runs/11271364562
Job status: failure. Test status: skipped.

Copy link

Compute Benchmarks level_zero run (with params: --epsilon 0.02):
https://github.com/oneapi-src/unified-runtime/actions/runs/11271364562

Copy link

Compute Benchmarks level_zero run (--epsilon 0.02):
https://github.com/oneapi-src/unified-runtime/actions/runs/11271364562
Job status: failure. Test status: skipped.

Copy link

Compute Benchmarks level_zero run (with params: --epsilon 0.02):
https://github.com/oneapi-src/unified-runtime/actions/runs/11271364562

Copy link

Compute Benchmarks level_zero run (--epsilon 0.02):
https://github.com/oneapi-src/unified-runtime/actions/runs/11271364562
Job status: failure. Test status: skipped.

Copy link

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/11271740221

Copy link

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/11271740221
Job status: cancelled. Test status: skipped.

Copy link

Compute Benchmarks level_zero run (with params: --epsilon 0.02):
https://github.com/oneapi-src/unified-runtime/actions/runs/11271780004

Copy link

Compute Benchmarks level_zero run (--epsilon 0.02):
https://github.com/oneapi-src/unified-runtime/actions/runs/11271780004
Job status: failure. Test status: skipped.

Copy link

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/11290451582

Copy link

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/11290451582
Job status: success. Test status: success.

Summary

Total 70 benchmarks in mean.
Geomean 99.832%.
Improved 15 Regressed 16 (threshold 0.50%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (6): 102.074%
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_sycl SubmitKernel out of order 22.460000 μs 25.633 μs 114.13% 14.13% ++++++++++
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.621000 μs 1.665 μs 102.71% 2.71% ++
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.106000 μs 2.118 μs 100.57% 0.57% .
api_overhead_benchmark_sycl SubmitKernel in order 22.827000 μs 22.919 μs 100.40% 0.40% .
api_overhead_benchmark_ur SubmitKernel out of order 14.323 μs 14.052000 μs 98.11% -1.89% -
api_overhead_benchmark_ur SubmitKernel in order 13.786 μs 13.427000 μs 97.40% -2.60% --
Relative perf in group memory (4): 97.713%
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.713000 μs 5.885 μs 103.01% 3.01% ++
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.186000 μs 3.232 μs 101.44% 1.44% +
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 116.779 μs 113.775000 μs 97.43% -2.57% --
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 253.468 μs 226.961000 μs 89.54% -10.46% -------
Relative perf in group miscellaneous (1): 100.269%
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum 861.128000 μs 863.445 μs 100.27% 0.27% .
Relative perf in group Velocity-Bench (6): 99.528%
Benchmark This PR baseline Relative perf Change -
Velocity-Bench QuickSilver 117.910 MMS/CTT 117.990000 MMS/CTT 99.93% -0.07% .
Velocity-Bench Bitcracker 35.414 s 35.366400 s 99.86% -0.14% .
Velocity-Bench Easywave 242.000 ms 241.000000 ms 99.59% -0.41% .
Velocity-Bench Sobel Filter 555.428 ms 551.561000 ms 99.30% -0.70% .
Velocity-Bench Hashtable 354.756 M keys/sec 358.493626 M keys/sec 98.96% -1.04% -
Velocity-Bench CudaSift 222.696000 ms -
Relative perf in group Runtime (8): 101.237%
Benchmark This PR baseline Relative perf Change -
Runtime_DAGTaskThroughput_NDRangeParallelFor 1790.988000 ms 1837.867 ms 102.62% 2.62% ++
Runtime_DAGTaskThroughput_HierarchicalParallelFor 1810.949000 ms 1857.392 ms 102.56% 2.56% ++
Runtime_DAGTaskThroughput_SingleTask 1761.127000 ms 1805.332 ms 102.51% 2.51% ++
Runtime_DAGTaskThroughput_BasicParallelFor 1854.356000 ms 1886.208 ms 101.72% 1.72% +
Runtime_IndependentDAGTaskThroughput_BasicParallelFor 279.638000 ms 280.169 ms 100.19% 0.19% .
Runtime_IndependentDAGTaskThroughput_SingleTask 261.800000 ms 262.259 ms 100.18% 0.18% .
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor 273.161000 ms 273.398 ms 100.09% 0.09% .
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor 277.841000 ms 278.081 ms 100.09% 0.09% .
Relative perf in group MicroBench (14): 99.834%
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous 5.070000 ms 5.104 ms 100.67% 0.67% .
MicroBench_HostDeviceBandwidth_3D_H2D_Strided 5.043000 ms 5.070 ms 100.54% 0.54% .
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous 5.079000 ms 5.084 ms 100.10% 0.10% .
MicroBench_HostDeviceBandwidth_1D_D2H_Strided 5.161000 ms 5.164 ms 100.06% 0.06% .
MicroBench_HostDeviceBandwidth_2D_D2H_Strided 617.674000 ms 617.720 ms 100.01% 0.01% .
MicroBench_LocalMem_fp32_4096 30.380000 ms 30.380 ms 100.00% 0.00% .
MicroBench_HostDeviceBandwidth_3D_D2H_Strided 617.738 ms 617.666000 ms 99.99% -0.01% .
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous 618.451 ms 618.329000 ms 99.98% -0.02% .
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous 618.472 ms 618.279000 ms 99.97% -0.03% .
MicroBench_LocalMem_int32_4096 30.398 ms 30.361000 ms 99.88% -0.12% .
MicroBench_HostDeviceBandwidth_2D_H2D_Strided 5.065 ms 5.049000 ms 99.68% -0.32% .
MicroBench_HostDeviceBandwidth_1D_H2D_Strided 4.912 ms 4.880000 ms 99.35% -0.65% .
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous 5.041 ms 4.989000 ms 98.97% -1.03% -
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous 4.963 ms 4.889000 ms 98.51% -1.49% -
Relative perf in group Pattern (10): 99.782%
Benchmark This PR baseline Relative perf Change -
Pattern_SegmentedReduction_Hierarchical_fp32 12.054000 ms 12.057 ms 100.02% 0.02% .
Pattern_SegmentedReduction_Hierarchical_int16 12.216000 ms 12.219 ms 100.02% 0.02% .
Pattern_SegmentedReduction_Hierarchical_int32 12.058000 ms 12.060 ms 100.02% 0.02% .
Pattern_SegmentedReduction_NDRange_int16 6.077000 ms 6.078 ms 100.02% 0.02% .
Pattern_SegmentedReduction_NDRange_int64 6.195000 ms 6.196 ms 100.02% 0.02% .
Pattern_SegmentedReduction_Hierarchical_int64 12.260 ms 12.259000 ms 99.99% -0.01% .
Pattern_SegmentedReduction_NDRange_fp32 5.717 ms 5.715000 ms 99.97% -0.03% .
Pattern_SegmentedReduction_NDRange_int32 5.719 ms 5.715000 ms 99.93% -0.07% .
Pattern_Reduction_Hierarchical_int32 16.986 ms 16.827000 ms 99.06% -0.94% -
Pattern_Reduction_NDRange_int32 16.952 ms 16.746000 ms 98.78% -1.22% -
Relative perf in group ScalarProduct (6): 99.939%
Benchmark This PR baseline Relative perf Change -
ScalarProduct_Hierarchical_int32 10.593000 ms 10.643 ms 100.47% 0.47% .
ScalarProduct_NDRange_fp32 6.325000 ms 6.331 ms 100.09% 0.09% .
ScalarProduct_NDRange_int32 6.374 ms 6.363000 ms 99.83% -0.17% .
ScalarProduct_Hierarchical_fp32 10.257 ms 10.235000 ms 99.79% -0.21% .
ScalarProduct_NDRange_int64 8.254 ms 8.234000 ms 99.76% -0.24% .
ScalarProduct_Hierarchical_int64 11.627 ms 11.592000 ms 99.70% -0.30% .
Relative perf in group USM (7): 98.083%
Benchmark This PR baseline Relative perf Change -
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch 1.207000 ms 1.225 ms 101.49% 1.49% +
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch 1.042000 ms 1.049 ms 100.67% 0.67% .
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch 1.801000 ms 1.812 ms 100.61% 0.61% .
USM_Allocation_latency_fp32_host 37.382000 ms 37.417 ms 100.09% 0.09% .
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch 1.647 ms 1.644000 ms 99.82% -0.18% .
USM_Allocation_latency_fp32_device 0.150 ms 0.143000 ms 95.33% -4.67% ---
USM_Allocation_latency_fp32_shared 0.148 ms 0.132000 ms 89.19% -10.81% --------
Relative perf in group VectorAddition (3): 98.397%
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 1.458 ms 1.449000 ms 99.38% -0.62% .
VectorAddition_int64 3.107 ms 3.075000 ms 98.97% -1.03% -
VectorAddition_fp32 1.495 ms 1.448000 ms 96.86% -3.14% --
Relative perf in group Polybench (3): 100.261%
Benchmark This PR baseline Relative perf Change -
Polybench_Atax 6.778000 ms 6.851 ms 101.08% 1.08% +
Polybench_2mm 1.219000 ms 1.219 ms 100.00% 0.00% .
Polybench_3mm 1.731 ms 1.726000 ms 99.71% -0.29% .
Relative perf in group Kmeans (1): 99.969%
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 16.171 ms 16.166000 ms 99.97% -0.03% .
Relative perf in group LinearRegressionCoeff (1): 99.978%
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 966.892 ms 966.679000 ms 99.98% -0.02% .
Relative perf in group MolecularDynamics (1): 100.000%
Benchmark This PR baseline Relative perf Change -
MolecularDynamics 0.026000 ms 0.026 ms 100.00% 0.00% .

Details

Benchmark details - environment, command, output...
api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),22.460,22.427,4.42%,21.691,316.403,[CPU],[us]

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),22.827,22.796,1.52%,22.032,76.333,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),14.323,14.300,2.18%,13.850,66.544,[CPU],[us]

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),13.786,13.767,5.12%,13.095,221.034,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),253.468,254.795,2.41%,223.102,516.426,[CPU],[us]

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),116.779,116.794,1.40%,110.592,153.781,[CPU],[us]

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),5.713,5.627,13.16%,5.214,53.853,[CPU],[us]

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.186,3.203,3.53%,0.547,3.463,[CPU],[GB/s]

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),2.106,2.084,5.91%,1.891,10.528,[CPU],[us]

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),1.621,1.613,4.26%,1.528,6.745,[CPU],[us]

miscellaneous_benchmark_sycl VectorSum

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256),861.128,861.548,0.45%,814.692,880.540,[GPU],bw [GB/s]

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.378338 s
354.755890 million keys/second

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00434045 s
bitcracker - total time for whole calculation: 35.4145 s

Velocity-Bench CudaSift

Environment Variables:

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Output:

UNKN:

UNKN: ==================================================
UNKN: User input parameters:
UNKN: Trace: ../../inputData
UNKN: ==================================================
UNKN:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1259 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1107 1272 30.057% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1219 1253 33.098% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1218 1261 33.0709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1262 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1116 1269 30.3014% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1268 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1262 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1038 1259 28.1835% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1040 1257 28.2379% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1184 1265 32.1477% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1270 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1031 1268 27.9935% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1264 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1269 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1260 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1042 1258 28.2922% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1214 1252 32.9623% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1099 1264 29.8398% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1183 1270 32.1206% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1266 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1259 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1113 1263 30.2199% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1266 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1266 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1134 1274 30.7901% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1267 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1212 1257 32.908% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1270 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1107 1266 30.057% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1264 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1080 1262 29.3239% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1022 1256 27.7491% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1263 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1114 1274 30.2471% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1070 1263 29.0524% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1102 1264 29.9213% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1221 1257 33.1523% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1265 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1266 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1077 1260 29.2425% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1088 1256 29.5411% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1267 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1095 1274 29.7312% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1267 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1143 1256 31.0345% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1086 1261 29.4868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1119 1266 30.3828% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1257 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1266 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 222.696 ms

Velocity-Bench Easywave

Environment Variables:

Command:

/home/test-user/bench_workdir/easywave/easyWave_sycl -grid /home/test-user/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/test-user/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.3.30049+10)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.329160e-01 6.125040e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.676720e-01 7.568170e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.641780e-01 7.716000e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.958630e-01 8.289550e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.441940e-01 8.039000e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.337390e-01 7.664410e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.326530e-01 7.678420e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.358090e-01 7.868140e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.343930e-01 7.849760e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.340070e-01 7.605150e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.122e+07 1.122e+07 1.122e+07 0.000e+00 100.00
cycleInit 10 3.575e+06 3.575e+06 3.575e+06 0.000e+00 100.00
cycleTracking 10 7.640e+06 7.640e+06 7.640e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.946e+06 4.946e+06 4.946e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.134e+05 2.134e+05 2.134e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.150e+02 4.150e+02 4.150e+02 0.000e+00 100.00
Figure Of Merit 117.91 [Num Mega Segments / Cycle Tracking Time]

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.46201 s
sobelfilter - total time for whole calculation: 0.555428 s

Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.273257', '0.273161', '0.272544', '0.272544 0.273019 0.273161 0.273589 0.273974', '0.000547', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.262455', '0.261800', '0.254583', '0.254583 0.254589 0.261800 0.263321 0.277981', '0.009566', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.284461', '0.277841', '0.275498', '0.275498 0.276219 0.277841 0.279134 0.313612', '0.016357', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_IndependentDAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_independent --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/IndependentDAGTaskThroughput_multi.csv --size=32768

Output:

['Runtime_IndependentDAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '32768', '0.284269', '0.279638', '0.274740', '0.274740 0.276283 0.279638 0.294870 0.295812', '0.010267', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_SingleTask

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_SingleTask', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.760878', '1.761127', '1.758656', '1.758656 1.759416 1.761127 1.761501 1.763689', '0.001964', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_HierarchicalParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_HierarchicalParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.810039', '1.810949', '1.806372', '1.806372 1.810361 1.810949 1.811107 1.811407', '0.002085', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_NDRangeParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_NDRangeParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.788752', '1.790988', '1.783039', '1.783039 1.785840 1.790988 1.791192 1.792703', '0.004113', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Runtime_DAGTaskThroughput_BasicParallelFor

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/dag_task_throughput_sequential --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/DAGTaskThroughput_multi.csv --size=327680

Output:

['Runtime_DAGTaskThroughput_BasicParallelFor', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '327680', '1.854453', '1.854356', '1.849584', '1.849584 1.850909 1.854356 1.855688 1.861725', '0.004762', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005037', '0.005041', '0.004931', '0.004931 0.005023 0.005041 0.005080 0.005113', '0.000069', '25.351545', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618424', '0.618451', '0.618239', '0.618239 0.618438 0.618451 0.618463 0.618528', '0.000109', '0.202187', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005037', '0.004912', '0.004870', '0.004870 0.004878 0.004912 0.004955 0.005572', '0.000301', '25.666297', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617684', '0.617674', '0.617654', '0.617654 0.617658 0.617674 0.617687 0.617749', '0.000039', '0.202379', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005332', '0.004963', '0.004933', '0.004933 0.004943 0.004963 0.004989 0.006831', '0.000838', '25.337603', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005106', '0.005079', '0.005056', '0.005056 0.005069 0.005079 0.005151 0.005173', '0.000053', '24.722700', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_1D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_1D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005163', '0.005161', '0.005133', '0.005133 0.005140 0.005161 0.005162 0.005219', '0.000034', '24.353607', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_D2H_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_D2H_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.617717', '0.617738', '0.617672', '0.617672 0.617675 0.617738 0.617741 0.617757', '0.000040', '0.202373', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005101', '0.005070', '0.004962', '0.004962 0.004975 0.005070 0.005233 0.005265', '0.000142', '25.188963', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005074', '0.005065', '0.005040', '0.005040 0.005062 0.005065 0.005070 0.005131', '0.000034', '24.800893', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.618462', '0.618472', '0.618382', '0.618382 0.618434 0.618472 0.618496 0.618525', '0.000056', '0.202140', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_HostDeviceBandwidth_3D_H2D_Strided

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/host_device_bandwidth --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/HostDeviceBandwidth_multi.csv --size=512

Output:

['MicroBench_HostDeviceBandwidth_3D_H2D_Strided', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.005039', '0.005043', '0.004958', '0.004958 0.004991 0.005043 0.005050 0.005156', '0.000075', '25.211098', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '0.125000']

MicroBench_LocalMem_fp32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_fp32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030353', '0.030380', '0.030278', '0.030278 0.030285 0.030380 0.030402 0.030419', '0.000067', '10304.573807', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

MicroBench_LocalMem_int32_4096

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/local_mem --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LocalMem_multi.csv --size=10240000

Output:

['MicroBench_LocalMem_int32_4096', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.030420', '0.030398', '0.030378', '0.030378 0.030397 0.030398 0.030422 0.030503', '0.000049', '10270.560469', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '312.000000']

Pattern_Reduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.017008', '0.016986', '0.016967', '0.016967 0.016981 0.016986 0.017017 0.017091', '0.000050', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_Reduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/reduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_Reduction_multi.csv --size=10240000

Output:

['Pattern_Reduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '10240000', '0.016907', '0.016952', '0.016714', '0.016714 0.016939 0.016952 0.016962 0.016969', '0.000109', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010255', '0.010257', '0.010237', '0.010237 0.010238 0.010257 0.010263 0.010282', '0.000019', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006379', '0.006374', '0.006311', '0.006311 0.006331 0.006374 0.006429 0.006449', '0.000060', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.010586', '0.010593', '0.010547', '0.010547 0.010589 0.010593 0.010596 0.010605', '0.000023', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.011602', '0.011627', '0.011528', '0.011528 0.011573 0.011627 0.011632 0.011649', '0.000050', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.008248', '0.008254', '0.008223', '0.008223 0.008228 0.008254 0.008265 0.008269', '0.000021', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

ScalarProduct_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/scalar_prod --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/ScalarProduct_multi.csv --size=102400000

Output:

['ScalarProduct_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006318', '0.006325', '0.006293', '0.006293 0.006309 0.006325 0.006331 0.006334', '0.000017', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006194', '0.006195', '0.006187', '0.006187 0.006187 0.006195 0.006200 0.006201', '0.000007', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005717', '0.005717', '0.005710', '0.005710 0.005714 0.005717 0.005720 0.005723', '0.000005', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012061', '0.012058', '0.012052', '0.012052 0.012056 0.012058 0.012058 0.012079', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.005720', '0.005719', '0.005712', '0.005712 0.005718 0.005719 0.005723 0.005727', '0.000006', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_NDRange_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_NDRange_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.006082', '0.006077', '0.006073', '0.006073 0.006074 0.006077 0.006081 0.006103', '0.000012', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012056', '0.012054', '0.012048', '0.012048 0.012049 0.012054 0.012058 0.012072', '0.000010', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int16

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int16', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012218', '0.012216', '0.012211', '0.012211 0.012214 0.012216 0.012217 0.012234', '0.000009', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Pattern_SegmentedReduction_Hierarchical_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/segmentedreduction --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Pattern_SegmentedReduction_multi.csv --size=102400000

Output:

['Pattern_SegmentedReduction_Hierarchical_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.012262', '0.012260', '0.012242', '0.012242 0.012244 0.012260 0.012263 0.012301', '0.000024', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_shared

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_shared', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000131', '0.000148', '0.000076', '0.000076 0.000135 0.000148 0.000148 0.000148', '0.000031', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_device

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_device', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.000179', '0.000150', '0.000049', '0.000049 0.000086 0.000150 0.000190 0.000418', '0.000144', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Allocation_latency_fp32_host

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_allocation_latency --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Allocation_latency_multi.csv --size=1024000000

Output:

['USM_Allocation_latency_fp32_host', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1024000000', '0.037413', '0.037382', '0.037119', '0.037119 0.037364 0.037382 0.037599 0.037601', '0.000200', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001814', '0.001801', '0.001795', '0.001795 0.001797 0.001801 0.001803 0.001877', '0.000035', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001211', '0.001207', '0.001202', '0.001202 0.001205 0.001207 0.001209 0.001231', '0.000012', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001981', '0.001647', '0.001634', '0.001634 0.001641 0.001647 0.001654 0.003332', '0.000755', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/usm_instr_mix --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/USM_Instr_Mix_multi.csv --size=8192

Output:

['USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.001045', '0.001042', '0.001036', '0.001036 0.001039 0.001042 0.001043 0.001067', '0.000012', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001479', '0.001495', '0.001443', '0.001443 0.001448 0.001495 0.001500 0.001507', '0.000031', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int64

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int64', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.003097', '0.003107', '0.003066', '0.003066 0.003077 0.003107 0.003108 0.003128', '0.000025', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

VectorAddition_int32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/vec_add --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/VectorAddition_multi.csv --size=102400000

Output:

['VectorAddition_int32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '102400000', '0.001471', '0.001458', '0.001444', '0.001444 0.001456 0.001458 0.001494 0.001503', '0.000026', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_2mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/2mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/2mm.csv --size=512

Output:

['Polybench_2mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001224', '0.001219', '0.001213', '0.001213 0.001218 0.001219 0.001227 0.001241', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_3mm

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/3mm --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/3mm.csv --size=512

Output:

['Polybench_3mm', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '512', '0.001732', '0.001731', '0.001719', '0.001719 0.001728 0.001731 0.001731 0.001749', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Polybench_Atax

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/atax --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Atax.csv --size=8192

Output:

['Polybench_Atax', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8192', '0.006769', '0.006778', '0.006696', '0.006696 0.006716 0.006778 0.006788 0.006869', '0.000068', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

Kmeans_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/kmeans --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/Kmeans.csv --size=700000000

Output:

['Kmeans_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '700000000', '0.016176', '0.016171', '0.016167', '0.016167 0.016170 0.016171 0.016181 0.016194', '0.000011', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

LinearRegressionCoeff_fp32

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/lin_reg_coeff --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/LinearRegressionCoeff.csv --size=1638400000

Output:

['LinearRegressionCoeff_fp32', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '1638400000', '0.967064', '0.966892', '0.966668', '0.966668 0.966803 0.966892 0.967335 0.967621', '0.000400', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

MolecularDynamics

Environment Variables:

Command:

/home/test-user/bench_workdir/sycl-bench-build/mol_dyn --warmup-run --num-runs=5 --output=/home/test-user/bench_workdir/MolecularDynamics.csv --size=8196

Output:

['MolecularDynamics', 'PASS', 'Intel(R) Data Center GPU Max 1100', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', '256', '8196', '0.000032', '0.000026', '0.000024', '0.000024 0.000025 0.000026 0.000029 0.000056', '0.000013', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'LLVM (Intel DPC++)', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A', 'N/A']

@pbalcer pbalcer merged commit 568a96a into oneapi-src:main Oct 15, 2024
50 checks passed
Copy link

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/11345546153

Copy link

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/11345546153
Job status: failure. Test status: skipped.

@mateuszpn mateuszpn deleted the bench-update branch October 15, 2024 12:05
igchor pushed a commit to igchor/unified-runtime that referenced this pull request Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants