Skip to content

Improves performance of dspmv, dspr, dsyr, and sdot

Pre-release
Pre-release
Compare
Choose a tag to compare
@luhenry luhenry released this 18 Dec 09:50
· 291 commits to master since this release
Benchmark Mode Threads Samples Score Score Error (99.9%) Unit Param: k Param: m Param: n Param: trans Param: transa Param: transb
dev.ludovic.blas.benchmarks.DaxpyBenchmark blas thrpt 1 6 9891830.622 1028065.96 ops/s 100
dev.ludovic.blas.benchmarks.DaxpyBenchmark f2j thrpt 1 6 17318477.350 359818.65 ops/s 100
dev.ludovic.blas.benchmarks.DaxpyBenchmark vector thrpt 1 6 44742539.869 18923558.24 ops/s 100
dev.ludovic.blas.benchmarks.DaxpyBenchmark blas thrpt 1 6 96.853 11.44 ops/s 10000000
dev.ludovic.blas.benchmarks.DaxpyBenchmark f2j thrpt 1 6 106.407 4.43 ops/s 10000000
dev.ludovic.blas.benchmarks.DaxpyBenchmark vector thrpt 1 6 121.109 6.17 ops/s 10000000
dev.ludovic.blas.benchmarks.DdotBenchmark blas thrpt 1 6 9954784.951 173474.76 ops/s 100
dev.ludovic.blas.benchmarks.DdotBenchmark f2j thrpt 1 6 10859531.956 15222.60 ops/s 100
dev.ludovic.blas.benchmarks.DdotBenchmark vector thrpt 1 6 47541964.333 600666.13 ops/s 100
dev.ludovic.blas.benchmarks.DdotBenchmark blas thrpt 1 6 111.008 12.33 ops/s 10000000
dev.ludovic.blas.benchmarks.DdotBenchmark f2j thrpt 1 6 89.610 0.39 ops/s 10000000
dev.ludovic.blas.benchmarks.DdotBenchmark vector thrpt 1 6 155.458 0.76 ops/s 10000000
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 2047546.577 12526.45 ops/s 10 10 10 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 923563.891 6930.27 ops/s 10 10 10 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 922385.764 14522.30 ops/s 10 10 10 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 78249.491 457.42 ops/s 1000 10 10 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 10646.235 744.95 ops/s 1000 10 10 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 10732.174 384.81 ops/s 1000 10 10 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 82551.452 2137.37 ops/s 10 1000 10 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 17799.175 48.49 ops/s 10 1000 10 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 17778.241 62.63 ops/s 10 1000 10 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 2424.799 995.14 ops/s 1000 1000 10 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 171.261 3.06 ops/s 1000 1000 10 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 171.975 1.18 ops/s 1000 1000 10 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 80907.458 1471.73 ops/s 10 10 1000 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 9834.701 23.26 ops/s 10 10 1000 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 9834.416 63.73 ops/s 10 10 1000 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 1879.319 1764.10 ops/s 1000 10 1000 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 106.984 1.04 ops/s 1000 10 1000 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 106.900 0.70 ops/s 1000 10 1000 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 2168.814 70.41 ops/s 10 1000 1000 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 171.909 1.55 ops/s 10 1000 1000 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 172.358 0.75 ops/s 10 1000 1000 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 106.865 1.63 ops/s 1000 1000 1000 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 1.713 0.02 ops/s 1000 1000 1000 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 1.661 0.18 ops/s 1000 1000 1000 N N
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 2063000.639 13726.63 ops/s 10 10 10 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 937191.069 3797.43 ops/s 10 10 10 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 1137394.041 4013.79 ops/s 10 10 10 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 79769.902 682.91 ops/s 1000 10 10 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 10898.293 6.75 ops/s 1000 10 10 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 26585.125 399.26 ops/s 1000 10 10 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 81745.215 4755.09 ops/s 10 1000 10 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 10026.385 251.08 ops/s 10 1000 10 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 13146.593 151.41 ops/s 10 1000 10 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 2615.017 386.57 ops/s 1000 1000 10 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 105.069 0.51 ops/s 1000 1000 10 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 219.903 7.32 ops/s 1000 1000 10 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 81429.819 731.09 ops/s 10 10 1000 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 9782.387 19.98 ops/s 10 10 1000 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 12203.720 1882.68 ops/s 10 10 1000 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 2204.799 89.43 ops/s 1000 10 1000 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 107.516 0.68 ops/s 1000 10 1000 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 250.123 2.58 ops/s 1000 10 1000 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 2184.589 67.15 ops/s 10 1000 1000 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 98.956 0.91 ops/s 10 1000 1000 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 117.876 1.28 ops/s 10 1000 1000 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 99.506 50.37 ops/s 1000 1000 1000 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 1.047 0.00 ops/s 1000 1000 1000 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 2.185 0.05 ops/s 1000 1000 1000 T N
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 1938319.922 8216.59 ops/s 10 10 10 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 925870.239 4411.42 ops/s 10 10 10 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 922341.430 2018.15 ops/s 10 10 10 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 63373.857 328.00 ops/s 1000 10 10 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 10038.596 619.56 ops/s 1000 10 10 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 10088.526 749.38 ops/s 1000 10 10 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 82053.302 2790.34 ops/s 10 1000 10 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 17243.272 1084.71 ops/s 10 1000 10 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 16992.581 32.94 ops/s 10 1000 10 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 2399.037 812.39 ops/s 1000 1000 10 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 157.014 3.05 ops/s 1000 1000 10 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 154.813 5.45 ops/s 1000 1000 10 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 68170.582 3560.16 ops/s 10 10 1000 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 9321.041 562.80 ops/s 10 10 1000 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 9169.666 68.11 ops/s 10 10 1000 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 1960.171 23.78 ops/s 1000 10 1000 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 75.022 1.66 ops/s 1000 10 1000 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 74.882 2.06 ops/s 1000 10 1000 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 2166.652 107.49 ops/s 10 1000 1000 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 156.969 0.12 ops/s 10 1000 1000 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 156.916 0.41 ops/s 10 1000 1000 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 104.763 1.61 ops/s 1000 1000 1000 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 1.523 0.01 ops/s 1000 1000 1000 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 1.475 0.23 ops/s 1000 1000 1000 N T
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 1958542.124 5059.70 ops/s 10 10 10 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 703158.229 12245.54 ops/s 10 10 10 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 703801.605 16839.36 ops/s 10 10 10 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 63565.749 503.66 ops/s 1000 10 10 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 10687.994 8.36 ops/s 1000 10 10 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 10670.900 93.57 ops/s 1000 10 10 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 82941.985 1461.85 ops/s 10 1000 10 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 7728.476 6.05 ops/s 10 1000 10 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 7732.597 4.55 ops/s 10 1000 10 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 2688.565 54.35 ops/s 1000 1000 10 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 98.873 7.21 ops/s 1000 1000 10 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 97.551 9.58 ops/s 1000 1000 10 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 68617.303 161.47 ops/s 10 10 1000 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 7422.335 354.50 ops/s 10 10 1000 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 7526.837 25.80 ops/s 10 10 1000 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 1961.076 41.30 ops/s 1000 10 1000 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 47.116 1.15 ops/s 1000 10 1000 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 47.395 0.36 ops/s 1000 10 1000 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 2184.035 59.28 ops/s 10 1000 1000 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 74.029 0.16 ops/s 10 1000 1000 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 74.001 0.37 ops/s 10 1000 1000 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark blas thrpt 1 6 105.112 2.77 ops/s 1000 1000 1000 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark f2j thrpt 1 6 0.507 0.00 ops/s 1000 1000 1000 T T
dev.ludovic.blas.benchmarks.DgemmBenchmark vector thrpt 1 6 0.506 0.01 ops/s 1000 1000 1000 T T
dev.ludovic.blas.benchmarks.DgemvBenchmark blas thrpt 1 6 5036946.031 14198.03 ops/s 10 10 N
dev.ludovic.blas.benchmarks.DgemvBenchmark f2j thrpt 1 6 7968046.453 16005.21 ops/s 10 10 N
dev.ludovic.blas.benchmarks.DgemvBenchmark vector thrpt 1 6 7753130.822 20001.27 ops/s 10 10 N
dev.ludovic.blas.benchmarks.DgemvBenchmark blas thrpt 1 6 52098.597 1745.40 ops/s 10000 10 N
dev.ludovic.blas.benchmarks.DgemvBenchmark f2j thrpt 1 6 15904.137 55.87 ops/s 10000 10 N
dev.ludovic.blas.benchmarks.DgemvBenchmark vector thrpt 1 6 15969.944 157.17 ops/s 10000 10 N
dev.ludovic.blas.benchmarks.DgemvBenchmark blas thrpt 1 6 49867.206 48140.54 ops/s 10 10000 N
dev.ludovic.blas.benchmarks.DgemvBenchmark f2j thrpt 1 6 9063.373 8.47 ops/s 10 10000 N
dev.ludovic.blas.benchmarks.DgemvBenchmark vector thrpt 1 6 9060.903 9.15 ops/s 10 10000 N
dev.ludovic.blas.benchmarks.DgemvBenchmark blas thrpt 1 6 21.057 0.71 ops/s 10000 10000 N
dev.ludovic.blas.benchmarks.DgemvBenchmark f2j thrpt 1 6 13.394 0.30 ops/s 10000 10000 N
dev.ludovic.blas.benchmarks.DgemvBenchmark vector thrpt 1 6 13.422 0.32 ops/s 10000 10000 N
dev.ludovic.blas.benchmarks.DgemvBenchmark blas thrpt 1 6 4818015.374 34381.58 ops/s 10 10 T
dev.ludovic.blas.benchmarks.DgemvBenchmark f2j thrpt 1 6 8282120.525 45691.98 ops/s 10 10 T
dev.ludovic.blas.benchmarks.DgemvBenchmark vector thrpt 1 6 11053548.803 84947.47 ops/s 10 10 T
dev.ludovic.blas.benchmarks.DgemvBenchmark blas thrpt 1 6 135056.171 6646.08 ops/s 10000 10 T
dev.ludovic.blas.benchmarks.DgemvBenchmark f2j thrpt 1 6 10867.628 5.85 ops/s 10000 10 T
dev.ludovic.blas.benchmarks.DgemvBenchmark vector thrpt 1 6 26082.600 1834.41 ops/s 10000 10 T
dev.ludovic.blas.benchmarks.DgemvBenchmark blas thrpt 1 6 49742.504 1263.88 ops/s 10 10000 T
dev.ludovic.blas.benchmarks.DgemvBenchmark f2j thrpt 1 6 9961.732 1364.97 ops/s 10 10000 T
dev.ludovic.blas.benchmarks.DgemvBenchmark vector thrpt 1 6 11702.483 2933.48 ops/s 10 10000 T
dev.ludovic.blas.benchmarks.DgemvBenchmark blas thrpt 1 6 22.636 7.50 ops/s 10000 10000 T
dev.ludovic.blas.benchmarks.DgemvBenchmark f2j thrpt 1 6 9.573 0.01 ops/s 10000 10000 T
dev.ludovic.blas.benchmarks.DgemvBenchmark vector thrpt 1 6 17.048 0.22 ops/s 10000 10000 T
dev.ludovic.blas.benchmarks.DscalBenchmark blas thrpt 1 6 566008.148 20903.34 ops/s 100
dev.ludovic.blas.benchmarks.DscalBenchmark f2j thrpt 1 6 301450.652 279.13 ops/s 100
dev.ludovic.blas.benchmarks.DscalBenchmark vector thrpt 1 6 1154920.952 59650.94 ops/s 100
dev.ludovic.blas.benchmarks.DscalBenchmark blas thrpt 1 6 59.563 0.82 ops/s 10000000
dev.ludovic.blas.benchmarks.DscalBenchmark f2j thrpt 1 6 3.011 0.00 ops/s 10000000
dev.ludovic.blas.benchmarks.DscalBenchmark vector thrpt 1 6 11.145 0.02 ops/s 10000000
dev.ludovic.blas.benchmarks.DspmvBenchmark blas thrpt 1 6 2796587.673 7609.43 ops/s 10
dev.ludovic.blas.benchmarks.DspmvBenchmark f2j thrpt 1 6 7585193.214 19449.63 ops/s 10
dev.ludovic.blas.benchmarks.DspmvBenchmark vector thrpt 1 6 9682274.504 98374.74 ops/s 10
dev.ludovic.blas.benchmarks.DspmvBenchmark blas thrpt 1 6 6614.227 511.27 ops/s 1000
dev.ludovic.blas.benchmarks.DspmvBenchmark f2j thrpt 1 6 1607.448 4.57 ops/s 1000
dev.ludovic.blas.benchmarks.DspmvBenchmark vector thrpt 1 6 3323.576 58.79 ops/s 1000
dev.ludovic.blas.benchmarks.DsprBenchmark blas thrpt 1 6 3916849.603 30989.57 ops/s 10
dev.ludovic.blas.benchmarks.DsprBenchmark f2j thrpt 1 6 10609179.052 61752.40 ops/s 10
dev.ludovic.blas.benchmarks.DsprBenchmark vector thrpt 1 6 15466671.558 404954.45 ops/s 10
dev.ludovic.blas.benchmarks.DsprBenchmark blas thrpt 1 6 31331.850 1414.51 ops/s 1000
dev.ludovic.blas.benchmarks.DsprBenchmark f2j thrpt 1 6 3366.354 18.58 ops/s 1000
dev.ludovic.blas.benchmarks.DsprBenchmark vector thrpt 1 6 8449.482 109.29 ops/s 1000
dev.ludovic.blas.benchmarks.DsyrBenchmark blas thrpt 1 6 4060693.461 125839.80 ops/s 10
dev.ludovic.blas.benchmarks.DsyrBenchmark f2j thrpt 1 6 11027797.321 528986.53 ops/s 10
dev.ludovic.blas.benchmarks.DsyrBenchmark vector thrpt 1 6 16096321.087 548155.51 ops/s 10
dev.ludovic.blas.benchmarks.DsyrBenchmark blas thrpt 1 6 30480.039 1897.84 ops/s 1000
dev.ludovic.blas.benchmarks.DsyrBenchmark f2j thrpt 1 6 3177.229 27.59 ops/s 1000
dev.ludovic.blas.benchmarks.DsyrBenchmark vector thrpt 1 6 7269.102 419.43 ops/s 1000
dev.ludovic.blas.benchmarks.SdotBenchmark blas thrpt 1 6 10972373.748 117429.87 ops/s 100
dev.ludovic.blas.benchmarks.SdotBenchmark f2j thrpt 1 6 10897574.968 197763.01 ops/s 100
dev.ludovic.blas.benchmarks.SdotBenchmark vector thrpt 1 6 52071656.932 1486327.74 ops/s 100
dev.ludovic.blas.benchmarks.SdotBenchmark blas thrpt 1 6 327.433 1.75 ops/s 10000000
dev.ludovic.blas.benchmarks.SdotBenchmark f2j thrpt 1 6 99.476 1.89 ops/s 10000000
dev.ludovic.blas.benchmarks.SdotBenchmark vector thrpt 1 6 322.310 3.51 ops/s 10000000