Skip to content

Latest commit

 

History

History
1040 lines (1010 loc) · 105 KB

detailed_benchmark.md

File metadata and controls

1040 lines (1010 loc) · 105 KB

Detailed MLX benchmark

Detailed runtime benchmark of mlx operations, measured in milliseconds.

  • mlx_gpu: mlx framework with gpu backend
  • mlx_cpu: mlx framework with cpu backend
  • cpu: torch framework with cpu backend
  • mps: torch framework with mps (gpu) backend
  • mlx_gpu/mps speedup: runtime speedup of mlx_gpu compared to mps
  • mlx_gpu/mlx_cpu speedup: runtime speedup of mlx_gpu compared to mlx_cpu
  • cuda/cpu speedup: runtime speedup of cuda compared to cpu

Apple Silicon

M1 (cores: 4E+4P+8GPU)

Operation mlx_gpu mlx_cpu mps cpu mlx_gpu/mps speedup mlx_gpu/mlx_cpu speedup
Argmax / dim=64x1024x128 axi=0 2.72 12.43 3.93 20.56 +44% +356%
Argmax / dim=64x1024x128 axi=1 1.55 11.21 3.38 6.23 +117% +621%
Argmax / dim=64x1024x128 axi=2 1.55 8.56 2.69 3.04 +73% +452%
Argmax / dim=64x128x1024 axi=2 1.42 10.32 1.49 2.35 +5% +626%
BCE / dim=1000000 dim=1000000 1.03 7.38 1.64 1.75 +58% +615%
BCE / dim=100000x32 dim=100000x32 2.53 22.57 4.90 4.59 +93% +791%
BCE / dim=100000x64x2 dim=100000x64x2 8.96 87.92 21.47 18.42 +139% +880%
BCE / dim=128x100000 dim=128x100000 9.52 89.39 20.74 18.74 +117% +839%
Concat / dim=1000000x64 dim=1000000x32 axi=1 14.11 64.02 13.84 38.98 -1% +353%
Concat / dim=1000000x64 dim=1000000x128 axi=1 27.23 148.54 27.91 77.07 +2% +445%
Concat / dim=1000000x64 dim=1000000x64 axi=0 17.61 85.42 17.58 41.08 0% +385%
Concat / dim=64x1000000 dim=64x1000000 axi=0 17.57 105.55 17.77 41.40 +1% +500%
Conv1d / dim=100x256x3 dim=8x3x3 1.13 0.33 0.55 2.39 -51% -70%
Conv1d / dim=100x256x256 dim=8x3x256 6.46 9.77 5.85 54.03 -9% +51%
Conv1d / dim=16x1000x80 dim=128x11x80 5.41 7.43 7.07 359.73 +30% +37%
Conv1d / dim=16x1000x3 dim=128x11x3 2.33 0.58 1.45 48.37 -37% -75%
Conv2d / dim=100x256x256x3 dim=8x3x3x3 80.43 935.78 11.06 127.84 -86% +1063%
Conv2d / dim=10x256x256x12 dim=8x3x3x12 29.16 425.12 8.57 32.76 -70% +1358%
Conv2d / dim=1x256x256x128 dim=8x3x3x128 29.64 703.02 9.62 56.69 -67% +2271%
Conv2d / dim=100x28x28x3 dim=8x3x3x3 2.57 10.70 1.11 1.35 -56% +316%
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 8.82 108.78 4.94 9.04 -44% +1133%
LeakyReLU / dim=128x16x1024 0.98 1.44 0.67 0.70 -31% +47%
LeakyReLU / dim=64x128x1024 3.14 4.37 1.65 2.05 -47% +39%
Linear / dim=100x1024x32 dim=32x1024 dim=1024 22.13 60.20 13.41 62.32 -39% +171%
Linear / dim=100x1024x64 dim=64x1024 dim=1024 22.15 64.97 21.02 72.80 -5% +193%
Linear / dim=100x1024x256 dim=256x1024 dim=1024 40.84 96.80 79.19 170.94 +93% +137%
Linear / dim=100x1024x512 dim=512x1024 dim=1024 66.18 144.25 153.90 281.38 +132% +117%
Linear / dim=100x1x51200 dim=51200x1 dim=1 0.75 0.40 1.00 0.96 +32% -47%
MatMul / dim=32x1x1000 dim=32x1000x128 0.56 0.90 0.94 0.81 +65% +60%
MatMul / dim=1000x64x256 dim=256x32 1.70 4.17 5.03 15.57 +195% +144%
MatMul / dim=1000x64x1024 dim=1000x1024x32 8.28 17.74 21.41 253.21 +158% +114%
MatMul / dim=1000x1024x64 dim=1000x64x256 25.69 93.79 25.91 1506.15 +0% +265%
MatMul / dim=64x1000000 dim=1000000x32 41.69 18.05 33.39 56.56 -19% -56%
MatMul / dim=1000000x64 dim=64x1024 80.34 428.28 200.55 1194.50 +149% +433%
PReLU / dim=128x16x1024 dim=1 1.52 2.48 0.65 0.56 -57% +63%
PReLU / dim=64x128x1024 dim=1 5.48 6.59 1.65 2.07 -69% +20%
ReLU / dim=128x16x1024 0.54 0.53 0.66 0.68 +21% -1%
ReLU / dim=64x128x1024 1.43 1.27 1.60 2.01 +12% -11%
SeLU / dim=128x16x1024 3.34 6.65 0.65 3.26 -80% +98%
SeLU / dim=64x128x1024 12.27 22.81 1.63 12.17 -86% +85%
Sigmoid / dim=128x16x1024 0.51 13.28 0.67 3.24 +31% +2511%
Sigmoid / dim=64x128x1024 1.42 52.05 1.64 11.21 +15% +3565%
Softmax / dim=64x1000000 axi=-1 13.33 36.49 17.54 55.15 +31% +173%
Softmax / dim=1000000x64 axi=-1 8.85 34.18 21.06 59.54 +138% +286%
Softmax / dim=64x16x32x1024 axi=-1 4.81 18.03 16.84 23.87 +250% +274%
Softmax / dim=128x16x32x1024 axi=-1 9.31 36.18 31.50 54.39 +238% +288%
Softmax / dim=1024x16x32x128 axi=-1 9.43 34.99 22.24 57.47 +135% +271%
Softmax / dim=1024x64x32x8 axi=-1 15.19 86.01 6.42 29.72 -57% +466%
Softplus / dim=128x16x1024 0.69 13.48 0.94 4.55 +35% +1860%
Softplus / dim=64x128x1024 1.46 52.67 2.52 17.42 +72% +3503%
Sort / dim=64x128x1024 axi=0 34.19 900.00 95.15 72.40 +178% +2532%
Sort / dim=64x128x1024 axi=1 18.08 976.16 85.37 63.35 +372% +5298%
Sort / dim=64x128x1024 axi=2 3.20 263.54 39.22 74.59 +1123% +8124%
Sum / dim=64x128x128x128 axi=0 11.36 17.39 30.79 17.97 +170% +53%
Sum / dim=64x128x128x128 axi=1 9.74 12.66 10.87 15.02 +11% +30%
Sum / dim=64x128x128x128 axi=2 9.39 11.23 10.52 10.80 +11% +19%
Sum / dim=64x128x128x128 axi=3 14.82 9.51 13.21 9.90 -10% -35%
SumAll / dim=64x128x128x128 9.02 8.81 9.82 9.38 +8% -2%
SumAll / dim=1000000 0.52 0.08 0.43 0.07 -17% -85%
SumAll / dim=1000000x128 9.13 9.22 9.89 9.34 +8% +0%
SumAll / dim=128x1000000 8.95 9.28 9.46 9.18 +5% +3%

M1 Pro (16GB) mlx 0.2.0

Operation mlx_gpu mlx_gpu_compile mlx_cpu mps cpu mlx_gpu_compile/mlx_gpu speedup mlx_gpu/mps speedup mlx_gpu/mlx_cpu speedup
Argmax / dim=64x1024x128 axi=0 1.76 1.76 11.21 1.90 23.25 0% +7% +536%
Argmax / dim=64x1024x128 axi=1 1.76 1.75 11.07 2.47 5.79 +0% +40% +530%
Argmax / dim=64x1024x128 axi=2 1.79 1.74 11.06 1.37 2.54 +2% -23% +517%
Argmax / dim=64x128x1024 axi=2 1.76 1.74 11.06 0.87 2.22 +0% -50% +528%
BCE / dim=1000000 dim=1000000 0.68 0.39 6.81 0.69 1.53 +75% +1% +897%
BCE / dim=100000x32 dim=100000x32 1.17 0.74 22.68 0.99 3.88 +58% -15% +1841%
BCE / dim=100000x64x2 dim=100000x64x2 3.66 1.51 89.05 1.93 15.12 +141% -47% +2335%
BCE / dim=128x100000 dim=128x100000 3.66 1.50 89.50 1.86 15.38 +143% -49% +2345%
Concat / dim=1000000x64 dim=1000000x32 axi=1 4.52 4.47 66.62 4.68 30.75 +1% +3% +1374%
Concat / dim=1000000x64 dim=1000000x128 axi=1 8.76 8.86 150.48 8.90 51.52 -1% +1% +1618%
Concat / dim=1000000x64 dim=1000000x64 axi=0 5.89 5.87 66.99 6.07 41.97 +0% +2% +1036%
Concat / dim=64x1000000 dim=64x1000000 axi=0 5.87 5.87 87.85 6.09 41.80 0% +3% +1396%
Conv1d / dim=100x256x3 dim=8x3x3 1.16 0.48 0.48 0.52 2.82 +143% -55% -58%
Conv1d / dim=100x256x256 dim=8x3x256 4.01 3.09 9.22 2.51 82.08 +29% -37% +130%
Conv1d / dim=16x1000x80 dim=128x11x80 3.01 2.50 4.98 2.55 520.37 +20% -15% +65%
Conv1d / dim=16x1000x3 dim=128x11x3 1.73 0.73 0.76 0.83 66.68 +137% -51% -55%
Conv2d / dim=100x256x256x3 dim=8x3x3x3 38.01 37.62 953.57 6.67 155.57 +1% -82% +2408%
Conv2d / dim=10x256x256x12 dim=8x3x3x12 13.92 13.66 438.53 3.38 28.86 +1% -75% +3050%
Conv2d / dim=1x256x256x128 dim=8x3x3x128 15.30 14.26 737.17 2.86 39.96 +7% -81% +4719%
Conv2d / dim=100x28x28x3 dim=8x3x3x3 2.29 1.10 11.48 0.74 1.66 +108% -67% +401%
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 5.33 4.31 112.67 2.06 9.32 +23% -61% +2012%
Gather / dim=64x256 dim=10 0.28 0.27 0.03 0.32 0.01 +3% +14% -89%
Gather / dim=64x256 dim=1000 0.39 0.39 0.06 0.46 0.15 +0% +19% -84%
Gather / dim=64x256 dim=1000000 22.90 22.90 31.88 102.21 52.84 +0% +346% +39%
Gather / dim=1024x32 dim=10 0.28 0.27 0.03 0.32 0.01 +3% +16% -89%
Gather / dim=1024x32 dim=1000 0.29 0.29 0.04 0.36 0.09 +0% +22% -87%
Gather / dim=1024x32 dim=1000000 3.60 3.15 7.91 13.14 7.50 +14% +265% +119%
LeakyReLU / dim=128x16x1024 0.50 0.34 1.95 0.50 0.89 +48% +0% +287%
LeakyReLU / dim=64x128x1024 1.26 0.66 7.37 0.88 1.61 +90% -30% +487%
Linear / dim=100x1024x32 dim=32x1024 dim=1024 7.68 7.37 25.04 7.35 49.39 +4% -4% +226%
Linear / dim=100x1024x64 dim=64x1024 dim=1024 8.65 8.51 29.16 12.55 63.51 +1% +45% +237%
Linear / dim=100x1024x256 dim=256x1024 dim=1024 17.59 17.50 47.12 47.71 153.45 +0% +171% +167%
Linear / dim=100x1024x512 dim=512x1024 dim=1024 29.37 29.38 74.01 93.98 278.74 0% +220% +152%
Linear / dim=100x1x51200 dim=51200x1 dim=1 0.87 0.76 0.50 0.80 4.98 +13% -8% -42%
MatMul / dim=32x1x1000 dim=32x1000x128 0.34 0.33 1.10 0.52 0.81 +3% +51% +220%
MatMul / dim=1000x64x256 dim=256x32 1.02 1.00 2.36 2.41 21.91 +1% +136% +131%
MatMul / dim=1000x64x1024 dim=1000x1024x32 3.18 3.18 19.13 13.42 1530.48 +0% +321% +500%
MatMul / dim=1000x1024x64 dim=1000x64x256 9.35 9.40 51.38 9.31 825.97 0% 0% +449%
MatMul / dim=64x1000000 dim=1000000x32 15.17 15.04 12.41 19.47 548.59 +0% +28% -18%
MatMul / dim=1000000x64 dim=64x1024 34.83 34.86 156.96 101.81 1390.57 0% +192% +350%
PReLU / dim=128x16x1024 dim=1 0.75 0.35 1.37 0.55 0.91 +117% -27% +81%
PReLU / dim=64x128x1024 dim=1 2.11 0.67 4.81 0.86 1.61 +216% -59% +128%
ReLU / dim=128x16x1024 0.38 0.40 0.59 0.52 0.85 -4% +37% +55%
ReLU / dim=64x128x1024 0.69 0.66 1.54 0.87 1.78 +3% +26% +123%
Scatter / dim=64x16 dim=10 0.29 0.28 0.03 0.30 0.01 +3% +3% -90%
Scatter / dim=64x16 dim=1000 0.34 0.33 0.11 0.33 0.09 +2% -1% -68%
Scatter / dim=64x16 dim=1000000 15.66 15.64 65.73 6.72 4.37 +0% -57% +319%
Scatter / dim=1024x32 dim=10 0.32 0.30 0.03 0.37 0.01 +6% +16% -89%
Scatter / dim=1024x32 dim=1000 0.39 0.37 0.19 0.38 0.09 +7% -3% -52%
Scatter / dim=1024x32 dim=1000000 30.93 30.87 123.75 13.22 6.70 +0% -57% +300%
ScatterSum / dim=64x16 dim=10 0.09 0.08 0.03 nan 0.01 +15% nan% -69%
ScatterSum / dim=64x16 dim=1000 0.09 0.07 0.03 nan 0.01 +23% nan% -70%
ScatterSum / dim=64x16 dim=1000000 0.08 0.07 0.03 nan 1.95 +26% nan% -68%
ScatterSum / dim=1024x32 dim=10 0.08 0.06 0.02 nan 0.01 +29% nan% -69%
ScatterSum / dim=1024x32 dim=1000 0.09 0.07 0.03 nan 0.02 +27% nan% -71%
ScatterSum / dim=1024x32 dim=1000000 0.09 0.07 0.03 nan 7.06 +19% nan% -68%
ScatterMax / dim=64x16 dim=10 0.09 0.07 0.03 nan 0.01 +32% nan% -68%
ScatterMax / dim=64x16 dim=1000 0.09 0.07 0.02 nan 0.01 +21% nan% -71%
ScatterMax / dim=64x16 dim=1000000 0.09 0.07 0.03 nan 1.79 +23% nan% -69%
ScatterMax / dim=1024x32 dim=10 0.09 0.07 0.03 nan 0.01 +16% nan% -69%
ScatterMax / dim=1024x32 dim=1000 0.08 0.07 0.03 nan 0.02 +24% nan% -65%
ScatterMax / dim=1024x32 dim=1000000 0.09 0.07 0.03 nan 7.21 +27% nan% -71%
SeLU / dim=128x16x1024 1.15 0.34 3.04 0.51 2.95 +237% -56% +163%
SeLU / dim=64x128x1024 4.36 0.67 12.73 0.89 10.44 +553% -79% +192%
Sigmoid / dim=128x16x1024 0.44 0.39 13.69 0.59 2.84 +12% +33% +3021%
Sigmoid / dim=64x128x1024 0.68 0.66 54.30 0.97 9.60 +1% +43% +7929%
Softmax / dim=64x1000000 axi=-1 11.70 8.83 58.18 6.22 40.86 +32% -46% +397%
Softmax / dim=1000000x64 axi=-1 11.62 8.85 59.00 8.91 42.06 +31% -23% +407%
Softmax / dim=64x16x32x1024 axi=-1 6.26 4.84 32.68 4.43 19.57 +29% -29% +421%
Softmax / dim=128x16x32x1024 axi=-1 12.22 9.26 61.93 8.05 39.81 +31% -34% +406%
Softmax / dim=1024x16x32x128 axi=-1 12.22 9.25 63.13 9.48 42.39 +32% -22% +416%
Softmax / dim=1024x64x32x8 axi=-1 3.40 2.64 15.58 3.10 23.94 +28% -8% +358%
Softplus / dim=128x16x1024 0.50 0.33 13.88 0.70 3.94 +52% +41% +2683%
Softplus / dim=64x128x1024 0.82 0.65 54.54 1.22 14.63 +25% +48% +6547%
Sort / dim=64x128x1024 axi=0 1.84 1.81 270.84 50.79 65.70 +1% +2664% +14638%
Sort / dim=64x128x1024 axi=1 1.76 1.75 271.31 44.70 55.19 +0% +2437% +15305%
Sort / dim=64x128x1024 axi=2 1.76 1.75 271.42 17.12 69.15 +0% +874% +15350%
Sum / dim=64x128x128x128 axi=0 3.49 3.48 11.27 12.43 14.32 +0% +256% +222%
Sum / dim=64x128x128x128 axi=1 3.48 3.45 10.46 3.53 13.72 +0% +1% +200%
Sum / dim=64x128x128x128 axi=2 3.50 3.47 11.52 3.44 8.44 +0% -1% +229%
Sum / dim=64x128x128x128 axi=3 3.47 3.47 10.42 6.07 5.91 +0% +75% +200%
SumAll / dim=64x128x128x128 3.50 3.48 10.84 3.62 5.15 +0% +3% +209%
SumAll / dim=1000000 0.38 0.34 0.09 0.53 0.11 +12% +38% -75%
SumAll / dim=1000000x128 3.34 3.35 10.20 3.52 4.88 0% +5% +205%
SumAll / dim=128x1000000 3.41 3.31 10.01 3.34 4.83 +2% -1% +193%

M1 Max 64GB mlx 0.2.0

Operation mlx_gpu mlx_gpu_compile mlx_cpu mps cpu mlx_gpu_compile/mlx_gpu speedup mlx_gpu/mps speedup mlx_gpu/mlx_cpu speedup
Argmax / dim=64x1024x128 axi=0 2.08 1.67 11.14 4.58 25.92 +25% +119% +434%
Argmax / dim=64x1024x128 axi=1 2.20 1.72 10.85 1.48 6.47 +27% -32% +393%
Argmax / dim=64x1024x128 axi=2 2.19 1.68 10.62 1.01 2.35 +30% -53% +385%
Argmax / dim=64x128x1024 axi=2 2.11 1.69 10.58 0.63 1.92 +25% -70% +401%
BCE / dim=1000000 dim=1000000 0.52 0.35 6.62 0.48 1.36 +50% -8% +1172%
BCE / dim=100000x32 dim=100000x32 0.71 0.45 21.63 0.69 3.47 +56% -3% +2953%
BCE / dim=100000x64x2 dim=100000x64x2 1.98 0.91 86.42 1.47 13.99 +118% -25% +4266%
BCE / dim=128x100000 dim=128x100000 1.98 0.91 86.40 1.39 13.53 +117% -29% +4268%
Concat / dim=1000000x64 dim=1000000x32 axi=1 2.34 2.33 78.58 2.41 18.34 +0% +2% +3260%
Concat / dim=1000000x64 dim=1000000x128 axi=1 4.43 4.41 146.43 4.52 42.51 +0% +2% +3206%
Concat / dim=1000000x64 dim=1000000x64 axi=0 3.02 3.02 62.03 3.07 19.04 +0% +1% +1952%
Concat / dim=64x1000000 dim=64x1000000 axi=0 3.03 3.02 82.37 3.08 19.29 +0% +1% +2622%
Conv1d / dim=100x256x3 dim=8x3x3 0.39 0.39 0.36 0.40 2.57 +0% +0% -7%
Conv1d / dim=100x256x256 dim=8x3x256 4.05 1.63 8.22 1.80 69.41 +149% -55% +102%
Conv1d / dim=16x1000x80 dim=128x11x80 2.10 1.38 4.20 1.87 497.10 +52% -10% +100%
Conv1d / dim=16x1000x3 dim=128x11x3 2.27 0.53 0.59 0.64 59.95 +326% -71% -74%
Conv2d / dim=100x256x256x3 dim=8x3x3x3 18.90 18.76 1034.14 3.49 131.73 +0% -81% +5370%
Conv2d / dim=10x256x256x12 dim=8x3x3x12 9.18 6.97 419.33 2.23 18.17 +31% -75% +4467%
Conv2d / dim=1x256x256x128 dim=8x3x3x128 9.23 7.42 697.01 2.21 18.67 +24% -76% +7449%
Conv2d / dim=100x28x28x3 dim=8x3x3x3 0.98 0.78 10.66 0.53 1.45 +25% -46% +988%
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 2.60 2.25 116.24 1.46 7.77 +15% -43% +4363%
Gather / dim=64x256 dim=10 0.23 0.23 0.01 0.23 0.01 0% 0% -94%
Gather / dim=64x256 dim=1000 0.33 0.32 0.03 0.33 0.12 +4% +0% -89%
Gather / dim=64x256 dim=1000000 11.57 11.54 28.36 50.98 46.58 +0% +340% +145%
Gather / dim=1024x32 dim=10 0.25 0.23 0.01 0.22 0.00 +7% -8% -94%
Gather / dim=1024x32 dim=1000 0.26 0.25 0.02 0.25 0.09 +3% -4% -92%
Gather / dim=1024x32 dim=1000000 2.42 1.66 7.20 6.66 6.73 +45% +175% +197%
LeakyReLU / dim=128x16x1024 0.40 0.30 1.79 0.37 0.68 +31% -8% +347%
LeakyReLU / dim=64x128x1024 0.67 0.37 7.02 0.54 0.59 +83% -19% +941%
Linear / dim=100x1024x32 dim=32x1024 dim=1024 4.35 3.74 19.60 3.40 31.02 +16% -21% +350%
Linear / dim=100x1024x64 dim=64x1024 dim=1024 4.71 4.34 27.89 5.88 37.61 +8% +24% +492%
Linear / dim=100x1024x256 dim=256x1024 dim=1024 8.88 8.86 49.44 24.42 53.09 +0% +174% +456%
Linear / dim=100x1024x512 dim=512x1024 dim=1024 14.83 14.81 64.98 47.81 75.07 +0% +222% +338%
Linear / dim=100x1x51200 dim=51200x1 dim=1 0.85 0.71 0.37 0.67 0.44 +19% -21% -56%
MatMul / dim=32x1x1000 dim=32x1000x128 0.31 0.26 0.84 0.40 0.87 +17% +29% +173%
MatMul / dim=1000x64x256 dim=256x32 0.62 0.60 1.71 2.02 2.31 +3% +225% +176%
MatMul / dim=1000x64x1024 dim=1000x1024x32 1.71 1.70 17.97 7.38 20.63 +0% +331% +949%
MatMul / dim=1000x1024x64 dim=1000x64x256 4.84 4.83 64.09 4.96 122.87 +0% +2% +1224%
MatMul / dim=64x1000000 dim=1000000x32 2.87 2.88 10.51 11.88 10.47 0% +313% +265%
MatMul / dim=1000000x64 dim=64x1024 17.58 17.56 187.87 40.33 372.75 +0% +129% +968%
PReLU / dim=128x16x1024 dim=1 0.57 0.33 1.09 0.36 0.55 +73% -36% +93%
PReLU / dim=64x128x1024 dim=1 1.08 0.39 4.19 0.52 0.59 +172% -51% +289%
ReLU / dim=128x16x1024 0.32 0.29 0.31 0.37 0.63 +10% +15% -3%
ReLU / dim=64x128x1024 0.41 0.38 1.33 0.52 0.58 +8% +26% +224%
Scatter / dim=64x16 dim=10 0.23 0.24 0.01 0.22 0.00 -5% -5% -93%
Scatter / dim=64x16 dim=1000 0.31 0.29 0.07 0.25 0.05 +4% -18% -75%
Scatter / dim=64x16 dim=1000000 7.99 7.94 62.95 3.41 4.35 +0% -57% +687%
Scatter / dim=1024x32 dim=10 0.28 0.26 0.02 0.22 0.00 +6% -19% -93%
Scatter / dim=1024x32 dim=1000 0.31 0.29 0.13 0.26 0.08 +6% -16% -57%
Scatter / dim=1024x32 dim=1000000 15.54 15.51 118.69 6.72 6.19 +0% -56% +663%
ScatterSum / dim=64x16 dim=10 0.05 0.03 0.01 nan 0.00 +46% nan% -82%
ScatterSum / dim=64x16 dim=1000 0.05 0.03 0.01 nan 0.01 +41% nan% -80%
ScatterSum / dim=64x16 dim=1000000 0.05 0.03 0.01 nan 1.60 +54% nan% -83%
ScatterSum / dim=1024x32 dim=10 0.04 0.03 0.01 nan 0.01 +35% nan% -78%
ScatterSum / dim=1024x32 dim=1000 0.05 0.03 0.01 nan 0.01 +41% nan% -81%
ScatterSum / dim=1024x32 dim=1000000 0.05 0.03 0.01 nan 6.49 +38% nan% -81%
ScatterMax / dim=64x16 dim=10 0.05 0.03 0.01 nan 0.00 +41% nan% -82%
ScatterMax / dim=64x16 dim=1000 0.05 0.03 0.01 nan 0.01 +37% nan% -82%
ScatterMax / dim=64x16 dim=1000000 0.05 0.03 0.01 nan 1.54 +37% nan% -80%
ScatterMax / dim=1024x32 dim=10 0.04 0.03 0.01 nan 0.01 +26% nan% -79%
ScatterMax / dim=1024x32 dim=1000 0.05 0.03 0.01 nan 0.01 +36% nan% -81%
ScatterMax / dim=1024x32 dim=1000000 0.04 0.03 0.01 nan 6.55 +29% nan% -79%
SeLU / dim=128x16x1024 0.98 0.33 2.75 0.36 2.62 +202% -63% +178%
SeLU / dim=64x128x1024 2.07 0.40 11.36 0.57 9.32 +423% -72% +449%
Sigmoid / dim=128x16x1024 0.34 0.32 13.10 0.40 2.43 +4% +18% +3784%
Sigmoid / dim=64x128x1024 0.43 0.39 52.05 0.60 8.43 +9% +39% +12049%
Softmax / dim=64x1000000 axi=-1 5.86 4.49 53.32 3.30 33.16 +30% -43% +809%
Softmax / dim=1000000x64 axi=-1 5.87 4.48 53.33 4.89 34.35 +31% -16% +807%
Softmax / dim=64x16x32x1024 axi=-1 3.25 2.50 28.15 3.18 19.36 +29% -2% +766%
Softmax / dim=128x16x32x1024 axi=-1 6.13 4.68 55.99 5.01 31.86 +30% -18% +813%
Softmax / dim=1024x16x32x128 axi=-1 6.13 4.68 55.89 5.14 33.61 +31% -16% +811%
Softmax / dim=1024x64x32x8 axi=-1 1.79 1.44 14.22 1.79 21.23 +24% 0% +692%
Softplus / dim=128x16x1024 0.52 0.32 13.09 0.47 3.57 +62% -9% +2405%
Softplus / dim=64x128x1024 0.62 0.37 52.49 0.87 12.93 +68% +40% +8373%
Sort / dim=64x128x1024 axi=0 1.08 0.97 257.13 29.79 52.22 +11% +2661% +23735%
Sort / dim=64x128x1024 axi=1 1.09 0.98 257.19 21.35 43.94 +11% +1862% +23537%
Sort / dim=64x128x1024 axi=2 1.06 0.97 257.20 9.39 51.75 +9% +782% +24073%
Sum / dim=64x128x128x128 axi=0 1.75 1.74 8.96 5.57 15.39 +0% +218% +411%
Sum / dim=64x128x128x128 axi=1 1.76 1.74 8.84 1.80 13.53 +0% +2% +403%
Sum / dim=64x128x128x128 axi=2 1.74 1.74 8.84 1.79 7.31 +0% +2% +406%
Sum / dim=64x128x128x128 axi=3 1.74 1.74 8.83 3.25 5.18 +0% +86% +406%
SumAll / dim=64x128x128x128 1.74 1.73 8.82 1.84 4.50 +0% +5% +405%
SumAll / dim=1000000 0.32 0.30 0.06 0.36 0.10 +7% +15% -81%
SumAll / dim=1000000x128 1.68 1.67 8.68 1.93 4.47 +1% +14% +415%
SumAll / dim=128x1000000 1.68 1.67 8.95 1.87 4.35 +0% +11% +434%

M2

Operation mlx_gpu mlx_cpu mps cpu mlx_gpu/mps speedup mlx_gpu/mlx_cpu speedup
Argmax / dim=64x1024x128 axi=0 2.57 16.16 3.93 22.12 +52% +528%
Argmax / dim=64x1024x128 axi=1 1.47 18.03 3.35 6.93 +127% +1126%
Argmax / dim=64x1024x128 axi=2 1.44 13.91 2.39 2.67 +65% +866%
Argmax / dim=64x128x1024 axi=2 1.35 16.74 1.18 2.32 -12% +1140%
BCE / dim=1000000 dim=1000000 0.83 11.36 1.93 1.96 +132% +1267%
BCE / dim=100000x32 dim=100000x32 1.84 35.85 6.22 5.58 +238% +1849%
BCE / dim=100000x64x2 dim=100000x64x2 6.08 140.95 21.14 21.82 +247% +2219%
BCE / dim=128x100000 dim=128x100000 6.10 141.21 22.96 23.99 +276% +2215%
Concat / dim=1000000x64 dim=1000000x32 axi=1 8.65 99.86 8.95 34.82 +3% +1054%
Concat / dim=1000000x64 dim=1000000x128 axi=1 17.19 221.80 17.74 62.82 +3% +1190%
Concat / dim=1000000x64 dim=1000000x64 axi=0 11.35 142.18 11.57 44.15 +1% +1152%
Concat / dim=64x1000000 dim=64x1000000 axi=0 11.37 180.44 11.78 43.52 +3% +1487%
Conv1d / dim=100x256x3 dim=8x3x3 0.56 0.56 0.59 3.62 +5% 0%
Conv1d / dim=100x256x256 dim=8x3x256 5.08 13.73 5.53 56.72 +8% +170%
Conv1d / dim=16x1000x80 dim=128x11x80 6.01 8.79 5.61 392.02 -6% +46%
Conv1d / dim=16x1000x3 dim=128x11x3 3.00 0.95 1.44 78.40 -51% -68%
Conv2d / dim=100x256x256x3 dim=8x3x3x3 77.90 1491.19 10.80 173.72 -86% +1814%
Conv2d / dim=10x256x256x12 dim=8x3x3x12 25.70 690.25 7.00 42.99 -72% +2585%
Conv2d / dim=1x256x256x128 dim=8x3x3x128 24.70 1154.58 6.52 55.78 -73% +4574%
Conv2d / dim=100x28x28x3 dim=8x3x3x3 2.00 17.35 1.17 1.87 -41% +769%
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 8.41 175.54 4.20 10.03 -50% +1987%
LeakyReLU / dim=128x16x1024 0.80 1.49 0.70 0.92 -12% +84%
LeakyReLU / dim=64x128x1024 2.20 3.39 1.50 1.75 -31% +54%
Linear / dim=100x1024x32 dim=32x1024 dim=1024 16.90 74.45 14.00 83.08 -17% +340%
Linear / dim=100x1024x64 dim=64x1024 dim=1024 16.82 90.75 23.52 104.08 +39% +439%
Linear / dim=100x1024x256 dim=256x1024 dim=1024 33.88 131.98 83.12 255.48 +145% +289%
Linear / dim=100x1024x512 dim=512x1024 dim=1024 56.74 201.06 164.84 471.68 +190% +254%
Linear / dim=100x1x51200 dim=51200x1 dim=1 0.72 0.30 0.99 0.78 +37% -57%
MatMul / dim=32x1x1000 dim=32x1000x128 0.46 0.26 0.85 1.35 +85% -42%
MatMul / dim=1000x64x256 dim=256x32 1.14 4.86 5.17 13.63 +351% +324%
MatMul / dim=1000x64x1024 dim=1000x1024x32 4.84 24.73 19.62 188.50 +305% +410%
MatMul / dim=1000x1024x64 dim=1000x64x256 17.95 110.02 20.34 1452.25 +13% +512%
MatMul / dim=64x1000000 dim=1000000x32 39.95 21.26 33.21 49.77 -16% -46%
MatMul / dim=1000000x64 dim=64x1024 67.87 562.54 389.42 2072.27 +473% +728%
PReLU / dim=128x16x1024 dim=1 1.18 2.88 0.71 0.95 -39% +145%
PReLU / dim=64x128x1024 dim=1 3.69 6.27 1.37 1.74 -62% +69%
ReLU / dim=128x16x1024 0.46 0.66 0.67 0.93 +44% +42%
ReLU / dim=64x128x1024 1.08 1.33 1.34 1.76 +23% +23%
SeLU / dim=128x16x1024 2.39 7.83 0.71 3.37 -70% +227%
SeLU / dim=64x128x1024 8.22 26.67 1.51 13.12 -81% +224%
Sigmoid / dim=128x16x1024 0.46 21.21 0.76 3.12 +66% +4535%
Sigmoid / dim=64x128x1024 1.07 84.49 1.50 11.83 +39% +7760%
Softmax / dim=64x1000000 axi=-1 8.80 55.52 11.74 71.44 +33% +530%
Softmax / dim=1000000x64 axi=-1 6.00 55.61 16.18 79.75 +169% +827%
Softmax / dim=64x16x32x1024 axi=-1 3.23 28.86 12.77 28.85 +294% +792%
Softmax / dim=128x16x32x1024 axi=-1 6.16 57.68 23.52 75.04 +282% +836%
Softmax / dim=1024x16x32x128 axi=-1 6.13 55.61 17.27 75.92 +181% +807%
Softmax / dim=1024x64x32x8 axi=-1 12.10 140.43 5.74 34.53 -52% +1060%
Softplus / dim=128x16x1024 0.61 21.86 1.01 5.15 +64% +3457%
Softplus / dim=64x128x1024 1.20 86.03 2.45 19.10 +104% +7069%
Sort / dim=64x128x1024 axi=0 31.34 1619.90 56.94 78.79 +81% +5068%
Sort / dim=64x128x1024 axi=1 16.30 1686.63 52.80 73.93 +224% +10249%
Sort / dim=64x128x1024 axi=2 2.97 423.22 30.32 85.22 +920% +14145%
Sum / dim=64x128x128x128 axi=0 10.23 25.07 18.06 18.94 +76% +144%
Sum / dim=64x128x128x128 axi=1 6.19 19.27 6.41 17.45 +3% +211%
Sum / dim=64x128x128x128 axi=2 6.17 17.14 6.26 11.48 +1% +177%
Sum / dim=64x128x128x128 axi=3 14.01 12.03 11.15 8.90 -20% -14%
SumAll / dim=64x128x128x128 5.75 10.66 6.72 8.27 +17% +85%
SumAll / dim=1000000 0.38 0.10 0.48 0.10 +27% -73%
SumAll / dim=1000000x128 5.53 10.20 6.37 8.07 +15% +84%
SumAll / dim=128x1000000 5.61 10.20 6.26 7.99 +11% +81%

M2 Pro (cores: 4E+6P+16GPU)

Operation mlx_gpu mlx_cpu mps cpu mlx_gpu/mps speedup mlx_gpu/mlx_cpu speedup
Argmax / dim=64x1024x128 axi=0 1.55 19.78 1.88 22.70 +20% +1173%
Argmax / dim=64x1024x128 axi=1 0.83 10.20 1.53 4.78 +83% +1124%
Argmax / dim=64x1024x128 axi=2 0.94 7.87 1.38 1.89 +47% +741%
Argmax / dim=64x128x1024 axi=2 1.07 9.51 1.11 1.75 +4% +792%
BCE / dim=1000000 dim=1000000 0.92 7.14 1.22 1.22 +32% +674%
BCE / dim=100000x32 dim=100000x32 1.25 21.06 3.54 3.19 +182% +1580%
BCE / dim=100000x64x2 dim=100000x64x2 3.09 81.30 13.98 13.46 +352% +2531%
BCE / dim=128x100000 dim=128x100000 3.05 80.69 13.23 13.29 +334% +2548%
Concat / dim=1000000x64 dim=1000000x32 axi=1 4.37 58.87 4.71 34.48 +7% +1247%
Concat / dim=1000000x64 dim=1000000x128 axi=1 8.46 134.11 8.83 44.21 +4% +1486%
Concat / dim=1000000x64 dim=1000000x64 axi=0 5.72 78.40 6.11 38.42 +6% +1270%
Concat / dim=64x1000000 dim=64x1000000 axi=0 5.96 99.03 6.10 38.99 +2% +1562%
Conv1d / dim=100x256x3 dim=8x3x3 1.15 0.37 0.62 2.42 -45% -67%
Conv1d / dim=100x256x256 dim=8x3x256 2.68 8.07 3.15 61.01 +17% +201%
Conv1d / dim=16x1000x80 dim=128x11x80 3.88 4.25 3.17 458.67 -18% +9%
Conv1d / dim=16x1000x3 dim=128x11x3 3.77 0.53 1.00 58.17 -73% -85%
Conv2d / dim=100x256x256x3 dim=8x3x3x3 34.84 854.67 7.60 140.59 -78% +2352%
Conv2d / dim=10x256x256x12 dim=8x3x3x12 17.97 388.74 6.66 28.82 -62% +2063%
Conv2d / dim=1x256x256x128 dim=8x3x3x128 12.00 654.79 5.26 39.17 -56% +5358%
Conv2d / dim=100x28x28x3 dim=8x3x3x3 1.47 9.80 0.89 1.42 -39% +567%
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 5.29 99.13 2.32 7.86 -56% +1775%
LeakyReLU / dim=128x16x1024 0.76 1.20 0.80 1.10 +6% +58%
LeakyReLU / dim=64x128x1024 1.26 2.70 1.21 1.34 -4% +113%
Linear / dim=100x1024x32 dim=32x1024 dim=1024 8.95 42.74 7.19 51.32 -19% +377%
Linear / dim=100x1024x64 dim=64x1024 dim=1024 8.68 45.12 12.07 64.84 +39% +419%
Linear / dim=100x1024x256 dim=256x1024 dim=1024 16.92 59.49 46.26 144.12 +173% +251%
Linear / dim=100x1024x512 dim=512x1024 dim=1024 28.51 80.04 92.27 253.60 +223% +180%
Linear / dim=100x1x51200 dim=51200x1 dim=1 0.57 0.36 1.28 1.20 +123% -36%
MatMul / dim=32x1x1000 dim=32x1000x128 0.35 0.29 0.94 0.75 +169% -17%
MatMul / dim=1000x64x256 dim=256x32 0.59 1.93 3.62 23.10 +517% +228%
MatMul / dim=1000x64x1024 dim=1000x1024x32 2.67 14.83 12.84 1009.65 +381% +456%
MatMul / dim=1000x1024x64 dim=1000x64x256 9.25 77.07 11.62 2264.38 +25% +732%
MatMul / dim=64x1000000 dim=1000000x32 41.00 9.28 19.02 146.63 -53% -77%
MatMul / dim=1000000x64 dim=64x1024 34.28 156.61 88.69 1265.80 +158% +356%
PReLU / dim=128x16x1024 dim=1 0.90 2.46 0.70 1.07 -21% +173%
PReLU / dim=64x128x1024 dim=1 1.83 4.38 1.05 1.38 -42% +138%
ReLU / dim=128x16x1024 0.30 0.53 0.58 0.95 +91% +75%
ReLU / dim=64x128x1024 0.66 0.88 1.25 1.36 +88% +32%
SeLU / dim=128x16x1024 1.51 6.16 0.77 2.69 -49% +307%
SeLU / dim=64x128x1024 4.05 18.29 1.24 8.63 -69% +351%
Sigmoid / dim=128x16x1024 0.68 12.36 0.85 2.34 +25% +1725%
Sigmoid / dim=64x128x1024 0.90 48.84 1.33 8.12 +48% +5322%
Softmax / dim=64x1000000 axi=-1 4.42 33.53 6.43 42.36 +45% +658%
Softmax / dim=1000000x64 axi=-1 3.51 32.75 10.44 45.64 +197% +833%
Softmax / dim=64x16x32x1024 axi=-1 1.70 17.05 9.39 17.86 +452% +903%
Softmax / dim=128x16x32x1024 axi=-1 3.13 35.75 15.36 43.68 +390% +1042%
Softmax / dim=1024x16x32x128 axi=-1 3.14 32.62 10.44 45.19 +232% +938%
Softmax / dim=1024x64x32x8 axi=-1 7.56 81.85 3.00 21.47 -60% +983%
Softplus / dim=128x16x1024 0.79 12.80 0.86 3.18 +8% +1518%
Softplus / dim=64x128x1024 0.92 50.19 1.40 12.44 +52% +5371%
Sort / dim=64x128x1024 axi=0 15.49 924.82 25.23 49.20 +62% +5871%
Sort / dim=64x128x1024 axi=1 8.37 955.98 27.60 44.17 +229% +11325%
Sort / dim=64x128x1024 axi=2 1.63 246.31 16.75 53.48 +926% +14986%
Sum / dim=64x128x128x128 axi=0 3.22 15.14 3.71 14.62 +15% +370%
Sum / dim=64x128x128x128 axi=1 3.24 11.50 3.65 12.97 +12% +254%
Sum / dim=64x128x128x128 axi=2 3.03 10.27 3.53 8.73 +16% +239%
Sum / dim=64x128x128x128 axi=3 6.73 9.31 6.20 5.12 -7% +38%
SumAll / dim=64x128x128x128 3.26 9.09 3.64 4.66 +11% +178%
SumAll / dim=1000000 0.70 0.06 0.70 0.14 0% -90%
SumAll / dim=1000000x128 3.10 8.59 3.54 4.60 +14% +177%
SumAll / dim=128x1000000 3.16 8.64 3.53 4.49 +11% +173%

M2 Max (cores: 4E+8P+38GPU)

Operation mlx_gpu mlx_cpu mps cpu mlx_gpu/mps speedup mlx_gpu/mlx_cpu speedup
Argmax / dim=64x1024x128 axi=0 0.96 19.74 1.59 21.86 +64% +1951%
Argmax / dim=64x1024x128 axi=1 0.95 9.53 1.24 4.13 +30% +902%
Argmax / dim=64x1024x128 axi=2 0.79 7.47 1.24 1.70 +56% +845%
Argmax / dim=64x128x1024 axi=2 0.79 9.16 1.10 1.39 +39% +1055%
BCE / dim=1000000 dim=1000000 0.84 6.87 1.07 1.09 +28% +719%
BCE / dim=100000x32 dim=100000x32 1.06 19.99 2.84 2.73 +168% +1785%
BCE / dim=100000x64x2 dim=100000x64x2 1.63 78.65 11.70 10.60 +618% +4733%
BCE / dim=128x100000 dim=128x100000 1.73 79.53 11.62 10.64 +570% +4491%
Concat / dim=1000000x64 dim=1000000x32 axi=1 2.34 56.87 2.95 17.39 +26% +2335%
Concat / dim=1000000x64 dim=1000000x128 axi=1 4.40 131.50 5.15 40.52 +17% +2887%
Concat / dim=1000000x64 dim=1000000x64 axi=0 3.00 76.65 3.78 18.15 +25% +2452%
Concat / dim=64x1000000 dim=64x1000000 axi=0 3.03 96.23 3.58 18.10 +18% +3073%
Conv1d / dim=100x256x3 dim=8x3x3 0.71 0.35 0.81 2.31 +14% -50%
Conv1d / dim=100x256x256 dim=8x3x256 2.05 7.89 1.80 65.20 -11% +285%
Conv1d / dim=16x1000x80 dim=128x11x80 2.72 4.11 2.45 470.47 -10% +50%
Conv1d / dim=16x1000x3 dim=128x11x3 10.05 0.59 0.90 52.53 -91% -94%
Conv2d / dim=100x256x256x3 dim=8x3x3x3 15.25 857.17 3.82 116.44 -74% +5519%
Conv2d / dim=10x256x256x12 dim=8x3x3x12 6.40 370.56 2.73 17.94 -57% +5689%
Conv2d / dim=1x256x256x128 dim=8x3x3x128 6.75 622.48 2.74 37.98 -59% +9123%
Conv2d / dim=100x28x28x3 dim=8x3x3x3 0.98 9.74 0.88 1.33 -10% +897%
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 5.01 98.69 1.80 7.38 -64% +1870%
LeakyReLU / dim=128x16x1024 0.84 0.93 0.76 1.12 -8% +11%
LeakyReLU / dim=64x128x1024 0.95 2.62 1.12 0.60 +18% +176%
Linear / dim=100x1024x32 dim=32x1024 dim=1024 5.10 42.76 4.18 39.49 -17% +738%
Linear / dim=100x1024x64 dim=64x1024 dim=1024 4.90 45.12 6.12 50.30 +24% +820%
Linear / dim=100x1024x256 dim=256x1024 dim=1024 7.93 59.64 19.33 97.15 +143% +652%
Linear / dim=100x1024x512 dim=512x1024 dim=1024 12.49 79.32 37.50 189.29 +200% +534%
Linear / dim=100x1x51200 dim=51200x1 dim=1 1.22 0.32 1.16 0.68 -5% -74%
MatMul / dim=32x1x1000 dim=32x1000x128 0.67 0.29 0.94 0.73 +41% -55%
MatMul / dim=1000x64x256 dim=256x32 0.43 1.76 3.19 25.90 +642% +310%
MatMul / dim=1000x64x1024 dim=1000x1024x32 1.41 14.42 7.78 667.52 +450% +920%
MatMul / dim=1000x1024x64 dim=1000x64x256 4.40 77.69 6.27 1895.84 +42% +1664%
MatMul / dim=64x1000000 dim=1000000x32 40.15 9.05 10.97 163.00 -72% -77%
MatMul / dim=1000000x64 dim=64x1024 15.02 150.79 35.25 1525.49 +134% +903%
PReLU / dim=128x16x1024 dim=1 0.90 1.95 0.75 1.19 -16% +116%
PReLU / dim=64x128x1024 dim=1 1.57 4.34 1.20 0.57 -23% +176%
ReLU / dim=128x16x1024 0.81 0.47 0.78 1.31 -3% -41%
ReLU / dim=64x128x1024 0.61 1.05 1.10 0.57 +79% +71%
SeLU / dim=128x16x1024 1.47 5.58 0.83 2.05 -43% +279%
SeLU / dim=64x128x1024 2.35 18.42 1.17 6.97 -50% +683%
Sigmoid / dim=128x16x1024 0.54 12.64 0.50 1.94 -6% +2238%
Sigmoid / dim=64x128x1024 0.41 49.49 0.98 6.99 +142% +12116%
Softmax / dim=64x1000000 axi=-1 2.35 33.88 4.05 27.56 +72% +1341%
Softmax / dim=1000000x64 axi=-1 2.25 32.44 5.53 28.25 +145% +1341%
Softmax / dim=64x16x32x1024 axi=-1 0.99 16.97 6.91 15.33 +599% +1617%
Softmax / dim=128x16x32x1024 axi=-1 1.71 33.61 15.73 27.19 +821% +1868%
Softmax / dim=1024x16x32x128 axi=-1 1.72 32.31 6.26 27.34 +264% +1778%
Softmax / dim=1024x64x32x8 axi=-1 5.34 81.44 2.36 17.73 -55% +1424%
Softplus / dim=128x16x1024 0.42 12.65 0.59 2.71 +41% +2943%
Softplus / dim=64x128x1024 0.45 50.08 1.07 9.92 +137% +10979%
Sort / dim=64x128x1024 axi=0 6.74 914.48 11.56 41.90 +71% +13469%
Sort / dim=64x128x1024 axi=1 3.63 937.31 12.57 37.46 +246% +25707%
Sort / dim=64x128x1024 axi=2 0.88 245.80 8.83 44.32 +908% +27959%
Sum / dim=64x128x128x128 axi=0 1.72 15.13 1.96 15.38 +14% +782%
Sum / dim=64x128x128x128 axi=1 1.70 11.54 2.30 12.89 +35% +580%
Sum / dim=64x128x128x128 axi=2 1.66 10.31 2.35 8.08 +41% +519%
Sum / dim=64x128x128x128 axi=3 3.28 9.24 3.78 5.26 +15% +181%
SumAll / dim=64x128x128x128 1.71 9.17 2.26 4.66 +32% +434%
SumAll / dim=1000000 0.66 0.06 0.74 0.11 +12% -90%
SumAll / dim=1000000x128 1.62 8.57 2.42 4.55 +49% +429%
SumAll / dim=128x1000000 1.63 8.61 2.34 4.60 +43% +427%

M2 Ultra (cores: 8E+16P+76GPU) mlx 0.2.0

Operation mlx_gpu mlx_gpu_compile mlx_cpu mps cpu mlx_gpu_compile/mlx_gpu speedup mlx_gpu/mps speedup mlx_gpu/mlx_cpu speedup
Argmax / dim=64x1024x128 axi=0 1.96 1.71 9.13 0.73 33.55 +14% -62% +366%
Argmax / dim=64x1024x128 axi=1 1.76 1.67 9.14 0.61 2.40 +5% -65% +419%
Argmax / dim=64x1024x128 axi=2 1.72 1.67 9.25 0.67 1.01 +3% -60% +438%
Argmax / dim=64x128x1024 axi=2 1.76 1.67 9.14 0.41 0.86 +5% -76% +419%
BCE / dim=1000000 dim=1000000 0.47 0.33 5.78 0.33 1.04 +41% -30% +1140%
BCE / dim=100000x32 dim=100000x32 0.54 0.44 19.18 0.40 1.70 +23% -26% +3421%
BCE / dim=100000x64x2 dim=100000x64x2 0.96 0.50 77.08 0.76 5.47 +90% -20% +7927%
BCE / dim=128x100000 dim=128x100000 0.95 0.50 77.05 0.50 5.48 +91% -47% +7990%
Concat / dim=1000000x64 dim=1000000x32 axi=1 1.25 1.25 75.15 1.33 29.07 +0% +6% +5918%
Concat / dim=1000000x64 dim=1000000x128 axi=1 2.30 2.27 148.17 2.45 61.83 +1% +6% +6344%
Concat / dim=1000000x64 dim=1000000x64 axi=0 1.64 1.61 56.08 1.74 33.53 +2% +5% +3313%
Concat / dim=64x1000000 dim=64x1000000 axi=0 1.64 1.61 76.46 1.71 35.67 +1% +4% +4575%
Conv1d / dim=100x256x3 dim=8x3x3 0.38 0.36 0.39 0.46 3.04 +6% +19% +2%
Conv1d / dim=100x256x256 dim=8x3x256 4.06 0.84 7.20 0.74 79.68 +386% -81% +77%
Conv1d / dim=16x1000x80 dim=128x11x80 0.83 0.78 2.89 0.71 533.20 +7% -14% +246%
Conv1d / dim=16x1000x3 dim=128x11x3 3.51 0.34 0.54 0.41 60.77 +946% -88% -84%
Conv2d / dim=100x256x256x3 dim=8x3x3x3 12.22 12.57 793.00 1.70 140.13 -2% -86% +6389%
Conv2d / dim=10x256x256x12 dim=8x3x3x12 5.58 2.95 366.69 0.94 14.20 +88% -83% +6470%
Conv2d / dim=1x256x256x128 dim=8x3x3x128 7.14 3.24 629.08 0.90 20.01 +120% -87% +8708%
Conv2d / dim=100x28x28x3 dim=8x3x3x3 0.56 0.48 9.67 0.49 1.64 +18% -12% +1617%
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 1.44 1.29 96.38 0.67 9.37 +11% -53% +6596%
Gather / dim=64x256 dim=10 0.28 0.33 0.01 0.20 0.01 -15% -29% -96%
Gather / dim=64x256 dim=1000 0.42 0.37 0.03 0.37 0.24 +13% -11% -92%
Gather / dim=64x256 dim=1000000 5.13 4.71 27.09 20.67 69.37 +9% +302% +427%
Gather / dim=1024x32 dim=10 0.23 0.22 0.01 0.21 0.00 +5% -12% -95%
Gather / dim=1024x32 dim=1000 0.30 0.26 0.02 0.24 0.11 +13% -18% -94%
Gather / dim=1024x32 dim=1000000 1.09 1.03 6.28 2.85 1.64 +5% +162% +477%
LeakyReLU / dim=128x16x1024 0.34 0.36 1.53 0.52 1.74 -6% +52% +350%
LeakyReLU / dim=64x128x1024 0.51 0.42 6.49 0.56 0.44 +21% +10% +1180%
Linear / dim=100x1024x32 dim=32x1024 dim=1024 2.71 2.12 14.87 1.89 29.93 +27% -30% +448%
Linear / dim=100x1024x64 dim=64x1024 dim=1024 2.47 2.19 15.83 3.16 30.58 +12% +27% +539%
Linear / dim=100x1024x256 dim=256x1024 dim=1024 4.21 3.93 22.94 11.31 44.35 +7% +168% +444%
Linear / dim=100x1024x512 dim=512x1024 dim=1024 6.38 6.39 31.64 20.35 53.68 0% +218% +395%
Linear / dim=100x1x51200 dim=51200x1 dim=1 0.89 0.72 0.26 0.70 0.26 +23% -21% -70%
MatMul / dim=32x1x1000 dim=32x1000x128 0.39 0.44 0.19 0.35 0.26 -9% -11% -51%
MatMul / dim=1000x64x256 dim=256x32 0.55 0.50 0.82 1.08 1.63 +8% +98% +50%
MatMul / dim=1000x64x1024 dim=1000x1024x32 0.91 0.90 15.18 5.02 17.84 +0% +449% +1562%
MatMul / dim=1000x1024x64 dim=1000x64x256 3.39 3.37 45.99 3.62 123.47 +0% +6% +1255%
MatMul / dim=64x1000000 dim=1000000x32 2.95 2.95 9.36 8.11 9.35 +0% +174% +217%
MatMul / dim=1000000x64 dim=64x1024 7.57 7.53 66.98 17.67 263.92 +0% +133% +785%
PReLU / dim=128x16x1024 dim=1 0.54 0.37 0.86 0.34 1.72 +46% -36% +59%
PReLU / dim=64x128x1024 dim=1 0.89 0.55 3.92 0.63 0.43 +59% -29% +343%
ReLU / dim=128x16x1024 0.46 0.34 0.27 0.64 1.82 +34% +39% -41%
ReLU / dim=64x128x1024 0.55 0.46 1.25 0.45 0.41 +19% -18% +127%
Scatter / dim=64x16 dim=1000 0.63 0.31 0.07 0.74 0.04 +106% +17% -89%
Scatter / dim=64x16 dim=1000000 3.95 3.21 55.13 1.69 3.31 +23% -57% +1294%
Scatter / dim=1024x32 dim=10 0.63 0.33 0.01 0.24 0.00 +92% -61% -97%
Scatter / dim=1024x32 dim=1000 0.45 0.44 0.12 0.52 0.10 +3% +14% -73%
Scatter / dim=1024x32 dim=1000000 6.71 6.24 103.83 2.96 4.22 +7% -55% +1448%
ScatterSum / dim=64x16 dim=10 0.04 0.03 0.01 nan 0.00 +31% nan% -79%
ScatterSum / dim=64x16 dim=1000 0.04 0.03 0.01 nan 0.01 +32% nan% -78%
ScatterSum / dim=64x16 dim=1000000 0.04 0.03 0.01 nan 1.38 +24% nan% -78%
ScatterSum / dim=1024x32 dim=10 0.04 0.03 0.01 nan 0.01 +39% nan% -81%
ScatterSum / dim=1024x32 dim=1000 0.04 0.03 0.01 nan 0.01 +39% nan% -81%
ScatterSum / dim=1024x32 dim=1000000 0.05 0.03 0.01 nan 6.93 +44% nan% -79%
ScatterMax / dim=64x16 dim=10 0.04 0.03 0.01 nan 0.00 +29% nan% -78%
ScatterMax / dim=64x16 dim=1000 0.04 0.03 0.01 nan 0.01 +32% nan% -79%
ScatterMax / dim=64x16 dim=1000000 0.04 0.03 0.01 nan 1.38 +17% nan% -77%
ScatterMax / dim=1024x32 dim=10 0.04 0.03 0.01 nan 0.01 +31% nan% -78%
ScatterMax / dim=1024x32 dim=1000 0.04 0.03 0.01 nan 0.01 +32% nan% -80%
ScatterMax / dim=1024x32 dim=1000000 0.04 0.03 0.01 nan 6.92 +25% nan% -80%
SeLU / dim=128x16x1024 0.96 0.57 2.59 0.61 1.91 +68% -36% +169%
SeLU / dim=64x128x1024 1.57 0.55 15.34 0.87 4.44 +182% -44% +879%
Sigmoid / dim=128x16x1024 0.39 0.81 11.76 0.47 1.81 -52% +20% +2930%
Sigmoid / dim=64x128x1024 0.70 0.84 46.61 0.73 3.41 -16% +4% +6598%
Softmax / dim=64x1000000 axi=-1 3.04 2.34 50.36 2.05 18.30 +29% -32% +1557%
Softmax / dim=1000000x64 axi=-1 3.08 2.39 50.81 2.93 19.76 +28% -4% +1549%
Softmax / dim=64x16x32x1024 axi=-1 1.86 1.34 26.83 2.03 7.07 +38% +9% +1342%
Softmax / dim=128x16x32x1024 axi=-1 3.15 2.35 52.72 3.26 17.29 +33% +3% +1574%
Softmax / dim=1024x16x32x128 axi=-1 3.15 2.36 52.82 2.99 18.05 +33% -4% +1578%
Softmax / dim=1024x64x32x8 axi=-1 1.49 0.82 13.40 0.94 9.24 +81% -36% +802%
Softplus / dim=128x16x1024 0.43 0.26 11.90 0.50 1.81 +65% +16% +2698%
Softplus / dim=64x128x1024 0.70 0.44 48.32 0.71 5.10 +59% +0% +6755%
Sort / dim=64x128x1024 axi=0 0.76 0.60 226.16 6.73 23.90 +26% +782% +29546%
Sort / dim=64x128x1024 axi=1 0.54 0.52 226.28 7.44 21.69 +3% +1274% +41707%
Sort / dim=64x128x1024 axi=2 0.76 0.62 227.19 4.69 21.90 +22% +513% +29646%
Sum / dim=64x128x128x128 axi=0 0.92 0.91 9.07 1.26 9.10 +1% +36% +884%
Sum / dim=64x128x128x128 axi=1 0.92 0.91 9.05 1.27 7.53 +1% +38% +884%
Sum / dim=64x128x128x128 axi=2 0.92 0.90 9.00 1.16 4.48 +2% +26% +877%
Sum / dim=64x128x128x128 axi=3 0.91 0.89 9.02 1.88 3.59 +2% +106% +888%
SumAll / dim=64x128x128x128 0.91 0.91 9.00 1.16 2.63 +0% +27% +891%
SumAll / dim=1000000 0.33 0.41 0.06 0.40 0.24 -18% +20% -82%
SumAll / dim=1000000x128 0.89 0.86 8.60 1.32 2.50 +3% +48% +869%
SumAll / dim=128x1000000 0.89 0.87 8.58 1.20 2.59 +2% +35% +865%

M3 (RAM: 16GB)

Operation mlx_gpu mlx_cpu mps cpu mlx_gpu/mps speedup mlx_gpu/mlx_cpu speedup
Argmax / dim=64x1024x128 axi=0 1.64 21.87 2.08 19.89 +26% +1233%
Argmax / dim=64x1024x128 axi=1 1.03 9.30 1.62 3.32 +58% +806%
Argmax / dim=64x1024x128 axi=2 1.08 6.99 1.94 2.77 +80% +549%
Argmax / dim=64x128x1024 axi=2 1.05 8.34 1.21 2.41 +15% +696%
BCE / dim=1000000 dim=1000000 0.80 5.31 1.30 1.30 +63% +566%
BCE / dim=100000x32 dim=100000x32 1.91 18.11 3.64 3.50 +90% +846%
BCE / dim=100000x64x2 dim=100000x64x2 6.58 69.43 14.49 14.17 +120% +954%
BCE / dim=128x100000 dim=128x100000 6.92 70.35 14.92 13.61 +115% +917%
Concat / dim=1000000x64 dim=1000000x32 axi=1 8.76 54.43 8.83 28.59 +0% +521%
Concat / dim=1000000x64 dim=1000000x128 axi=1 18.46 126.17 18.08 58.87 -2% +583%
Concat / dim=1000000x64 dim=1000000x64 axi=0 11.48 65.83 11.70 26.79 +1% +473%
Concat / dim=64x1000000 dim=64x1000000 axi=0 11.39 86.71 11.78 26.93 +3% +661%
Conv1d / dim=100x256x3 dim=8x3x3 0.51 0.33 0.44 1.90 -13% -36%
Conv1d / dim=100x256x256 dim=8x3x256 4.31 8.12 2.40 41.79 -44% +88%
Conv1d / dim=16x1000x80 dim=128x11x80 3.63 5.64 3.77 205.70 +3% +55%
Conv1d / dim=16x1000x3 dim=128x11x3 0.91 0.58 1.31 35.53 +43% -36%
Conv2d / dim=100x256x256x3 dim=8x3x3x3 41.44 730.75 7.45 98.91 -82% +1663%
Conv2d / dim=10x256x256x12 dim=8x3x3x12 16.01 318.25 4.73 31.72 -70% +1888%
Conv2d / dim=1x256x256x128 dim=8x3x3x128 18.47 551.98 6.10 42.98 -66% +2888%
Conv2d / dim=100x28x28x3 dim=8x3x3x3 1.57 9.14 0.93 1.13 -40% +480%
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 4.83 90.03 2.93 7.04 -39% +1763%
LeakyReLU / dim=128x16x1024 0.77 1.15 0.68 0.76 -11% +49%
LeakyReLU / dim=64x128x1024 2.09 4.96 1.34 1.38 -35% +137%
Linear / dim=100x1024x32 dim=32x1024 dim=1024 15.25 63.82 7.10 66.79 -53% +318%
Linear / dim=100x1024x64 dim=64x1024 dim=1024 15.55 68.92 7.98 78.16 -48% +343%
Linear / dim=100x1024x256 dim=256x1024 dim=1024 28.76 96.20 21.89 202.41 -23% +234%
Linear / dim=100x1024x512 dim=512x1024 dim=1024 47.50 130.18 41.22 260.27 -13% +174%
Linear / dim=100x1x51200 dim=51200x1 dim=1 0.67 0.34 1.02 4.00 +52% -48%
MatMul / dim=32x1x1000 dim=32x1000x128 0.47 0.15 0.97 0.68 +107% -68%
MatMul / dim=1000x64x256 dim=256x32 1.23 3.40 5.30 14.38 +330% +176%
MatMul / dim=1000x64x1024 dim=1000x1024x32 4.76 11.32 4.96 158.75 +4% +137%
MatMul / dim=1000x1024x64 dim=1000x64x256 17.41 89.14 19.42 1214.55 +11% +412%
MatMul / dim=64x1000000 dim=1000000x32 9.07 12.25 7.74 40.94 -14% +35%
MatMul / dim=1000000x64 dim=64x1024 60.00 343.14 161.03 1513.56 +168% +471%
PReLU / dim=128x16x1024 dim=1 1.14 1.10 0.67 0.86 -41% -3%
PReLU / dim=64x128x1024 dim=1 3.58 4.41 1.31 1.36 -63% +23%
ReLU / dim=128x16x1024 0.50 0.33 0.62 0.64 +24% -33%
ReLU / dim=64x128x1024 1.03 2.44 1.31 1.39 +27% +137%
SeLU / dim=128x16x1024 2.28 2.92 0.61 2.87 -73% +27%
SeLU / dim=64x128x1024 8.17 12.51 1.44 10.90 -82% +53%
Sigmoid / dim=128x16x1024 0.53 10.64 0.69 2.66 +30% +1901%
Sigmoid / dim=64x128x1024 1.05 43.31 1.45 8.72 +38% +4022%
Softmax / dim=64x1000000 axi=-1 8.85 42.85 11.65 37.39 +31% +384%
Softmax / dim=1000000x64 axi=-1 6.04 38.42 12.63 40.49 +109% +535%
Softmax / dim=64x16x32x1024 axi=-1 3.29 20.42 10.26 18.81 +212% +521%
Softmax / dim=128x16x32x1024 axi=-1 6.32 41.12 20.14 37.59 +218% +550%
Softmax / dim=1024x16x32x128 axi=-1 6.34 38.48 13.63 39.88 +114% +506%
Softmax / dim=1024x64x32x8 axi=-1 7.02 66.84 4.17 21.11 -40% +852%
Softplus / dim=128x16x1024 0.43 10.47 0.66 3.93 +50% +2309%
Softplus / dim=64x128x1024 1.03 43.17 1.50 14.26 +45% +4101%
Sort / dim=64x128x1024 axi=0 23.39 1015.93 35.02 59.19 +49% +4243%
Sort / dim=64x128x1024 axi=1 12.35 926.84 32.10 56.93 +159% +7405%
Sort / dim=64x128x1024 axi=2 2.27 230.00 25.06 64.75 +1003% +10025%
Sum / dim=64x128x128x128 axi=0 6.42 12.28 6.40 19.30 0% +91%
Sum / dim=64x128x128x128 axi=1 6.25 11.50 6.39 15.18 +2% +84%
Sum / dim=64x128x128x128 axi=2 6.28 11.08 6.34 8.06 +1% +76%
Sum / dim=64x128x128x128 axi=3 8.91 10.09 7.30 6.55 -18% +13%
SumAll / dim=64x128x128x128 5.76 10.70 6.38 5.92 +10% +85%
SumAll / dim=1000000 0.29 0.05 0.39 0.07 +36% -82%
SumAll / dim=1000000x128 5.52 9.80 6.18 5.81 +12% +77%
SumAll / dim=128x1000000 5.48 10.61 6.16 5.73 +12% +93%

M3 Pro (cores: 6E+5P+14GPU)

Operation mlx_gpu mlx_cpu mps cpu mlx_gpu/mps speedup mlx_gpu/mlx_cpu speedup
Argmax / dim=64x1024x128 axi=0 1.18 21.22 1.49 19.26 +25% +1695%
Argmax / dim=64x1024x128 axi=1 0.89 8.85 1.22 2.35 +36% +890%
Argmax / dim=64x1024x128 axi=2 0.97 6.65 1.16 1.58 +19% +587%
Argmax / dim=64x128x1024 axi=2 0.89 8.12 1.07 1.37 +21% +816%
BCE / dim=1000000 dim=1000000 0.81 5.96 1.09 1.04 +34% +634%
BCE / dim=100000x32 dim=100000x32 1.76 17.78 3.01 2.86 +71% +912%
BCE / dim=100000x64x2 dim=100000x64x2 4.20 67.99 11.80 11.81 +180% +1517%
BCE / dim=128x100000 dim=128x100000 4.04 67.80 11.57 11.39 +186% +1576%
Concat / dim=1000000x64 dim=1000000x32 axi=1 5.93 51.84 6.50 29.83 +9% +773%
Concat / dim=1000000x64 dim=1000000x128 axi=1 11.57 118.19 11.91 46.27 +3% +921%
Concat / dim=1000000x64 dim=1000000x64 axi=0 7.75 62.53 8.40 37.28 +8% +706%
Concat / dim=64x1000000 dim=64x1000000 axi=0 7.74 80.78 8.66 39.03 +11% +943%
Conv1d / dim=100x256x3 dim=8x3x3 0.80 0.28 0.80 1.97 0% -64%
Conv1d / dim=100x256x256 dim=8x3x256 2.95 7.32 1.99 50.02 -32% +147%
Conv1d / dim=16x1000x80 dim=128x11x80 3.70 5.40 4.23 233.14 +14% +45%
Conv1d / dim=16x1000x3 dim=128x11x3 1.15 0.44 1.26 47.61 +9% -61%
Conv2d / dim=100x256x256x3 dim=8x3x3x3 28.54 738.77 5.85 107.76 -79% +2488%
Conv2d / dim=10x256x256x12 dim=8x3x3x12 15.23 316.15 3.41 22.42 -77% +1975%
Conv2d / dim=1x256x256x128 dim=8x3x3x128 11.07 512.34 3.73 30.77 -66% +4528%
Conv2d / dim=100x28x28x3 dim=8x3x3x3 1.40 8.92 0.93 1.04 -33% +536%
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 4.07 88.95 1.54 7.37 -62% +2086%
LeakyReLU / dim=128x16x1024 1.31 0.96 0.99 0.90 -24% -26%
LeakyReLU / dim=64x128x1024 1.78 2.10 1.54 1.02 -13% +18%
Linear / dim=100x1024x32 dim=32x1024 dim=1024 11.22 42.51 5.76 38.86 -48% +278%
Linear / dim=100x1024x64 dim=64x1024 dim=1024 10.79 46.87 5.71 51.97 -47% +334%
Linear / dim=100x1024x256 dim=256x1024 dim=1024 20.39 70.91 15.84 125.66 -22% +247%
Linear / dim=100x1024x512 dim=512x1024 dim=1024 33.41 103.37 28.62 240.45 -14% +209%
Linear / dim=100x1x51200 dim=51200x1 dim=1 0.69 0.23 1.28 0.50 +86% -67%
MatMul / dim=32x1x1000 dim=32x1000x128 0.87 0.10 1.18 0.66 +36% -88%
MatMul / dim=1000x64x256 dim=256x32 0.83 2.81 3.93 16.03 +373% +238%
MatMul / dim=1000x64x1024 dim=1000x1024x32 3.25 10.29 3.55 272.27 +9% +216%
MatMul / dim=1000x1024x64 dim=1000x64x256 11.40 82.00 13.39 680.84 +17% +619%
MatMul / dim=64x1000000 dim=1000000x32 39.89 10.89 6.81 155.98 -82% -72%
MatMul / dim=1000000x64 dim=64x1024 40.00 309.51 106.34 1214.46 +165% +673%
PReLU / dim=128x16x1024 dim=1 1.47 1.98 1.09 0.79 -25% +34%
PReLU / dim=64x128x1024 dim=1 2.62 3.62 1.60 1.04 -38% +38%
ReLU / dim=128x16x1024 0.82 0.45 1.19 0.81 +43% -45%
ReLU / dim=64x128x1024 1.06 0.78 1.55 1.03 +46% -26%
SeLU / dim=128x16x1024 2.42 4.90 1.09 2.06 -55% +101%
SeLU / dim=64x128x1024 5.53 15.30 1.45 7.33 -73% +176%
Sigmoid / dim=128x16x1024 0.99 10.68 0.97 1.83 -2% +979%
Sigmoid / dim=64x128x1024 1.07 41.87 1.62 6.72 +51% +3800%
Softmax / dim=64x1000000 axi=-1 6.00 29.23 8.74 34.32 +45% +387%
Softmax / dim=1000000x64 axi=-1 4.05 28.19 9.60 37.89 +136% +595%
Softmax / dim=64x16x32x1024 axi=-1 2.27 14.97 8.09 14.76 +256% +560%
Softmax / dim=128x16x32x1024 axi=-1 4.25 29.98 16.10 35.51 +278% +605%
Softmax / dim=1024x16x32x128 axi=-1 4.23 28.42 10.15 36.84 +139% +571%
Softmax / dim=1024x64x32x8 axi=-1 6.93 64.48 3.24 19.37 -53% +830%
Softplus / dim=128x16x1024 0.71 10.54 1.05 2.72 +47% +1383%
Softplus / dim=64x128x1024 1.33 41.36 1.47 10.32 +10% +3012%
Sort / dim=64x128x1024 axi=0 15.52 1012.11 24.23 46.26 +56% +6422%
Sort / dim=64x128x1024 axi=1 8.50 898.97 22.34 46.41 +162% +10479%
Sort / dim=64x128x1024 axi=2 1.99 224.86 17.53 47.45 +782% +11220%
Sum / dim=64x128x128x128 axi=0 4.29 11.99 5.00 12.36 +16% +179%
Sum / dim=64x128x128x128 axi=1 4.17 10.66 4.98 11.53 +19% +155%
Sum / dim=64x128x128x128 axi=2 4.14 9.48 4.87 6.34 +17% +129%
Sum / dim=64x128x128x128 axi=3 6.34 7.11 5.61 5.09 -11% +12%
SumAll / dim=64x128x128x128 4.12 6.45 4.92 4.72 +19% +56%
SumAll / dim=1000000 0.67 0.06 0.75 0.08 +11% -90%
SumAll / dim=1000000x128 3.93 6.18 4.70 4.46 +19% +57%
SumAll / dim=128x1000000 3.95 6.16 4.37 4.48 +10% +56%

M3 Max (cores: 4E+12P+40GPU) mlx 0.2.0

Operation mlx_gpu mlx_gpu_compile mlx_cpu mps cpu mlx_gpu_compile/mlx_gpu speedup mlx_gpu/mps speedup mlx_gpu/mlx_cpu speedup
Argmax / dim=64x1024x128 axi=0 1.56 1.56 8.35 1.47 20.73 +0% -5% +435%
Argmax / dim=64x1024x128 axi=1 1.57 1.55 8.33 0.98 1.67 +1% -37% +430%
Argmax / dim=64x1024x128 axi=2 1.59 1.56 8.33 0.89 1.16 +1% -43% +424%
Argmax / dim=64x128x1024 axi=2 1.57 1.56 8.34 0.73 1.01 +0% -53% +432%
BCE / dim=1000000 dim=1000000 0.37 0.24 4.97 0.33 0.72 +49% -11% +1258%
BCE / dim=100000x32 dim=100000x32 0.51 0.27 16.26 0.44 1.64 +84% -13% +3110%
BCE / dim=100000x64x2 dim=100000x64x2 1.80 0.79 66.58 0.91 6.25 +128% -49% +3597%
BCE / dim=128x100000 dim=128x100000 1.80 0.78 67.05 0.68 6.29 +130% -61% +3624%
Concat / dim=1000000x64 dim=1000000x32 axi=1 2.43 2.41 66.73 2.50 16.76 +0% +2% +2645%
Concat / dim=1000000x64 dim=1000000x128 axi=1 4.58 4.57 146.32 4.68 36.45 +0% +2% +3094%
Concat / dim=1000000x64 dim=1000000x64 axi=0 3.12 3.11 47.47 3.20 19.43 +0% +2% +1419%
Concat / dim=64x1000000 dim=64x1000000 axi=0 3.13 3.12 68.53 3.20 18.91 +0% +2% +2090%
Conv1d / dim=100x256x3 dim=8x3x3 0.33 0.33 0.29 0.43 2.31 -1% +32% -11%
Conv1d / dim=100x256x256 dim=8x3x256 1.21 1.19 5.99 1.27 68.57 +2% +4% +394%
Conv1d / dim=16x1000x80 dim=128x11x80 1.41 1.03 2.88 1.49 502.60 +37% +5% +104%
Conv1d / dim=16x1000x3 dim=128x11x3 0.44 0.45 0.42 0.51 50.53 0% +13% -4%
Conv2d / dim=100x256x256x3 dim=8x3x3x3 10.22 10.13 722.62 2.18 109.15 +0% -78% +6971%
Conv2d / dim=10x256x256x12 dim=8x3x3x12 3.99 3.96 313.85 1.52 11.02 +0% -61% +7770%
Conv2d / dim=1x256x256x128 dim=8x3x3x128 4.57 4.63 512.73 1.87 27.17 -1% -59% +11117%
Conv2d / dim=100x28x28x3 dim=8x3x3x3 0.62 0.60 9.17 0.50 1.35 +3% -19% +1373%
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 1.66 1.37 89.00 1.05 7.55 +20% -36% +5269%
Gather / dim=64x256 dim=10 0.33 0.21 0.01 0.58 0.00 +59% +74% -96%
Gather / dim=64x256 dim=1000 0.27 0.27 0.03 0.59 0.12 0% +115% -90%
Gather / dim=64x256 dim=1000000 6.92 6.79 20.55 41.59 38.15 +1% +501% +197%
Gather / dim=1024x32 dim=10 0.24 0.22 0.01 0.56 0.00 +4% +139% -95%
Gather / dim=1024x32 dim=1000 0.23 0.22 0.01 0.58 0.08 +2% +152% -93%
Gather / dim=1024x32 dim=1000000 1.36 1.12 5.58 5.47 1.70 +22% +301% +309%
LeakyReLU / dim=128x16x1024 0.29 0.25 0.96 0.35 0.89 +17% +19% +230%
LeakyReLU / dim=64x128x1024 0.58 0.34 4.18 0.74 0.44 +70% +27% +623%
Linear / dim=100x1024x32 dim=32x1024 dim=1024 4.06 4.03 19.47 1.91 26.53 +0% -52% +380%
Linear / dim=100x1024x64 dim=64x1024 dim=1024 4.24 4.19 23.09 2.37 55.05 +0% -44% +445%
Linear / dim=100x1024x256 dim=256x1024 dim=1024 7.47 7.45 32.52 5.70 87.50 +0% -23% +335%
Linear / dim=100x1024x512 dim=512x1024 dim=1024 12.00 11.97 48.07 10.49 125.36 +0% -12% +300%
Linear / dim=100x1x51200 dim=51200x1 dim=1 0.52 0.63 0.21 0.75 0.77 -16% +42% -60%
MatMul / dim=32x1x1000 dim=32x1000x128 0.24 0.20 0.08 0.58 0.65 +15% +142% -64%
MatMul / dim=1000x64x256 dim=256x32 0.44 0.42 1.35 1.59 15.13 +3% +264% +209%
MatMul / dim=1000x64x1024 dim=1000x1024x32 1.36 1.35 9.85 1.50 428.05 +0% +10% +624%
MatMul / dim=1000x1024x64 dim=1000x64x256 4.65 4.63 43.90 5.21 1111.34 +0% +12% +844%
MatMul / dim=64x1000000 dim=1000000x32 2.75 2.76 7.46 3.89 67.98 0% +41% +171%
MatMul / dim=1000000x64 dim=64x1024 15.77 15.79 90.78 32.97 1891.27 0% +109% +475%
PReLU / dim=128x16x1024 dim=1 0.36 0.22 0.69 0.37 0.77 +59% +2% +91%
PReLU / dim=64x128x1024 dim=1 1.04 0.35 3.43 0.62 0.46 +199% -40% +229%
ReLU / dim=128x16x1024 0.32 0.32 0.25 0.50 0.81 0% +58% -21%
ReLU / dim=64x128x1024 0.70 0.38 1.05 1.06 0.43 +84% +51% +50%
Scatter / dim=64x16 dim=10 0.66 0.43 0.01 0.44 0.00 +52% -33% -98%
Scatter / dim=64x16 dim=1000 0.42 0.38 0.07 0.47 0.07 +9% +13% -83%
Scatter / dim=64x16 dim=1000000 4.17 4.10 52.82 2.79 2.28 +1% -32% +1166%
Scatter / dim=1024x32 dim=10 0.25 0.23 0.01 0.44 0.00 +9% +76% -94%
Scatter / dim=1024x32 dim=1000 0.26 0.24 0.12 0.62 0.07 +8% +134% -55%
Scatter / dim=1024x32 dim=1000000 7.98 7.95 99.38 5.20 3.15 +0% -34% +1145%
ScatterSum / dim=64x16 dim=10 0.05 0.03 0.01 nan 0.00 +60% nan% -82%
ScatterSum / dim=64x16 dim=1000 0.04 0.03 0.01 nan 0.01 +48% nan% -80%
ScatterSum / dim=64x16 dim=1000000 0.04 0.03 0.01 nan 1.18 +54% nan% -81%
ScatterSum / dim=1024x32 dim=10 0.04 0.03 0.01 nan 0.01 +50% nan% -80%
ScatterSum / dim=1024x32 dim=1000 0.04 0.03 0.01 nan 0.01 +49% nan% -81%
ScatterSum / dim=1024x32 dim=1000000 0.04 0.03 0.01 nan 6.13 +51% nan% -80%
ScatterMax / dim=64x16 dim=10 0.04 0.03 0.01 nan 0.00 +55% nan% -80%
ScatterMax / dim=64x16 dim=1000 0.04 0.03 0.01 nan 0.00 +55% nan% -82%
ScatterMax / dim=64x16 dim=1000000 0.04 0.03 0.01 nan 1.21 +60% nan% -81%
ScatterMax / dim=1024x32 dim=10 0.04 0.03 0.01 nan 0.01 +56% nan% -82%
ScatterMax / dim=1024x32 dim=1000 0.05 0.03 0.01 nan 0.01 +46% nan% -80%
ScatterMax / dim=1024x32 dim=1000000 0.04 0.03 0.01 nan 6.16 +43% nan% -81%
SeLU / dim=128x16x1024 0.64 0.22 1.86 0.33 1.44 +187% -47% +191%
SeLU / dim=64x128x1024 2.06 0.36 8.42 0.63 4.41 +468% -69% +309%
Sigmoid / dim=128x16x1024 0.25 0.24 10.47 0.35 1.46 +6% +39% +4066%
Sigmoid / dim=64x128x1024 0.35 0.34 42.10 0.63 4.23 +3% +78% +11895%
Softmax / dim=64x1000000 axi=-1 5.78 4.35 43.94 3.26 21.04 +32% -43% +660%
Softmax / dim=1000000x64 axi=-1 5.78 4.36 43.81 4.02 20.18 +32% -30% +657%
Softmax / dim=64x16x32x1024 axi=-1 3.13 2.38 23.21 2.73 7.76 +31% -12% +641%
Softmax / dim=128x16x32x1024 axi=-1 6.05 4.56 46.01 4.37 19.09 +32% -27% +660%
Softmax / dim=1024x16x32x128 axi=-1 6.06 4.56 46.22 4.28 19.98 +33% -29% +662%
Softmax / dim=1024x64x32x8 axi=-1 1.68 1.32 11.56 1.74 10.91 +26% +3% +589%
Softplus / dim=128x16x1024 0.32 0.24 10.35 0.39 1.84 +33% +22% +3156%
Softplus / dim=64x128x1024 0.39 0.34 41.69 0.62 6.16 +13% +60% +10602%
Sort / dim=64x128x1024 axi=0 0.77 0.75 229.46 9.03 35.77 +2% +1074% +29757%
Sort / dim=64x128x1024 axi=1 0.77 0.76 229.35 8.63 33.35 +1% +1015% +29535%
Sort / dim=64x128x1024 axi=2 0.77 0.76 229.35 6.45 28.18 +1% +737% +29646%
Sum / dim=64x128x128x128 axi=0 1.55 1.55 6.54 1.69 9.59 0% +8% +321%
Sum / dim=64x128x128x128 axi=1 1.54 1.52 6.52 1.66 7.84 +1% +8% +324%
Sum / dim=64x128x128x128 axi=2 1.54 1.54 6.53 1.62 5.63 +0% +5% +323%
Sum / dim=64x128x128x128 axi=3 1.55 1.53 6.53 2.62 4.91 +1% +68% +320%
SumAll / dim=64x128x128x128 1.54 1.54 6.52 1.65 4.38 +0% +7% +323%
SumAll / dim=1000000 0.23 0.21 0.05 0.30 0.08 +8% +29% -77%
SumAll / dim=1000000x128 1.50 1.50 6.30 1.67 4.19 +0% +11% +318%
SumAll / dim=128x1000000 1.49 1.49 6.25 1.66 4.22 +0% +10% +318%

CUDA GPUs

Tesla V100 32Go (PCIe) / Intel Xeon Gold 5120 14 cores, 28 threads @ 2.2GHz (Skylake), 60Go

Operation cpu cuda cuda/cpu speedup
Argmax / dim=64x1024x128 axi=0 72.96 0.09 +80554%
Argmax / dim=64x1024x128 axi=1 25.43 0.11 +22457%
Argmax / dim=64x1024x128 axi=2 20.35 0.12 +16705%
Argmax / dim=64x128x1024 axi=2 18.61 0.09 +21652%
BCE / dim=1000000 dim=1000000 26.32 0.07 +38400%
BCE / dim=100000x32 dim=100000x32 83.80 0.11 +74241%
BCE / dim=100000x64x2 dim=100000x64x2 341.08 0.30 +115358%
BCE / dim=128x100000 dim=128x100000 341.55 0.29 +116168%
Concat / dim=1000000x64 dim=1000000x32 axi=1 277.05 1.19 +23209%
Concat / dim=1000000x64 dim=1000000x128 axi=1 571.38 2.44 +23338%
Concat / dim=1000000x64 dim=1000000x64 axi=0 336.66 1.53 +21834%
Concat / dim=64x1000000 dim=64x1000000 axi=0 338.85 1.53 +22065%
Conv1d / dim=100x256x3 dim=8x3x3 0.71 0.08 +744%
Conv1d / dim=100x256x256 dim=8x3x256 37.65 0.65 +5736%
Conv1d / dim=16x1000x80 dim=128x11x80 79.15 0.47 +16703%
Conv1d / dim=16x1000x3 dim=128x11x3 3.32 0.12 +2596%
Conv2d / dim=100x256x256x3 dim=8x3x3x3 150.83 1.99 +7480%
Conv2d / dim=10x256x256x12 dim=8x3x3x12 46.20 0.65 +7000%
Conv2d / dim=1x256x256x128 dim=8x3x3x128 49.60 1.41 +3409%
Conv2d / dim=100x28x28x3 dim=8x3x3x3 1.64 0.06 +2490%
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 15.40 0.25 +6068%
Gather / dim=64x256 dim=10 0.02 0.04 -49%
Gather / dim=64x256 dim=1000 0.15 0.04 +293%
Gather / dim=64x256 dim=1000000 538.57 2.08 +25740%
Gather / dim=1024x32 dim=10 0.02 0.04 -43%
Gather / dim=1024x32 dim=1000 0.05 0.04 +37%
Gather / dim=1024x32 dim=1000000 40.86 0.31 +13082%
LeakyReLU / dim=128x16x1024 1.07 0.05 +2185%
LeakyReLU / dim=64x128x1024 9.94 0.11 +9104%
Linear / dim=100x1024x32 dim=32x1024 dim=1024 311.56 1.87 +16520%
Linear / dim=100x1024x64 dim=64x1024 dim=1024 454.32 2.04 +22143%
Linear / dim=100x1024x256 dim=256x1024 dim=1024 1287.08 5.26 +24360%
Linear / dim=100x1024x512 dim=512x1024 dim=1024 2455.45 9.67 +25289%
Linear / dim=100x1x51200 dim=51200x1 dim=1 1.49 0.08 +1728%
MatMul / dim=32x1x1000 dim=32x1000x128 1.18 0.06 +1948%
MatMul / dim=1000x64x256 dim=256x32 28.43 0.39 +7143%
MatMul / dim=1000x64x1024 dim=1000x1024x32 101.83 1.59 +6318%
MatMul / dim=1000x1024x64 dim=1000x64x256 1440.54 2.82 +50943%
MatMul / dim=64x1000000 dim=1000000x32 125.20 0.65 +19267%
MatMul / dim=1000000x64 dim=64x1024 5749.54 11.27 +50919%
PReLU / dim=128x16x1024 dim=1 1.07 0.05 +2223%
PReLU / dim=64x128x1024 dim=1 10.03 0.11 +9285%
ReLU / dim=128x16x1024 1.08 0.05 +2232%
ReLU / dim=64x128x1024 9.91 0.11 +9104%
Scatter / dim=64x16 dim=10 0.02 0.03 -40%
Scatter / dim=64x16 dim=1000 0.05 0.03 +44%
Scatter / dim=64x16 dim=1000000 20.02 0.25 +7856%
Scatter / dim=1024x32 dim=10 0.02 0.03 -40%
Scatter / dim=1024x32 dim=1000 0.05 0.03 +65%
Scatter / dim=1024x32 dim=1000000 21.35 0.32 +6560%
ScatterSum / dim=64x16 dim=10 0.02 0.05 -47%
ScatterSum / dim=64x16 dim=1000 0.03 0.05 -25%
ScatterSum / dim=64x16 dim=1000000 7.61 0.19 +3937%
ScatterSum / dim=1024x32 dim=10 0.03 0.05 -39%
ScatterSum / dim=1024x32 dim=1000 0.04 0.05 -9%
ScatterSum / dim=1024x32 dim=1000000 17.76 0.12 +14453%
ScatterMax / dim=64x16 dim=10 nan nan nan%
ScatterMax / dim=64x16 dim=1000 nan nan nan%
ScatterMax / dim=64x16 dim=1000000 nan nan nan%
ScatterMax / dim=1024x32 dim=10 nan nan nan%
ScatterMax / dim=1024x32 dim=1000 nan nan nan%
ScatterMax / dim=1024x32 dim=1000000 nan nan nan%
SeLU / dim=128x16x1024 3.64 0.05 +7573%
SeLU / dim=64x128x1024 19.47 0.11 +17827%
Sigmoid / dim=128x16x1024 2.75 0.05 +5743%
Sigmoid / dim=64x128x1024 16.17 0.11 +14728%
Softmax / dim=64x1000000 axi=-1 287.67 1.63 +17518%
Softmax / dim=1000000x64 axi=-1 274.14 0.66 +41722%
Softmax / dim=64x16x32x1024 axi=-1 140.28 0.36 +38520%
Softmax / dim=128x16x32x1024 axi=-1 280.85 0.70 +40182%
Softmax / dim=1024x16x32x128 axi=-1 279.57 0.68 +40838%
Softmax / dim=1024x64x32x8 axi=-1 66.09 0.20 +33030%
Softplus / dim=128x16x1024 7.93 0.05 +16119%
Softplus / dim=64x128x1024 36.33 0.11 +32760%
Sort / dim=64x128x1024 axi=0 567.66 3.87 +14573%
Sort / dim=64x128x1024 axi=1 409.30 1.89 +21560%
Sort / dim=64x128x1024 axi=2 602.02 2.02 +29717%
Sum / dim=64x128x128x128 axi=0 88.65 0.71 +12444%
Sum / dim=64x128x128x128 axi=1 85.62 0.68 +12433%
Sum / dim=64x128x128x128 axi=2 50.47 0.70 +7131%
Sum / dim=64x128x128x128 axi=3 44.98 0.73 +6063%
SumAll / dim=64x128x128x128 40.95 0.67 +6043%
SumAll / dim=1000000 0.24 0.04 +516%
SumAll / dim=1000000x128 39.07 0.65 +5878%
SumAll / dim=128x1000000 39.02 0.65 +5859%

Tesla V100 32Go (NVLink) / Intel Xeon Gold 6148 20 cores, 40 threads @ 2.4 GHz (Skylake), 60Go

Operation cpu cuda cuda/cpu speedup
Argmax / dim=64x1024x128 axi=0 57.05 0.09 +64142%
Argmax / dim=64x1024x128 axi=1 23.09 0.11 +21144%
Argmax / dim=64x1024x128 axi=2 17.13 0.11 +14831%
Argmax / dim=64x128x1024 axi=2 15.64 0.08 +18718%
BCE / dim=1000000 dim=1000000 22.45 0.06 +35452%
BCE / dim=100000x32 dim=100000x32 72.05 0.11 +65232%
BCE / dim=100000x64x2 dim=100000x64x2 330.74 0.29 +112871%
BCE / dim=128x100000 dim=128x100000 318.96 0.29 +108848%
Concat / dim=1000000x64 dim=1000000x32 axi=1 364.51 1.18 +30747%
Concat / dim=1000000x64 dim=1000000x128 axi=1 841.81 2.42 +34620%
Concat / dim=1000000x64 dim=1000000x64 axi=0 452.68 1.53 +29421%
Concat / dim=64x1000000 dim=64x1000000 axi=0 466.35 1.53 +30291%
Conv1d / dim=100x256x3 dim=8x3x3 0.52 0.07 +600%
Conv1d / dim=100x256x256 dim=8x3x256 47.17 0.62 +7567%
Conv1d / dim=16x1000x80 dim=128x11x80 39.96 0.45 +8734%
Conv1d / dim=16x1000x3 dim=128x11x3 1.83 0.11 +1520%
Conv2d / dim=100x256x256x3 dim=8x3x3x3 207.97 1.90 +10863%
Conv2d / dim=10x256x256x12 dim=8x3x3x12 21.61 0.62 +3404%
Conv2d / dim=1x256x256x128 dim=8x3x3x128 23.09 1.35 +1608%
Conv2d / dim=100x28x28x3 dim=8x3x3x3 1.15 0.06 +1844%
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 10.62 0.24 +4407%
Gather / dim=64x256 dim=10 0.02 0.04 -56%
Gather / dim=64x256 dim=1000 0.14 0.04 +266%
Gather / dim=64x256 dim=1000000 846.18 2.03 +41625%
Gather / dim=1024x32 dim=10 0.02 0.04 -54%
Gather / dim=1024x32 dim=1000 0.05 0.04 +26%
Gather / dim=1024x32 dim=1000000 122.97 0.30 +41252%
LeakyReLU / dim=128x16x1024 1.45 0.05 +2933%
LeakyReLU / dim=64x128x1024 32.44 0.11 +29854%
Linear / dim=100x1024x32 dim=32x1024 dim=1024 546.20 1.83 +29689%
Linear / dim=100x1024x64 dim=64x1024 dim=1024 521.31 1.98 +26244%
Linear / dim=100x1024x256 dim=256x1024 dim=1024 827.56 5.03 +16363%
Linear / dim=100x1024x512 dim=512x1024 dim=1024 1437.48 8.98 +15914%
Linear / dim=100x1x51200 dim=51200x1 dim=1 1.39 0.08 +1690%
MatMul / dim=32x1x1000 dim=32x1000x128 1.07 0.05 +1843%
MatMul / dim=1000x64x256 dim=256x32 19.51 0.38 +5093%
MatMul / dim=1000x64x1024 dim=1000x1024x32 69.19 1.52 +4448%
MatMul / dim=1000x1024x64 dim=1000x64x256 1241.71 2.69 +46020%
MatMul / dim=64x1000000 dim=1000000x32 84.51 0.65 +12958%
MatMul / dim=1000000x64 dim=64x1024 4573.74 10.77 +42376%
PReLU / dim=128x16x1024 dim=1 1.42 0.04 +3123%
PReLU / dim=64x128x1024 dim=1 29.68 0.11 +27829%
ReLU / dim=128x16x1024 1.45 0.04 +3135%
ReLU / dim=64x128x1024 26.68 0.11 +24987%
Scatter / dim=64x16 dim=10 0.01 0.03 -49%
Scatter / dim=64x16 dim=1000 0.03 0.03 +28%
Scatter / dim=64x16 dim=1000000 16.90 0.24 +6954%
Scatter / dim=1024x32 dim=10 0.01 0.03 -49%
Scatter / dim=1024x32 dim=1000 0.04 0.03 +37%
Scatter / dim=1024x32 dim=1000000 20.12 0.31 +6358%
ScatterSum / dim=64x16 dim=10 0.02 0.04 -62%
ScatterSum / dim=64x16 dim=1000 0.02 0.04 -38%
ScatterSum / dim=64x16 dim=1000000 12.03 0.18 +6606%
ScatterSum / dim=1024x32 dim=10 0.02 0.04 -50%
ScatterSum / dim=1024x32 dim=1000 0.03 0.04 -21%
ScatterSum / dim=1024x32 dim=1000000 28.86 0.12 +23973%
ScatterMax / dim=64x16 dim=10 nan nan nan%
ScatterMax / dim=64x16 dim=1000 nan nan nan%
ScatterMax / dim=64x16 dim=1000000 nan nan nan%
ScatterMax / dim=1024x32 dim=10 nan nan nan%
ScatterMax / dim=1024x32 dim=1000 nan nan nan%
ScatterMax / dim=1024x32 dim=1000000 nan nan nan%
SeLU / dim=128x16x1024 3.54 0.05 +7533%
SeLU / dim=64x128x1024 38.35 0.11 +35654%
Sigmoid / dim=128x16x1024 2.63 0.05 +5376%
Sigmoid / dim=64x128x1024 37.01 0.11 +34194%
Softmax / dim=64x1000000 axi=-1 333.85 1.62 +20570%
Softmax / dim=1000000x64 axi=-1 302.71 0.65 +46345%
Softmax / dim=64x16x32x1024 axi=-1 157.12 0.36 +43248%
Softmax / dim=128x16x32x1024 axi=-1 318.22 0.69 +45815%
Softmax / dim=1024x16x32x128 axi=-1 304.80 0.68 +44679%
Softmax / dim=1024x64x32x8 axi=-1 105.89 0.20 +53796%
Softplus / dim=128x16x1024 7.13 0.05 +14967%
Softplus / dim=64x128x1024 51.29 0.11 +46704%
Sort / dim=64x128x1024 axi=0 417.40 3.70 +11178%
Sort / dim=64x128x1024 axi=1 360.67 1.81 +19796%
Sort / dim=64x128x1024 axi=2 490.86 1.94 +25253%
Sum / dim=64x128x128x128 axi=0 92.62 0.70 +13038%
Sum / dim=64x128x128x128 axi=1 85.36 0.68 +12454%
Sum / dim=64x128x128x128 axi=2 49.97 0.69 +7165%
Sum / dim=64x128x128x128 axi=3 49.57 0.71 +6847%
SumAll / dim=64x128x128x128 42.74 0.66 +6368%
SumAll / dim=1000000 0.21 0.03 +528%
SumAll / dim=1000000x128 40.79 0.65 +6190%
SumAll / dim=128x1000000 40.77 0.65 +6189%

RTX4090 (Desktop) / 10th Gen Intel Core i9-10940X @ 3.30GHz 128GB

Operation cpu cuda cuda/cpu speedup
Argmax / dim=64x1024x128 axi=0 15.92 0.04 +39326%
Argmax / dim=64x1024x128 axi=1 4.11 0.05 +7998%
Argmax / dim=64x1024x128 axi=2 3.46 0.05 +6615%
Argmax / dim=64x128x1024 axi=2 3.20 0.04 +8608%
BCE / dim=1000000 dim=1000000 3.84 0.05 +7086%
BCE / dim=100000x32 dim=100000x32 10.57 0.05 +19148%
BCE / dim=100000x64x2 dim=100000x64x2 40.02 0.22 +17863%
BCE / dim=128x100000 dim=128x100000 40.52 0.22 +17958%
Concat / dim=1000000x64 dim=1000000x32 axi=1 38.01 0.97 +3814%
Concat / dim=1000000x64 dim=1000000x128 axi=1 86.99 1.79 +4759%
Concat / dim=1000000x64 dim=1000000x64 axi=0 41.79 1.20 +3376%
Concat / dim=64x1000000 dim=64x1000000 axi=0 41.53 1.22 +3317%
Conv1d / dim=100x256x3 dim=8x3x3 0.33 0.07 +359%
Conv1d / dim=100x256x256 dim=8x3x256 5.11 0.30 +1596%
Conv1d / dim=16x1000x80 dim=128x11x80 5.24 0.13 +4017%
Conv1d / dim=16x1000x3 dim=128x11x3 0.69 0.11 +507%
Conv2d / dim=100x256x256x3 dim=8x3x3x3 22.06 0.74 +2862%
Conv2d / dim=10x256x256x12 dim=8x3x3x12 4.15 0.18 +2247%
Conv2d / dim=1x256x256x128 dim=8x3x3x128 3.45 0.15 +2202%
Conv2d / dim=100x28x28x3 dim=8x3x3x3 0.56 0.06 +832%
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 2.79 0.11 +2449%
Gather / dim=64x256 dim=10 0.02 0.03 -33%
Gather / dim=64x256 dim=1000 0.11 0.04 +156%
Gather / dim=64x256 dim=1000000 103.61 1.23 +8337%
Gather / dim=1024x32 dim=10 0.03 0.05 -45%
Gather / dim=1024x32 dim=1000 0.06 0.04 +23%
Gather / dim=1024x32 dim=1000000 14.67 0.19 +7595%
LeakyReLU / dim=128x16x1024 0.43 0.03 +1519%
LeakyReLU / dim=64x128x1024 4.45 0.04 +11604%
Linear / dim=100x1024x32 dim=32x1024 dim=1024 53.35 0.59 +8943%
Linear / dim=100x1024x64 dim=64x1024 dim=1024 56.93 0.70 +8089%
Linear / dim=100x1024x256 dim=256x1024 dim=1024 79.14 1.26 +6166%
Linear / dim=100x1024x512 dim=512x1024 dim=1024 121.64 2.46 +4854%
Linear / dim=100x1x51200 dim=51200x1 dim=1 0.27 0.05 +401%
MatMul / dim=32x1x1000 dim=32x1000x128 0.16 0.05 +251%
MatMul / dim=1000x64x256 dim=256x32 2.53 0.07 +3323%
MatMul / dim=1000x64x1024 dim=1000x1024x32 8.35 0.73 +1051%
MatMul / dim=1000x1024x64 dim=1000x64x256 108.79 1.60 +6689%
MatMul / dim=64x1000000 dim=1000000x32 9.42 0.50 +1791%
MatMul / dim=1000000x64 dim=64x1024 395.58 5.23 +7468%
PReLU / dim=128x16x1024 dim=1 0.39 0.03 +1103%
PReLU / dim=64x128x1024 dim=1 4.18 0.05 +7858%
ReLU / dim=128x16x1024 0.63 0.03 +2102%
ReLU / dim=64x128x1024 4.32 0.04 +10970%
Scatter / dim=64x16 dim=10 0.01 0.02 -54%
Scatter / dim=64x16 dim=1000 0.03 0.02 +30%
Scatter / dim=64x16 dim=1000000 5.03 0.14 +3531%
Scatter / dim=1024x32 dim=10 0.02 0.02 -36%
Scatter / dim=1024x32 dim=1000 0.04 0.02 +47%
Scatter / dim=1024x32 dim=1000000 5.92 0.17 +3423%
ScatterSum / dim=64x16 dim=10 0.02 0.05 -62%
ScatterSum / dim=64x16 dim=1000 0.03 0.04 -37%
ScatterSum / dim=64x16 dim=1000000 6.75 0.11 +5789%
ScatterSum / dim=1024x32 dim=10 0.04 0.04 0%
ScatterSum / dim=1024x32 dim=1000 0.06 0.05 +16%
ScatterSum / dim=1024x32 dim=1000000 16.28 0.09 +17776%
ScatterMax / dim=64x16 dim=10 0.02 0.04 -53%
ScatterMax / dim=64x16 dim=1000 0.02 0.04 -48%
ScatterMax / dim=64x16 dim=1000000 6.60 0.19 +3439%
ScatterMax / dim=1024x32 dim=10 0.04 0.04 +1%
ScatterMax / dim=1024x32 dim=1000 0.04 0.04 -7%
ScatterMax / dim=1024x32 dim=1000000 16.46 0.12 +13167%
SeLU / dim=128x16x1024 0.77 0.04 +1917%
SeLU / dim=64x128x1024 4.65 0.04 +11906%
Sigmoid / dim=128x16x1024 0.67 0.05 +1367%
Sigmoid / dim=64x128x1024 4.60 0.05 +9782%
Softmax / dim=64x1000000 axi=-1 37.95 1.16 +3157%
Softmax / dim=1000000x64 axi=-1 27.42 0.59 +4530%
Softmax / dim=64x16x32x1024 axi=-1 14.56 0.32 +4490%
Softmax / dim=128x16x32x1024 axi=-1 28.07 0.64 +4269%
Softmax / dim=1024x16x32x128 axi=-1 27.67 0.62 +4343%
Softmax / dim=1024x64x32x8 axi=-1 30.83 0.18 +17281%
Softplus / dim=128x16x1024 1.28 0.05 +2674%
Softplus / dim=64x128x1024 5.73 0.04 +14673%
Sort / dim=64x128x1024 axi=0 42.75 1.34 +3095%
Sort / dim=64x128x1024 axi=1 49.05 0.89 +5388%
Sort / dim=64x128x1024 axi=2 48.21 0.47 +10100%
Sum / dim=64x128x128x128 axi=0 15.71 0.62 +2435%
Sum / dim=64x128x128x128 axi=1 13.05 0.62 +1997%
Sum / dim=64x128x128x128 axi=2 10.13 0.63 +1515%
Sum / dim=64x128x128x128 axi=3 9.89 0.61 +1515%
SumAll / dim=64x128x128x128 9.43 0.61 +1453%
SumAll / dim=1000000 0.04 0.03 +30%
SumAll / dim=1000000x128 9.09 0.58 +1460%
SumAll / dim=128x1000000 9.22 0.59 +1450%