Detailed runtime benchmark of mlx operations, measured in milliseconds
.
mlx_gpu
: mlx framework with gpu backendmlx_cpu
: mlx framework with cpu backendcpu
: torch framework with cpu backendmps
: torch framework with mps (gpu) backendmlx_gpu/mps speedup
: runtime speedup of mlx_gpu compared to mpsmlx_gpu/mlx_cpu speedup
: runtime speedup of mlx_gpu compared to mlx_cpucuda/cpu speedup
: runtime speedup of cuda compared to cpu
M1 (cores: 4E+4P+8GPU)
Operation | mlx_gpu | mlx_cpu | mps | cpu | mlx_gpu/mps speedup | mlx_gpu/mlx_cpu speedup |
---|---|---|---|---|---|---|
Argmax / dim=64x1024x128 axi=0 | 2.72 | 12.43 | 3.93 | 20.56 | +44% | +356% |
Argmax / dim=64x1024x128 axi=1 | 1.55 | 11.21 | 3.38 | 6.23 | +117% | +621% |
Argmax / dim=64x1024x128 axi=2 | 1.55 | 8.56 | 2.69 | 3.04 | +73% | +452% |
Argmax / dim=64x128x1024 axi=2 | 1.42 | 10.32 | 1.49 | 2.35 | +5% | +626% |
BCE / dim=1000000 dim=1000000 | 1.03 | 7.38 | 1.64 | 1.75 | +58% | +615% |
BCE / dim=100000x32 dim=100000x32 | 2.53 | 22.57 | 4.90 | 4.59 | +93% | +791% |
BCE / dim=100000x64x2 dim=100000x64x2 | 8.96 | 87.92 | 21.47 | 18.42 | +139% | +880% |
BCE / dim=128x100000 dim=128x100000 | 9.52 | 89.39 | 20.74 | 18.74 | +117% | +839% |
Concat / dim=1000000x64 dim=1000000x32 axi=1 | 14.11 | 64.02 | 13.84 | 38.98 | -1% | +353% |
Concat / dim=1000000x64 dim=1000000x128 axi=1 | 27.23 | 148.54 | 27.91 | 77.07 | +2% | +445% |
Concat / dim=1000000x64 dim=1000000x64 axi=0 | 17.61 | 85.42 | 17.58 | 41.08 | 0% | +385% |
Concat / dim=64x1000000 dim=64x1000000 axi=0 | 17.57 | 105.55 | 17.77 | 41.40 | +1% | +500% |
Conv1d / dim=100x256x3 dim=8x3x3 | 1.13 | 0.33 | 0.55 | 2.39 | -51% | -70% |
Conv1d / dim=100x256x256 dim=8x3x256 | 6.46 | 9.77 | 5.85 | 54.03 | -9% | +51% |
Conv1d / dim=16x1000x80 dim=128x11x80 | 5.41 | 7.43 | 7.07 | 359.73 | +30% | +37% |
Conv1d / dim=16x1000x3 dim=128x11x3 | 2.33 | 0.58 | 1.45 | 48.37 | -37% | -75% |
Conv2d / dim=100x256x256x3 dim=8x3x3x3 | 80.43 | 935.78 | 11.06 | 127.84 | -86% | +1063% |
Conv2d / dim=10x256x256x12 dim=8x3x3x12 | 29.16 | 425.12 | 8.57 | 32.76 | -70% | +1358% |
Conv2d / dim=1x256x256x128 dim=8x3x3x128 | 29.64 | 703.02 | 9.62 | 56.69 | -67% | +2271% |
Conv2d / dim=100x28x28x3 dim=8x3x3x3 | 2.57 | 10.70 | 1.11 | 1.35 | -56% | +316% |
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 | 8.82 | 108.78 | 4.94 | 9.04 | -44% | +1133% |
LeakyReLU / dim=128x16x1024 | 0.98 | 1.44 | 0.67 | 0.70 | -31% | +47% |
LeakyReLU / dim=64x128x1024 | 3.14 | 4.37 | 1.65 | 2.05 | -47% | +39% |
Linear / dim=100x1024x32 dim=32x1024 dim=1024 | 22.13 | 60.20 | 13.41 | 62.32 | -39% | +171% |
Linear / dim=100x1024x64 dim=64x1024 dim=1024 | 22.15 | 64.97 | 21.02 | 72.80 | -5% | +193% |
Linear / dim=100x1024x256 dim=256x1024 dim=1024 | 40.84 | 96.80 | 79.19 | 170.94 | +93% | +137% |
Linear / dim=100x1024x512 dim=512x1024 dim=1024 | 66.18 | 144.25 | 153.90 | 281.38 | +132% | +117% |
Linear / dim=100x1x51200 dim=51200x1 dim=1 | 0.75 | 0.40 | 1.00 | 0.96 | +32% | -47% |
MatMul / dim=32x1x1000 dim=32x1000x128 | 0.56 | 0.90 | 0.94 | 0.81 | +65% | +60% |
MatMul / dim=1000x64x256 dim=256x32 | 1.70 | 4.17 | 5.03 | 15.57 | +195% | +144% |
MatMul / dim=1000x64x1024 dim=1000x1024x32 | 8.28 | 17.74 | 21.41 | 253.21 | +158% | +114% |
MatMul / dim=1000x1024x64 dim=1000x64x256 | 25.69 | 93.79 | 25.91 | 1506.15 | +0% | +265% |
MatMul / dim=64x1000000 dim=1000000x32 | 41.69 | 18.05 | 33.39 | 56.56 | -19% | -56% |
MatMul / dim=1000000x64 dim=64x1024 | 80.34 | 428.28 | 200.55 | 1194.50 | +149% | +433% |
PReLU / dim=128x16x1024 dim=1 | 1.52 | 2.48 | 0.65 | 0.56 | -57% | +63% |
PReLU / dim=64x128x1024 dim=1 | 5.48 | 6.59 | 1.65 | 2.07 | -69% | +20% |
ReLU / dim=128x16x1024 | 0.54 | 0.53 | 0.66 | 0.68 | +21% | -1% |
ReLU / dim=64x128x1024 | 1.43 | 1.27 | 1.60 | 2.01 | +12% | -11% |
SeLU / dim=128x16x1024 | 3.34 | 6.65 | 0.65 | 3.26 | -80% | +98% |
SeLU / dim=64x128x1024 | 12.27 | 22.81 | 1.63 | 12.17 | -86% | +85% |
Sigmoid / dim=128x16x1024 | 0.51 | 13.28 | 0.67 | 3.24 | +31% | +2511% |
Sigmoid / dim=64x128x1024 | 1.42 | 52.05 | 1.64 | 11.21 | +15% | +3565% |
Softmax / dim=64x1000000 axi=-1 | 13.33 | 36.49 | 17.54 | 55.15 | +31% | +173% |
Softmax / dim=1000000x64 axi=-1 | 8.85 | 34.18 | 21.06 | 59.54 | +138% | +286% |
Softmax / dim=64x16x32x1024 axi=-1 | 4.81 | 18.03 | 16.84 | 23.87 | +250% | +274% |
Softmax / dim=128x16x32x1024 axi=-1 | 9.31 | 36.18 | 31.50 | 54.39 | +238% | +288% |
Softmax / dim=1024x16x32x128 axi=-1 | 9.43 | 34.99 | 22.24 | 57.47 | +135% | +271% |
Softmax / dim=1024x64x32x8 axi=-1 | 15.19 | 86.01 | 6.42 | 29.72 | -57% | +466% |
Softplus / dim=128x16x1024 | 0.69 | 13.48 | 0.94 | 4.55 | +35% | +1860% |
Softplus / dim=64x128x1024 | 1.46 | 52.67 | 2.52 | 17.42 | +72% | +3503% |
Sort / dim=64x128x1024 axi=0 | 34.19 | 900.00 | 95.15 | 72.40 | +178% | +2532% |
Sort / dim=64x128x1024 axi=1 | 18.08 | 976.16 | 85.37 | 63.35 | +372% | +5298% |
Sort / dim=64x128x1024 axi=2 | 3.20 | 263.54 | 39.22 | 74.59 | +1123% | +8124% |
Sum / dim=64x128x128x128 axi=0 | 11.36 | 17.39 | 30.79 | 17.97 | +170% | +53% |
Sum / dim=64x128x128x128 axi=1 | 9.74 | 12.66 | 10.87 | 15.02 | +11% | +30% |
Sum / dim=64x128x128x128 axi=2 | 9.39 | 11.23 | 10.52 | 10.80 | +11% | +19% |
Sum / dim=64x128x128x128 axi=3 | 14.82 | 9.51 | 13.21 | 9.90 | -10% | -35% |
SumAll / dim=64x128x128x128 | 9.02 | 8.81 | 9.82 | 9.38 | +8% | -2% |
SumAll / dim=1000000 | 0.52 | 0.08 | 0.43 | 0.07 | -17% | -85% |
SumAll / dim=1000000x128 | 9.13 | 9.22 | 9.89 | 9.34 | +8% | +0% |
SumAll / dim=128x1000000 | 8.95 | 9.28 | 9.46 | 9.18 | +5% | +3% |
M1 Pro (16GB) mlx 0.2.0
Operation | mlx_gpu | mlx_gpu_compile | mlx_cpu | mps | cpu | mlx_gpu_compile/mlx_gpu speedup | mlx_gpu/mps speedup | mlx_gpu/mlx_cpu speedup |
---|---|---|---|---|---|---|---|---|
Argmax / dim=64x1024x128 axi=0 | 1.76 | 1.76 | 11.21 | 1.90 | 23.25 | 0% | +7% | +536% |
Argmax / dim=64x1024x128 axi=1 | 1.76 | 1.75 | 11.07 | 2.47 | 5.79 | +0% | +40% | +530% |
Argmax / dim=64x1024x128 axi=2 | 1.79 | 1.74 | 11.06 | 1.37 | 2.54 | +2% | -23% | +517% |
Argmax / dim=64x128x1024 axi=2 | 1.76 | 1.74 | 11.06 | 0.87 | 2.22 | +0% | -50% | +528% |
BCE / dim=1000000 dim=1000000 | 0.68 | 0.39 | 6.81 | 0.69 | 1.53 | +75% | +1% | +897% |
BCE / dim=100000x32 dim=100000x32 | 1.17 | 0.74 | 22.68 | 0.99 | 3.88 | +58% | -15% | +1841% |
BCE / dim=100000x64x2 dim=100000x64x2 | 3.66 | 1.51 | 89.05 | 1.93 | 15.12 | +141% | -47% | +2335% |
BCE / dim=128x100000 dim=128x100000 | 3.66 | 1.50 | 89.50 | 1.86 | 15.38 | +143% | -49% | +2345% |
Concat / dim=1000000x64 dim=1000000x32 axi=1 | 4.52 | 4.47 | 66.62 | 4.68 | 30.75 | +1% | +3% | +1374% |
Concat / dim=1000000x64 dim=1000000x128 axi=1 | 8.76 | 8.86 | 150.48 | 8.90 | 51.52 | -1% | +1% | +1618% |
Concat / dim=1000000x64 dim=1000000x64 axi=0 | 5.89 | 5.87 | 66.99 | 6.07 | 41.97 | +0% | +2% | +1036% |
Concat / dim=64x1000000 dim=64x1000000 axi=0 | 5.87 | 5.87 | 87.85 | 6.09 | 41.80 | 0% | +3% | +1396% |
Conv1d / dim=100x256x3 dim=8x3x3 | 1.16 | 0.48 | 0.48 | 0.52 | 2.82 | +143% | -55% | -58% |
Conv1d / dim=100x256x256 dim=8x3x256 | 4.01 | 3.09 | 9.22 | 2.51 | 82.08 | +29% | -37% | +130% |
Conv1d / dim=16x1000x80 dim=128x11x80 | 3.01 | 2.50 | 4.98 | 2.55 | 520.37 | +20% | -15% | +65% |
Conv1d / dim=16x1000x3 dim=128x11x3 | 1.73 | 0.73 | 0.76 | 0.83 | 66.68 | +137% | -51% | -55% |
Conv2d / dim=100x256x256x3 dim=8x3x3x3 | 38.01 | 37.62 | 953.57 | 6.67 | 155.57 | +1% | -82% | +2408% |
Conv2d / dim=10x256x256x12 dim=8x3x3x12 | 13.92 | 13.66 | 438.53 | 3.38 | 28.86 | +1% | -75% | +3050% |
Conv2d / dim=1x256x256x128 dim=8x3x3x128 | 15.30 | 14.26 | 737.17 | 2.86 | 39.96 | +7% | -81% | +4719% |
Conv2d / dim=100x28x28x3 dim=8x3x3x3 | 2.29 | 1.10 | 11.48 | 0.74 | 1.66 | +108% | -67% | +401% |
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 | 5.33 | 4.31 | 112.67 | 2.06 | 9.32 | +23% | -61% | +2012% |
Gather / dim=64x256 dim=10 | 0.28 | 0.27 | 0.03 | 0.32 | 0.01 | +3% | +14% | -89% |
Gather / dim=64x256 dim=1000 | 0.39 | 0.39 | 0.06 | 0.46 | 0.15 | +0% | +19% | -84% |
Gather / dim=64x256 dim=1000000 | 22.90 | 22.90 | 31.88 | 102.21 | 52.84 | +0% | +346% | +39% |
Gather / dim=1024x32 dim=10 | 0.28 | 0.27 | 0.03 | 0.32 | 0.01 | +3% | +16% | -89% |
Gather / dim=1024x32 dim=1000 | 0.29 | 0.29 | 0.04 | 0.36 | 0.09 | +0% | +22% | -87% |
Gather / dim=1024x32 dim=1000000 | 3.60 | 3.15 | 7.91 | 13.14 | 7.50 | +14% | +265% | +119% |
LeakyReLU / dim=128x16x1024 | 0.50 | 0.34 | 1.95 | 0.50 | 0.89 | +48% | +0% | +287% |
LeakyReLU / dim=64x128x1024 | 1.26 | 0.66 | 7.37 | 0.88 | 1.61 | +90% | -30% | +487% |
Linear / dim=100x1024x32 dim=32x1024 dim=1024 | 7.68 | 7.37 | 25.04 | 7.35 | 49.39 | +4% | -4% | +226% |
Linear / dim=100x1024x64 dim=64x1024 dim=1024 | 8.65 | 8.51 | 29.16 | 12.55 | 63.51 | +1% | +45% | +237% |
Linear / dim=100x1024x256 dim=256x1024 dim=1024 | 17.59 | 17.50 | 47.12 | 47.71 | 153.45 | +0% | +171% | +167% |
Linear / dim=100x1024x512 dim=512x1024 dim=1024 | 29.37 | 29.38 | 74.01 | 93.98 | 278.74 | 0% | +220% | +152% |
Linear / dim=100x1x51200 dim=51200x1 dim=1 | 0.87 | 0.76 | 0.50 | 0.80 | 4.98 | +13% | -8% | -42% |
MatMul / dim=32x1x1000 dim=32x1000x128 | 0.34 | 0.33 | 1.10 | 0.52 | 0.81 | +3% | +51% | +220% |
MatMul / dim=1000x64x256 dim=256x32 | 1.02 | 1.00 | 2.36 | 2.41 | 21.91 | +1% | +136% | +131% |
MatMul / dim=1000x64x1024 dim=1000x1024x32 | 3.18 | 3.18 | 19.13 | 13.42 | 1530.48 | +0% | +321% | +500% |
MatMul / dim=1000x1024x64 dim=1000x64x256 | 9.35 | 9.40 | 51.38 | 9.31 | 825.97 | 0% | 0% | +449% |
MatMul / dim=64x1000000 dim=1000000x32 | 15.17 | 15.04 | 12.41 | 19.47 | 548.59 | +0% | +28% | -18% |
MatMul / dim=1000000x64 dim=64x1024 | 34.83 | 34.86 | 156.96 | 101.81 | 1390.57 | 0% | +192% | +350% |
PReLU / dim=128x16x1024 dim=1 | 0.75 | 0.35 | 1.37 | 0.55 | 0.91 | +117% | -27% | +81% |
PReLU / dim=64x128x1024 dim=1 | 2.11 | 0.67 | 4.81 | 0.86 | 1.61 | +216% | -59% | +128% |
ReLU / dim=128x16x1024 | 0.38 | 0.40 | 0.59 | 0.52 | 0.85 | -4% | +37% | +55% |
ReLU / dim=64x128x1024 | 0.69 | 0.66 | 1.54 | 0.87 | 1.78 | +3% | +26% | +123% |
Scatter / dim=64x16 dim=10 | 0.29 | 0.28 | 0.03 | 0.30 | 0.01 | +3% | +3% | -90% |
Scatter / dim=64x16 dim=1000 | 0.34 | 0.33 | 0.11 | 0.33 | 0.09 | +2% | -1% | -68% |
Scatter / dim=64x16 dim=1000000 | 15.66 | 15.64 | 65.73 | 6.72 | 4.37 | +0% | -57% | +319% |
Scatter / dim=1024x32 dim=10 | 0.32 | 0.30 | 0.03 | 0.37 | 0.01 | +6% | +16% | -89% |
Scatter / dim=1024x32 dim=1000 | 0.39 | 0.37 | 0.19 | 0.38 | 0.09 | +7% | -3% | -52% |
Scatter / dim=1024x32 dim=1000000 | 30.93 | 30.87 | 123.75 | 13.22 | 6.70 | +0% | -57% | +300% |
ScatterSum / dim=64x16 dim=10 | 0.09 | 0.08 | 0.03 | nan | 0.01 | +15% | nan% | -69% |
ScatterSum / dim=64x16 dim=1000 | 0.09 | 0.07 | 0.03 | nan | 0.01 | +23% | nan% | -70% |
ScatterSum / dim=64x16 dim=1000000 | 0.08 | 0.07 | 0.03 | nan | 1.95 | +26% | nan% | -68% |
ScatterSum / dim=1024x32 dim=10 | 0.08 | 0.06 | 0.02 | nan | 0.01 | +29% | nan% | -69% |
ScatterSum / dim=1024x32 dim=1000 | 0.09 | 0.07 | 0.03 | nan | 0.02 | +27% | nan% | -71% |
ScatterSum / dim=1024x32 dim=1000000 | 0.09 | 0.07 | 0.03 | nan | 7.06 | +19% | nan% | -68% |
ScatterMax / dim=64x16 dim=10 | 0.09 | 0.07 | 0.03 | nan | 0.01 | +32% | nan% | -68% |
ScatterMax / dim=64x16 dim=1000 | 0.09 | 0.07 | 0.02 | nan | 0.01 | +21% | nan% | -71% |
ScatterMax / dim=64x16 dim=1000000 | 0.09 | 0.07 | 0.03 | nan | 1.79 | +23% | nan% | -69% |
ScatterMax / dim=1024x32 dim=10 | 0.09 | 0.07 | 0.03 | nan | 0.01 | +16% | nan% | -69% |
ScatterMax / dim=1024x32 dim=1000 | 0.08 | 0.07 | 0.03 | nan | 0.02 | +24% | nan% | -65% |
ScatterMax / dim=1024x32 dim=1000000 | 0.09 | 0.07 | 0.03 | nan | 7.21 | +27% | nan% | -71% |
SeLU / dim=128x16x1024 | 1.15 | 0.34 | 3.04 | 0.51 | 2.95 | +237% | -56% | +163% |
SeLU / dim=64x128x1024 | 4.36 | 0.67 | 12.73 | 0.89 | 10.44 | +553% | -79% | +192% |
Sigmoid / dim=128x16x1024 | 0.44 | 0.39 | 13.69 | 0.59 | 2.84 | +12% | +33% | +3021% |
Sigmoid / dim=64x128x1024 | 0.68 | 0.66 | 54.30 | 0.97 | 9.60 | +1% | +43% | +7929% |
Softmax / dim=64x1000000 axi=-1 | 11.70 | 8.83 | 58.18 | 6.22 | 40.86 | +32% | -46% | +397% |
Softmax / dim=1000000x64 axi=-1 | 11.62 | 8.85 | 59.00 | 8.91 | 42.06 | +31% | -23% | +407% |
Softmax / dim=64x16x32x1024 axi=-1 | 6.26 | 4.84 | 32.68 | 4.43 | 19.57 | +29% | -29% | +421% |
Softmax / dim=128x16x32x1024 axi=-1 | 12.22 | 9.26 | 61.93 | 8.05 | 39.81 | +31% | -34% | +406% |
Softmax / dim=1024x16x32x128 axi=-1 | 12.22 | 9.25 | 63.13 | 9.48 | 42.39 | +32% | -22% | +416% |
Softmax / dim=1024x64x32x8 axi=-1 | 3.40 | 2.64 | 15.58 | 3.10 | 23.94 | +28% | -8% | +358% |
Softplus / dim=128x16x1024 | 0.50 | 0.33 | 13.88 | 0.70 | 3.94 | +52% | +41% | +2683% |
Softplus / dim=64x128x1024 | 0.82 | 0.65 | 54.54 | 1.22 | 14.63 | +25% | +48% | +6547% |
Sort / dim=64x128x1024 axi=0 | 1.84 | 1.81 | 270.84 | 50.79 | 65.70 | +1% | +2664% | +14638% |
Sort / dim=64x128x1024 axi=1 | 1.76 | 1.75 | 271.31 | 44.70 | 55.19 | +0% | +2437% | +15305% |
Sort / dim=64x128x1024 axi=2 | 1.76 | 1.75 | 271.42 | 17.12 | 69.15 | +0% | +874% | +15350% |
Sum / dim=64x128x128x128 axi=0 | 3.49 | 3.48 | 11.27 | 12.43 | 14.32 | +0% | +256% | +222% |
Sum / dim=64x128x128x128 axi=1 | 3.48 | 3.45 | 10.46 | 3.53 | 13.72 | +0% | +1% | +200% |
Sum / dim=64x128x128x128 axi=2 | 3.50 | 3.47 | 11.52 | 3.44 | 8.44 | +0% | -1% | +229% |
Sum / dim=64x128x128x128 axi=3 | 3.47 | 3.47 | 10.42 | 6.07 | 5.91 | +0% | +75% | +200% |
SumAll / dim=64x128x128x128 | 3.50 | 3.48 | 10.84 | 3.62 | 5.15 | +0% | +3% | +209% |
SumAll / dim=1000000 | 0.38 | 0.34 | 0.09 | 0.53 | 0.11 | +12% | +38% | -75% |
SumAll / dim=1000000x128 | 3.34 | 3.35 | 10.20 | 3.52 | 4.88 | 0% | +5% | +205% |
SumAll / dim=128x1000000 | 3.41 | 3.31 | 10.01 | 3.34 | 4.83 | +2% | -1% | +193% |
M1 Max 64GB mlx 0.2.0
Operation | mlx_gpu | mlx_gpu_compile | mlx_cpu | mps | cpu | mlx_gpu_compile/mlx_gpu speedup | mlx_gpu/mps speedup | mlx_gpu/mlx_cpu speedup |
---|---|---|---|---|---|---|---|---|
Argmax / dim=64x1024x128 axi=0 | 2.08 | 1.67 | 11.14 | 4.58 | 25.92 | +25% | +119% | +434% |
Argmax / dim=64x1024x128 axi=1 | 2.20 | 1.72 | 10.85 | 1.48 | 6.47 | +27% | -32% | +393% |
Argmax / dim=64x1024x128 axi=2 | 2.19 | 1.68 | 10.62 | 1.01 | 2.35 | +30% | -53% | +385% |
Argmax / dim=64x128x1024 axi=2 | 2.11 | 1.69 | 10.58 | 0.63 | 1.92 | +25% | -70% | +401% |
BCE / dim=1000000 dim=1000000 | 0.52 | 0.35 | 6.62 | 0.48 | 1.36 | +50% | -8% | +1172% |
BCE / dim=100000x32 dim=100000x32 | 0.71 | 0.45 | 21.63 | 0.69 | 3.47 | +56% | -3% | +2953% |
BCE / dim=100000x64x2 dim=100000x64x2 | 1.98 | 0.91 | 86.42 | 1.47 | 13.99 | +118% | -25% | +4266% |
BCE / dim=128x100000 dim=128x100000 | 1.98 | 0.91 | 86.40 | 1.39 | 13.53 | +117% | -29% | +4268% |
Concat / dim=1000000x64 dim=1000000x32 axi=1 | 2.34 | 2.33 | 78.58 | 2.41 | 18.34 | +0% | +2% | +3260% |
Concat / dim=1000000x64 dim=1000000x128 axi=1 | 4.43 | 4.41 | 146.43 | 4.52 | 42.51 | +0% | +2% | +3206% |
Concat / dim=1000000x64 dim=1000000x64 axi=0 | 3.02 | 3.02 | 62.03 | 3.07 | 19.04 | +0% | +1% | +1952% |
Concat / dim=64x1000000 dim=64x1000000 axi=0 | 3.03 | 3.02 | 82.37 | 3.08 | 19.29 | +0% | +1% | +2622% |
Conv1d / dim=100x256x3 dim=8x3x3 | 0.39 | 0.39 | 0.36 | 0.40 | 2.57 | +0% | +0% | -7% |
Conv1d / dim=100x256x256 dim=8x3x256 | 4.05 | 1.63 | 8.22 | 1.80 | 69.41 | +149% | -55% | +102% |
Conv1d / dim=16x1000x80 dim=128x11x80 | 2.10 | 1.38 | 4.20 | 1.87 | 497.10 | +52% | -10% | +100% |
Conv1d / dim=16x1000x3 dim=128x11x3 | 2.27 | 0.53 | 0.59 | 0.64 | 59.95 | +326% | -71% | -74% |
Conv2d / dim=100x256x256x3 dim=8x3x3x3 | 18.90 | 18.76 | 1034.14 | 3.49 | 131.73 | +0% | -81% | +5370% |
Conv2d / dim=10x256x256x12 dim=8x3x3x12 | 9.18 | 6.97 | 419.33 | 2.23 | 18.17 | +31% | -75% | +4467% |
Conv2d / dim=1x256x256x128 dim=8x3x3x128 | 9.23 | 7.42 | 697.01 | 2.21 | 18.67 | +24% | -76% | +7449% |
Conv2d / dim=100x28x28x3 dim=8x3x3x3 | 0.98 | 0.78 | 10.66 | 0.53 | 1.45 | +25% | -46% | +988% |
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 | 2.60 | 2.25 | 116.24 | 1.46 | 7.77 | +15% | -43% | +4363% |
Gather / dim=64x256 dim=10 | 0.23 | 0.23 | 0.01 | 0.23 | 0.01 | 0% | 0% | -94% |
Gather / dim=64x256 dim=1000 | 0.33 | 0.32 | 0.03 | 0.33 | 0.12 | +4% | +0% | -89% |
Gather / dim=64x256 dim=1000000 | 11.57 | 11.54 | 28.36 | 50.98 | 46.58 | +0% | +340% | +145% |
Gather / dim=1024x32 dim=10 | 0.25 | 0.23 | 0.01 | 0.22 | 0.00 | +7% | -8% | -94% |
Gather / dim=1024x32 dim=1000 | 0.26 | 0.25 | 0.02 | 0.25 | 0.09 | +3% | -4% | -92% |
Gather / dim=1024x32 dim=1000000 | 2.42 | 1.66 | 7.20 | 6.66 | 6.73 | +45% | +175% | +197% |
LeakyReLU / dim=128x16x1024 | 0.40 | 0.30 | 1.79 | 0.37 | 0.68 | +31% | -8% | +347% |
LeakyReLU / dim=64x128x1024 | 0.67 | 0.37 | 7.02 | 0.54 | 0.59 | +83% | -19% | +941% |
Linear / dim=100x1024x32 dim=32x1024 dim=1024 | 4.35 | 3.74 | 19.60 | 3.40 | 31.02 | +16% | -21% | +350% |
Linear / dim=100x1024x64 dim=64x1024 dim=1024 | 4.71 | 4.34 | 27.89 | 5.88 | 37.61 | +8% | +24% | +492% |
Linear / dim=100x1024x256 dim=256x1024 dim=1024 | 8.88 | 8.86 | 49.44 | 24.42 | 53.09 | +0% | +174% | +456% |
Linear / dim=100x1024x512 dim=512x1024 dim=1024 | 14.83 | 14.81 | 64.98 | 47.81 | 75.07 | +0% | +222% | +338% |
Linear / dim=100x1x51200 dim=51200x1 dim=1 | 0.85 | 0.71 | 0.37 | 0.67 | 0.44 | +19% | -21% | -56% |
MatMul / dim=32x1x1000 dim=32x1000x128 | 0.31 | 0.26 | 0.84 | 0.40 | 0.87 | +17% | +29% | +173% |
MatMul / dim=1000x64x256 dim=256x32 | 0.62 | 0.60 | 1.71 | 2.02 | 2.31 | +3% | +225% | +176% |
MatMul / dim=1000x64x1024 dim=1000x1024x32 | 1.71 | 1.70 | 17.97 | 7.38 | 20.63 | +0% | +331% | +949% |
MatMul / dim=1000x1024x64 dim=1000x64x256 | 4.84 | 4.83 | 64.09 | 4.96 | 122.87 | +0% | +2% | +1224% |
MatMul / dim=64x1000000 dim=1000000x32 | 2.87 | 2.88 | 10.51 | 11.88 | 10.47 | 0% | +313% | +265% |
MatMul / dim=1000000x64 dim=64x1024 | 17.58 | 17.56 | 187.87 | 40.33 | 372.75 | +0% | +129% | +968% |
PReLU / dim=128x16x1024 dim=1 | 0.57 | 0.33 | 1.09 | 0.36 | 0.55 | +73% | -36% | +93% |
PReLU / dim=64x128x1024 dim=1 | 1.08 | 0.39 | 4.19 | 0.52 | 0.59 | +172% | -51% | +289% |
ReLU / dim=128x16x1024 | 0.32 | 0.29 | 0.31 | 0.37 | 0.63 | +10% | +15% | -3% |
ReLU / dim=64x128x1024 | 0.41 | 0.38 | 1.33 | 0.52 | 0.58 | +8% | +26% | +224% |
Scatter / dim=64x16 dim=10 | 0.23 | 0.24 | 0.01 | 0.22 | 0.00 | -5% | -5% | -93% |
Scatter / dim=64x16 dim=1000 | 0.31 | 0.29 | 0.07 | 0.25 | 0.05 | +4% | -18% | -75% |
Scatter / dim=64x16 dim=1000000 | 7.99 | 7.94 | 62.95 | 3.41 | 4.35 | +0% | -57% | +687% |
Scatter / dim=1024x32 dim=10 | 0.28 | 0.26 | 0.02 | 0.22 | 0.00 | +6% | -19% | -93% |
Scatter / dim=1024x32 dim=1000 | 0.31 | 0.29 | 0.13 | 0.26 | 0.08 | +6% | -16% | -57% |
Scatter / dim=1024x32 dim=1000000 | 15.54 | 15.51 | 118.69 | 6.72 | 6.19 | +0% | -56% | +663% |
ScatterSum / dim=64x16 dim=10 | 0.05 | 0.03 | 0.01 | nan | 0.00 | +46% | nan% | -82% |
ScatterSum / dim=64x16 dim=1000 | 0.05 | 0.03 | 0.01 | nan | 0.01 | +41% | nan% | -80% |
ScatterSum / dim=64x16 dim=1000000 | 0.05 | 0.03 | 0.01 | nan | 1.60 | +54% | nan% | -83% |
ScatterSum / dim=1024x32 dim=10 | 0.04 | 0.03 | 0.01 | nan | 0.01 | +35% | nan% | -78% |
ScatterSum / dim=1024x32 dim=1000 | 0.05 | 0.03 | 0.01 | nan | 0.01 | +41% | nan% | -81% |
ScatterSum / dim=1024x32 dim=1000000 | 0.05 | 0.03 | 0.01 | nan | 6.49 | +38% | nan% | -81% |
ScatterMax / dim=64x16 dim=10 | 0.05 | 0.03 | 0.01 | nan | 0.00 | +41% | nan% | -82% |
ScatterMax / dim=64x16 dim=1000 | 0.05 | 0.03 | 0.01 | nan | 0.01 | +37% | nan% | -82% |
ScatterMax / dim=64x16 dim=1000000 | 0.05 | 0.03 | 0.01 | nan | 1.54 | +37% | nan% | -80% |
ScatterMax / dim=1024x32 dim=10 | 0.04 | 0.03 | 0.01 | nan | 0.01 | +26% | nan% | -79% |
ScatterMax / dim=1024x32 dim=1000 | 0.05 | 0.03 | 0.01 | nan | 0.01 | +36% | nan% | -81% |
ScatterMax / dim=1024x32 dim=1000000 | 0.04 | 0.03 | 0.01 | nan | 6.55 | +29% | nan% | -79% |
SeLU / dim=128x16x1024 | 0.98 | 0.33 | 2.75 | 0.36 | 2.62 | +202% | -63% | +178% |
SeLU / dim=64x128x1024 | 2.07 | 0.40 | 11.36 | 0.57 | 9.32 | +423% | -72% | +449% |
Sigmoid / dim=128x16x1024 | 0.34 | 0.32 | 13.10 | 0.40 | 2.43 | +4% | +18% | +3784% |
Sigmoid / dim=64x128x1024 | 0.43 | 0.39 | 52.05 | 0.60 | 8.43 | +9% | +39% | +12049% |
Softmax / dim=64x1000000 axi=-1 | 5.86 | 4.49 | 53.32 | 3.30 | 33.16 | +30% | -43% | +809% |
Softmax / dim=1000000x64 axi=-1 | 5.87 | 4.48 | 53.33 | 4.89 | 34.35 | +31% | -16% | +807% |
Softmax / dim=64x16x32x1024 axi=-1 | 3.25 | 2.50 | 28.15 | 3.18 | 19.36 | +29% | -2% | +766% |
Softmax / dim=128x16x32x1024 axi=-1 | 6.13 | 4.68 | 55.99 | 5.01 | 31.86 | +30% | -18% | +813% |
Softmax / dim=1024x16x32x128 axi=-1 | 6.13 | 4.68 | 55.89 | 5.14 | 33.61 | +31% | -16% | +811% |
Softmax / dim=1024x64x32x8 axi=-1 | 1.79 | 1.44 | 14.22 | 1.79 | 21.23 | +24% | 0% | +692% |
Softplus / dim=128x16x1024 | 0.52 | 0.32 | 13.09 | 0.47 | 3.57 | +62% | -9% | +2405% |
Softplus / dim=64x128x1024 | 0.62 | 0.37 | 52.49 | 0.87 | 12.93 | +68% | +40% | +8373% |
Sort / dim=64x128x1024 axi=0 | 1.08 | 0.97 | 257.13 | 29.79 | 52.22 | +11% | +2661% | +23735% |
Sort / dim=64x128x1024 axi=1 | 1.09 | 0.98 | 257.19 | 21.35 | 43.94 | +11% | +1862% | +23537% |
Sort / dim=64x128x1024 axi=2 | 1.06 | 0.97 | 257.20 | 9.39 | 51.75 | +9% | +782% | +24073% |
Sum / dim=64x128x128x128 axi=0 | 1.75 | 1.74 | 8.96 | 5.57 | 15.39 | +0% | +218% | +411% |
Sum / dim=64x128x128x128 axi=1 | 1.76 | 1.74 | 8.84 | 1.80 | 13.53 | +0% | +2% | +403% |
Sum / dim=64x128x128x128 axi=2 | 1.74 | 1.74 | 8.84 | 1.79 | 7.31 | +0% | +2% | +406% |
Sum / dim=64x128x128x128 axi=3 | 1.74 | 1.74 | 8.83 | 3.25 | 5.18 | +0% | +86% | +406% |
SumAll / dim=64x128x128x128 | 1.74 | 1.73 | 8.82 | 1.84 | 4.50 | +0% | +5% | +405% |
SumAll / dim=1000000 | 0.32 | 0.30 | 0.06 | 0.36 | 0.10 | +7% | +15% | -81% |
SumAll / dim=1000000x128 | 1.68 | 1.67 | 8.68 | 1.93 | 4.47 | +1% | +14% | +415% |
SumAll / dim=128x1000000 | 1.68 | 1.67 | 8.95 | 1.87 | 4.35 | +0% | +11% | +434% |
M2
Operation | mlx_gpu | mlx_cpu | mps | cpu | mlx_gpu/mps speedup | mlx_gpu/mlx_cpu speedup |
---|---|---|---|---|---|---|
Argmax / dim=64x1024x128 axi=0 | 2.57 | 16.16 | 3.93 | 22.12 | +52% | +528% |
Argmax / dim=64x1024x128 axi=1 | 1.47 | 18.03 | 3.35 | 6.93 | +127% | +1126% |
Argmax / dim=64x1024x128 axi=2 | 1.44 | 13.91 | 2.39 | 2.67 | +65% | +866% |
Argmax / dim=64x128x1024 axi=2 | 1.35 | 16.74 | 1.18 | 2.32 | -12% | +1140% |
BCE / dim=1000000 dim=1000000 | 0.83 | 11.36 | 1.93 | 1.96 | +132% | +1267% |
BCE / dim=100000x32 dim=100000x32 | 1.84 | 35.85 | 6.22 | 5.58 | +238% | +1849% |
BCE / dim=100000x64x2 dim=100000x64x2 | 6.08 | 140.95 | 21.14 | 21.82 | +247% | +2219% |
BCE / dim=128x100000 dim=128x100000 | 6.10 | 141.21 | 22.96 | 23.99 | +276% | +2215% |
Concat / dim=1000000x64 dim=1000000x32 axi=1 | 8.65 | 99.86 | 8.95 | 34.82 | +3% | +1054% |
Concat / dim=1000000x64 dim=1000000x128 axi=1 | 17.19 | 221.80 | 17.74 | 62.82 | +3% | +1190% |
Concat / dim=1000000x64 dim=1000000x64 axi=0 | 11.35 | 142.18 | 11.57 | 44.15 | +1% | +1152% |
Concat / dim=64x1000000 dim=64x1000000 axi=0 | 11.37 | 180.44 | 11.78 | 43.52 | +3% | +1487% |
Conv1d / dim=100x256x3 dim=8x3x3 | 0.56 | 0.56 | 0.59 | 3.62 | +5% | 0% |
Conv1d / dim=100x256x256 dim=8x3x256 | 5.08 | 13.73 | 5.53 | 56.72 | +8% | +170% |
Conv1d / dim=16x1000x80 dim=128x11x80 | 6.01 | 8.79 | 5.61 | 392.02 | -6% | +46% |
Conv1d / dim=16x1000x3 dim=128x11x3 | 3.00 | 0.95 | 1.44 | 78.40 | -51% | -68% |
Conv2d / dim=100x256x256x3 dim=8x3x3x3 | 77.90 | 1491.19 | 10.80 | 173.72 | -86% | +1814% |
Conv2d / dim=10x256x256x12 dim=8x3x3x12 | 25.70 | 690.25 | 7.00 | 42.99 | -72% | +2585% |
Conv2d / dim=1x256x256x128 dim=8x3x3x128 | 24.70 | 1154.58 | 6.52 | 55.78 | -73% | +4574% |
Conv2d / dim=100x28x28x3 dim=8x3x3x3 | 2.00 | 17.35 | 1.17 | 1.87 | -41% | +769% |
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 | 8.41 | 175.54 | 4.20 | 10.03 | -50% | +1987% |
LeakyReLU / dim=128x16x1024 | 0.80 | 1.49 | 0.70 | 0.92 | -12% | +84% |
LeakyReLU / dim=64x128x1024 | 2.20 | 3.39 | 1.50 | 1.75 | -31% | +54% |
Linear / dim=100x1024x32 dim=32x1024 dim=1024 | 16.90 | 74.45 | 14.00 | 83.08 | -17% | +340% |
Linear / dim=100x1024x64 dim=64x1024 dim=1024 | 16.82 | 90.75 | 23.52 | 104.08 | +39% | +439% |
Linear / dim=100x1024x256 dim=256x1024 dim=1024 | 33.88 | 131.98 | 83.12 | 255.48 | +145% | +289% |
Linear / dim=100x1024x512 dim=512x1024 dim=1024 | 56.74 | 201.06 | 164.84 | 471.68 | +190% | +254% |
Linear / dim=100x1x51200 dim=51200x1 dim=1 | 0.72 | 0.30 | 0.99 | 0.78 | +37% | -57% |
MatMul / dim=32x1x1000 dim=32x1000x128 | 0.46 | 0.26 | 0.85 | 1.35 | +85% | -42% |
MatMul / dim=1000x64x256 dim=256x32 | 1.14 | 4.86 | 5.17 | 13.63 | +351% | +324% |
MatMul / dim=1000x64x1024 dim=1000x1024x32 | 4.84 | 24.73 | 19.62 | 188.50 | +305% | +410% |
MatMul / dim=1000x1024x64 dim=1000x64x256 | 17.95 | 110.02 | 20.34 | 1452.25 | +13% | +512% |
MatMul / dim=64x1000000 dim=1000000x32 | 39.95 | 21.26 | 33.21 | 49.77 | -16% | -46% |
MatMul / dim=1000000x64 dim=64x1024 | 67.87 | 562.54 | 389.42 | 2072.27 | +473% | +728% |
PReLU / dim=128x16x1024 dim=1 | 1.18 | 2.88 | 0.71 | 0.95 | -39% | +145% |
PReLU / dim=64x128x1024 dim=1 | 3.69 | 6.27 | 1.37 | 1.74 | -62% | +69% |
ReLU / dim=128x16x1024 | 0.46 | 0.66 | 0.67 | 0.93 | +44% | +42% |
ReLU / dim=64x128x1024 | 1.08 | 1.33 | 1.34 | 1.76 | +23% | +23% |
SeLU / dim=128x16x1024 | 2.39 | 7.83 | 0.71 | 3.37 | -70% | +227% |
SeLU / dim=64x128x1024 | 8.22 | 26.67 | 1.51 | 13.12 | -81% | +224% |
Sigmoid / dim=128x16x1024 | 0.46 | 21.21 | 0.76 | 3.12 | +66% | +4535% |
Sigmoid / dim=64x128x1024 | 1.07 | 84.49 | 1.50 | 11.83 | +39% | +7760% |
Softmax / dim=64x1000000 axi=-1 | 8.80 | 55.52 | 11.74 | 71.44 | +33% | +530% |
Softmax / dim=1000000x64 axi=-1 | 6.00 | 55.61 | 16.18 | 79.75 | +169% | +827% |
Softmax / dim=64x16x32x1024 axi=-1 | 3.23 | 28.86 | 12.77 | 28.85 | +294% | +792% |
Softmax / dim=128x16x32x1024 axi=-1 | 6.16 | 57.68 | 23.52 | 75.04 | +282% | +836% |
Softmax / dim=1024x16x32x128 axi=-1 | 6.13 | 55.61 | 17.27 | 75.92 | +181% | +807% |
Softmax / dim=1024x64x32x8 axi=-1 | 12.10 | 140.43 | 5.74 | 34.53 | -52% | +1060% |
Softplus / dim=128x16x1024 | 0.61 | 21.86 | 1.01 | 5.15 | +64% | +3457% |
Softplus / dim=64x128x1024 | 1.20 | 86.03 | 2.45 | 19.10 | +104% | +7069% |
Sort / dim=64x128x1024 axi=0 | 31.34 | 1619.90 | 56.94 | 78.79 | +81% | +5068% |
Sort / dim=64x128x1024 axi=1 | 16.30 | 1686.63 | 52.80 | 73.93 | +224% | +10249% |
Sort / dim=64x128x1024 axi=2 | 2.97 | 423.22 | 30.32 | 85.22 | +920% | +14145% |
Sum / dim=64x128x128x128 axi=0 | 10.23 | 25.07 | 18.06 | 18.94 | +76% | +144% |
Sum / dim=64x128x128x128 axi=1 | 6.19 | 19.27 | 6.41 | 17.45 | +3% | +211% |
Sum / dim=64x128x128x128 axi=2 | 6.17 | 17.14 | 6.26 | 11.48 | +1% | +177% |
Sum / dim=64x128x128x128 axi=3 | 14.01 | 12.03 | 11.15 | 8.90 | -20% | -14% |
SumAll / dim=64x128x128x128 | 5.75 | 10.66 | 6.72 | 8.27 | +17% | +85% |
SumAll / dim=1000000 | 0.38 | 0.10 | 0.48 | 0.10 | +27% | -73% |
SumAll / dim=1000000x128 | 5.53 | 10.20 | 6.37 | 8.07 | +15% | +84% |
SumAll / dim=128x1000000 | 5.61 | 10.20 | 6.26 | 7.99 | +11% | +81% |
M2 Pro (cores: 4E+6P+16GPU)
Operation | mlx_gpu | mlx_cpu | mps | cpu | mlx_gpu/mps speedup | mlx_gpu/mlx_cpu speedup |
---|---|---|---|---|---|---|
Argmax / dim=64x1024x128 axi=0 | 1.55 | 19.78 | 1.88 | 22.70 | +20% | +1173% |
Argmax / dim=64x1024x128 axi=1 | 0.83 | 10.20 | 1.53 | 4.78 | +83% | +1124% |
Argmax / dim=64x1024x128 axi=2 | 0.94 | 7.87 | 1.38 | 1.89 | +47% | +741% |
Argmax / dim=64x128x1024 axi=2 | 1.07 | 9.51 | 1.11 | 1.75 | +4% | +792% |
BCE / dim=1000000 dim=1000000 | 0.92 | 7.14 | 1.22 | 1.22 | +32% | +674% |
BCE / dim=100000x32 dim=100000x32 | 1.25 | 21.06 | 3.54 | 3.19 | +182% | +1580% |
BCE / dim=100000x64x2 dim=100000x64x2 | 3.09 | 81.30 | 13.98 | 13.46 | +352% | +2531% |
BCE / dim=128x100000 dim=128x100000 | 3.05 | 80.69 | 13.23 | 13.29 | +334% | +2548% |
Concat / dim=1000000x64 dim=1000000x32 axi=1 | 4.37 | 58.87 | 4.71 | 34.48 | +7% | +1247% |
Concat / dim=1000000x64 dim=1000000x128 axi=1 | 8.46 | 134.11 | 8.83 | 44.21 | +4% | +1486% |
Concat / dim=1000000x64 dim=1000000x64 axi=0 | 5.72 | 78.40 | 6.11 | 38.42 | +6% | +1270% |
Concat / dim=64x1000000 dim=64x1000000 axi=0 | 5.96 | 99.03 | 6.10 | 38.99 | +2% | +1562% |
Conv1d / dim=100x256x3 dim=8x3x3 | 1.15 | 0.37 | 0.62 | 2.42 | -45% | -67% |
Conv1d / dim=100x256x256 dim=8x3x256 | 2.68 | 8.07 | 3.15 | 61.01 | +17% | +201% |
Conv1d / dim=16x1000x80 dim=128x11x80 | 3.88 | 4.25 | 3.17 | 458.67 | -18% | +9% |
Conv1d / dim=16x1000x3 dim=128x11x3 | 3.77 | 0.53 | 1.00 | 58.17 | -73% | -85% |
Conv2d / dim=100x256x256x3 dim=8x3x3x3 | 34.84 | 854.67 | 7.60 | 140.59 | -78% | +2352% |
Conv2d / dim=10x256x256x12 dim=8x3x3x12 | 17.97 | 388.74 | 6.66 | 28.82 | -62% | +2063% |
Conv2d / dim=1x256x256x128 dim=8x3x3x128 | 12.00 | 654.79 | 5.26 | 39.17 | -56% | +5358% |
Conv2d / dim=100x28x28x3 dim=8x3x3x3 | 1.47 | 9.80 | 0.89 | 1.42 | -39% | +567% |
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 | 5.29 | 99.13 | 2.32 | 7.86 | -56% | +1775% |
LeakyReLU / dim=128x16x1024 | 0.76 | 1.20 | 0.80 | 1.10 | +6% | +58% |
LeakyReLU / dim=64x128x1024 | 1.26 | 2.70 | 1.21 | 1.34 | -4% | +113% |
Linear / dim=100x1024x32 dim=32x1024 dim=1024 | 8.95 | 42.74 | 7.19 | 51.32 | -19% | +377% |
Linear / dim=100x1024x64 dim=64x1024 dim=1024 | 8.68 | 45.12 | 12.07 | 64.84 | +39% | +419% |
Linear / dim=100x1024x256 dim=256x1024 dim=1024 | 16.92 | 59.49 | 46.26 | 144.12 | +173% | +251% |
Linear / dim=100x1024x512 dim=512x1024 dim=1024 | 28.51 | 80.04 | 92.27 | 253.60 | +223% | +180% |
Linear / dim=100x1x51200 dim=51200x1 dim=1 | 0.57 | 0.36 | 1.28 | 1.20 | +123% | -36% |
MatMul / dim=32x1x1000 dim=32x1000x128 | 0.35 | 0.29 | 0.94 | 0.75 | +169% | -17% |
MatMul / dim=1000x64x256 dim=256x32 | 0.59 | 1.93 | 3.62 | 23.10 | +517% | +228% |
MatMul / dim=1000x64x1024 dim=1000x1024x32 | 2.67 | 14.83 | 12.84 | 1009.65 | +381% | +456% |
MatMul / dim=1000x1024x64 dim=1000x64x256 | 9.25 | 77.07 | 11.62 | 2264.38 | +25% | +732% |
MatMul / dim=64x1000000 dim=1000000x32 | 41.00 | 9.28 | 19.02 | 146.63 | -53% | -77% |
MatMul / dim=1000000x64 dim=64x1024 | 34.28 | 156.61 | 88.69 | 1265.80 | +158% | +356% |
PReLU / dim=128x16x1024 dim=1 | 0.90 | 2.46 | 0.70 | 1.07 | -21% | +173% |
PReLU / dim=64x128x1024 dim=1 | 1.83 | 4.38 | 1.05 | 1.38 | -42% | +138% |
ReLU / dim=128x16x1024 | 0.30 | 0.53 | 0.58 | 0.95 | +91% | +75% |
ReLU / dim=64x128x1024 | 0.66 | 0.88 | 1.25 | 1.36 | +88% | +32% |
SeLU / dim=128x16x1024 | 1.51 | 6.16 | 0.77 | 2.69 | -49% | +307% |
SeLU / dim=64x128x1024 | 4.05 | 18.29 | 1.24 | 8.63 | -69% | +351% |
Sigmoid / dim=128x16x1024 | 0.68 | 12.36 | 0.85 | 2.34 | +25% | +1725% |
Sigmoid / dim=64x128x1024 | 0.90 | 48.84 | 1.33 | 8.12 | +48% | +5322% |
Softmax / dim=64x1000000 axi=-1 | 4.42 | 33.53 | 6.43 | 42.36 | +45% | +658% |
Softmax / dim=1000000x64 axi=-1 | 3.51 | 32.75 | 10.44 | 45.64 | +197% | +833% |
Softmax / dim=64x16x32x1024 axi=-1 | 1.70 | 17.05 | 9.39 | 17.86 | +452% | +903% |
Softmax / dim=128x16x32x1024 axi=-1 | 3.13 | 35.75 | 15.36 | 43.68 | +390% | +1042% |
Softmax / dim=1024x16x32x128 axi=-1 | 3.14 | 32.62 | 10.44 | 45.19 | +232% | +938% |
Softmax / dim=1024x64x32x8 axi=-1 | 7.56 | 81.85 | 3.00 | 21.47 | -60% | +983% |
Softplus / dim=128x16x1024 | 0.79 | 12.80 | 0.86 | 3.18 | +8% | +1518% |
Softplus / dim=64x128x1024 | 0.92 | 50.19 | 1.40 | 12.44 | +52% | +5371% |
Sort / dim=64x128x1024 axi=0 | 15.49 | 924.82 | 25.23 | 49.20 | +62% | +5871% |
Sort / dim=64x128x1024 axi=1 | 8.37 | 955.98 | 27.60 | 44.17 | +229% | +11325% |
Sort / dim=64x128x1024 axi=2 | 1.63 | 246.31 | 16.75 | 53.48 | +926% | +14986% |
Sum / dim=64x128x128x128 axi=0 | 3.22 | 15.14 | 3.71 | 14.62 | +15% | +370% |
Sum / dim=64x128x128x128 axi=1 | 3.24 | 11.50 | 3.65 | 12.97 | +12% | +254% |
Sum / dim=64x128x128x128 axi=2 | 3.03 | 10.27 | 3.53 | 8.73 | +16% | +239% |
Sum / dim=64x128x128x128 axi=3 | 6.73 | 9.31 | 6.20 | 5.12 | -7% | +38% |
SumAll / dim=64x128x128x128 | 3.26 | 9.09 | 3.64 | 4.66 | +11% | +178% |
SumAll / dim=1000000 | 0.70 | 0.06 | 0.70 | 0.14 | 0% | -90% |
SumAll / dim=1000000x128 | 3.10 | 8.59 | 3.54 | 4.60 | +14% | +177% |
SumAll / dim=128x1000000 | 3.16 | 8.64 | 3.53 | 4.49 | +11% | +173% |
M2 Max (cores: 4E+8P+38GPU)
Operation | mlx_gpu | mlx_cpu | mps | cpu | mlx_gpu/mps speedup | mlx_gpu/mlx_cpu speedup |
---|---|---|---|---|---|---|
Argmax / dim=64x1024x128 axi=0 | 0.96 | 19.74 | 1.59 | 21.86 | +64% | +1951% |
Argmax / dim=64x1024x128 axi=1 | 0.95 | 9.53 | 1.24 | 4.13 | +30% | +902% |
Argmax / dim=64x1024x128 axi=2 | 0.79 | 7.47 | 1.24 | 1.70 | +56% | +845% |
Argmax / dim=64x128x1024 axi=2 | 0.79 | 9.16 | 1.10 | 1.39 | +39% | +1055% |
BCE / dim=1000000 dim=1000000 | 0.84 | 6.87 | 1.07 | 1.09 | +28% | +719% |
BCE / dim=100000x32 dim=100000x32 | 1.06 | 19.99 | 2.84 | 2.73 | +168% | +1785% |
BCE / dim=100000x64x2 dim=100000x64x2 | 1.63 | 78.65 | 11.70 | 10.60 | +618% | +4733% |
BCE / dim=128x100000 dim=128x100000 | 1.73 | 79.53 | 11.62 | 10.64 | +570% | +4491% |
Concat / dim=1000000x64 dim=1000000x32 axi=1 | 2.34 | 56.87 | 2.95 | 17.39 | +26% | +2335% |
Concat / dim=1000000x64 dim=1000000x128 axi=1 | 4.40 | 131.50 | 5.15 | 40.52 | +17% | +2887% |
Concat / dim=1000000x64 dim=1000000x64 axi=0 | 3.00 | 76.65 | 3.78 | 18.15 | +25% | +2452% |
Concat / dim=64x1000000 dim=64x1000000 axi=0 | 3.03 | 96.23 | 3.58 | 18.10 | +18% | +3073% |
Conv1d / dim=100x256x3 dim=8x3x3 | 0.71 | 0.35 | 0.81 | 2.31 | +14% | -50% |
Conv1d / dim=100x256x256 dim=8x3x256 | 2.05 | 7.89 | 1.80 | 65.20 | -11% | +285% |
Conv1d / dim=16x1000x80 dim=128x11x80 | 2.72 | 4.11 | 2.45 | 470.47 | -10% | +50% |
Conv1d / dim=16x1000x3 dim=128x11x3 | 10.05 | 0.59 | 0.90 | 52.53 | -91% | -94% |
Conv2d / dim=100x256x256x3 dim=8x3x3x3 | 15.25 | 857.17 | 3.82 | 116.44 | -74% | +5519% |
Conv2d / dim=10x256x256x12 dim=8x3x3x12 | 6.40 | 370.56 | 2.73 | 17.94 | -57% | +5689% |
Conv2d / dim=1x256x256x128 dim=8x3x3x128 | 6.75 | 622.48 | 2.74 | 37.98 | -59% | +9123% |
Conv2d / dim=100x28x28x3 dim=8x3x3x3 | 0.98 | 9.74 | 0.88 | 1.33 | -10% | +897% |
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 | 5.01 | 98.69 | 1.80 | 7.38 | -64% | +1870% |
LeakyReLU / dim=128x16x1024 | 0.84 | 0.93 | 0.76 | 1.12 | -8% | +11% |
LeakyReLU / dim=64x128x1024 | 0.95 | 2.62 | 1.12 | 0.60 | +18% | +176% |
Linear / dim=100x1024x32 dim=32x1024 dim=1024 | 5.10 | 42.76 | 4.18 | 39.49 | -17% | +738% |
Linear / dim=100x1024x64 dim=64x1024 dim=1024 | 4.90 | 45.12 | 6.12 | 50.30 | +24% | +820% |
Linear / dim=100x1024x256 dim=256x1024 dim=1024 | 7.93 | 59.64 | 19.33 | 97.15 | +143% | +652% |
Linear / dim=100x1024x512 dim=512x1024 dim=1024 | 12.49 | 79.32 | 37.50 | 189.29 | +200% | +534% |
Linear / dim=100x1x51200 dim=51200x1 dim=1 | 1.22 | 0.32 | 1.16 | 0.68 | -5% | -74% |
MatMul / dim=32x1x1000 dim=32x1000x128 | 0.67 | 0.29 | 0.94 | 0.73 | +41% | -55% |
MatMul / dim=1000x64x256 dim=256x32 | 0.43 | 1.76 | 3.19 | 25.90 | +642% | +310% |
MatMul / dim=1000x64x1024 dim=1000x1024x32 | 1.41 | 14.42 | 7.78 | 667.52 | +450% | +920% |
MatMul / dim=1000x1024x64 dim=1000x64x256 | 4.40 | 77.69 | 6.27 | 1895.84 | +42% | +1664% |
MatMul / dim=64x1000000 dim=1000000x32 | 40.15 | 9.05 | 10.97 | 163.00 | -72% | -77% |
MatMul / dim=1000000x64 dim=64x1024 | 15.02 | 150.79 | 35.25 | 1525.49 | +134% | +903% |
PReLU / dim=128x16x1024 dim=1 | 0.90 | 1.95 | 0.75 | 1.19 | -16% | +116% |
PReLU / dim=64x128x1024 dim=1 | 1.57 | 4.34 | 1.20 | 0.57 | -23% | +176% |
ReLU / dim=128x16x1024 | 0.81 | 0.47 | 0.78 | 1.31 | -3% | -41% |
ReLU / dim=64x128x1024 | 0.61 | 1.05 | 1.10 | 0.57 | +79% | +71% |
SeLU / dim=128x16x1024 | 1.47 | 5.58 | 0.83 | 2.05 | -43% | +279% |
SeLU / dim=64x128x1024 | 2.35 | 18.42 | 1.17 | 6.97 | -50% | +683% |
Sigmoid / dim=128x16x1024 | 0.54 | 12.64 | 0.50 | 1.94 | -6% | +2238% |
Sigmoid / dim=64x128x1024 | 0.41 | 49.49 | 0.98 | 6.99 | +142% | +12116% |
Softmax / dim=64x1000000 axi=-1 | 2.35 | 33.88 | 4.05 | 27.56 | +72% | +1341% |
Softmax / dim=1000000x64 axi=-1 | 2.25 | 32.44 | 5.53 | 28.25 | +145% | +1341% |
Softmax / dim=64x16x32x1024 axi=-1 | 0.99 | 16.97 | 6.91 | 15.33 | +599% | +1617% |
Softmax / dim=128x16x32x1024 axi=-1 | 1.71 | 33.61 | 15.73 | 27.19 | +821% | +1868% |
Softmax / dim=1024x16x32x128 axi=-1 | 1.72 | 32.31 | 6.26 | 27.34 | +264% | +1778% |
Softmax / dim=1024x64x32x8 axi=-1 | 5.34 | 81.44 | 2.36 | 17.73 | -55% | +1424% |
Softplus / dim=128x16x1024 | 0.42 | 12.65 | 0.59 | 2.71 | +41% | +2943% |
Softplus / dim=64x128x1024 | 0.45 | 50.08 | 1.07 | 9.92 | +137% | +10979% |
Sort / dim=64x128x1024 axi=0 | 6.74 | 914.48 | 11.56 | 41.90 | +71% | +13469% |
Sort / dim=64x128x1024 axi=1 | 3.63 | 937.31 | 12.57 | 37.46 | +246% | +25707% |
Sort / dim=64x128x1024 axi=2 | 0.88 | 245.80 | 8.83 | 44.32 | +908% | +27959% |
Sum / dim=64x128x128x128 axi=0 | 1.72 | 15.13 | 1.96 | 15.38 | +14% | +782% |
Sum / dim=64x128x128x128 axi=1 | 1.70 | 11.54 | 2.30 | 12.89 | +35% | +580% |
Sum / dim=64x128x128x128 axi=2 | 1.66 | 10.31 | 2.35 | 8.08 | +41% | +519% |
Sum / dim=64x128x128x128 axi=3 | 3.28 | 9.24 | 3.78 | 5.26 | +15% | +181% |
SumAll / dim=64x128x128x128 | 1.71 | 9.17 | 2.26 | 4.66 | +32% | +434% |
SumAll / dim=1000000 | 0.66 | 0.06 | 0.74 | 0.11 | +12% | -90% |
SumAll / dim=1000000x128 | 1.62 | 8.57 | 2.42 | 4.55 | +49% | +429% |
SumAll / dim=128x1000000 | 1.63 | 8.61 | 2.34 | 4.60 | +43% | +427% |
M2 Ultra (cores: 8E+16P+76GPU) mlx 0.2.0
Operation | mlx_gpu | mlx_gpu_compile | mlx_cpu | mps | cpu | mlx_gpu_compile/mlx_gpu speedup | mlx_gpu/mps speedup | mlx_gpu/mlx_cpu speedup |
---|---|---|---|---|---|---|---|---|
Argmax / dim=64x1024x128 axi=0 | 1.96 | 1.71 | 9.13 | 0.73 | 33.55 | +14% | -62% | +366% |
Argmax / dim=64x1024x128 axi=1 | 1.76 | 1.67 | 9.14 | 0.61 | 2.40 | +5% | -65% | +419% |
Argmax / dim=64x1024x128 axi=2 | 1.72 | 1.67 | 9.25 | 0.67 | 1.01 | +3% | -60% | +438% |
Argmax / dim=64x128x1024 axi=2 | 1.76 | 1.67 | 9.14 | 0.41 | 0.86 | +5% | -76% | +419% |
BCE / dim=1000000 dim=1000000 | 0.47 | 0.33 | 5.78 | 0.33 | 1.04 | +41% | -30% | +1140% |
BCE / dim=100000x32 dim=100000x32 | 0.54 | 0.44 | 19.18 | 0.40 | 1.70 | +23% | -26% | +3421% |
BCE / dim=100000x64x2 dim=100000x64x2 | 0.96 | 0.50 | 77.08 | 0.76 | 5.47 | +90% | -20% | +7927% |
BCE / dim=128x100000 dim=128x100000 | 0.95 | 0.50 | 77.05 | 0.50 | 5.48 | +91% | -47% | +7990% |
Concat / dim=1000000x64 dim=1000000x32 axi=1 | 1.25 | 1.25 | 75.15 | 1.33 | 29.07 | +0% | +6% | +5918% |
Concat / dim=1000000x64 dim=1000000x128 axi=1 | 2.30 | 2.27 | 148.17 | 2.45 | 61.83 | +1% | +6% | +6344% |
Concat / dim=1000000x64 dim=1000000x64 axi=0 | 1.64 | 1.61 | 56.08 | 1.74 | 33.53 | +2% | +5% | +3313% |
Concat / dim=64x1000000 dim=64x1000000 axi=0 | 1.64 | 1.61 | 76.46 | 1.71 | 35.67 | +1% | +4% | +4575% |
Conv1d / dim=100x256x3 dim=8x3x3 | 0.38 | 0.36 | 0.39 | 0.46 | 3.04 | +6% | +19% | +2% |
Conv1d / dim=100x256x256 dim=8x3x256 | 4.06 | 0.84 | 7.20 | 0.74 | 79.68 | +386% | -81% | +77% |
Conv1d / dim=16x1000x80 dim=128x11x80 | 0.83 | 0.78 | 2.89 | 0.71 | 533.20 | +7% | -14% | +246% |
Conv1d / dim=16x1000x3 dim=128x11x3 | 3.51 | 0.34 | 0.54 | 0.41 | 60.77 | +946% | -88% | -84% |
Conv2d / dim=100x256x256x3 dim=8x3x3x3 | 12.22 | 12.57 | 793.00 | 1.70 | 140.13 | -2% | -86% | +6389% |
Conv2d / dim=10x256x256x12 dim=8x3x3x12 | 5.58 | 2.95 | 366.69 | 0.94 | 14.20 | +88% | -83% | +6470% |
Conv2d / dim=1x256x256x128 dim=8x3x3x128 | 7.14 | 3.24 | 629.08 | 0.90 | 20.01 | +120% | -87% | +8708% |
Conv2d / dim=100x28x28x3 dim=8x3x3x3 | 0.56 | 0.48 | 9.67 | 0.49 | 1.64 | +18% | -12% | +1617% |
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 | 1.44 | 1.29 | 96.38 | 0.67 | 9.37 | +11% | -53% | +6596% |
Gather / dim=64x256 dim=10 | 0.28 | 0.33 | 0.01 | 0.20 | 0.01 | -15% | -29% | -96% |
Gather / dim=64x256 dim=1000 | 0.42 | 0.37 | 0.03 | 0.37 | 0.24 | +13% | -11% | -92% |
Gather / dim=64x256 dim=1000000 | 5.13 | 4.71 | 27.09 | 20.67 | 69.37 | +9% | +302% | +427% |
Gather / dim=1024x32 dim=10 | 0.23 | 0.22 | 0.01 | 0.21 | 0.00 | +5% | -12% | -95% |
Gather / dim=1024x32 dim=1000 | 0.30 | 0.26 | 0.02 | 0.24 | 0.11 | +13% | -18% | -94% |
Gather / dim=1024x32 dim=1000000 | 1.09 | 1.03 | 6.28 | 2.85 | 1.64 | +5% | +162% | +477% |
LeakyReLU / dim=128x16x1024 | 0.34 | 0.36 | 1.53 | 0.52 | 1.74 | -6% | +52% | +350% |
LeakyReLU / dim=64x128x1024 | 0.51 | 0.42 | 6.49 | 0.56 | 0.44 | +21% | +10% | +1180% |
Linear / dim=100x1024x32 dim=32x1024 dim=1024 | 2.71 | 2.12 | 14.87 | 1.89 | 29.93 | +27% | -30% | +448% |
Linear / dim=100x1024x64 dim=64x1024 dim=1024 | 2.47 | 2.19 | 15.83 | 3.16 | 30.58 | +12% | +27% | +539% |
Linear / dim=100x1024x256 dim=256x1024 dim=1024 | 4.21 | 3.93 | 22.94 | 11.31 | 44.35 | +7% | +168% | +444% |
Linear / dim=100x1024x512 dim=512x1024 dim=1024 | 6.38 | 6.39 | 31.64 | 20.35 | 53.68 | 0% | +218% | +395% |
Linear / dim=100x1x51200 dim=51200x1 dim=1 | 0.89 | 0.72 | 0.26 | 0.70 | 0.26 | +23% | -21% | -70% |
MatMul / dim=32x1x1000 dim=32x1000x128 | 0.39 | 0.44 | 0.19 | 0.35 | 0.26 | -9% | -11% | -51% |
MatMul / dim=1000x64x256 dim=256x32 | 0.55 | 0.50 | 0.82 | 1.08 | 1.63 | +8% | +98% | +50% |
MatMul / dim=1000x64x1024 dim=1000x1024x32 | 0.91 | 0.90 | 15.18 | 5.02 | 17.84 | +0% | +449% | +1562% |
MatMul / dim=1000x1024x64 dim=1000x64x256 | 3.39 | 3.37 | 45.99 | 3.62 | 123.47 | +0% | +6% | +1255% |
MatMul / dim=64x1000000 dim=1000000x32 | 2.95 | 2.95 | 9.36 | 8.11 | 9.35 | +0% | +174% | +217% |
MatMul / dim=1000000x64 dim=64x1024 | 7.57 | 7.53 | 66.98 | 17.67 | 263.92 | +0% | +133% | +785% |
PReLU / dim=128x16x1024 dim=1 | 0.54 | 0.37 | 0.86 | 0.34 | 1.72 | +46% | -36% | +59% |
PReLU / dim=64x128x1024 dim=1 | 0.89 | 0.55 | 3.92 | 0.63 | 0.43 | +59% | -29% | +343% |
ReLU / dim=128x16x1024 | 0.46 | 0.34 | 0.27 | 0.64 | 1.82 | +34% | +39% | -41% |
ReLU / dim=64x128x1024 | 0.55 | 0.46 | 1.25 | 0.45 | 0.41 | +19% | -18% | +127% |
Scatter / dim=64x16 dim=1000 | 0.63 | 0.31 | 0.07 | 0.74 | 0.04 | +106% | +17% | -89% |
Scatter / dim=64x16 dim=1000000 | 3.95 | 3.21 | 55.13 | 1.69 | 3.31 | +23% | -57% | +1294% |
Scatter / dim=1024x32 dim=10 | 0.63 | 0.33 | 0.01 | 0.24 | 0.00 | +92% | -61% | -97% |
Scatter / dim=1024x32 dim=1000 | 0.45 | 0.44 | 0.12 | 0.52 | 0.10 | +3% | +14% | -73% |
Scatter / dim=1024x32 dim=1000000 | 6.71 | 6.24 | 103.83 | 2.96 | 4.22 | +7% | -55% | +1448% |
ScatterSum / dim=64x16 dim=10 | 0.04 | 0.03 | 0.01 | nan | 0.00 | +31% | nan% | -79% |
ScatterSum / dim=64x16 dim=1000 | 0.04 | 0.03 | 0.01 | nan | 0.01 | +32% | nan% | -78% |
ScatterSum / dim=64x16 dim=1000000 | 0.04 | 0.03 | 0.01 | nan | 1.38 | +24% | nan% | -78% |
ScatterSum / dim=1024x32 dim=10 | 0.04 | 0.03 | 0.01 | nan | 0.01 | +39% | nan% | -81% |
ScatterSum / dim=1024x32 dim=1000 | 0.04 | 0.03 | 0.01 | nan | 0.01 | +39% | nan% | -81% |
ScatterSum / dim=1024x32 dim=1000000 | 0.05 | 0.03 | 0.01 | nan | 6.93 | +44% | nan% | -79% |
ScatterMax / dim=64x16 dim=10 | 0.04 | 0.03 | 0.01 | nan | 0.00 | +29% | nan% | -78% |
ScatterMax / dim=64x16 dim=1000 | 0.04 | 0.03 | 0.01 | nan | 0.01 | +32% | nan% | -79% |
ScatterMax / dim=64x16 dim=1000000 | 0.04 | 0.03 | 0.01 | nan | 1.38 | +17% | nan% | -77% |
ScatterMax / dim=1024x32 dim=10 | 0.04 | 0.03 | 0.01 | nan | 0.01 | +31% | nan% | -78% |
ScatterMax / dim=1024x32 dim=1000 | 0.04 | 0.03 | 0.01 | nan | 0.01 | +32% | nan% | -80% |
ScatterMax / dim=1024x32 dim=1000000 | 0.04 | 0.03 | 0.01 | nan | 6.92 | +25% | nan% | -80% |
SeLU / dim=128x16x1024 | 0.96 | 0.57 | 2.59 | 0.61 | 1.91 | +68% | -36% | +169% |
SeLU / dim=64x128x1024 | 1.57 | 0.55 | 15.34 | 0.87 | 4.44 | +182% | -44% | +879% |
Sigmoid / dim=128x16x1024 | 0.39 | 0.81 | 11.76 | 0.47 | 1.81 | -52% | +20% | +2930% |
Sigmoid / dim=64x128x1024 | 0.70 | 0.84 | 46.61 | 0.73 | 3.41 | -16% | +4% | +6598% |
Softmax / dim=64x1000000 axi=-1 | 3.04 | 2.34 | 50.36 | 2.05 | 18.30 | +29% | -32% | +1557% |
Softmax / dim=1000000x64 axi=-1 | 3.08 | 2.39 | 50.81 | 2.93 | 19.76 | +28% | -4% | +1549% |
Softmax / dim=64x16x32x1024 axi=-1 | 1.86 | 1.34 | 26.83 | 2.03 | 7.07 | +38% | +9% | +1342% |
Softmax / dim=128x16x32x1024 axi=-1 | 3.15 | 2.35 | 52.72 | 3.26 | 17.29 | +33% | +3% | +1574% |
Softmax / dim=1024x16x32x128 axi=-1 | 3.15 | 2.36 | 52.82 | 2.99 | 18.05 | +33% | -4% | +1578% |
Softmax / dim=1024x64x32x8 axi=-1 | 1.49 | 0.82 | 13.40 | 0.94 | 9.24 | +81% | -36% | +802% |
Softplus / dim=128x16x1024 | 0.43 | 0.26 | 11.90 | 0.50 | 1.81 | +65% | +16% | +2698% |
Softplus / dim=64x128x1024 | 0.70 | 0.44 | 48.32 | 0.71 | 5.10 | +59% | +0% | +6755% |
Sort / dim=64x128x1024 axi=0 | 0.76 | 0.60 | 226.16 | 6.73 | 23.90 | +26% | +782% | +29546% |
Sort / dim=64x128x1024 axi=1 | 0.54 | 0.52 | 226.28 | 7.44 | 21.69 | +3% | +1274% | +41707% |
Sort / dim=64x128x1024 axi=2 | 0.76 | 0.62 | 227.19 | 4.69 | 21.90 | +22% | +513% | +29646% |
Sum / dim=64x128x128x128 axi=0 | 0.92 | 0.91 | 9.07 | 1.26 | 9.10 | +1% | +36% | +884% |
Sum / dim=64x128x128x128 axi=1 | 0.92 | 0.91 | 9.05 | 1.27 | 7.53 | +1% | +38% | +884% |
Sum / dim=64x128x128x128 axi=2 | 0.92 | 0.90 | 9.00 | 1.16 | 4.48 | +2% | +26% | +877% |
Sum / dim=64x128x128x128 axi=3 | 0.91 | 0.89 | 9.02 | 1.88 | 3.59 | +2% | +106% | +888% |
SumAll / dim=64x128x128x128 | 0.91 | 0.91 | 9.00 | 1.16 | 2.63 | +0% | +27% | +891% |
SumAll / dim=1000000 | 0.33 | 0.41 | 0.06 | 0.40 | 0.24 | -18% | +20% | -82% |
SumAll / dim=1000000x128 | 0.89 | 0.86 | 8.60 | 1.32 | 2.50 | +3% | +48% | +869% |
SumAll / dim=128x1000000 | 0.89 | 0.87 | 8.58 | 1.20 | 2.59 | +2% | +35% | +865% |
M3 (RAM: 16GB)
Operation | mlx_gpu | mlx_cpu | mps | cpu | mlx_gpu/mps speedup | mlx_gpu/mlx_cpu speedup |
---|---|---|---|---|---|---|
Argmax / dim=64x1024x128 axi=0 | 1.64 | 21.87 | 2.08 | 19.89 | +26% | +1233% |
Argmax / dim=64x1024x128 axi=1 | 1.03 | 9.30 | 1.62 | 3.32 | +58% | +806% |
Argmax / dim=64x1024x128 axi=2 | 1.08 | 6.99 | 1.94 | 2.77 | +80% | +549% |
Argmax / dim=64x128x1024 axi=2 | 1.05 | 8.34 | 1.21 | 2.41 | +15% | +696% |
BCE / dim=1000000 dim=1000000 | 0.80 | 5.31 | 1.30 | 1.30 | +63% | +566% |
BCE / dim=100000x32 dim=100000x32 | 1.91 | 18.11 | 3.64 | 3.50 | +90% | +846% |
BCE / dim=100000x64x2 dim=100000x64x2 | 6.58 | 69.43 | 14.49 | 14.17 | +120% | +954% |
BCE / dim=128x100000 dim=128x100000 | 6.92 | 70.35 | 14.92 | 13.61 | +115% | +917% |
Concat / dim=1000000x64 dim=1000000x32 axi=1 | 8.76 | 54.43 | 8.83 | 28.59 | +0% | +521% |
Concat / dim=1000000x64 dim=1000000x128 axi=1 | 18.46 | 126.17 | 18.08 | 58.87 | -2% | +583% |
Concat / dim=1000000x64 dim=1000000x64 axi=0 | 11.48 | 65.83 | 11.70 | 26.79 | +1% | +473% |
Concat / dim=64x1000000 dim=64x1000000 axi=0 | 11.39 | 86.71 | 11.78 | 26.93 | +3% | +661% |
Conv1d / dim=100x256x3 dim=8x3x3 | 0.51 | 0.33 | 0.44 | 1.90 | -13% | -36% |
Conv1d / dim=100x256x256 dim=8x3x256 | 4.31 | 8.12 | 2.40 | 41.79 | -44% | +88% |
Conv1d / dim=16x1000x80 dim=128x11x80 | 3.63 | 5.64 | 3.77 | 205.70 | +3% | +55% |
Conv1d / dim=16x1000x3 dim=128x11x3 | 0.91 | 0.58 | 1.31 | 35.53 | +43% | -36% |
Conv2d / dim=100x256x256x3 dim=8x3x3x3 | 41.44 | 730.75 | 7.45 | 98.91 | -82% | +1663% |
Conv2d / dim=10x256x256x12 dim=8x3x3x12 | 16.01 | 318.25 | 4.73 | 31.72 | -70% | +1888% |
Conv2d / dim=1x256x256x128 dim=8x3x3x128 | 18.47 | 551.98 | 6.10 | 42.98 | -66% | +2888% |
Conv2d / dim=100x28x28x3 dim=8x3x3x3 | 1.57 | 9.14 | 0.93 | 1.13 | -40% | +480% |
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 | 4.83 | 90.03 | 2.93 | 7.04 | -39% | +1763% |
LeakyReLU / dim=128x16x1024 | 0.77 | 1.15 | 0.68 | 0.76 | -11% | +49% |
LeakyReLU / dim=64x128x1024 | 2.09 | 4.96 | 1.34 | 1.38 | -35% | +137% |
Linear / dim=100x1024x32 dim=32x1024 dim=1024 | 15.25 | 63.82 | 7.10 | 66.79 | -53% | +318% |
Linear / dim=100x1024x64 dim=64x1024 dim=1024 | 15.55 | 68.92 | 7.98 | 78.16 | -48% | +343% |
Linear / dim=100x1024x256 dim=256x1024 dim=1024 | 28.76 | 96.20 | 21.89 | 202.41 | -23% | +234% |
Linear / dim=100x1024x512 dim=512x1024 dim=1024 | 47.50 | 130.18 | 41.22 | 260.27 | -13% | +174% |
Linear / dim=100x1x51200 dim=51200x1 dim=1 | 0.67 | 0.34 | 1.02 | 4.00 | +52% | -48% |
MatMul / dim=32x1x1000 dim=32x1000x128 | 0.47 | 0.15 | 0.97 | 0.68 | +107% | -68% |
MatMul / dim=1000x64x256 dim=256x32 | 1.23 | 3.40 | 5.30 | 14.38 | +330% | +176% |
MatMul / dim=1000x64x1024 dim=1000x1024x32 | 4.76 | 11.32 | 4.96 | 158.75 | +4% | +137% |
MatMul / dim=1000x1024x64 dim=1000x64x256 | 17.41 | 89.14 | 19.42 | 1214.55 | +11% | +412% |
MatMul / dim=64x1000000 dim=1000000x32 | 9.07 | 12.25 | 7.74 | 40.94 | -14% | +35% |
MatMul / dim=1000000x64 dim=64x1024 | 60.00 | 343.14 | 161.03 | 1513.56 | +168% | +471% |
PReLU / dim=128x16x1024 dim=1 | 1.14 | 1.10 | 0.67 | 0.86 | -41% | -3% |
PReLU / dim=64x128x1024 dim=1 | 3.58 | 4.41 | 1.31 | 1.36 | -63% | +23% |
ReLU / dim=128x16x1024 | 0.50 | 0.33 | 0.62 | 0.64 | +24% | -33% |
ReLU / dim=64x128x1024 | 1.03 | 2.44 | 1.31 | 1.39 | +27% | +137% |
SeLU / dim=128x16x1024 | 2.28 | 2.92 | 0.61 | 2.87 | -73% | +27% |
SeLU / dim=64x128x1024 | 8.17 | 12.51 | 1.44 | 10.90 | -82% | +53% |
Sigmoid / dim=128x16x1024 | 0.53 | 10.64 | 0.69 | 2.66 | +30% | +1901% |
Sigmoid / dim=64x128x1024 | 1.05 | 43.31 | 1.45 | 8.72 | +38% | +4022% |
Softmax / dim=64x1000000 axi=-1 | 8.85 | 42.85 | 11.65 | 37.39 | +31% | +384% |
Softmax / dim=1000000x64 axi=-1 | 6.04 | 38.42 | 12.63 | 40.49 | +109% | +535% |
Softmax / dim=64x16x32x1024 axi=-1 | 3.29 | 20.42 | 10.26 | 18.81 | +212% | +521% |
Softmax / dim=128x16x32x1024 axi=-1 | 6.32 | 41.12 | 20.14 | 37.59 | +218% | +550% |
Softmax / dim=1024x16x32x128 axi=-1 | 6.34 | 38.48 | 13.63 | 39.88 | +114% | +506% |
Softmax / dim=1024x64x32x8 axi=-1 | 7.02 | 66.84 | 4.17 | 21.11 | -40% | +852% |
Softplus / dim=128x16x1024 | 0.43 | 10.47 | 0.66 | 3.93 | +50% | +2309% |
Softplus / dim=64x128x1024 | 1.03 | 43.17 | 1.50 | 14.26 | +45% | +4101% |
Sort / dim=64x128x1024 axi=0 | 23.39 | 1015.93 | 35.02 | 59.19 | +49% | +4243% |
Sort / dim=64x128x1024 axi=1 | 12.35 | 926.84 | 32.10 | 56.93 | +159% | +7405% |
Sort / dim=64x128x1024 axi=2 | 2.27 | 230.00 | 25.06 | 64.75 | +1003% | +10025% |
Sum / dim=64x128x128x128 axi=0 | 6.42 | 12.28 | 6.40 | 19.30 | 0% | +91% |
Sum / dim=64x128x128x128 axi=1 | 6.25 | 11.50 | 6.39 | 15.18 | +2% | +84% |
Sum / dim=64x128x128x128 axi=2 | 6.28 | 11.08 | 6.34 | 8.06 | +1% | +76% |
Sum / dim=64x128x128x128 axi=3 | 8.91 | 10.09 | 7.30 | 6.55 | -18% | +13% |
SumAll / dim=64x128x128x128 | 5.76 | 10.70 | 6.38 | 5.92 | +10% | +85% |
SumAll / dim=1000000 | 0.29 | 0.05 | 0.39 | 0.07 | +36% | -82% |
SumAll / dim=1000000x128 | 5.52 | 9.80 | 6.18 | 5.81 | +12% | +77% |
SumAll / dim=128x1000000 | 5.48 | 10.61 | 6.16 | 5.73 | +12% | +93% |
M3 Pro (cores: 6E+5P+14GPU)
Operation | mlx_gpu | mlx_cpu | mps | cpu | mlx_gpu/mps speedup | mlx_gpu/mlx_cpu speedup |
---|---|---|---|---|---|---|
Argmax / dim=64x1024x128 axi=0 | 1.18 | 21.22 | 1.49 | 19.26 | +25% | +1695% |
Argmax / dim=64x1024x128 axi=1 | 0.89 | 8.85 | 1.22 | 2.35 | +36% | +890% |
Argmax / dim=64x1024x128 axi=2 | 0.97 | 6.65 | 1.16 | 1.58 | +19% | +587% |
Argmax / dim=64x128x1024 axi=2 | 0.89 | 8.12 | 1.07 | 1.37 | +21% | +816% |
BCE / dim=1000000 dim=1000000 | 0.81 | 5.96 | 1.09 | 1.04 | +34% | +634% |
BCE / dim=100000x32 dim=100000x32 | 1.76 | 17.78 | 3.01 | 2.86 | +71% | +912% |
BCE / dim=100000x64x2 dim=100000x64x2 | 4.20 | 67.99 | 11.80 | 11.81 | +180% | +1517% |
BCE / dim=128x100000 dim=128x100000 | 4.04 | 67.80 | 11.57 | 11.39 | +186% | +1576% |
Concat / dim=1000000x64 dim=1000000x32 axi=1 | 5.93 | 51.84 | 6.50 | 29.83 | +9% | +773% |
Concat / dim=1000000x64 dim=1000000x128 axi=1 | 11.57 | 118.19 | 11.91 | 46.27 | +3% | +921% |
Concat / dim=1000000x64 dim=1000000x64 axi=0 | 7.75 | 62.53 | 8.40 | 37.28 | +8% | +706% |
Concat / dim=64x1000000 dim=64x1000000 axi=0 | 7.74 | 80.78 | 8.66 | 39.03 | +11% | +943% |
Conv1d / dim=100x256x3 dim=8x3x3 | 0.80 | 0.28 | 0.80 | 1.97 | 0% | -64% |
Conv1d / dim=100x256x256 dim=8x3x256 | 2.95 | 7.32 | 1.99 | 50.02 | -32% | +147% |
Conv1d / dim=16x1000x80 dim=128x11x80 | 3.70 | 5.40 | 4.23 | 233.14 | +14% | +45% |
Conv1d / dim=16x1000x3 dim=128x11x3 | 1.15 | 0.44 | 1.26 | 47.61 | +9% | -61% |
Conv2d / dim=100x256x256x3 dim=8x3x3x3 | 28.54 | 738.77 | 5.85 | 107.76 | -79% | +2488% |
Conv2d / dim=10x256x256x12 dim=8x3x3x12 | 15.23 | 316.15 | 3.41 | 22.42 | -77% | +1975% |
Conv2d / dim=1x256x256x128 dim=8x3x3x128 | 11.07 | 512.34 | 3.73 | 30.77 | -66% | +4528% |
Conv2d / dim=100x28x28x3 dim=8x3x3x3 | 1.40 | 8.92 | 0.93 | 1.04 | -33% | +536% |
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 | 4.07 | 88.95 | 1.54 | 7.37 | -62% | +2086% |
LeakyReLU / dim=128x16x1024 | 1.31 | 0.96 | 0.99 | 0.90 | -24% | -26% |
LeakyReLU / dim=64x128x1024 | 1.78 | 2.10 | 1.54 | 1.02 | -13% | +18% |
Linear / dim=100x1024x32 dim=32x1024 dim=1024 | 11.22 | 42.51 | 5.76 | 38.86 | -48% | +278% |
Linear / dim=100x1024x64 dim=64x1024 dim=1024 | 10.79 | 46.87 | 5.71 | 51.97 | -47% | +334% |
Linear / dim=100x1024x256 dim=256x1024 dim=1024 | 20.39 | 70.91 | 15.84 | 125.66 | -22% | +247% |
Linear / dim=100x1024x512 dim=512x1024 dim=1024 | 33.41 | 103.37 | 28.62 | 240.45 | -14% | +209% |
Linear / dim=100x1x51200 dim=51200x1 dim=1 | 0.69 | 0.23 | 1.28 | 0.50 | +86% | -67% |
MatMul / dim=32x1x1000 dim=32x1000x128 | 0.87 | 0.10 | 1.18 | 0.66 | +36% | -88% |
MatMul / dim=1000x64x256 dim=256x32 | 0.83 | 2.81 | 3.93 | 16.03 | +373% | +238% |
MatMul / dim=1000x64x1024 dim=1000x1024x32 | 3.25 | 10.29 | 3.55 | 272.27 | +9% | +216% |
MatMul / dim=1000x1024x64 dim=1000x64x256 | 11.40 | 82.00 | 13.39 | 680.84 | +17% | +619% |
MatMul / dim=64x1000000 dim=1000000x32 | 39.89 | 10.89 | 6.81 | 155.98 | -82% | -72% |
MatMul / dim=1000000x64 dim=64x1024 | 40.00 | 309.51 | 106.34 | 1214.46 | +165% | +673% |
PReLU / dim=128x16x1024 dim=1 | 1.47 | 1.98 | 1.09 | 0.79 | -25% | +34% |
PReLU / dim=64x128x1024 dim=1 | 2.62 | 3.62 | 1.60 | 1.04 | -38% | +38% |
ReLU / dim=128x16x1024 | 0.82 | 0.45 | 1.19 | 0.81 | +43% | -45% |
ReLU / dim=64x128x1024 | 1.06 | 0.78 | 1.55 | 1.03 | +46% | -26% |
SeLU / dim=128x16x1024 | 2.42 | 4.90 | 1.09 | 2.06 | -55% | +101% |
SeLU / dim=64x128x1024 | 5.53 | 15.30 | 1.45 | 7.33 | -73% | +176% |
Sigmoid / dim=128x16x1024 | 0.99 | 10.68 | 0.97 | 1.83 | -2% | +979% |
Sigmoid / dim=64x128x1024 | 1.07 | 41.87 | 1.62 | 6.72 | +51% | +3800% |
Softmax / dim=64x1000000 axi=-1 | 6.00 | 29.23 | 8.74 | 34.32 | +45% | +387% |
Softmax / dim=1000000x64 axi=-1 | 4.05 | 28.19 | 9.60 | 37.89 | +136% | +595% |
Softmax / dim=64x16x32x1024 axi=-1 | 2.27 | 14.97 | 8.09 | 14.76 | +256% | +560% |
Softmax / dim=128x16x32x1024 axi=-1 | 4.25 | 29.98 | 16.10 | 35.51 | +278% | +605% |
Softmax / dim=1024x16x32x128 axi=-1 | 4.23 | 28.42 | 10.15 | 36.84 | +139% | +571% |
Softmax / dim=1024x64x32x8 axi=-1 | 6.93 | 64.48 | 3.24 | 19.37 | -53% | +830% |
Softplus / dim=128x16x1024 | 0.71 | 10.54 | 1.05 | 2.72 | +47% | +1383% |
Softplus / dim=64x128x1024 | 1.33 | 41.36 | 1.47 | 10.32 | +10% | +3012% |
Sort / dim=64x128x1024 axi=0 | 15.52 | 1012.11 | 24.23 | 46.26 | +56% | +6422% |
Sort / dim=64x128x1024 axi=1 | 8.50 | 898.97 | 22.34 | 46.41 | +162% | +10479% |
Sort / dim=64x128x1024 axi=2 | 1.99 | 224.86 | 17.53 | 47.45 | +782% | +11220% |
Sum / dim=64x128x128x128 axi=0 | 4.29 | 11.99 | 5.00 | 12.36 | +16% | +179% |
Sum / dim=64x128x128x128 axi=1 | 4.17 | 10.66 | 4.98 | 11.53 | +19% | +155% |
Sum / dim=64x128x128x128 axi=2 | 4.14 | 9.48 | 4.87 | 6.34 | +17% | +129% |
Sum / dim=64x128x128x128 axi=3 | 6.34 | 7.11 | 5.61 | 5.09 | -11% | +12% |
SumAll / dim=64x128x128x128 | 4.12 | 6.45 | 4.92 | 4.72 | +19% | +56% |
SumAll / dim=1000000 | 0.67 | 0.06 | 0.75 | 0.08 | +11% | -90% |
SumAll / dim=1000000x128 | 3.93 | 6.18 | 4.70 | 4.46 | +19% | +57% |
SumAll / dim=128x1000000 | 3.95 | 6.16 | 4.37 | 4.48 | +10% | +56% |
M3 Max (cores: 4E+12P+40GPU) mlx 0.2.0
Operation | mlx_gpu | mlx_gpu_compile | mlx_cpu | mps | cpu | mlx_gpu_compile/mlx_gpu speedup | mlx_gpu/mps speedup | mlx_gpu/mlx_cpu speedup |
---|---|---|---|---|---|---|---|---|
Argmax / dim=64x1024x128 axi=0 | 1.56 | 1.56 | 8.35 | 1.47 | 20.73 | +0% | -5% | +435% |
Argmax / dim=64x1024x128 axi=1 | 1.57 | 1.55 | 8.33 | 0.98 | 1.67 | +1% | -37% | +430% |
Argmax / dim=64x1024x128 axi=2 | 1.59 | 1.56 | 8.33 | 0.89 | 1.16 | +1% | -43% | +424% |
Argmax / dim=64x128x1024 axi=2 | 1.57 | 1.56 | 8.34 | 0.73 | 1.01 | +0% | -53% | +432% |
BCE / dim=1000000 dim=1000000 | 0.37 | 0.24 | 4.97 | 0.33 | 0.72 | +49% | -11% | +1258% |
BCE / dim=100000x32 dim=100000x32 | 0.51 | 0.27 | 16.26 | 0.44 | 1.64 | +84% | -13% | +3110% |
BCE / dim=100000x64x2 dim=100000x64x2 | 1.80 | 0.79 | 66.58 | 0.91 | 6.25 | +128% | -49% | +3597% |
BCE / dim=128x100000 dim=128x100000 | 1.80 | 0.78 | 67.05 | 0.68 | 6.29 | +130% | -61% | +3624% |
Concat / dim=1000000x64 dim=1000000x32 axi=1 | 2.43 | 2.41 | 66.73 | 2.50 | 16.76 | +0% | +2% | +2645% |
Concat / dim=1000000x64 dim=1000000x128 axi=1 | 4.58 | 4.57 | 146.32 | 4.68 | 36.45 | +0% | +2% | +3094% |
Concat / dim=1000000x64 dim=1000000x64 axi=0 | 3.12 | 3.11 | 47.47 | 3.20 | 19.43 | +0% | +2% | +1419% |
Concat / dim=64x1000000 dim=64x1000000 axi=0 | 3.13 | 3.12 | 68.53 | 3.20 | 18.91 | +0% | +2% | +2090% |
Conv1d / dim=100x256x3 dim=8x3x3 | 0.33 | 0.33 | 0.29 | 0.43 | 2.31 | -1% | +32% | -11% |
Conv1d / dim=100x256x256 dim=8x3x256 | 1.21 | 1.19 | 5.99 | 1.27 | 68.57 | +2% | +4% | +394% |
Conv1d / dim=16x1000x80 dim=128x11x80 | 1.41 | 1.03 | 2.88 | 1.49 | 502.60 | +37% | +5% | +104% |
Conv1d / dim=16x1000x3 dim=128x11x3 | 0.44 | 0.45 | 0.42 | 0.51 | 50.53 | 0% | +13% | -4% |
Conv2d / dim=100x256x256x3 dim=8x3x3x3 | 10.22 | 10.13 | 722.62 | 2.18 | 109.15 | +0% | -78% | +6971% |
Conv2d / dim=10x256x256x12 dim=8x3x3x12 | 3.99 | 3.96 | 313.85 | 1.52 | 11.02 | +0% | -61% | +7770% |
Conv2d / dim=1x256x256x128 dim=8x3x3x128 | 4.57 | 4.63 | 512.73 | 1.87 | 27.17 | -1% | -59% | +11117% |
Conv2d / dim=100x28x28x3 dim=8x3x3x3 | 0.62 | 0.60 | 9.17 | 0.50 | 1.35 | +3% | -19% | +1373% |
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 | 1.66 | 1.37 | 89.00 | 1.05 | 7.55 | +20% | -36% | +5269% |
Gather / dim=64x256 dim=10 | 0.33 | 0.21 | 0.01 | 0.58 | 0.00 | +59% | +74% | -96% |
Gather / dim=64x256 dim=1000 | 0.27 | 0.27 | 0.03 | 0.59 | 0.12 | 0% | +115% | -90% |
Gather / dim=64x256 dim=1000000 | 6.92 | 6.79 | 20.55 | 41.59 | 38.15 | +1% | +501% | +197% |
Gather / dim=1024x32 dim=10 | 0.24 | 0.22 | 0.01 | 0.56 | 0.00 | +4% | +139% | -95% |
Gather / dim=1024x32 dim=1000 | 0.23 | 0.22 | 0.01 | 0.58 | 0.08 | +2% | +152% | -93% |
Gather / dim=1024x32 dim=1000000 | 1.36 | 1.12 | 5.58 | 5.47 | 1.70 | +22% | +301% | +309% |
LeakyReLU / dim=128x16x1024 | 0.29 | 0.25 | 0.96 | 0.35 | 0.89 | +17% | +19% | +230% |
LeakyReLU / dim=64x128x1024 | 0.58 | 0.34 | 4.18 | 0.74 | 0.44 | +70% | +27% | +623% |
Linear / dim=100x1024x32 dim=32x1024 dim=1024 | 4.06 | 4.03 | 19.47 | 1.91 | 26.53 | +0% | -52% | +380% |
Linear / dim=100x1024x64 dim=64x1024 dim=1024 | 4.24 | 4.19 | 23.09 | 2.37 | 55.05 | +0% | -44% | +445% |
Linear / dim=100x1024x256 dim=256x1024 dim=1024 | 7.47 | 7.45 | 32.52 | 5.70 | 87.50 | +0% | -23% | +335% |
Linear / dim=100x1024x512 dim=512x1024 dim=1024 | 12.00 | 11.97 | 48.07 | 10.49 | 125.36 | +0% | -12% | +300% |
Linear / dim=100x1x51200 dim=51200x1 dim=1 | 0.52 | 0.63 | 0.21 | 0.75 | 0.77 | -16% | +42% | -60% |
MatMul / dim=32x1x1000 dim=32x1000x128 | 0.24 | 0.20 | 0.08 | 0.58 | 0.65 | +15% | +142% | -64% |
MatMul / dim=1000x64x256 dim=256x32 | 0.44 | 0.42 | 1.35 | 1.59 | 15.13 | +3% | +264% | +209% |
MatMul / dim=1000x64x1024 dim=1000x1024x32 | 1.36 | 1.35 | 9.85 | 1.50 | 428.05 | +0% | +10% | +624% |
MatMul / dim=1000x1024x64 dim=1000x64x256 | 4.65 | 4.63 | 43.90 | 5.21 | 1111.34 | +0% | +12% | +844% |
MatMul / dim=64x1000000 dim=1000000x32 | 2.75 | 2.76 | 7.46 | 3.89 | 67.98 | 0% | +41% | +171% |
MatMul / dim=1000000x64 dim=64x1024 | 15.77 | 15.79 | 90.78 | 32.97 | 1891.27 | 0% | +109% | +475% |
PReLU / dim=128x16x1024 dim=1 | 0.36 | 0.22 | 0.69 | 0.37 | 0.77 | +59% | +2% | +91% |
PReLU / dim=64x128x1024 dim=1 | 1.04 | 0.35 | 3.43 | 0.62 | 0.46 | +199% | -40% | +229% |
ReLU / dim=128x16x1024 | 0.32 | 0.32 | 0.25 | 0.50 | 0.81 | 0% | +58% | -21% |
ReLU / dim=64x128x1024 | 0.70 | 0.38 | 1.05 | 1.06 | 0.43 | +84% | +51% | +50% |
Scatter / dim=64x16 dim=10 | 0.66 | 0.43 | 0.01 | 0.44 | 0.00 | +52% | -33% | -98% |
Scatter / dim=64x16 dim=1000 | 0.42 | 0.38 | 0.07 | 0.47 | 0.07 | +9% | +13% | -83% |
Scatter / dim=64x16 dim=1000000 | 4.17 | 4.10 | 52.82 | 2.79 | 2.28 | +1% | -32% | +1166% |
Scatter / dim=1024x32 dim=10 | 0.25 | 0.23 | 0.01 | 0.44 | 0.00 | +9% | +76% | -94% |
Scatter / dim=1024x32 dim=1000 | 0.26 | 0.24 | 0.12 | 0.62 | 0.07 | +8% | +134% | -55% |
Scatter / dim=1024x32 dim=1000000 | 7.98 | 7.95 | 99.38 | 5.20 | 3.15 | +0% | -34% | +1145% |
ScatterSum / dim=64x16 dim=10 | 0.05 | 0.03 | 0.01 | nan | 0.00 | +60% | nan% | -82% |
ScatterSum / dim=64x16 dim=1000 | 0.04 | 0.03 | 0.01 | nan | 0.01 | +48% | nan% | -80% |
ScatterSum / dim=64x16 dim=1000000 | 0.04 | 0.03 | 0.01 | nan | 1.18 | +54% | nan% | -81% |
ScatterSum / dim=1024x32 dim=10 | 0.04 | 0.03 | 0.01 | nan | 0.01 | +50% | nan% | -80% |
ScatterSum / dim=1024x32 dim=1000 | 0.04 | 0.03 | 0.01 | nan | 0.01 | +49% | nan% | -81% |
ScatterSum / dim=1024x32 dim=1000000 | 0.04 | 0.03 | 0.01 | nan | 6.13 | +51% | nan% | -80% |
ScatterMax / dim=64x16 dim=10 | 0.04 | 0.03 | 0.01 | nan | 0.00 | +55% | nan% | -80% |
ScatterMax / dim=64x16 dim=1000 | 0.04 | 0.03 | 0.01 | nan | 0.00 | +55% | nan% | -82% |
ScatterMax / dim=64x16 dim=1000000 | 0.04 | 0.03 | 0.01 | nan | 1.21 | +60% | nan% | -81% |
ScatterMax / dim=1024x32 dim=10 | 0.04 | 0.03 | 0.01 | nan | 0.01 | +56% | nan% | -82% |
ScatterMax / dim=1024x32 dim=1000 | 0.05 | 0.03 | 0.01 | nan | 0.01 | +46% | nan% | -80% |
ScatterMax / dim=1024x32 dim=1000000 | 0.04 | 0.03 | 0.01 | nan | 6.16 | +43% | nan% | -81% |
SeLU / dim=128x16x1024 | 0.64 | 0.22 | 1.86 | 0.33 | 1.44 | +187% | -47% | +191% |
SeLU / dim=64x128x1024 | 2.06 | 0.36 | 8.42 | 0.63 | 4.41 | +468% | -69% | +309% |
Sigmoid / dim=128x16x1024 | 0.25 | 0.24 | 10.47 | 0.35 | 1.46 | +6% | +39% | +4066% |
Sigmoid / dim=64x128x1024 | 0.35 | 0.34 | 42.10 | 0.63 | 4.23 | +3% | +78% | +11895% |
Softmax / dim=64x1000000 axi=-1 | 5.78 | 4.35 | 43.94 | 3.26 | 21.04 | +32% | -43% | +660% |
Softmax / dim=1000000x64 axi=-1 | 5.78 | 4.36 | 43.81 | 4.02 | 20.18 | +32% | -30% | +657% |
Softmax / dim=64x16x32x1024 axi=-1 | 3.13 | 2.38 | 23.21 | 2.73 | 7.76 | +31% | -12% | +641% |
Softmax / dim=128x16x32x1024 axi=-1 | 6.05 | 4.56 | 46.01 | 4.37 | 19.09 | +32% | -27% | +660% |
Softmax / dim=1024x16x32x128 axi=-1 | 6.06 | 4.56 | 46.22 | 4.28 | 19.98 | +33% | -29% | +662% |
Softmax / dim=1024x64x32x8 axi=-1 | 1.68 | 1.32 | 11.56 | 1.74 | 10.91 | +26% | +3% | +589% |
Softplus / dim=128x16x1024 | 0.32 | 0.24 | 10.35 | 0.39 | 1.84 | +33% | +22% | +3156% |
Softplus / dim=64x128x1024 | 0.39 | 0.34 | 41.69 | 0.62 | 6.16 | +13% | +60% | +10602% |
Sort / dim=64x128x1024 axi=0 | 0.77 | 0.75 | 229.46 | 9.03 | 35.77 | +2% | +1074% | +29757% |
Sort / dim=64x128x1024 axi=1 | 0.77 | 0.76 | 229.35 | 8.63 | 33.35 | +1% | +1015% | +29535% |
Sort / dim=64x128x1024 axi=2 | 0.77 | 0.76 | 229.35 | 6.45 | 28.18 | +1% | +737% | +29646% |
Sum / dim=64x128x128x128 axi=0 | 1.55 | 1.55 | 6.54 | 1.69 | 9.59 | 0% | +8% | +321% |
Sum / dim=64x128x128x128 axi=1 | 1.54 | 1.52 | 6.52 | 1.66 | 7.84 | +1% | +8% | +324% |
Sum / dim=64x128x128x128 axi=2 | 1.54 | 1.54 | 6.53 | 1.62 | 5.63 | +0% | +5% | +323% |
Sum / dim=64x128x128x128 axi=3 | 1.55 | 1.53 | 6.53 | 2.62 | 4.91 | +1% | +68% | +320% |
SumAll / dim=64x128x128x128 | 1.54 | 1.54 | 6.52 | 1.65 | 4.38 | +0% | +7% | +323% |
SumAll / dim=1000000 | 0.23 | 0.21 | 0.05 | 0.30 | 0.08 | +8% | +29% | -77% |
SumAll / dim=1000000x128 | 1.50 | 1.50 | 6.30 | 1.67 | 4.19 | +0% | +11% | +318% |
SumAll / dim=128x1000000 | 1.49 | 1.49 | 6.25 | 1.66 | 4.22 | +0% | +10% | +318% |
Tesla V100 32Go (PCIe) / Intel Xeon Gold 5120 14 cores, 28 threads @ 2.2GHz (Skylake), 60Go
Operation | cpu | cuda | cuda/cpu speedup |
---|---|---|---|
Argmax / dim=64x1024x128 axi=0 | 72.96 | 0.09 | +80554% |
Argmax / dim=64x1024x128 axi=1 | 25.43 | 0.11 | +22457% |
Argmax / dim=64x1024x128 axi=2 | 20.35 | 0.12 | +16705% |
Argmax / dim=64x128x1024 axi=2 | 18.61 | 0.09 | +21652% |
BCE / dim=1000000 dim=1000000 | 26.32 | 0.07 | +38400% |
BCE / dim=100000x32 dim=100000x32 | 83.80 | 0.11 | +74241% |
BCE / dim=100000x64x2 dim=100000x64x2 | 341.08 | 0.30 | +115358% |
BCE / dim=128x100000 dim=128x100000 | 341.55 | 0.29 | +116168% |
Concat / dim=1000000x64 dim=1000000x32 axi=1 | 277.05 | 1.19 | +23209% |
Concat / dim=1000000x64 dim=1000000x128 axi=1 | 571.38 | 2.44 | +23338% |
Concat / dim=1000000x64 dim=1000000x64 axi=0 | 336.66 | 1.53 | +21834% |
Concat / dim=64x1000000 dim=64x1000000 axi=0 | 338.85 | 1.53 | +22065% |
Conv1d / dim=100x256x3 dim=8x3x3 | 0.71 | 0.08 | +744% |
Conv1d / dim=100x256x256 dim=8x3x256 | 37.65 | 0.65 | +5736% |
Conv1d / dim=16x1000x80 dim=128x11x80 | 79.15 | 0.47 | +16703% |
Conv1d / dim=16x1000x3 dim=128x11x3 | 3.32 | 0.12 | +2596% |
Conv2d / dim=100x256x256x3 dim=8x3x3x3 | 150.83 | 1.99 | +7480% |
Conv2d / dim=10x256x256x12 dim=8x3x3x12 | 46.20 | 0.65 | +7000% |
Conv2d / dim=1x256x256x128 dim=8x3x3x128 | 49.60 | 1.41 | +3409% |
Conv2d / dim=100x28x28x3 dim=8x3x3x3 | 1.64 | 0.06 | +2490% |
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 | 15.40 | 0.25 | +6068% |
Gather / dim=64x256 dim=10 | 0.02 | 0.04 | -49% |
Gather / dim=64x256 dim=1000 | 0.15 | 0.04 | +293% |
Gather / dim=64x256 dim=1000000 | 538.57 | 2.08 | +25740% |
Gather / dim=1024x32 dim=10 | 0.02 | 0.04 | -43% |
Gather / dim=1024x32 dim=1000 | 0.05 | 0.04 | +37% |
Gather / dim=1024x32 dim=1000000 | 40.86 | 0.31 | +13082% |
LeakyReLU / dim=128x16x1024 | 1.07 | 0.05 | +2185% |
LeakyReLU / dim=64x128x1024 | 9.94 | 0.11 | +9104% |
Linear / dim=100x1024x32 dim=32x1024 dim=1024 | 311.56 | 1.87 | +16520% |
Linear / dim=100x1024x64 dim=64x1024 dim=1024 | 454.32 | 2.04 | +22143% |
Linear / dim=100x1024x256 dim=256x1024 dim=1024 | 1287.08 | 5.26 | +24360% |
Linear / dim=100x1024x512 dim=512x1024 dim=1024 | 2455.45 | 9.67 | +25289% |
Linear / dim=100x1x51200 dim=51200x1 dim=1 | 1.49 | 0.08 | +1728% |
MatMul / dim=32x1x1000 dim=32x1000x128 | 1.18 | 0.06 | +1948% |
MatMul / dim=1000x64x256 dim=256x32 | 28.43 | 0.39 | +7143% |
MatMul / dim=1000x64x1024 dim=1000x1024x32 | 101.83 | 1.59 | +6318% |
MatMul / dim=1000x1024x64 dim=1000x64x256 | 1440.54 | 2.82 | +50943% |
MatMul / dim=64x1000000 dim=1000000x32 | 125.20 | 0.65 | +19267% |
MatMul / dim=1000000x64 dim=64x1024 | 5749.54 | 11.27 | +50919% |
PReLU / dim=128x16x1024 dim=1 | 1.07 | 0.05 | +2223% |
PReLU / dim=64x128x1024 dim=1 | 10.03 | 0.11 | +9285% |
ReLU / dim=128x16x1024 | 1.08 | 0.05 | +2232% |
ReLU / dim=64x128x1024 | 9.91 | 0.11 | +9104% |
Scatter / dim=64x16 dim=10 | 0.02 | 0.03 | -40% |
Scatter / dim=64x16 dim=1000 | 0.05 | 0.03 | +44% |
Scatter / dim=64x16 dim=1000000 | 20.02 | 0.25 | +7856% |
Scatter / dim=1024x32 dim=10 | 0.02 | 0.03 | -40% |
Scatter / dim=1024x32 dim=1000 | 0.05 | 0.03 | +65% |
Scatter / dim=1024x32 dim=1000000 | 21.35 | 0.32 | +6560% |
ScatterSum / dim=64x16 dim=10 | 0.02 | 0.05 | -47% |
ScatterSum / dim=64x16 dim=1000 | 0.03 | 0.05 | -25% |
ScatterSum / dim=64x16 dim=1000000 | 7.61 | 0.19 | +3937% |
ScatterSum / dim=1024x32 dim=10 | 0.03 | 0.05 | -39% |
ScatterSum / dim=1024x32 dim=1000 | 0.04 | 0.05 | -9% |
ScatterSum / dim=1024x32 dim=1000000 | 17.76 | 0.12 | +14453% |
ScatterMax / dim=64x16 dim=10 | nan | nan | nan% |
ScatterMax / dim=64x16 dim=1000 | nan | nan | nan% |
ScatterMax / dim=64x16 dim=1000000 | nan | nan | nan% |
ScatterMax / dim=1024x32 dim=10 | nan | nan | nan% |
ScatterMax / dim=1024x32 dim=1000 | nan | nan | nan% |
ScatterMax / dim=1024x32 dim=1000000 | nan | nan | nan% |
SeLU / dim=128x16x1024 | 3.64 | 0.05 | +7573% |
SeLU / dim=64x128x1024 | 19.47 | 0.11 | +17827% |
Sigmoid / dim=128x16x1024 | 2.75 | 0.05 | +5743% |
Sigmoid / dim=64x128x1024 | 16.17 | 0.11 | +14728% |
Softmax / dim=64x1000000 axi=-1 | 287.67 | 1.63 | +17518% |
Softmax / dim=1000000x64 axi=-1 | 274.14 | 0.66 | +41722% |
Softmax / dim=64x16x32x1024 axi=-1 | 140.28 | 0.36 | +38520% |
Softmax / dim=128x16x32x1024 axi=-1 | 280.85 | 0.70 | +40182% |
Softmax / dim=1024x16x32x128 axi=-1 | 279.57 | 0.68 | +40838% |
Softmax / dim=1024x64x32x8 axi=-1 | 66.09 | 0.20 | +33030% |
Softplus / dim=128x16x1024 | 7.93 | 0.05 | +16119% |
Softplus / dim=64x128x1024 | 36.33 | 0.11 | +32760% |
Sort / dim=64x128x1024 axi=0 | 567.66 | 3.87 | +14573% |
Sort / dim=64x128x1024 axi=1 | 409.30 | 1.89 | +21560% |
Sort / dim=64x128x1024 axi=2 | 602.02 | 2.02 | +29717% |
Sum / dim=64x128x128x128 axi=0 | 88.65 | 0.71 | +12444% |
Sum / dim=64x128x128x128 axi=1 | 85.62 | 0.68 | +12433% |
Sum / dim=64x128x128x128 axi=2 | 50.47 | 0.70 | +7131% |
Sum / dim=64x128x128x128 axi=3 | 44.98 | 0.73 | +6063% |
SumAll / dim=64x128x128x128 | 40.95 | 0.67 | +6043% |
SumAll / dim=1000000 | 0.24 | 0.04 | +516% |
SumAll / dim=1000000x128 | 39.07 | 0.65 | +5878% |
SumAll / dim=128x1000000 | 39.02 | 0.65 | +5859% |
Tesla V100 32Go (NVLink) / Intel Xeon Gold 6148 20 cores, 40 threads @ 2.4 GHz (Skylake), 60Go
Operation | cpu | cuda | cuda/cpu speedup |
---|---|---|---|
Argmax / dim=64x1024x128 axi=0 | 57.05 | 0.09 | +64142% |
Argmax / dim=64x1024x128 axi=1 | 23.09 | 0.11 | +21144% |
Argmax / dim=64x1024x128 axi=2 | 17.13 | 0.11 | +14831% |
Argmax / dim=64x128x1024 axi=2 | 15.64 | 0.08 | +18718% |
BCE / dim=1000000 dim=1000000 | 22.45 | 0.06 | +35452% |
BCE / dim=100000x32 dim=100000x32 | 72.05 | 0.11 | +65232% |
BCE / dim=100000x64x2 dim=100000x64x2 | 330.74 | 0.29 | +112871% |
BCE / dim=128x100000 dim=128x100000 | 318.96 | 0.29 | +108848% |
Concat / dim=1000000x64 dim=1000000x32 axi=1 | 364.51 | 1.18 | +30747% |
Concat / dim=1000000x64 dim=1000000x128 axi=1 | 841.81 | 2.42 | +34620% |
Concat / dim=1000000x64 dim=1000000x64 axi=0 | 452.68 | 1.53 | +29421% |
Concat / dim=64x1000000 dim=64x1000000 axi=0 | 466.35 | 1.53 | +30291% |
Conv1d / dim=100x256x3 dim=8x3x3 | 0.52 | 0.07 | +600% |
Conv1d / dim=100x256x256 dim=8x3x256 | 47.17 | 0.62 | +7567% |
Conv1d / dim=16x1000x80 dim=128x11x80 | 39.96 | 0.45 | +8734% |
Conv1d / dim=16x1000x3 dim=128x11x3 | 1.83 | 0.11 | +1520% |
Conv2d / dim=100x256x256x3 dim=8x3x3x3 | 207.97 | 1.90 | +10863% |
Conv2d / dim=10x256x256x12 dim=8x3x3x12 | 21.61 | 0.62 | +3404% |
Conv2d / dim=1x256x256x128 dim=8x3x3x128 | 23.09 | 1.35 | +1608% |
Conv2d / dim=100x28x28x3 dim=8x3x3x3 | 1.15 | 0.06 | +1844% |
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 | 10.62 | 0.24 | +4407% |
Gather / dim=64x256 dim=10 | 0.02 | 0.04 | -56% |
Gather / dim=64x256 dim=1000 | 0.14 | 0.04 | +266% |
Gather / dim=64x256 dim=1000000 | 846.18 | 2.03 | +41625% |
Gather / dim=1024x32 dim=10 | 0.02 | 0.04 | -54% |
Gather / dim=1024x32 dim=1000 | 0.05 | 0.04 | +26% |
Gather / dim=1024x32 dim=1000000 | 122.97 | 0.30 | +41252% |
LeakyReLU / dim=128x16x1024 | 1.45 | 0.05 | +2933% |
LeakyReLU / dim=64x128x1024 | 32.44 | 0.11 | +29854% |
Linear / dim=100x1024x32 dim=32x1024 dim=1024 | 546.20 | 1.83 | +29689% |
Linear / dim=100x1024x64 dim=64x1024 dim=1024 | 521.31 | 1.98 | +26244% |
Linear / dim=100x1024x256 dim=256x1024 dim=1024 | 827.56 | 5.03 | +16363% |
Linear / dim=100x1024x512 dim=512x1024 dim=1024 | 1437.48 | 8.98 | +15914% |
Linear / dim=100x1x51200 dim=51200x1 dim=1 | 1.39 | 0.08 | +1690% |
MatMul / dim=32x1x1000 dim=32x1000x128 | 1.07 | 0.05 | +1843% |
MatMul / dim=1000x64x256 dim=256x32 | 19.51 | 0.38 | +5093% |
MatMul / dim=1000x64x1024 dim=1000x1024x32 | 69.19 | 1.52 | +4448% |
MatMul / dim=1000x1024x64 dim=1000x64x256 | 1241.71 | 2.69 | +46020% |
MatMul / dim=64x1000000 dim=1000000x32 | 84.51 | 0.65 | +12958% |
MatMul / dim=1000000x64 dim=64x1024 | 4573.74 | 10.77 | +42376% |
PReLU / dim=128x16x1024 dim=1 | 1.42 | 0.04 | +3123% |
PReLU / dim=64x128x1024 dim=1 | 29.68 | 0.11 | +27829% |
ReLU / dim=128x16x1024 | 1.45 | 0.04 | +3135% |
ReLU / dim=64x128x1024 | 26.68 | 0.11 | +24987% |
Scatter / dim=64x16 dim=10 | 0.01 | 0.03 | -49% |
Scatter / dim=64x16 dim=1000 | 0.03 | 0.03 | +28% |
Scatter / dim=64x16 dim=1000000 | 16.90 | 0.24 | +6954% |
Scatter / dim=1024x32 dim=10 | 0.01 | 0.03 | -49% |
Scatter / dim=1024x32 dim=1000 | 0.04 | 0.03 | +37% |
Scatter / dim=1024x32 dim=1000000 | 20.12 | 0.31 | +6358% |
ScatterSum / dim=64x16 dim=10 | 0.02 | 0.04 | -62% |
ScatterSum / dim=64x16 dim=1000 | 0.02 | 0.04 | -38% |
ScatterSum / dim=64x16 dim=1000000 | 12.03 | 0.18 | +6606% |
ScatterSum / dim=1024x32 dim=10 | 0.02 | 0.04 | -50% |
ScatterSum / dim=1024x32 dim=1000 | 0.03 | 0.04 | -21% |
ScatterSum / dim=1024x32 dim=1000000 | 28.86 | 0.12 | +23973% |
ScatterMax / dim=64x16 dim=10 | nan | nan | nan% |
ScatterMax / dim=64x16 dim=1000 | nan | nan | nan% |
ScatterMax / dim=64x16 dim=1000000 | nan | nan | nan% |
ScatterMax / dim=1024x32 dim=10 | nan | nan | nan% |
ScatterMax / dim=1024x32 dim=1000 | nan | nan | nan% |
ScatterMax / dim=1024x32 dim=1000000 | nan | nan | nan% |
SeLU / dim=128x16x1024 | 3.54 | 0.05 | +7533% |
SeLU / dim=64x128x1024 | 38.35 | 0.11 | +35654% |
Sigmoid / dim=128x16x1024 | 2.63 | 0.05 | +5376% |
Sigmoid / dim=64x128x1024 | 37.01 | 0.11 | +34194% |
Softmax / dim=64x1000000 axi=-1 | 333.85 | 1.62 | +20570% |
Softmax / dim=1000000x64 axi=-1 | 302.71 | 0.65 | +46345% |
Softmax / dim=64x16x32x1024 axi=-1 | 157.12 | 0.36 | +43248% |
Softmax / dim=128x16x32x1024 axi=-1 | 318.22 | 0.69 | +45815% |
Softmax / dim=1024x16x32x128 axi=-1 | 304.80 | 0.68 | +44679% |
Softmax / dim=1024x64x32x8 axi=-1 | 105.89 | 0.20 | +53796% |
Softplus / dim=128x16x1024 | 7.13 | 0.05 | +14967% |
Softplus / dim=64x128x1024 | 51.29 | 0.11 | +46704% |
Sort / dim=64x128x1024 axi=0 | 417.40 | 3.70 | +11178% |
Sort / dim=64x128x1024 axi=1 | 360.67 | 1.81 | +19796% |
Sort / dim=64x128x1024 axi=2 | 490.86 | 1.94 | +25253% |
Sum / dim=64x128x128x128 axi=0 | 92.62 | 0.70 | +13038% |
Sum / dim=64x128x128x128 axi=1 | 85.36 | 0.68 | +12454% |
Sum / dim=64x128x128x128 axi=2 | 49.97 | 0.69 | +7165% |
Sum / dim=64x128x128x128 axi=3 | 49.57 | 0.71 | +6847% |
SumAll / dim=64x128x128x128 | 42.74 | 0.66 | +6368% |
SumAll / dim=1000000 | 0.21 | 0.03 | +528% |
SumAll / dim=1000000x128 | 40.79 | 0.65 | +6190% |
SumAll / dim=128x1000000 | 40.77 | 0.65 | +6189% |
RTX4090 (Desktop) / 10th Gen Intel Core i9-10940X @ 3.30GHz 128GB
Operation | cpu | cuda | cuda/cpu speedup |
---|---|---|---|
Argmax / dim=64x1024x128 axi=0 | 15.92 | 0.04 | +39326% |
Argmax / dim=64x1024x128 axi=1 | 4.11 | 0.05 | +7998% |
Argmax / dim=64x1024x128 axi=2 | 3.46 | 0.05 | +6615% |
Argmax / dim=64x128x1024 axi=2 | 3.20 | 0.04 | +8608% |
BCE / dim=1000000 dim=1000000 | 3.84 | 0.05 | +7086% |
BCE / dim=100000x32 dim=100000x32 | 10.57 | 0.05 | +19148% |
BCE / dim=100000x64x2 dim=100000x64x2 | 40.02 | 0.22 | +17863% |
BCE / dim=128x100000 dim=128x100000 | 40.52 | 0.22 | +17958% |
Concat / dim=1000000x64 dim=1000000x32 axi=1 | 38.01 | 0.97 | +3814% |
Concat / dim=1000000x64 dim=1000000x128 axi=1 | 86.99 | 1.79 | +4759% |
Concat / dim=1000000x64 dim=1000000x64 axi=0 | 41.79 | 1.20 | +3376% |
Concat / dim=64x1000000 dim=64x1000000 axi=0 | 41.53 | 1.22 | +3317% |
Conv1d / dim=100x256x3 dim=8x3x3 | 0.33 | 0.07 | +359% |
Conv1d / dim=100x256x256 dim=8x3x256 | 5.11 | 0.30 | +1596% |
Conv1d / dim=16x1000x80 dim=128x11x80 | 5.24 | 0.13 | +4017% |
Conv1d / dim=16x1000x3 dim=128x11x3 | 0.69 | 0.11 | +507% |
Conv2d / dim=100x256x256x3 dim=8x3x3x3 | 22.06 | 0.74 | +2862% |
Conv2d / dim=10x256x256x12 dim=8x3x3x12 | 4.15 | 0.18 | +2247% |
Conv2d / dim=1x256x256x128 dim=8x3x3x128 | 3.45 | 0.15 | +2202% |
Conv2d / dim=100x28x28x3 dim=8x3x3x3 | 0.56 | 0.06 | +832% |
Conv2d / dim=1000x28x28x3 dim=8x3x3x3 | 2.79 | 0.11 | +2449% |
Gather / dim=64x256 dim=10 | 0.02 | 0.03 | -33% |
Gather / dim=64x256 dim=1000 | 0.11 | 0.04 | +156% |
Gather / dim=64x256 dim=1000000 | 103.61 | 1.23 | +8337% |
Gather / dim=1024x32 dim=10 | 0.03 | 0.05 | -45% |
Gather / dim=1024x32 dim=1000 | 0.06 | 0.04 | +23% |
Gather / dim=1024x32 dim=1000000 | 14.67 | 0.19 | +7595% |
LeakyReLU / dim=128x16x1024 | 0.43 | 0.03 | +1519% |
LeakyReLU / dim=64x128x1024 | 4.45 | 0.04 | +11604% |
Linear / dim=100x1024x32 dim=32x1024 dim=1024 | 53.35 | 0.59 | +8943% |
Linear / dim=100x1024x64 dim=64x1024 dim=1024 | 56.93 | 0.70 | +8089% |
Linear / dim=100x1024x256 dim=256x1024 dim=1024 | 79.14 | 1.26 | +6166% |
Linear / dim=100x1024x512 dim=512x1024 dim=1024 | 121.64 | 2.46 | +4854% |
Linear / dim=100x1x51200 dim=51200x1 dim=1 | 0.27 | 0.05 | +401% |
MatMul / dim=32x1x1000 dim=32x1000x128 | 0.16 | 0.05 | +251% |
MatMul / dim=1000x64x256 dim=256x32 | 2.53 | 0.07 | +3323% |
MatMul / dim=1000x64x1024 dim=1000x1024x32 | 8.35 | 0.73 | +1051% |
MatMul / dim=1000x1024x64 dim=1000x64x256 | 108.79 | 1.60 | +6689% |
MatMul / dim=64x1000000 dim=1000000x32 | 9.42 | 0.50 | +1791% |
MatMul / dim=1000000x64 dim=64x1024 | 395.58 | 5.23 | +7468% |
PReLU / dim=128x16x1024 dim=1 | 0.39 | 0.03 | +1103% |
PReLU / dim=64x128x1024 dim=1 | 4.18 | 0.05 | +7858% |
ReLU / dim=128x16x1024 | 0.63 | 0.03 | +2102% |
ReLU / dim=64x128x1024 | 4.32 | 0.04 | +10970% |
Scatter / dim=64x16 dim=10 | 0.01 | 0.02 | -54% |
Scatter / dim=64x16 dim=1000 | 0.03 | 0.02 | +30% |
Scatter / dim=64x16 dim=1000000 | 5.03 | 0.14 | +3531% |
Scatter / dim=1024x32 dim=10 | 0.02 | 0.02 | -36% |
Scatter / dim=1024x32 dim=1000 | 0.04 | 0.02 | +47% |
Scatter / dim=1024x32 dim=1000000 | 5.92 | 0.17 | +3423% |
ScatterSum / dim=64x16 dim=10 | 0.02 | 0.05 | -62% |
ScatterSum / dim=64x16 dim=1000 | 0.03 | 0.04 | -37% |
ScatterSum / dim=64x16 dim=1000000 | 6.75 | 0.11 | +5789% |
ScatterSum / dim=1024x32 dim=10 | 0.04 | 0.04 | 0% |
ScatterSum / dim=1024x32 dim=1000 | 0.06 | 0.05 | +16% |
ScatterSum / dim=1024x32 dim=1000000 | 16.28 | 0.09 | +17776% |
ScatterMax / dim=64x16 dim=10 | 0.02 | 0.04 | -53% |
ScatterMax / dim=64x16 dim=1000 | 0.02 | 0.04 | -48% |
ScatterMax / dim=64x16 dim=1000000 | 6.60 | 0.19 | +3439% |
ScatterMax / dim=1024x32 dim=10 | 0.04 | 0.04 | +1% |
ScatterMax / dim=1024x32 dim=1000 | 0.04 | 0.04 | -7% |
ScatterMax / dim=1024x32 dim=1000000 | 16.46 | 0.12 | +13167% |
SeLU / dim=128x16x1024 | 0.77 | 0.04 | +1917% |
SeLU / dim=64x128x1024 | 4.65 | 0.04 | +11906% |
Sigmoid / dim=128x16x1024 | 0.67 | 0.05 | +1367% |
Sigmoid / dim=64x128x1024 | 4.60 | 0.05 | +9782% |
Softmax / dim=64x1000000 axi=-1 | 37.95 | 1.16 | +3157% |
Softmax / dim=1000000x64 axi=-1 | 27.42 | 0.59 | +4530% |
Softmax / dim=64x16x32x1024 axi=-1 | 14.56 | 0.32 | +4490% |
Softmax / dim=128x16x32x1024 axi=-1 | 28.07 | 0.64 | +4269% |
Softmax / dim=1024x16x32x128 axi=-1 | 27.67 | 0.62 | +4343% |
Softmax / dim=1024x64x32x8 axi=-1 | 30.83 | 0.18 | +17281% |
Softplus / dim=128x16x1024 | 1.28 | 0.05 | +2674% |
Softplus / dim=64x128x1024 | 5.73 | 0.04 | +14673% |
Sort / dim=64x128x1024 axi=0 | 42.75 | 1.34 | +3095% |
Sort / dim=64x128x1024 axi=1 | 49.05 | 0.89 | +5388% |
Sort / dim=64x128x1024 axi=2 | 48.21 | 0.47 | +10100% |
Sum / dim=64x128x128x128 axi=0 | 15.71 | 0.62 | +2435% |
Sum / dim=64x128x128x128 axi=1 | 13.05 | 0.62 | +1997% |
Sum / dim=64x128x128x128 axi=2 | 10.13 | 0.63 | +1515% |
Sum / dim=64x128x128x128 axi=3 | 9.89 | 0.61 | +1515% |
SumAll / dim=64x128x128x128 | 9.43 | 0.61 | +1453% |
SumAll / dim=1000000 | 0.04 | 0.03 | +30% |
SumAll / dim=1000000x128 | 9.09 | 0.58 | +1460% |
SumAll / dim=128x1000000 | 9.22 | 0.59 | +1450% |