Skip to content

Latest commit

 

History

History
551 lines (409 loc) · 35.3 KB

aws.md

File metadata and controls

551 lines (409 loc) · 35.3 KB

Example results on AWS

I ran these on a p3.2xlarge AWS EC2 instance with the following specs:

  • 8 vCPUs
  • 61 GB memory
  • 1 NVIDIA Tesla P100 GPU

Software stack:

  • Ubuntu 22.04

  • Python 3.10.8

  • CUDA 11.4

  • Packages pulled from pip

  • Backend versions:

    aesara==2.8.9
    cupy==11.4.0
    jax==0.4.1
    numba==0.56.4
    numpy==1.23.5
    taichi==1.3.0
    torch==1.13.1
    tensorflow==2.11.0

Contents

Equation of state

An equation consisting of >100 terms with no data dependencies and only elementary math. This benchmark should represent a best-case scenario for vector instructions and GPU performance.

CPU

$ taskset -c 0 python run.py benchmarks/equation_of_state/

benchmarks.equation_of_state
============================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ       
------------------------------------------------------------------------------------------------------------------
       4,096  pytorch       10,000     0.000     0.000     0.000     0.000     0.000     0.000     0.011     6.560
       4,096  jax           10,000     0.000     0.000     0.000     0.000     0.000     0.000     0.008     6.515
       4,096  numba         10,000     0.000     0.000     0.000     0.000     0.000     0.000     0.012     3.760
       4,096  taichi        10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.008     3.007
       4,096  aesara        10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.012     2.535
       4,096  tensorflow    10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.012     2.123
       4,096  numpy         10,000     0.002     0.000     0.002     0.002     0.002     0.002     0.009     1.000

      16,384  pytorch       10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.009     7.007
      16,384  jax           10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.009     6.092
      16,384  tensorflow    10,000     0.002     0.000     0.002     0.002     0.002     0.002     0.013     4.393
      16,384  numba         10,000     0.002     0.000     0.002     0.002     0.002     0.002     0.009     3.988
      16,384  taichi         1,000     0.002     0.000     0.002     0.002     0.002     0.002     0.009     3.359
      16,384  aesara        10,000     0.003     0.000     0.003     0.003     0.003     0.003     0.013     2.938
      16,384  numpy          1,000     0.008     0.001     0.007     0.007     0.008     0.008     0.021     1.000

      65,536  pytorch        1,000     0.005     0.000     0.004     0.005     0.005     0.005     0.012     6.202
      65,536  jax            1,000     0.005     0.000     0.004     0.005     0.005     0.005     0.012     6.190
      65,536  tensorflow     1,000     0.005     0.001     0.005     0.005     0.005     0.005     0.017     5.797
      65,536  numba          1,000     0.007     0.001     0.007     0.007     0.007     0.007     0.019     3.849
      65,536  taichi         1,000     0.009     0.001     0.009     0.009     0.009     0.009     0.024     3.243
      65,536  aesara         1,000     0.010     0.000     0.010     0.010     0.010     0.010     0.022     2.888
      65,536  numpy          1,000     0.028     0.003     0.027     0.027     0.028     0.028     0.080     1.000

     262,144  pytorch        1,000     0.013     0.002     0.012     0.013     0.013     0.013     0.036    13.655
     262,144  tensorflow     1,000     0.015     0.001     0.014     0.014     0.015     0.015     0.035    12.085
     262,144  jax            1,000     0.016     0.001     0.014     0.016     0.016     0.016     0.043    11.357
     262,144  numba            100     0.027     0.000     0.026     0.026     0.027     0.027     0.027     6.617
     262,144  taichi           100     0.031     0.004     0.030     0.030     0.030     0.030     0.068     5.753
     262,144  aesara           100     0.035     0.000     0.035     0.035     0.035     0.036     0.036     4.988
     262,144  numpy            100     0.176     0.005     0.165     0.174     0.176     0.180     0.196     1.000

   1,048,576  pytorch          100     0.056     0.000     0.055     0.056     0.056     0.056     0.060    12.895
   1,048,576  jax              100     0.068     0.005     0.065     0.065     0.065     0.071     0.084    10.578
   1,048,576  tensorflow       100     0.070     0.005     0.065     0.065     0.068     0.072     0.084    10.407
   1,048,576  numba            100     0.111     0.001     0.109     0.111     0.111     0.111     0.114     6.523
   1,048,576  taichi           100     0.132     0.000     0.131     0.131     0.131     0.132     0.133     5.500
   1,048,576  aesara           100     0.146     0.001     0.144     0.146     0.146     0.146     0.148     4.947
   1,048,576  numpy             10     0.723     0.011     0.714     0.719     0.720     0.723     0.754     1.000

   4,194,304  pytorch           10     0.231     0.000     0.231     0.231     0.231     0.231     0.231    17.234
   4,194,304  tensorflow        10     0.334     0.001     0.331     0.334     0.334     0.334     0.335    11.927
   4,194,304  jax               10     0.335     0.001     0.334     0.334     0.335     0.335     0.338    11.870
   4,194,304  numba             10     0.444     0.001     0.443     0.444     0.444     0.445     0.446     8.961
   4,194,304  taichi            10     0.529     0.002     0.528     0.528     0.529     0.529     0.534     7.525
   4,194,304  aesara            10     0.586     0.001     0.585     0.585     0.585     0.586     0.589     6.796
   4,194,304  numpy             10     3.981     0.032     3.951     3.957     3.972     3.997     4.058     1.000

(time in wall seconds, less is better)

$ taskset -c 0 python run.py benchmarks/equation_of_state/ -s 16777216

benchmarks.equation_of_state
============================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ       
------------------------------------------------------------------------------------------------------------------
  16,777,216  pytorch           10     0.976     0.004     0.971     0.972     0.976     0.978     0.985    16.375
  16,777,216  tensorflow        10     1.277     0.003     1.272     1.275     1.276     1.278     1.284    12.518
  16,777,216  jax               10     1.310     0.003     1.305     1.308     1.311     1.311     1.315    12.200
  16,777,216  numba             10     1.741     0.003     1.739     1.739     1.740     1.745     1.745     9.177
  16,777,216  taichi            10     1.985     0.002     1.982     1.983     1.986     1.988     1.988     8.048
  16,777,216  aesara            10     2.329     0.005     2.322     2.326     2.328     2.332     2.339     6.861
  16,777,216  numpy             10    15.980     0.071    15.908    15.930    15.951    16.024    16.122     1.000

GPU

$ for backend in cupy jax pytorch taichi tensorflow; do CUDA_VISIBLE_DEVICES="0" python run.py benchmarks/equation_of_state/ --gpu -b $backend -b numpy; done

benchmarks.equation_of_state
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ       
------------------------------------------------------------------------------------------------------------------
       4,096  numpy         10,000     0.002     0.001     0.002     0.002     0.002     0.002     0.016     1.000
       4,096  cupy           1,000     0.008     0.001     0.008     0.008     0.008     0.008     0.018     0.208

      16,384  numpy          1,000     0.007     0.001     0.007     0.007     0.007     0.008     0.017     1.000
      16,384  cupy           1,000     0.008     0.001     0.008     0.008     0.008     0.008     0.018     0.901

      65,536  cupy           1,000     0.008     0.001     0.008     0.008     0.008     0.008     0.017     3.311
      65,536  numpy          1,000     0.028     0.002     0.026     0.026     0.027     0.027     0.040     1.000

     262,144  cupy           1,000     0.008     0.001     0.008     0.008     0.008     0.008     0.022    21.066
     262,144  numpy            100     0.176     0.012     0.111     0.176     0.176     0.179     0.185     1.000

   1,048,576  cupy             100     0.011     0.001     0.011     0.011     0.011     0.011     0.020    64.168
   1,048,576  numpy             10     0.716     0.005     0.710     0.713     0.716     0.718     0.724     1.000

   4,194,304  cupy             100     0.040     0.000     0.040     0.040     0.040     0.040     0.040    99.265
   4,194,304  numpy             10     3.956     0.041     3.924     3.928     3.932     3.961     4.045     1.000

(time in wall seconds, less is better)

benchmarks.equation_of_state
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ       
------------------------------------------------------------------------------------------------------------------
       4,096  jax           10,000     0.000     0.000     0.000     0.000     0.000     0.000     0.012    12.803
       4,096  numpy         10,000     0.002     0.000     0.002     0.002     0.002     0.002     0.014     1.000

      16,384  jax           10,000     0.000     0.000     0.000     0.000     0.000     0.000     0.011    60.102
      16,384  numpy          1,000     0.008     0.001     0.007     0.007     0.007     0.008     0.017     1.000

      65,536  jax           10,000     0.000     0.001     0.000     0.000     0.000     0.000     0.013   199.058
      65,536  numpy          1,000     0.029     0.006     0.026     0.027     0.027     0.028     0.052     1.000

     262,144  jax            1,000     0.000     0.000     0.000     0.000     0.000     0.000     0.009   810.167
     262,144  numpy            100     0.179     0.010     0.110     0.175     0.178     0.180     0.205     1.000

   1,048,576  jax            1,000     0.000     0.001     0.000     0.000     0.000     0.000     0.010  1686.376
   1,048,576  numpy             10     0.728     0.007     0.713     0.725     0.729     0.732     0.736     1.000

   4,194,304  jax              100     0.001     0.000     0.001     0.001     0.001     0.001     0.002  3202.356
   4,194,304  numpy             10     3.999     0.063     3.893     3.945     4.017     4.030     4.110     1.000

(time in wall seconds, less is better)

benchmarks.equation_of_state
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ       
------------------------------------------------------------------------------------------------------------------
       4,096  pytorch       10,000     0.000     0.000     0.000     0.000     0.000     0.000     0.010     8.881
       4,096  numpy          1,000     0.002     0.000     0.002     0.002     0.002     0.002     0.009     1.000

      16,384  pytorch       10,000     0.000     0.000     0.000     0.000     0.000     0.000     0.010    35.893
      16,384  numpy          1,000     0.008     0.001     0.007     0.007     0.008     0.008     0.016     1.000

      65,536  pytorch       10,000     0.000     0.000     0.000     0.000     0.000     0.000     0.010   147.702
      65,536  numpy            100     0.032     0.005     0.027     0.027     0.033     0.036     0.047     1.000

     262,144  pytorch        1,000     0.000     0.000     0.000     0.000     0.000     0.000     0.010   732.045
     262,144  numpy            100     0.197     0.005     0.188     0.192     0.198     0.202     0.210     1.000

   1,048,576  pytorch        1,000     0.000     0.000     0.000     0.000     0.000     0.000     0.008  1524.343
   1,048,576  numpy             10     0.739     0.009     0.728     0.733     0.737     0.739     0.757     1.000

   4,194,304  pytorch          100     0.002     0.000     0.002     0.002     0.002     0.002     0.002  2410.544
   4,194,304  numpy             10     4.029     0.026     4.000     4.015     4.019     4.035     4.092     1.000

(time in wall seconds, less is better)

benchmarks.equation_of_state
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ       
------------------------------------------------------------------------------------------------------------------
       4,096  taichi        10,000     0.000     0.001     0.000     0.000     0.000     0.000     0.016    21.539
       4,096  numpy         10,000     0.002     0.001     0.002     0.002     0.002     0.002     0.018     1.000

      16,384  taichi        10,000     0.000     0.001     0.000     0.000     0.000     0.000     0.016    74.277
      16,384  numpy          1,000     0.008     0.001     0.007     0.007     0.008     0.008     0.023     1.000

      65,536  taichi        10,000     0.000     0.001     0.000     0.000     0.000     0.000     0.015   308.928
      65,536  numpy          1,000     0.029     0.006     0.026     0.027     0.027     0.028     0.076     1.000

     262,144  taichi         1,000     0.000     0.001     0.000     0.000     0.000     0.000     0.012  1517.780
     262,144  numpy            100     0.178     0.007     0.174     0.175     0.176     0.178     0.202     1.000

   1,048,576  taichi         1,000     0.000     0.001     0.000     0.000     0.000     0.000     0.012  3093.831
   1,048,576  numpy             10     0.719     0.012     0.710     0.711     0.713     0.720     0.742     1.000

   4,194,304  taichi           100     0.001     0.000     0.001     0.001     0.001     0.001     0.001  5964.008
   4,194,304  numpy             10     3.934     0.011     3.917     3.924     3.934     3.943     3.949     1.000

(time in wall seconds, less is better)

benchmarks.equation_of_state
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ       
------------------------------------------------------------------------------------------------------------------
       4,096  tensorflow    10,000     0.001     0.000     0.000     0.000     0.001     0.001     0.011     3.284
       4,096  numpy         10,000     0.002     0.000     0.002     0.002     0.002     0.002     0.012     1.000

      16,384  tensorflow    10,000     0.001     0.000     0.000     0.000     0.001     0.001     0.010    14.329
      16,384  numpy          1,000     0.007     0.000     0.007     0.007     0.007     0.008     0.009     1.000

      65,536  tensorflow     1,000     0.001     0.000     0.000     0.000     0.001     0.001     0.009    50.879
      65,536  numpy            100     0.026     0.001     0.026     0.026     0.026     0.027     0.029     1.000

     262,144  tensorflow     1,000     0.001     0.000     0.000     0.000     0.001     0.001     0.001   233.719
     262,144  numpy            100     0.119     0.009     0.112     0.115     0.116     0.120     0.147     1.000

   1,048,576  tensorflow     1,000     0.001     0.000     0.001     0.001     0.001     0.001     0.001  1150.812
   1,048,576  numpy             10     0.674     0.010     0.667     0.667     0.669     0.673     0.699     1.000

   4,194,304  tensorflow       100     0.001     0.000     0.001     0.001     0.001     0.001     0.001  5013.081
   4,194,304  numpy             10     3.929     0.040     3.884     3.888     3.930     3.963     3.983     1.000

(time in wall seconds, less is better)

Isoneutral mixing

A more balanced routine with many data dependencies (stencil operations), and tensor shapes of up to 5 dimensions. This is the most expensive part of Veros, so in a way this is the benchmark that interests me the most.

CPU

$ taskset -c 0 python run.py benchmarks/isoneutral_mixing/

benchmarks.isoneutral_mixing
============================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ       
------------------------------------------------------------------------------------------------------------------
       4,096  numba         10,000     0.001     0.001     0.001     0.001     0.001     0.001     0.029     3.252
       4,096  taichi         1,000     0.001     0.001     0.001     0.001     0.001     0.001     0.027     3.213
       4,096  jax            1,000     0.001     0.001     0.001     0.001     0.001     0.001     0.026     3.197
       4,096  aesara         1,000     0.003     0.001     0.003     0.003     0.003     0.003     0.027     1.459
       4,096  numpy          1,000     0.004     0.002     0.004     0.004     0.004     0.004     0.032     1.000
       4,096  pytorch        1,000     0.004     0.001     0.004     0.004     0.004     0.005     0.029     0.990

      16,384  taichi         1,000     0.006     0.001     0.006     0.006     0.006     0.006     0.029     2.665
      16,384  jax            1,000     0.006     0.001     0.006     0.006     0.006     0.006     0.036     2.407
      16,384  numba          1,000     0.006     0.001     0.006     0.006     0.006     0.006     0.030     2.366
      16,384  aesara         1,000     0.010     0.001     0.010     0.010     0.010     0.011     0.038     1.492
      16,384  pytorch        1,000     0.011     0.001     0.011     0.011     0.011     0.011     0.035     1.394
      16,384  numpy          1,000     0.015     0.001     0.015     0.015     0.015     0.015     0.040     1.000

      65,536  taichi           100     0.025     0.000     0.024     0.025     0.025     0.025     0.025     2.388
      65,536  jax              100     0.027     0.000     0.027     0.027     0.027     0.028     0.028     2.146
      65,536  numba            100     0.028     0.000     0.028     0.028     0.028     0.028     0.028     2.081
      65,536  pytorch          100     0.039     0.000     0.038     0.038     0.038     0.039     0.039     1.527
      65,536  aesara           100     0.040     0.003     0.039     0.039     0.040     0.040     0.067     1.471
      65,536  numpy            100     0.059     0.000     0.058     0.059     0.059     0.059     0.060     1.000

     262,144  taichi           100     0.105     0.002     0.104     0.105     0.105     0.105     0.129     1.970
     262,144  jax              100     0.108     0.001     0.107     0.107     0.107     0.108     0.109     1.923
     262,144  numba            100     0.109     0.001     0.108     0.109     0.109     0.109     0.111     1.895
     262,144  pytorch          100     0.139     0.002     0.137     0.138     0.138     0.140     0.149     1.486
     262,144  aesara           100     0.147     0.001     0.145     0.146     0.147     0.147     0.153     1.407
     262,144  numpy             10     0.207     0.003     0.202     0.205     0.206     0.208     0.215     1.000

   1,048,576  taichi            10     0.411     0.000     0.411     0.411     0.411     0.411     0.413     2.325
   1,048,576  numba             10     0.468     0.001     0.468     0.468     0.468     0.468     0.470     2.041
   1,048,576  jax               10     0.623     0.003     0.621     0.621     0.622     0.624     0.630     1.534
   1,048,576  pytorch           10     0.698     0.007     0.690     0.695     0.696     0.699     0.712     1.370
   1,048,576  aesara            10     0.706     0.007     0.693     0.703     0.705     0.707     0.718     1.355
   1,048,576  numpy             10     0.956     0.010     0.950     0.951     0.952     0.957     0.984     1.000

   4,194,304  taichi            10     1.658     0.002     1.656     1.657     1.657     1.658     1.661     3.031
   4,194,304  numba             10     2.374     0.003     2.369     2.373     2.374     2.375     2.380     2.116
   4,194,304  jax               10     2.974     0.006     2.968     2.970     2.971     2.977     2.988     1.689
   4,194,304  aesara            10     3.662     0.013     3.650     3.653     3.656     3.663     3.694     1.372
   4,194,304  pytorch           10     4.187     0.103     3.974     4.133     4.193     4.227     4.363     1.200
   4,194,304  numpy             10     5.024     0.034     4.993     5.002     5.009     5.025     5.091     1.000

(time in wall seconds, less is better)

$ taskset -c 0 python run.py benchmarks/isoneutral_mixing/ -s 16777216

benchmarks.isoneutral_mixing
============================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ       
------------------------------------------------------------------------------------------------------------------
  16,777,216  numba             10     9.473     0.040     9.392     9.476     9.485     9.496     9.514     2.711
  16,777,216  taichi            10     9.845     0.038     9.818     9.822     9.826     9.840     9.923     2.609
  16,777,216  jax               10    12.243     0.044    12.155    12.235    12.247    12.281    12.292     2.098
  16,777,216  aesara            10    15.542     0.067    15.460    15.476    15.547    15.566    15.662     1.652
  16,777,216  numpy             10    25.681     0.157    25.448    25.566    25.692    25.776    25.957     1.000
  16,777,216  pytorch           10    28.955     0.054    28.869    28.937    28.948    28.958    29.098     0.887

(time in wall seconds, less is better)

GPU

$ for backend in cupy jax pytorch; do CUDA_VISIBLE_DEVICES="0" python run.py benchmarks/isoneutral_mixing/ --gpu -b $backend -b numpy; done

benchmarks.isoneutral_mixing
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ       
------------------------------------------------------------------------------------------------------------------
       4,096  numpy          1,000     0.004     0.001     0.004     0.004     0.004     0.004     0.027     1.000
       4,096  cupy           1,000     0.014     0.001     0.014     0.014     0.014     0.014     0.037     0.292

      16,384  cupy           1,000     0.014     0.001     0.014     0.014     0.014     0.014     0.037     1.054
      16,384  numpy          1,000     0.015     0.001     0.015     0.015     0.015     0.015     0.040     1.000

      65,536  cupy             100     0.014     0.002     0.014     0.014     0.014     0.014     0.037     3.836
      65,536  numpy            100     0.055     0.002     0.054     0.054     0.055     0.055     0.066     1.000

     262,144  cupy             100     0.015     0.001     0.014     0.014     0.014     0.014     0.023    16.322
     262,144  numpy             10     0.237     0.007     0.228     0.229     0.238     0.244     0.247     1.000

   1,048,576  cupy              10     0.015     0.001     0.014     0.015     0.015     0.015     0.018    72.092
   1,048,576  numpy             10     1.070     0.002     1.067     1.068     1.070     1.072     1.073     1.000

   4,194,304  cupy              10     0.051     0.000     0.050     0.050     0.051     0.051     0.051    98.410
   4,194,304  numpy             10     4.974     0.029     4.945     4.954     4.960     4.981     5.037     1.000

(time in wall seconds, less is better)

benchmarks.isoneutral_mixing
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ       
------------------------------------------------------------------------------------------------------------------
       4,096  jax            1,000     0.000     0.002     0.000     0.000     0.000     0.000     0.023     8.668
       4,096  numpy          1,000     0.004     0.001     0.004     0.004     0.004     0.004     0.027     1.000

      16,384  jax            1,000     0.000     0.001     0.000     0.000     0.000     0.000     0.023    33.054
      16,384  numpy          1,000     0.015     0.001     0.015     0.015     0.015     0.015     0.043     1.000

      65,536  jax              100     0.001     0.000     0.000     0.000     0.001     0.001     0.004   100.245
      65,536  numpy            100     0.055     0.001     0.054     0.054     0.055     0.055     0.062     1.000

     262,144  jax              100     0.002     0.001     0.002     0.002     0.002     0.002     0.011   118.359
     262,144  numpy            100     0.231     0.006     0.213     0.227     0.230     0.234     0.251     1.000

   1,048,576  jax               10     0.009     0.001     0.008     0.009     0.009     0.010     0.010   114.054
   1,048,576  numpy             10     1.062     0.011     1.051     1.056     1.058     1.067     1.086     1.000

   4,194,304  jax               10     0.025     0.000     0.024     0.025     0.025     0.025     0.025   199.321
   4,194,304  numpy             10     4.954     0.054     4.914     4.924     4.935     4.946     5.104     1.000

(time in wall seconds, less is better)

benchmarks.isoneutral_mixing
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ       
------------------------------------------------------------------------------------------------------------------
       4,096  numpy          1,000     0.004     0.001     0.004     0.004     0.004     0.004     0.027     1.000
       4,096  pytorch        1,000     0.006     0.001     0.005     0.005     0.005     0.006     0.029     0.775

      16,384  pytorch        1,000     0.006     0.002     0.005     0.005     0.005     0.006     0.029     2.709
      16,384  numpy          1,000     0.015     0.000     0.015     0.015     0.015     0.015     0.024     1.000

      65,536  pytorch          100     0.006     0.000     0.005     0.006     0.006     0.006     0.006     9.853
      65,536  numpy            100     0.055     0.001     0.055     0.055     0.055     0.056     0.066     1.000

     262,144  pytorch          100     0.006     0.000     0.006     0.006     0.006     0.006     0.008    38.100
     262,144  numpy             10     0.227     0.009     0.202     0.226     0.230     0.233     0.236     1.000

   1,048,576  pytorch           10     0.008     0.000     0.008     0.008     0.008     0.008     0.008   134.397
   1,048,576  numpy             10     1.086     0.011     1.074     1.076     1.084     1.096     1.103     1.000

   4,194,304  pytorch           10     0.022     0.000     0.022     0.022     0.022     0.022     0.023   223.333
   4,194,304  numpy             10     5.021     0.027     4.988     5.000     5.016     5.044     5.068     1.000

(time in wall seconds, less is better)

$ CUDA_VISIBLE_DEVICES="0" python run.py benchmarks/isoneutral_mixing/ --gpu -b taichi -b numpy -s 1_048_576

benchmarks.isoneutral_mixing
============================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ       
------------------------------------------------------------------------------------------------------------------
   1,048,576  taichi            10     0.101     0.004     0.096     0.096     0.103     0.104     0.104    10.831
   1,048,576  numpy             10     1.089     0.010     1.073     1.085     1.089     1.092     1.110     1.000

(time in wall seconds, less is better)

Turbulent kinetic energy

This routine consists of some stencil operations and some linear algebra (a tridiagonal matrix solver), which cannot be vectorized.

CPU

$ taskset -c 0 python run.py benchmarks/turbulent_kinetic_energy/

benchmarks.turbulent_kinetic_energy
===================================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ       
------------------------------------------------------------------------------------------------------------------
       4,096  jax            1,000     0.000     0.000     0.000     0.000     0.000     0.000     0.010     6.161
       4,096  numba          1,000     0.001     0.000     0.001     0.001     0.001     0.001     0.010     2.180
       4,096  pytorch        1,000     0.002     0.001     0.002     0.002     0.002     0.002     0.016     1.062
       4,096  numpy          1,000     0.002     0.000     0.002     0.002     0.002     0.002     0.003     1.000

      16,384  jax            1,000     0.002     0.000     0.002     0.002     0.002     0.002     0.011     3.961
      16,384  numba          1,000     0.004     0.000     0.004     0.004     0.004     0.004     0.013     2.000
      16,384  pytorch        1,000     0.005     0.001     0.004     0.004     0.004     0.005     0.017     1.635
      16,384  numpy          1,000     0.007     0.001     0.007     0.007     0.007     0.008     0.017     1.000

      65,536  jax              100     0.008     0.001     0.008     0.008     0.008     0.008     0.018     3.114
      65,536  numba            100     0.012     0.000     0.012     0.012     0.012     0.012     0.013     2.091
      65,536  pytorch          100     0.015     0.000     0.015     0.015     0.015     0.016     0.016     1.661
      65,536  numpy            100     0.026     0.000     0.024     0.025     0.026     0.026     0.027     1.000

     262,144  jax              100     0.030     0.000     0.030     0.030     0.030     0.031     0.032     2.976
     262,144  numba            100     0.040     0.000     0.040     0.040     0.040     0.040     0.042     2.258
     262,144  pytorch          100     0.051     0.001     0.050     0.051     0.051     0.051     0.054     1.777
     262,144  numpy            100     0.091     0.001     0.089     0.090     0.090     0.091     0.097     1.000

   1,048,576  numba             10     0.163     0.002     0.160     0.161     0.163     0.164     0.165     2.694
   1,048,576  jax               10     0.167     0.004     0.158     0.166     0.167     0.169     0.172     2.625
   1,048,576  pytorch           10     0.254     0.004     0.250     0.252     0.253     0.254     0.262     1.723
   1,048,576  numpy             10     0.438     0.004     0.435     0.435     0.436     0.440     0.446     1.000

   4,194,304  numba             10     0.827     0.014     0.800     0.821     0.826     0.836     0.849     2.578
   4,194,304  jax               10     1.073     0.008     1.063     1.068     1.072     1.078     1.088     1.985
   4,194,304  pytorch           10     1.691     0.054     1.630     1.643     1.689     1.709     1.810     1.260
   4,194,304  numpy             10     2.131     0.018     2.119     2.120     2.124     2.135     2.181     1.000

(time in wall seconds, less is better)

$ taskset -c 0 python run.py benchmarks/turbulent_kinetic_energy/ -s 16777216

benchmarks.turbulent_kinetic_energy
===================================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ       
------------------------------------------------------------------------------------------------------------------
  16,777,216  numba             10     3.708     0.015     3.686     3.693     3.715     3.721     3.728     3.117
  16,777,216  jax               10     4.734     0.008     4.724     4.728     4.732     4.740     4.749     2.441
  16,777,216  pytorch           10     9.745     0.040     9.696     9.707     9.735     9.790     9.801     1.186
  16,777,216  numpy             10    11.558     0.041    11.515    11.532    11.549    11.568    11.667     1.000

(time in wall seconds, less is better)

GPU

$ for backend in jax pytorch; do CUDA_VISIBLE_DEVICES="0" python run.py benchmarks/turbulent_kinetic_energy/ --gpu -b $backend -b numpy; done

benchmarks.turbulent_kinetic_energy
===================================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ       
------------------------------------------------------------------------------------------------------------------
       4,096  jax            1,000     0.000     0.000     0.000     0.000     0.000     0.000     0.001     6.956
       4,096  numpy          1,000     0.002     0.000     0.002     0.002     0.002     0.002     0.003     1.000

      16,384  jax            1,000     0.000     0.000     0.000     0.000     0.000     0.000     0.001    18.489
      16,384  numpy          1,000     0.007     0.000     0.007     0.007     0.007     0.008     0.008     1.000

      65,536  jax              100     0.001     0.000     0.000     0.001     0.001     0.001     0.001    44.388
      65,536  numpy            100     0.026     0.000     0.025     0.025     0.026     0.026     0.027     1.000

     262,144  jax              100     0.001     0.000     0.001     0.001     0.001     0.001     0.002    64.117
     262,144  numpy            100     0.091     0.002     0.089     0.089     0.090     0.092     0.095     1.000

   1,048,576  jax               10     0.005     0.000     0.005     0.005     0.005     0.005     0.005    93.975
   1,048,576  numpy             10     0.493     0.007     0.488     0.489     0.489     0.500     0.506     1.000

   4,194,304  jax               10     0.020     0.000     0.019     0.020     0.020     0.020     0.020   109.825
   4,194,304  numpy             10     2.159     0.036     2.115     2.128     2.154     2.179     2.230     1.000

(time in wall seconds, less is better)

benchmarks.turbulent_kinetic_energy
===================================
Running on GPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ       
------------------------------------------------------------------------------------------------------------------
       4,096  numpy          1,000     0.002     0.000     0.002     0.002     0.002     0.002     0.006     1.000
       4,096  pytorch        1,000     0.005     0.001     0.005     0.005     0.005     0.005     0.010     0.498

      16,384  pytorch        1,000     0.005     0.001     0.005     0.005     0.005     0.005     0.009     1.432
      16,384  numpy          1,000     0.008     0.001     0.007     0.008     0.008     0.008     0.011     1.000

      65,536  pytorch          100     0.006     0.000     0.006     0.006     0.006     0.006     0.009     4.611
      65,536  numpy            100     0.028     0.003     0.025     0.026     0.026     0.032     0.033     1.000

     262,144  pytorch          100     0.007     0.001     0.007     0.007     0.007     0.007     0.010    16.117
     262,144  numpy            100     0.117     0.003     0.100     0.117     0.117     0.118     0.123     1.000

   1,048,576  pytorch           10     0.009     0.000     0.009     0.009     0.009     0.009     0.009    55.791
   1,048,576  numpy             10     0.516     0.010     0.507     0.509     0.512     0.519     0.541     1.000

   4,194,304  pytorch           10     0.023     0.001     0.023     0.023     0.023     0.023     0.025    94.396
   4,194,304  numpy             10     2.174     0.010     2.150     2.171     2.173     2.178     2.189     1.000

(time in wall seconds, less is better)