Skip to content

Commit

Permalink
updated documentations
Browse files Browse the repository at this point in the history
  • Loading branch information
bab2min committed Feb 2, 2023
1 parent 743a303 commit d289a61
Show file tree
Hide file tree
Showing 20 changed files with 52 additions and 378 deletions.
163 changes: 32 additions & 131 deletions EigenRand/doc.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
You can get 5~10 times speed by just replacing old Eigen's Random
or unvectorizable c++11 random number generators with EigenRand.
EigenRand currently supports only x86-64 architecture (SSE, AVX, AVX2) and ARM64 NEON (experimental).
EigenRand currently supports only x86-64 architecture (SSE, AVX, AVX2) and ARM64 NEON.
EigenRand is distributed under the MIT License.
Expand Down Expand Up @@ -264,136 +264,37 @@
*
* @page performance Performance
* The following charts show the relative speed-up of EigenRand compared to Reference(C++ std or Eigen functions). Detailed results are below the charts.
@section performance_1 Overview of Results at x86-64 Architecture
\image html perf_no_vect.png
\image html perf_sse2.png
\image html perf_avx.png
\image html perf_avx2.png
\image html perf_mv_part1.png
\image html perf_mv_part2.png
* The following result is a measure of the time in seconds it takes to generate 1M random numbers. It shows the average of 20 times.
@section performance_2 Overview of Results at ARM64 NEON (experimental)
\image html perf_neon_v0.3.90.png
\image html perf_mv_part1_neon_v0.3.90.png
\image html perf_mv_part2_neon_v0.3.90.png
* The following result is a measure of the time in seconds it takes to generate 1M random numbers. It shows the average of 20 times.
@section performance_3 Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz (Ubuntu 16.04, gcc7.5)
| | C++ std (or Eigen) | EigenRand (No Vect.) | EigenRand (SSE2) | EigenRand (SSSE3) | EigenRand (AVX) | EigenRand (AVX2) |
|---|---:|---:|---:|---:|---:|---:|
| `balanced`* | 9.0 | 5.9 | 1.5 | 1.4 | 1.3 | 0.9 |
| `balanced`(double)* | 8.7 | 6.4 | 3.3 | 2.9 | 1.7 | 1.7 |
| `binomial(20, 0.5)` | 400.8 | 118.5 | 32.7 | 36.6 | 30.0 | 22.7 |
| `binomial(50, 0.01)` | 71.7 | 22.5 | 7.7 | 8.3 | 7.9 | 6.6 |
| `binomial(100, 0.75)` | 340.5 | 454.5 | 91.7 | 111.5 | 106.3 | 86.4 |
| `cauchy` | 36.1 | 54.4 | 6.1 | 7.1 | 4.7 | 3.9 |
| `chiSquared` | 80.5 | 249.5 | 64.6 | 58.0 | 29.4 | 28.8 |
| `discrete`(int32) | - | 14.0 | 2.9 | 2.6 | 2.4 | 1.7 |
| `discrete`(fp32) | - | 21.9 | 4.3 | 4.0 | 3.6 | 3.0 |
| `discrete`(fp64) | 72.4 | 21.4 | 6.9 | 6.5 | 4.9 | 3.7 |
| `exponential` | 31.0 | 25.3 | 5.5 | 5.3 | 3.3 | 2.9 |
| `extremeValue` | 66.0 | 60.1 | 11.9 | 10.7 | 6.5 | 5.8 |
| `fisherF(1, 1)` | 178.1 | 35.1 | 33.2 | 39.3 | 22.9 | 18.7 |
| `fisherF(5, 5)` | 141.8 | 415.2 | 136.47 | 172.4 | 92.4 | 74.9 |
| `gamma(0.2, 1)` | 207.8 | 211.4 | 54.6 | 51.2 | 26.9 | 27.0 |
| `gamma(5, 3)` | 80.9 | 60.0 | 14.3 | 13.3 | 11.4 | 8.0 |
| `gamma(10.5, 1)` | 81.1 | 248.6 | 63.3 | 58.5 | 29.2 | 28.4 |
| `geometric` | 43.0 | 22.4 | 6.7 | 7.4 | 5.8 | |
| `lognormal` | 66.3 | 55.4 | 12.8 | 11.8 | 6.2 | 6.2 |
| `negativeBinomial(10, 0.5)` | 312.0 | 301.4 | 82.9 | 100.6 | 95.3 | 77.9 |
| `negativeBinomial(20, 0.25)` | 483.4 | 575.9 | 125.0 | 158.2 | 148.4 | 119.5 |
| `normal(0, 1)` | 38.1 | 28.5 | 6.8 | 6.2 | 3.8 | 3.7 |
| `normal(2, 3)` | 37.6 | 29.0 | 7.3 | 6.6 | 4.0 | 3.9 |
| `poisson(1)` | 31.8 | 25.2 | 9.8 | 10.8 | 9.7 | 8.2 |
| `poisson(16)` | 231.8 | 274.1 | 66.2 | 80.7 | 74.4 | 64.2 |
| `randBits` | 5.2 | 5.4 | 1.4 | 1.3 | 1.1 | 1.0 |
| `studentT(1)` | 122.7 | 120.1 | 15.3 | 19.2 | 12.6 | 9.4 |
| `studentT(20)` | 102.2 | 111.1 | 15.4 | 19.2 | 12.2 | 9.4 |
| `uniformInt(0~63)` | 22.4 | 4.7 | 1.7 | 1.6 | 1.4 | 1.1 |
| `uniformInt(0~100k)` | 21.8 | 10.1 | 6.2 | 6.7 | 6.6 | 5.4 |
| `uniformReal` | 12.9 | 5.7 | 1.4 | 1.2 | 1.4 | 0.7 |
| `weibull` | 41.0 | 35.8 | 17.7 | 15.5 | 8.5 | 8.5 |
* Since there is no equivalent class to `balanced` in C++11 std, we used Eigen::DenseBase::Random instead.
| | C++ std | EigenRand (No Vect.) | EigenRand (SSE2) | EigenRand (SSSE3) | EigenRand (AVX) | EigenRand (AVX2) |
|---|---:|---:|---:|---:|---:|---:|
| Mersenne Twister(int32) | 4.7 | 5.6 | 4.0 | 3.7 | 3.5 | 3.6 |
| Mersenne Twister(int64) | 5.4 | 5.3 | 4.0 | 3.9 | 3.4 | 2.6 |
| | Python 3.6 + scipy 1.5.2 + numpy 1.19.2 | EigenRand (No Vect.) | EigenRand (SSE2) | EigenRand (SSSE3) | EigenRand (AVX) | EigenRand (AVX2) |
|---|---:|---:|---:|---:|---:|---:|
| `Dirichlet(4)` | 6.47 | 6.60 | 2.39 | 2.49 | 1.34 | 1.67 |
| `Dirichlet(100)` | 75.95 | 189.97 | 66.60 | 72.11 | 38.86 | 34.98 |
| `InvWishart(4)` | 140.18 | 7.62 | 4.21 | 4.54 | 3.58 | 3.39 |
| `InvWishart(50)` | 1510.47 | 1737.4 | 697.39 | 733.69 | 604.59 | 554.006 |
| `Multinomial(4, t=20)` | 3.32 | 4.12 | 0.95 | 1.06 | 1.00 | 1.03 |
| `Multinomial(4, t=1000)` | 3.51 | 192.51 | 35.99 | 39.58 | 27.84 | 35.45 |
| `Multinomial(100, t=20)` | 69.19 | 4.80 | 2.00 | 2.20 | 2.28 | 2.09 |
| `Multinomial(100, t=1000)` | 139.74 | 179.43 | 49.48 | 56.19 | 40.78 | 43.18 |
| `MvNormal(4)` | 2.32 | 0.96 | 0.36 | 0.37 | 0.25 | 0.30 |
| `MvNormal(100)` | 49.09 | 57.18 | 17.17 | 18.51 | 10.82 | 11.03 |
| `Wishart(4)` | 71.19 | 5.28 | 2.70 | 2.93 | 2.04 | 1.94 |
| `Wishart(50)` | 1185.26 | 1360.49 | 492.91 | 517.44 | 359.03 | 324.60 |
@section performance_4 AMD Ryzen 7 3700x CPU @ 3.60GHz (Windows 10, MSVC2017)
| | C++ std (or Eigen) | EigenRand (SSE2) | EigenRand (AVX) | EigenRand (AVX2) |
|---|---:|---:|---:|---:|
| `balanced`* | 20.8 | 1.9 | 2.0 | 1.4 |
| `balanced`(double)* | 21.7 | 4.1 | 2.7 | 3.0 |
| `binomial(20, 0.5)` | 416.0 | 27.7 | 28.9 | 29.1 |
| `binomial(50, 0.01)` | 37.8 | 6.3 | 6.0 | 6.6 |
| `binomial(100, 0.75)` | 309.1 | 72.4 | 66.0 | 67.0 |
| `cauchy` | 42.2 | 4.8 | 5.1 | 2.7 |
| `chiSquared` | 153.8 | 33.5 | 21.2 | 17.0 |
| `discrete`(int32) | - | 2.4 | 2.3 | 2.5 |
| `discrete`(fp32) | - | 2.6 | 2.3 | 3.5 |
| `discrete`(fp64) | 55.8 | 5.1 | 4.7 | 4.3 |
| `exponential` | 33.4 | 6.4 | 2.8 | 2.2 |
| `extremeValue` | 39.4 | 7.8 | 4.6 | 4.0 |
| `fisherF(1, 1)` | 103.9 | 25.3 | 14.9 | 11.7 |
| `fisherF(5, 5)` | 295.7 | 85.5 | 58.3 | 44.8 |
| `gamma(0.2, 1)` | 128.8 | 31.9 | 18.3 | 15.8 |
| `gamma(5, 3)` | 156.1 | 9.7 | 8.0 | 5.0 |
| `gamma(10.5, 1)` | 148.5 | 33.1 | 21.1 | 17.2 |
| `geometric` | 27.1 | 6.6 | 4.3 | 4.1 |
| `lognormal` | 104.0 | 6.6 | 4.7 | 3.5 |
| `negativeBinomial(10, 0.5)` | 462.1 | 60.0 | 56.4 | 58.6 |
| `negativeBinomial(20, 0.25)` | 357.6 | 84.5 | 80.6 | 78.4 |
| `normal(0, 1)` | 48.8 | 4.2 | 3.7 | 2.3 |
| `normal(2, 3)` | 48.8 | 4.5 | 3.8 | 2.4 |
| `poisson(1)` | 46.4 | 7.9 | 7.4 | 8.2 |
| `poisson(16)` | 192.4 | 43.2 | 40.4 | 40.9 |
| `randBits` | 4.2 | 1.7 | 1.5 | 1.8 |
| `studentT(1)` | 107.0 | 12.3 | 6.8 | 5.7 |
| `studentT(20)` | 107.1 | 12.3 | 6.8 | 5.8 |
| `uniformInt(0~63)` | 31.2 | 1.1 | 1.0 | 1.2 |
| `uniformInt(0~100k)` | 27.7 | 5.6 | 5.6 | 5.4 |
| `uniformReal` | 30.7 | 1.1 | 1.0 | 0.6 |
| `weibull` | 46.5 | 10.6 | 6.4 | 5.2 |
* Since there is no equivalent class to `balanced` in C++11 std, we used Eigen::DenseBase::Random instead.
| | C++ std | EigenRand (SSE2) | EigenRand (AVX) | EigenRand (AVX2) |
|---|---:|---:|---:|---:|
| Mersenne Twister(int32) | 5.0 | 3.4 | 3.4 | 3.3 |
| Mersenne Twister(int64) | 5.1 | 3.9 | 3.9 | 3.3 |
* The following charts show the relative speed-up of EigenRand compared to references(equivalent functions of C++ std or Eigen for univariate distributions and Scipy for multivariate distributions).
Since there is no equivalent class to `balanced` in C++11 std, we used Eigen::DenseBase::Random instead.
Cases filled with orange are generators that are slower than reference functions.
@section performance_1 Windows 2019, MSVC 19.29.30147, Intel(R) Xeon(R) Platinum 8171M CPU, AVX2, Eigen 3.4.0
\image html perf_avx2_win.png width=80%
\image html perf_avx2_win_mv1.png width=80%
\image html perf_avx2_win_mv2.png width=80%
@section performance_2 Ubuntu 18.04, gcc 7.5.0, Intel(R) Xeon(R) Platinum 8370C CPU, AVX2, Eigen 3.4.0
\image html perf_avx2_ubu.png width=80%
\image html perf_avx2_ubu_mv1.png width=80%
\image html perf_avx2_ubu_mv2.png width=80%
@section performance_3 macOS Monterey 12.2.1, clang 13.1.6, Apple M1 Pro, NEON, Eigen 3.4.0
\image html perf_neon_mac.png width=80%
\image html perf_neon_mac_mv1.png width=80%
\image html perf_neon_mac_mv2.png width=80%
You can see the detailed numerical values used to plot the above charts on the <a href="https://github.com/bab2min/EigenRand/actions/workflows/release.yml" target="_blank">Action Results of GitHub repository</a>.
*
*/
Loading

0 comments on commit d289a61

Please sign in to comment.