Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp implementation of parEvalMap, fixing numerous issues and increasing performance #2673

Merged
merged 8 commits into from
Dec 3, 2021

Conversation

nikiforo
Copy link
Contributor

@nikiforo nikiforo commented Oct 9, 2021

Resolves issues #2297, #2686, #2195, #2726

@nikiforo
Copy link
Contributor Author

roughly shows 5x performance boost in the benchmark

benchmark/ jmh:run -i 5 -wi 3 -f1 -t1 fs2.benchmark.ParEvalMapBenchmark

original

[info] ParEvalMapBenchmark.evalMap                10     100  thrpt    5  16597,710 ± 5339,488  ops/s
[info] ParEvalMapBenchmark.evalMap                10   10000  thrpt    5    263,320 ±   11,201  ops/s
[info] ParEvalMapBenchmark.evalMap               100     100  thrpt    5  19980,999 ±  458,301  ops/s
[info] ParEvalMapBenchmark.evalMap               100   10000  thrpt    5    319,355 ±   11,039  ops/s
[info] ParEvalMapBenchmark.parEvalMap10           10     100  thrpt    5    127,434 ±   83,615  ops/s
[info] ParEvalMapBenchmark.parEvalMap10           10   10000  thrpt    5      1,618 ±    0,104  ops/s
[info] ParEvalMapBenchmark.parEvalMap10          100     100  thrpt    5    146,220 ±    9,165  ops/s
[info] ParEvalMapBenchmark.parEvalMap10          100   10000  thrpt    5      1,661 ±    0,166  ops/s

current

[info] Benchmark                         (chunkSize)  (size)   Mode  Cnt      Score       Error  Units
[info] ParEvalMapBenchmark.evalMap                10     100  thrpt    5  18323,931 ±   322,369  ops/s
[info] ParEvalMapBenchmark.evalMap                10   10000  thrpt    5    262,852 ±    65,935  ops/s
[info] ParEvalMapBenchmark.evalMap               100     100  thrpt    5  15407,024 ± 18788,319  ops/s
[info] ParEvalMapBenchmark.evalMap               100   10000  thrpt    5    252,926 ±     5,626  ops/s
[info] ParEvalMapBenchmark.parEvalMap10           10     100  thrpt    5    680,162 ±    35,416  ops/s
[info] ParEvalMapBenchmark.parEvalMap10           10   10000  thrpt    5      8,157 ±     0,429  ops/s
[info] ParEvalMapBenchmark.parEvalMap10          100     100  thrpt    5    717,905 ±    40,958  ops/s
[info] ParEvalMapBenchmark.parEvalMap10          100   10000  thrpt    5      7,857 ±     0,469  ops/s

@nikiforo
Copy link
Contributor Author

Seems, that the last commit fixes #2686

These are the logs:

going to sleep for 10 seconds
going to sleep for 3 seconds
woken up after 3 seconds
woken up after 10 seconds
going to sleep for 2 seconds
after mapAsync/parEvalMap 10
after mapAsync/parEvalMap 3
going to sleep for 1 second
woken up after 1 second
woken up after 2 seconds
after mapAsync/parEvalMap 2
going to sleep for 1 second
after mapAsync/parEvalMap 1
woken up after 1 second
after mapAsync/parEvalMap 1

@nikiforo nikiforo marked this pull request as ready for review October 25, 2021 15:56
@diesalbla
Copy link
Contributor

diesalbla commented Nov 3, 2021

So, I have rerun the benchmarks. I used the command

benchmark/ jmh:run -i 10 -wi 5 -f1 -t4 -gc true -jvmArgs "-XX:MaxRecursiveInlineLevel=3 -XX:MaxInlineSize=50" fs2.benchmark.ParEvalMapBenchmark

Since in this case we are concerned about parEvalMap, I saw it better to run the benchmarks with more threads. Also, since the measures by @nikiforo had a bit too much error, I added more measuring and warmup iterations.

Current Version (main):

Benchmark              (chunkSize)  (size)   Mode  Cnt      Score      Error  Units
evalMap                         10     100  thrpt   10  47488.576 ± 3183.148  ops/s
evalMap                         10   10000  thrpt   10    801.994 ±   80.927  ops/s
evalMap                        100     100  thrpt   10  50132.632 ± 7662.933  ops/s
evalMap                        100   10000  thrpt   10    882.255 ±   44.519  ops/s
parEvalMapUnordered10           10     100  thrpt   10   1595.386 ±  172.605  ops/s
parEvalMapUnordered10           10   10000  thrpt   10     18.865 ±    1.011  ops/s
parEvalMapUnordered10          100     100  thrpt   10   1798.733 ±   65.134  ops/s
parEvalMapUnordered10          100   10000  thrpt   10     19.772 ±    0.609  ops/s

New version (branch)

Benchmark              (chunkSize)  (size)   Mode  Cnt      Score      Error  Units
evalMap                         10     100  thrpt   10  52744.008 ± 4044.386  ops/s
evalMap                         10   10000  thrpt   10    844.634 ±   37.255  ops/s
evalMap                        100     100  thrpt   10  55705.254 ± 1655.890  ops/s
evalMap                        100   10000  thrpt   10    998.729 ±   36.727  ops/s

parEvalMapUnordered10           10     100  thrpt   10   1435.734 ±  209.149  ops/s
parEvalMapUnordered10           10   10000  thrpt   10     16.003 ±    2.574  ops/s
parEvalMapUnordered10          100     100  thrpt   10   1303.686 ±   56.432  ops/s
parEvalMapUnordered10          100   10000  thrpt   10     12.682 ±    2.126  ops/s

These results seem to indicate that the new implementation is faster with the evalMap function, but slower with the parEvalMapUnordered10.

@nikiforo
Copy link
Contributor Author

nikiforo commented Nov 3, 2021

Great, many thanks!
However, I don’t see the results of the optimized function in your benchmark output.
Benchmark function evalMap was added as a reference point. In other words, it can be used to adjust different benchmark runs by evalMap results. The PR targets parEvalMap, which performance is measured by parEvalMap10 benchmark function.
I’ve also rerun the benchmark that compares new parEvalMap with the main implementation using your command.

benchmark/ jmh:run -i 10 -wi 5 -f1 -t4 -gc true -jvmArgs "-XX:MaxRecursiveInlineLevel=3 -XX:MaxInlineSize=50" fs2.benchmark.ParEvalMapBenchmark

Current Version (main e7bcbb1):

[info] ParEvalMapBenchmark.evalMap                         10     100  thrpt   10  44379,438 ± 4398,278  ops/s
[info] ParEvalMapBenchmark.evalMap                         10   10000  thrpt   10    742,395 ±   27,320  ops/s
[info] ParEvalMapBenchmark.evalMap                        100     100  thrpt   10  50553,217 ± 3633,609  ops/s
[info] ParEvalMapBenchmark.evalMap                        100   10000  thrpt   10    962,177 ±   26,027  ops/s
[info] ParEvalMapBenchmark.parEvalMap10                    10     100  thrpt   10    212,248 ±   12,463  ops/s
[info] ParEvalMapBenchmark.parEvalMap10                    10   10000  thrpt   10      2,127 ±    0,137  ops/s
[info] ParEvalMapBenchmark.parEvalMap10                   100     100  thrpt   10    219,562 ±    2,204  ops/s
[info] ParEvalMapBenchmark.parEvalMap10                   100   10000  thrpt   10      2,203 ±    0,043  ops/s
[info] ParEvalMapBenchmark.parEvalMapUnordered10           10     100  thrpt   10   1781,774 ±   22,137  ops/s
[info] ParEvalMapBenchmark.parEvalMapUnordered10           10   10000  thrpt   10     17,999 ±    1,234  ops/s
[info] ParEvalMapBenchmark.parEvalMapUnordered10          100     100  thrpt   10   1794,607 ±   16,252  ops/s
[info] ParEvalMapBenchmark.parEvalMapUnordered10          100   10000  thrpt   10     19,297 ±    0,754  ops/s

New version(branch):

[info] ParEvalMapBenchmark.evalMap                         10     100  thrpt   10  47426,136 ± 2354,241  ops/s
[info] ParEvalMapBenchmark.evalMap                         10   10000  thrpt   10    735,417 ±   28,730  ops/s
[info] ParEvalMapBenchmark.evalMap                        100     100  thrpt   10  49624,932 ± 2298,322  ops/s
[info] ParEvalMapBenchmark.evalMap                        100   10000  thrpt   10    939,003 ±   52,114  ops/s
[info] ParEvalMapBenchmark.parEvalMap10                    10     100  thrpt   10   1532,008 ±  102,198  ops/s
[info] ParEvalMapBenchmark.parEvalMap10                    10   10000  thrpt   10     17,050 ±    0,490  ops/s
[info] ParEvalMapBenchmark.parEvalMap10                   100     100  thrpt   10   1649,628 ±   53,896  ops/s
[info] ParEvalMapBenchmark.parEvalMap10                   100   10000  thrpt   10     16,495 ±    3,106  ops/s
[info] ParEvalMapBenchmark.parEvalMapUnordered10           10     100  thrpt   10   1353,258 ±   78,360  ops/s
[info] ParEvalMapBenchmark.parEvalMapUnordered10           10   10000  thrpt   10     16,804 ±    1,217  ops/s
[info] ParEvalMapBenchmark.parEvalMapUnordered10          100     100  thrpt   10   1673,306 ±   18,525  ops/s
[info] ParEvalMapBenchmark.parEvalMapUnordered10          100   10000  thrpt   10     15,995 ±    0,490  ops/s

The performance improvement of parEvalMap using your command is even more pronounced(7x). However, as you've already mentioned we loose some performance in parEvalMapUnordered.

@diesalbla diesalbla requested review from SystemFw, mpilquist and diesalbla and removed request for SystemFw and mpilquist November 4, 2021 13:04
@nikiforo
Copy link
Contributor Author

nikiforo commented Nov 11, 2021

@nikiforo
Copy link
Contributor Author

nikiforo commented Nov 16, 2021

Resolves issues #2297, #2686, #2195

benchmark/ jmh:run -i 10 -wi 5 -f1 -t4 -gc true -jvmArgs "-XX:MaxRecursiveInlineLevel=3 -XX:MaxInlineSize=50" fs2.benchmark.ParEvalMapBenchmark
[info] ParEvalMapBenchmark.evalMap                         10     100  thrpt   10  44379,438 ± 4398,278  ops/s
[info] ParEvalMapBenchmark.evalMap                         10   10000  thrpt   10    742,395 ±   27,320  ops/s
[info] ParEvalMapBenchmark.evalMap                        100     100  thrpt   10  50553,217 ± 3633,609  ops/s
[info] ParEvalMapBenchmark.evalMap                        100   10000  thrpt   10    962,177 ±   26,027  ops/s
[info] ParEvalMapBenchmark.parEvalMap10                    10     100  thrpt   10    212,248 ±   12,463  ops/s
[info] ParEvalMapBenchmark.parEvalMap10                    10   10000  thrpt   10      2,127 ±    0,137  ops/s
[info] ParEvalMapBenchmark.parEvalMap10                   100     100  thrpt   10    219,562 ±    2,204  ops/s
[info] ParEvalMapBenchmark.parEvalMap10                   100   10000  thrpt   10      2,203 ±    0,043  ops/s
[info] ParEvalMapBenchmark.parEvalMapUnordered10           10     100  thrpt   10   1781,774 ±   22,137  ops/s
[info] ParEvalMapBenchmark.parEvalMapUnordered10           10   10000  thrpt   10     17,999 ±    1,234  ops/s
[info] ParEvalMapBenchmark.parEvalMapUnordered10          100     100  thrpt   10   1794,607 ±   16,252  ops/s
[info] ParEvalMapBenchmark.parEvalMapUnordered10          100   10000  thrpt   10     19,297 ±    0,754  ops/s
[info] Benchmark                                  (chunkSize)  (size)   Mode  Cnt      Score      Error  Units
[info] ParEvalMapBenchmark.evalMap                         10     100  thrpt   10  49705,203 ±  852,694  ops/s
[info] ParEvalMapBenchmark.evalMap                         10   10000  thrpt   10    810,757 ±   44,645  ops/s
[info] ParEvalMapBenchmark.evalMap                        100     100  thrpt   10  56156,468 ± 1798,545  ops/s
[info] ParEvalMapBenchmark.evalMap                        100   10000  thrpt   10    968,758 ±    8,043  ops/s
[info] ParEvalMapBenchmark.parEvalMap10                    10     100  thrpt   10   1645,180 ±  158,254  ops/s
[info] ParEvalMapBenchmark.parEvalMap10                    10   10000  thrpt   10     16,816 ±    1,512  ops/s
[info] ParEvalMapBenchmark.parEvalMap10                   100     100  thrpt   10   1663,895 ±  111,153  ops/s
[info] ParEvalMapBenchmark.parEvalMap10                   100   10000  thrpt   10     18,461 ±    0,249  ops/s
[info] ParEvalMapBenchmark.parEvalMapUnordered10           10     100  thrpt   10   1404,604 ±   17,503  ops/s
[info] ParEvalMapBenchmark.parEvalMapUnordered10           10   10000  thrpt   10     16,798 ±    0,279  ops/s
[info] ParEvalMapBenchmark.parEvalMapUnordered10          100     100  thrpt   10   1589,831 ±   20,365  ops/s
[info] ParEvalMapBenchmark.parEvalMapUnordered10          100   10000  thrpt   10     16,640 ±    0,949  ops/s

@nikiforo nikiforo marked this pull request as ready for review November 16, 2021 13:31
@nikiforo nikiforo marked this pull request as draft November 16, 2021 20:50
@nikiforo nikiforo marked this pull request as ready for review November 16, 2021 21:13
@nikiforo
Copy link
Contributor Author

#2726 is checked by https://github.com/typelevel/fs2/pull/2673/files#diff-dc1dd19e9cd3fc7eb816b45bf31ef275ff79ea85a4e1473ae58349eb78259434R1214

@nikiforo nikiforo force-pushed the par-eval-v3 branch 2 times, most recently from fc832a9 to 8a3bc7b Compare November 30, 2021 22:14
@mpilquist
Copy link
Member

Test coverage looks great. Thoughts on moving all these tests to a new ParEvalMapSuite.scala?

@mpilquist mpilquist merged commit b8d2c35 into typelevel:main Dec 3, 2021
@mpilquist mpilquist changed the title parEvalMap Revamp implementation of parEvalMap, fixing numerous issues and increasing performance Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants