Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplemented Channel in terms of Queue #2856

Closed

Conversation

djspiewak
Copy link
Member

@djspiewak djspiewak commented Mar 26, 2022

Do not merge. Depends on a snapshot of Cats Effect 3.4 (well, technically could be implemented without the snapshot, but it's better to wait for tryTakeN). The snapshot in question includes the (still unmerged) improvements to Queue on any Async[F] (typelevel/cats-effect#2885 and typelevel/cats-effect#2914), but critically does not include an implementation of typelevel/cats-effect#2890, which is unquestionably the main bottleneck here:

Before:

[info] Benchmark                              (size)   Mode  Cnt     Score     Error  Units
[info] ChannelBenchmark.sendPull                  64  thrpt   10  7348.611 ± 207.064  ops/s
[info] ChannelBenchmark.sendPull                1024  thrpt   10  3850.531 ±  10.141  ops/s
[info] ChannelBenchmark.sendPull               16384  thrpt   10   372.989 ±   1.049  ops/s
[info] ChannelBenchmark.sendPullPar8              64  thrpt   10  9122.642 ±  49.265  ops/s
[info] ChannelBenchmark.sendPullPar8            1024  thrpt   10  3558.006 ±   5.951  ops/s
[info] ChannelBenchmark.sendPullPar8           16384  thrpt   10   254.749 ±   0.650  ops/s
[info] ChannelBenchmark.sendPullParUnlimited      64  thrpt   10  9367.322 ±  25.509  ops/s
[info] ChannelBenchmark.sendPullParUnlimited    1024  thrpt   10   897.318 ±  17.716  ops/s
[info] ChannelBenchmark.sendPullParUnlimited   16384  thrpt   10    39.973 ±   0.367  ops/s

After:

[info] Benchmark                              (size)   Mode  Cnt     Score    Error  Units
[info] ChannelBenchmark.sendPull                  64  thrpt   10  5383.059 ± 42.282  ops/s
[info] ChannelBenchmark.sendPull                1024  thrpt   10  2564.141 ±  5.962  ops/s
[info] ChannelBenchmark.sendPull               16384  thrpt   10   254.210 ±  0.407  ops/s
[info] ChannelBenchmark.sendPullPar8              64  thrpt   10  4840.209 ± 21.118  ops/s
[info] ChannelBenchmark.sendPullPar8            1024  thrpt   10  2427.766 ±  2.502  ops/s
[info] ChannelBenchmark.sendPullPar8           16384  thrpt   10   179.873 ±  0.524  ops/s
[info] ChannelBenchmark.sendPullParUnlimited      64  thrpt   10  7256.407 ± 29.061  ops/s
[info] ChannelBenchmark.sendPullParUnlimited    1024  thrpt   10  1334.681 ±  6.618  ops/s
[info] ChannelBenchmark.sendPullParUnlimited   16384  thrpt   10    77.232 ±  1.446  ops/s

To me, the fact that this is even close to the hand-rolled version is pretty cool, and a slightly less naive stream would probably put it almost on-par even without the optimized tryTakeN. Once we get the latter though this should be head and shoulders above the prior implementation.

Closes #2852

@djspiewak djspiewak marked this pull request as draft March 26, 2022 02:46
@djspiewak djspiewak marked this pull request as ready for review September 29, 2022 18:59
@djspiewak
Copy link
Member Author

Working on this PR uncovered a bug in Cats Effect 3.4.0-RC1 (fixed in typelevel/cats-effect#3180), so this PR is based on a snapshot. If it's accepted, we'll do an RC2.

Comment on lines -283 to -288
// allocate once
@inline private final def closed[A]: Either[Closed, A] = _closed
private[this] final val _closed: Either[Closed, Nothing] = Left(Closed)
private final val rightUnit: Either[Closed, Unit] = Right(())
private final val rightTrue: Either[Closed, Boolean] = Right(true)
private final val rightFalse: Either[Closed, Boolean] = Right(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, why dump all of these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can bring them back. I think I chopped them out a while ago.

Comment on lines 146 to 149
closedR.complete(()).map {
case false => Left(Channel.Closed)
case true => Right(())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also was going to suggest you use the cached left, but I see they're gone 😅

Suggested change
closedR.complete(()).map {
case false => Left(Channel.Closed)
case true => Right(())
}
closedR.complete(()).ifF(Either.unit, Left(Channel.Closed))

Comment on lines 138 to 142
def sendAll: Pipe[F, A, Nothing] =
_.evalMapChunk(send(_))
.takeWhile(_.isRight)
.onComplete(Stream.exec(close.void))
.drain
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was reminded of #2890.

Comment on lines 202 to 203
// you can do this more efficiently, just proves a point
Stream.eval(takeN).repeat.takeWhile(!_.isEmpty).flatMap(Stream.chunk(_))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// you can do this more efficiently, just proves a point
Stream.eval(takeN).repeat.takeWhile(!_.isEmpty).flatMap(Stream.chunk(_))
// you can do this more efficiently, just proves a point
Stream.eval(takeN).repeat.takeWhile(!_.isEmpty).unchunks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice. Is that new?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was one of my first PRs to FS2 :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've honestly wanted that convenience function for so long. I spend my whole life doing it.

@armanbilge
Copy link
Member

Error: Exception in thread "io-compute-4" java.lang.OutOfMemoryError: Requested array size exceeds VM limit
	at cats.effect.std.Queue$UnsafeBounded.<init>(Queue.scala:770)
	at cats.effect.std.Queue$BoundedAsyncQueue.<init>(Queue.scala:412)
	at cats.effect.std.Queue$.boundedForAsync$$anonfun$1(Queue.scala:96)
	at cats.effect.std.Queue$$$Lambda$1222/0x0000000800f5d398.apply(Unknown Source)
	at cats.effect.IOFiber.runLoop(IOFiber.scala:401)
	at cats.effect.IOFiber.execR(IOFiber.scala:1309)
	at cats.effect.IOFiber.run(IOFiber.scala:118)
	at cats.effect.unsafe.WorkerThread.run(WorkerThread.scala:580)

@djspiewak
Copy link
Member Author

Wow. How many elements were you trying to stick into that queue??

Co-authored-by: Arman Bilge <armanbilge@gmail.com>
@armanbilge
Copy link
Member

armanbilge commented Sep 30, 2022

Probably this?

val subscriptions =
Vector
.fill(subs)(topic.subscribeAwait(Int.MaxValue))

def subscribeAwait(maxQueued: Int): Resource[F, Stream[F, A]] =
Resource
.eval(Channel.bounded[F, A](maxQueued))

@armanbilge
Copy link
Member

So it seems like anyone who was using bounded with a large number and expecting it to be allocated lazily is in for a bad time 🙃

@SystemFw SystemFw self-assigned this Sep 30, 2022
@djspiewak
Copy link
Member Author

So it seems like anyone who was using bounded with a large number and expecting it to be allocated lazily is in for a bad time 🙃

Yes. They shouldn't do that. :-P I mean, we can special case large bounds, but genuinely this just isn't a case which should be supported. It only worked before by coincidence.

@djspiewak
Copy link
Member Author

djspiewak commented Oct 3, 2022

This is being caused by typelevel/cats-effect#3187

@djspiewak
Copy link
Member Author

Getting better!

Not quite better across the board yet, but doing really well. Because of how things are optimized, I'm pretty sure that the new implementation gets better the faster the producers run relative to the consumer, though the numbers don't seem to intuitively bear that out. Still fiddling with things, and the upstream Queue has plenty of room for improvement (especially unbounded).

Before

[info] Benchmark                              (size)   Mode  Cnt     Score     Error  Units
[info] ChannelBenchmark.sendPull                  64  thrpt   10  7735.063 ±  33.841  ops/s
[info] ChannelBenchmark.sendPull                1024  thrpt   10  3760.871 ±   9.716  ops/s
[info] ChannelBenchmark.sendPull               16384  thrpt   10   359.674 ±   1.309  ops/s
[info] ChannelBenchmark.sendPullPar8              64  thrpt   10  8533.059 ± 182.194  ops/s
[info] ChannelBenchmark.sendPullPar8            1024  thrpt   10  3464.678 ±   8.370  ops/s
[info] ChannelBenchmark.sendPullPar8           16384  thrpt   10   247.827 ±   2.399  ops/s
[info] ChannelBenchmark.sendPullParUnlimited      64  thrpt   10  8921.664 ±  27.248  ops/s
[info] ChannelBenchmark.sendPullParUnlimited    1024  thrpt   10  1002.150 ±  26.415  ops/s
[info] ChannelBenchmark.sendPullParUnlimited   16384  thrpt   10    39.949 ±   0.363  ops/s

After

[info] Benchmark                              (size)   Mode  Cnt     Score     Error  Units
[info] ChannelBenchmark.sendPull                  64  thrpt   10  8092.902 ± 112.613  ops/s
[info] ChannelBenchmark.sendPull                1024  thrpt   10  1934.434 ±  13.293  ops/s
[info] ChannelBenchmark.sendPull               16384  thrpt   10   144.031 ±   2.505  ops/s
[info] ChannelBenchmark.sendPullPar8              64  thrpt   10  7045.267 ±  37.947  ops/s
[info] ChannelBenchmark.sendPullPar8            1024  thrpt   10  2515.080 ±  13.203  ops/s
[info] ChannelBenchmark.sendPullPar8           16384  thrpt   10   225.750 ±   1.203  ops/s
[info] ChannelBenchmark.sendPullParUnlimited      64  thrpt   10  7241.213 ± 106.143  ops/s
[info] ChannelBenchmark.sendPullParUnlimited    1024  thrpt   10  1468.275 ±  20.139  ops/s
[info] ChannelBenchmark.sendPullParUnlimited   16384  thrpt   10    76.752 ±  10.437  ops/s

@djspiewak
Copy link
Member Author

…and a little better:

[info] Benchmark                              (size)   Mode  Cnt     Score     Error  Units
[info] ChannelBenchmark.sendPull                  64  thrpt   10  8847.457 ±  77.061  ops/s
[info] ChannelBenchmark.sendPull                1024  thrpt   10  1870.886 ±  16.247  ops/s
[info] ChannelBenchmark.sendPull               16384  thrpt   10   132.900 ±   0.244  ops/s
[info] ChannelBenchmark.sendPullPar8              64  thrpt   10  8171.220 ± 166.752  ops/s
[info] ChannelBenchmark.sendPullPar8            1024  thrpt   10  2673.881 ±  26.653  ops/s
[info] ChannelBenchmark.sendPullPar8           16384  thrpt   10   218.006 ±   1.377  ops/s
[info] ChannelBenchmark.sendPullParUnlimited      64  thrpt   10  8604.675 ± 247.214  ops/s
[info] ChannelBenchmark.sendPullParUnlimited    1024  thrpt   10  1415.921 ±  18.295  ops/s
[info] ChannelBenchmark.sendPullParUnlimited   16384  thrpt   10    79.396 ±   2.864  ops/s

core/shared/src/main/scala/fs2/concurrent/Channel.scala Outdated Show resolved Hide resolved
core/shared/src/main/scala/fs2/concurrent/Channel.scala Outdated Show resolved Hide resolved
core/shared/src/main/scala/fs2/concurrent/Channel.scala Outdated Show resolved Hide resolved
core/shared/src/main/scala/fs2/concurrent/Channel.scala Outdated Show resolved Hide resolved
core/shared/src/main/scala/fs2/concurrent/Channel.scala Outdated Show resolved Hide resolved
core/shared/src/main/scala/fs2/concurrent/Channel.scala Outdated Show resolved Hide resolved
core/shared/src/main/scala/fs2/concurrent/Channel.scala Outdated Show resolved Hide resolved
core/shared/src/main/scala/fs2/concurrent/Channel.scala Outdated Show resolved Hide resolved
core/shared/src/main/scala/fs2/concurrent/Channel.scala Outdated Show resolved Hide resolved
core/shared/src/main/scala/fs2/concurrent/Channel.scala Outdated Show resolved Hide resolved
@djspiewak djspiewak marked this pull request as draft October 26, 2022 18:24
@djspiewak
Copy link
Member Author

@SystemFw I think this is ready for another look. Your Ref consolidation advice is very good, and probably actually improved the performance of send (I'm checking now). I also corrected the issue with closed being fulfilled after quiescence rather than upon closure. Additionally, I fixed a bug in CE's Queue.synchronous which was the origin of the broadcast hang.

Depends on a hash snapshot for the time being, hence why this is marked as draft.

@djspiewak
Copy link
Member Author

Updated with benchmark results. The simple cases got faster with the consolidated Ref, but the more complex cases got much slower, and any case involving contention also got noticeably slower:

main

[info] Benchmark                              (size)   Mode  Cnt     Score     Error  Units
[info] ChannelBenchmark.sendPull                  64  thrpt   10  6893.574 ± 125.608  ops/s
[info] ChannelBenchmark.sendPull                1024  thrpt   10  3535.811 ±  39.588  ops/s
[info] ChannelBenchmark.sendPull               16384  thrpt   10   361.574 ±   1.307  ops/s
[info] ChannelBenchmark.sendPullPar8              64  thrpt   10  8176.797 ± 118.350  ops/s
[info] ChannelBenchmark.sendPullPar8            1024  thrpt   10  3407.982 ±  22.908  ops/s
[info] ChannelBenchmark.sendPullPar8           16384  thrpt   10   250.676 ±   1.205  ops/s
[info] ChannelBenchmark.sendPullParUnlimited      64  thrpt   10  8622.935 ±  48.615  ops/s
[info] ChannelBenchmark.sendPullParUnlimited    1024  thrpt   10   925.061 ±  27.972  ops/s
[info] ChannelBenchmark.sendPullParUnlimited   16384  thrpt   10    39.134 ±   0.659  ops/s

This PR

[info] Benchmark                              (size)   Mode  Cnt     Score     Error  Units
[info] ChannelBenchmark.sendPull                  64  thrpt   10  9343.856 ±  42.086  ops/s
[info] ChannelBenchmark.sendPull                1024  thrpt   10  2161.634 ±  12.066  ops/s
[info] ChannelBenchmark.sendPull               16384  thrpt   10    22.743 ±   0.856  ops/s
[info] ChannelBenchmark.sendPullPar8              64  thrpt   10  7830.841 ±  41.508  ops/s
[info] ChannelBenchmark.sendPullPar8            1024  thrpt   10  2175.548 ±  56.683  ops/s
[info] ChannelBenchmark.sendPullPar8           16384  thrpt   10   116.707 ±   2.704  ops/s
[info] ChannelBenchmark.sendPullParUnlimited      64  thrpt   10  8125.314 ± 164.499  ops/s
[info] ChannelBenchmark.sendPullParUnlimited    1024  thrpt   10  1375.947 ±  33.510  ops/s
[info] ChannelBenchmark.sendPullParUnlimited   16384  thrpt   10    54.626 ±   1.014  ops/s

@djspiewak
Copy link
Member Author

Fiddled around some more with the performance cases. I took a flier at porting one of the most dramatic performance deltas from the Cats Effect Queue benchmarks over to Channel. It was very similar to the sendPar benchmarks in terms of results, which suggests a couple conclusions:

  • A lot of CE Queue's performance benefits may be deriving from true multi-consumer cases. This can be verified by expanding the CE benchmark set a bit
  • Ref as a single synchronization point may be imposing some fundamental overhead. This overhead is shared with the old Channel implementation. Another way of parsing this is to conclude that Channel is bottlenecked by its state, rather than by its queue.

I suspect this is still a worthwhile change for a few reasons (e.g. the GC tends to do a lot better with contiguous array structures rather than linked-list equivalents, and this is not well measured in microbenchmarks), and I have some ideas for how to improve it a bit further, but I think we can probably foreclose seeing the kinds of huge performance leaps in Channel that we see in the lower level Queue.

@SystemFw
Copy link
Collaborator

SystemFw commented Nov 1, 2022

I suspect this is still a worthwhile change for a few reasons (e.g. the GC tends to do a lot better with contiguous array structures rather than linked-list equivalents, and this is not well measured in microbenchmarks), and I have some ideas for how to improve it a bit further, but I think we can probably foreclose seeing the kinds of huge performance leaps in Channel that we see in the lower level Queue.

mm, I don't want to be extremely gatekeeper-y, but this is the type of worst case scenario I feared: changing something that works, complicating the implementation (double refs, complex sentinel logic, Any) for unclear perf benefits.

Btw, what do you mean by linked list here? If it the fact that the current impl is based on List? That was benchmarked against a couple of alternatives, but it doesn't seem fundamental one way or another

@nikiforo
Copy link
Contributor

nikiforo commented Nov 2, 2022

That's the benchmark results of List vs Vector: #2751 (comment)

In List version the insert operation during unsuccessful CAS discards only one ListNode, whereas in Vector version several nodes(each with a size of 64-element array) might be discarded. That intuition was behind the change from Vector to List in Channel.

@djspiewak
Copy link
Member Author

djspiewak commented Nov 11, 2022

I did my final set of experiments! Tldr, I'm closing this PR.

So my last set of experiments looked at GC pressure (using -prof gc), under the hypothesis that the benchmarks aren't capturing the heap differences between a queue based on a mutable circular buffer projected into an array vs a large set of immutable List allocations. In theory, the new Queue-based implementation should have lower heap pressure, which would imply that its real-world performance could be dramatically better than the old Ref-based implementation, even if the microbenchmarks don't show a significant difference.

The results were the exact opposite. The implementation on this PR imposes about double the heap pressure, GC count, GC time, and survivor count. I genuinely have no intuition for why, but at this point I think we can safely conclude that the performance of this implementation is somewhere between "worse" and "much worse" than the performance of the old version. Full output attached.

[info] Benchmark                                                               (size)   Mode  Cnt         Score         Error   Units
[info] ChannelBenchmark.sendPull                                                   64  thrpt   10      6782.901 ±      28.760   ops/s
[info] ChannelBenchmark.sendPull:·gc.alloc.rate                                    64  thrpt   10       586.217 ±       2.353  MB/sec
[info] ChannelBenchmark.sendPull:·gc.alloc.rate.norm                               64  thrpt   10     95159.257 ±     198.598    B/op
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Eden_Space                           64  thrpt   10       596.385 ±       9.230  MB/sec
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Eden_Space.norm                      64  thrpt   10     96808.929 ±    1324.711    B/op
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Survivor_Space                       64  thrpt   10         0.004 ±       0.001  MB/sec
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Survivor_Space.norm                  64  thrpt   10         0.615 ±       0.113    B/op
[info] ChannelBenchmark.sendPull:·gc.count                                         64  thrpt   10       412.000                counts
[info] ChannelBenchmark.sendPull:·gc.time                                          64  thrpt   10       218.000                    ms
[info] ChannelBenchmark.sendPull                                                 1024  thrpt   10      3667.399 ±       6.255   ops/s
[info] ChannelBenchmark.sendPull:·gc.alloc.rate                                  1024  thrpt   10      1686.637 ±       2.895  MB/sec
[info] ChannelBenchmark.sendPull:·gc.alloc.rate.norm                             1024  thrpt   10    506374.626 ±      71.439    B/op
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Eden_Space                         1024  thrpt   10      1711.977 ±      11.091  MB/sec
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Eden_Space.norm                    1024  thrpt   10    513983.473 ±    3648.213    B/op
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Survivor_Space                     1024  thrpt   10         0.044 ±       0.003  MB/sec
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Survivor_Space.norm                1024  thrpt   10        13.225 ±       0.752    B/op
[info] ChannelBenchmark.sendPull:·gc.count                                       1024  thrpt   10      1184.000                counts
[info] ChannelBenchmark.sendPull:·gc.time                                        1024  thrpt   10       683.000                    ms
[info] ChannelBenchmark.sendPull                                                16384  thrpt   10       373.175 ±       0.676   ops/s
[info] ChannelBenchmark.sendPull:·gc.alloc.rate                                 16384  thrpt   10      2457.521 ±       4.449  MB/sec
[info] ChannelBenchmark.sendPull:·gc.alloc.rate.norm                            16384  thrpt   10   7250852.186 ±    3199.563    B/op
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Eden_Space                        16384  thrpt   10      2484.281 ±      10.115  MB/sec
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Eden_Space.norm                   16384  thrpt   10   7329808.003 ±   28407.348    B/op
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Survivor_Space                    16384  thrpt   10         0.983 ±       0.063  MB/sec
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Survivor_Space.norm               16384  thrpt   10      2901.019 ±     182.717    B/op
[info] ChannelBenchmark.sendPull:·gc.count                                      16384  thrpt   10      1727.000                counts
[info] ChannelBenchmark.sendPull:·gc.time                                       16384  thrpt   10      1231.000                    ms
[info] ChannelBenchmark.sendPullPar8                                               64  thrpt   10      8470.950 ±      53.434   ops/s
[info] ChannelBenchmark.sendPullPar8:·gc.alloc.rate                                64  thrpt   10      1082.979 ±       5.580  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.alloc.rate.norm                           64  thrpt   10    140765.782 ±     183.785    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Eden_Space                       64  thrpt   10      1101.575 ±       6.918  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Eden_Space.norm                  64  thrpt   10    143183.823 ±     980.898    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Survivor_Space                   64  thrpt   10         0.038 ±       0.005  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Survivor_Space.norm              64  thrpt   10         4.947 ±       0.622    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.count                                     64  thrpt   10       761.000                counts
[info] ChannelBenchmark.sendPullPar8:·gc.time                                      64  thrpt   10       431.000                    ms
[info] ChannelBenchmark.sendPullPar8                                             1024  thrpt   10      3495.435 ±      17.577   ops/s
[info] ChannelBenchmark.sendPullPar8:·gc.alloc.rate                              1024  thrpt   10      1927.029 ±       9.210  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.alloc.rate.norm                         1024  thrpt   10    607009.446 ±     284.258    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Eden_Space                     1024  thrpt   10      1959.943 ±      15.296  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Eden_Space.norm                1024  thrpt   10    617375.244 ±    2903.127    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Survivor_Space                 1024  thrpt   10         0.106 ±       0.012  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Survivor_Space.norm            1024  thrpt   10        33.493 ±       3.890    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.count                                   1024  thrpt   10      1354.000                counts
[info] ChannelBenchmark.sendPullPar8:·gc.time                                    1024  thrpt   10       779.000                    ms
[info] ChannelBenchmark.sendPullPar8                                            16384  thrpt   10       244.234 ±       1.311   ops/s
[info] ChannelBenchmark.sendPullPar8:·gc.alloc.rate                             16384  thrpt   10      4121.189 ±       9.629  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.alloc.rate.norm                        16384  thrpt   10  18579061.185 ±   75647.808    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Eden_Space                    16384  thrpt   10      4173.111 ±      14.027  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Eden_Space.norm               16384  thrpt   10  18813089.512 ±   60135.784    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Survivor_Space                16384  thrpt   10         1.426 ±       0.089  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Survivor_Space.norm           16384  thrpt   10      6429.944 ±     386.002    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.count                                  16384  thrpt   10      2892.000                counts
[info] ChannelBenchmark.sendPullPar8:·gc.time                                   16384  thrpt   10      1812.000                    ms
[info] ChannelBenchmark.sendPullParUnlimited                                       64  thrpt   10      8805.282 ±      41.722   ops/s
[info] ChannelBenchmark.sendPullParUnlimited:·gc.alloc.rate                        64  thrpt   10      1312.361 ±       5.923  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.alloc.rate.norm                   64  thrpt   10    164104.123 ±     121.178    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Eden_Space               64  thrpt   10      1333.176 ±       6.924  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Eden_Space.norm          64  thrpt   10    166708.220 ±    1118.023    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Survivor_Space           64  thrpt   10         0.072 ±       0.011  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Survivor_Space.norm      64  thrpt   10         8.992 ±       1.354    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.count                             64  thrpt   10       921.000                counts
[info] ChannelBenchmark.sendPullParUnlimited:·gc.time                              64  thrpt   10       531.000                    ms
[info] ChannelBenchmark.sendPullParUnlimited                                     1024  thrpt   10       897.736 ±      10.136   ops/s
[info] ChannelBenchmark.sendPullParUnlimited:·gc.alloc.rate                      1024  thrpt   10      2374.364 ±      26.982  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.alloc.rate.norm                 1024  thrpt   10   2912092.936 ±    5023.615    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Eden_Space             1024  thrpt   10      2410.847 ±      26.040  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Eden_Space.norm        1024  thrpt   10   2956855.055 ±   13186.708    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Survivor_Space         1024  thrpt   10         2.605 ±       0.234  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Survivor_Space.norm    1024  thrpt   10      3194.565 ±     290.602    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.count                           1024  thrpt   10      1671.000                counts
[info] ChannelBenchmark.sendPullParUnlimited:·gc.time                            1024  thrpt   10      1360.000                    ms
[info] ChannelBenchmark.sendPullParUnlimited                                    16384  thrpt   10        41.062 ±       0.506   ops/s
[info] ChannelBenchmark.sendPullParUnlimited:·gc.alloc.rate                     16384  thrpt   10      2295.805 ±      44.861  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.alloc.rate.norm                16384  thrpt   10  61565539.220 ± 1750956.837    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Eden_Space            16384  thrpt   10      2322.799 ±      46.934  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Eden_Space.norm       16384  thrpt   10  62290026.498 ± 1852356.934    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Old_Gen               16384  thrpt   10         1.547 ±       3.829  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Old_Gen.norm          16384  thrpt   10     41295.132 ±  102080.320    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Survivor_Space        16384  thrpt   10        41.170 ±      11.103  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Survivor_Space.norm   16384  thrpt   10   1104620.006 ±  304109.883    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.count                          16384  thrpt   10      1992.000                counts
[info] ChannelBenchmark.sendPullParUnlimited:·gc.time                           16384  thrpt   10      8442.000                    ms
[info] Benchmark                                                               (size)   Mode  Cnt         Score          Error   Units
[info] ChannelBenchmark.sendPull                                                   64  thrpt   10     10039.570 ±      107.793   ops/s
[info] ChannelBenchmark.sendPull:·gc.alloc.rate                                    64  thrpt   10      1170.589 ±        8.503  MB/sec
[info] ChannelBenchmark.sendPull:·gc.alloc.rate.norm                               64  thrpt   10    128382.008 ±      536.542    B/op
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Eden_Space                           64  thrpt   10      1189.875 ±        9.231  MB/sec
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Eden_Space.norm                      64  thrpt   10    130498.660 ±     1192.881    B/op
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Survivor_Space                       64  thrpt   10         0.006 ±        0.001  MB/sec
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Survivor_Space.norm                  64  thrpt   10         0.619 ±        0.113    B/op
[info] ChannelBenchmark.sendPull:·gc.count                                         64  thrpt   10       822.000                 counts
[info] ChannelBenchmark.sendPull:·gc.time                                          64  thrpt   10       423.000                     ms
[info] ChannelBenchmark.sendPull                                                 1024  thrpt   10      2145.795 ±       52.664   ops/s
[info] ChannelBenchmark.sendPull:·gc.alloc.rate                                  1024  thrpt   10      3040.809 ±       82.097  MB/sec
[info] ChannelBenchmark.sendPull:·gc.alloc.rate.norm                             1024  thrpt   10   1560268.777 ±     4275.598    B/op
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Eden_Space                         1024  thrpt   10      3090.520 ±       87.175  MB/sec
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Eden_Space.norm                    1024  thrpt   10   1585760.608 ±     7824.988    B/op
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Survivor_Space                     1024  thrpt   10         0.071 ±        0.007  MB/sec
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Survivor_Space.norm                1024  thrpt   10        36.608 ±        3.815    B/op
[info] ChannelBenchmark.sendPull:·gc.count                                       1024  thrpt   10      2136.000                 counts
[info] ChannelBenchmark.sendPull:·gc.time                                        1024  thrpt   10      1178.000                     ms
[info] ChannelBenchmark.sendPull                                                16384  thrpt   10       156.244 ±        5.478   ops/s
[info] ChannelBenchmark.sendPull:·gc.alloc.rate                                 16384  thrpt   10      3572.497 ±      130.383  MB/sec
[info] ChannelBenchmark.sendPull:·gc.alloc.rate.norm                            16384  thrpt   10  25174763.792 ±    60788.092    B/op
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Eden_Space                        16384  thrpt   10      3610.699 ±      135.223  MB/sec
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Eden_Space.norm                   16384  thrpt   10  25443633.606 ±    93383.333    B/op
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Survivor_Space                    16384  thrpt   10         1.177 ±        0.081  MB/sec
[info] ChannelBenchmark.sendPull:·gc.churn.G1_Survivor_Space.norm               16384  thrpt   10      8291.601 ±      415.727    B/op
[info] ChannelBenchmark.sendPull:·gc.count                                      16384  thrpt   10      2506.000                 counts
[info] ChannelBenchmark.sendPull:·gc.time                                       16384  thrpt   10      1656.000                     ms
[info] ChannelBenchmark.sendPullPar8                                               64  thrpt   10      9132.617 ±       81.065   ops/s
[info] ChannelBenchmark.sendPullPar8:·gc.alloc.rate                                64  thrpt   10      1474.601 ±        9.031  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.alloc.rate.norm                           64  thrpt   10    177783.311 ±      515.665    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Eden_Space                       64  thrpt   10      1499.647 ±       11.306  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Eden_Space.norm                  64  thrpt   10    180804.322 ±     1473.409    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Survivor_Space                   64  thrpt   10         0.053 ±        0.006  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Survivor_Space.norm              64  thrpt   10         6.441 ±        0.745    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.count                                     64  thrpt   10      1036.000                 counts
[info] ChannelBenchmark.sendPullPar8:·gc.time                                      64  thrpt   10       556.000                     ms
[info] ChannelBenchmark.sendPullPar8                                             1024  thrpt   10      2397.046 ±        6.897   ops/s
[info] ChannelBenchmark.sendPullPar8:·gc.alloc.rate                              1024  thrpt   10      4018.816 ±       16.190  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.alloc.rate.norm                         1024  thrpt   10   1845989.082 ±     5782.995    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Eden_Space                     1024  thrpt   10      4083.333 ±       16.129  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Eden_Space.norm                1024  thrpt   10   1875622.413 ±     3897.232    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Survivor_Space                 1024  thrpt   10         0.200 ±        0.009  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Survivor_Space.norm            1024  thrpt   10        91.671 ±        4.004    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.count                                   1024  thrpt   10      2821.000                 counts
[info] ChannelBenchmark.sendPullPar8:·gc.time                                    1024  thrpt   10      1570.000                     ms
[info] ChannelBenchmark.sendPullPar8                                            16384  thrpt   10       119.866 ±        2.194   ops/s
[info] ChannelBenchmark.sendPullPar8:·gc.alloc.rate                             16384  thrpt   10      6952.255 ±      304.750  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.alloc.rate.norm                        16384  thrpt   10  63850898.099 ±  1863356.851    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Eden_Space                    16384  thrpt   10      7026.122 ±      306.869  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Eden_Space.norm               16384  thrpt   10  64529395.444 ±  1874113.296    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Survivor_Space                16384  thrpt   10         3.007 ±        0.216  MB/sec
[info] ChannelBenchmark.sendPullPar8:·gc.churn.G1_Survivor_Space.norm           16384  thrpt   10     27614.071 ±     1674.822    B/op
[info] ChannelBenchmark.sendPullPar8:·gc.count                                  16384  thrpt   10      4870.000                 counts
[info] ChannelBenchmark.sendPullPar8:·gc.time                                   16384  thrpt   10      2837.000                     ms
[info] ChannelBenchmark.sendPullParUnlimited                                       64  thrpt   10      8977.668 ±       39.735   ops/s
[info] ChannelBenchmark.sendPullParUnlimited:·gc.alloc.rate                        64  thrpt   10      1572.078 ±        6.945  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.alloc.rate.norm                   64  thrpt   10    192805.665 ±      704.396    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Eden_Space               64  thrpt   10      1598.066 ±       15.296  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Eden_Space.norm          64  thrpt   10    195991.914 ±     1502.941    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Survivor_Space           64  thrpt   10         0.076 ±        0.012  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Survivor_Space.norm      64  thrpt   10         9.364 ±        1.526    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.count                             64  thrpt   10      1104.000                 counts
[info] ChannelBenchmark.sendPullParUnlimited:·gc.time                              64  thrpt   10       612.000                     ms
[info] ChannelBenchmark.sendPullParUnlimited                                     1024  thrpt   10      1344.809 ±       10.434   ops/s
[info] ChannelBenchmark.sendPullParUnlimited:·gc.alloc.rate                      1024  thrpt   10      3693.881 ±       29.326  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.alloc.rate.norm                 1024  thrpt   10   3024335.518 ±     3888.827    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Eden_Space             1024  thrpt   10      3747.242 ±       30.733  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Eden_Space.norm        1024  thrpt   10   3068028.827 ±    11173.430    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Survivor_Space         1024  thrpt   10         4.513 ±        0.270  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Survivor_Space.norm    1024  thrpt   10      3695.416 ±      232.419    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.count                           1024  thrpt   10      2598.000                 counts
[info] ChannelBenchmark.sendPullParUnlimited:·gc.time                            1024  thrpt   10      1957.000                     ms
[info] ChannelBenchmark.sendPullParUnlimited                                    16384  thrpt   10        57.699 ±        3.618   ops/s
[info] ChannelBenchmark.sendPullParUnlimited:·gc.alloc.rate                     16384  thrpt   10      3816.738 ±      840.063  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.alloc.rate.norm                16384  thrpt   10  72530374.170 ± 12985547.163    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Eden_Space            16384  thrpt   10      3862.335 ±      845.905  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Eden_Space.norm       16384  thrpt   10  73398498.755 ± 13043295.117    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Old_Gen               16384  thrpt   10         2.861 ±        3.738  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Old_Gen.norm          16384  thrpt   10     54057.998 ±    70717.960    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Survivor_Space        16384  thrpt   10        89.222 ±       32.824  MB/sec
[info] ChannelBenchmark.sendPullParUnlimited:·gc.churn.G1_Survivor_Space.norm   16384  thrpt   10   1688874.681 ±   581524.809    B/op
[info] ChannelBenchmark.sendPullParUnlimited:·gc.count                          16384  thrpt   10      3295.000                 counts
[info] ChannelBenchmark.sendPullParUnlimited:·gc.time                           16384  thrpt   10     10503.000                     ms

As an aside, I have separately verified that the performance improvements in CE's Queue are not just restricted to MPMC cases, and MPSC is similarly much faster than its old implementation. I'm having a hard time figuring out how to rectify this information with the investigation on this PR.


All in all, I think we can conclude a few things:

  • Ref/Deferred is a startingly and unintuitively fast general method for implementing stuff
  • Channel's performance is bounded mostly by the Ref#modify loop, not by the state management in List
  • Generational GC benefits a lot from the fact that the state objects here are very short lived. I think this is where this PR really fell behind: most of the heavy lifting is handled by data structures which are long-lived
  • The tests in main are woefully insufficient.

I'll open a new PR bringing over the tests from this PR. :-)

@djspiewak djspiewak closed this Nov 11, 2022
djspiewak added a commit to djspiewak/fs2 that referenced this pull request Nov 11, 2022
mpilquist added a commit that referenced this pull request Nov 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reimplement Channel in terms of Queue
4 participants