Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast and scalable channels algorithm #3621

Closed
qwwdfsad opened this issue Feb 13, 2023 · 3 comments
Closed

Fast and scalable channels algorithm #3621

qwwdfsad opened this issue Feb 13, 2023 · 3 comments

Comments

@qwwdfsad
Copy link
Collaborator

In #3103 we propose a brand new underlying data structure for channels.
The change is technical and should not affect user-visible invariants, behaviour and API shape.
The full-blown algorithm description and correctness proof can be found here: https://arxiv.org/abs/2211.04986

The rework addresses the following problems:

  • Previous implementation is based on a concurrent double-linked list, which has proven to be incorrect, hard to reason about and always impossible to maintain, meaning that linearizability issues and non-trivial data races cannot be fixed and reasoned about in a predictable manner
  • The implementation also imposes non-trivial limitations on both bytecode and dex size due to its complex implementation details
  • All non-trivial operations are expressed in terms of descriptors (DCAS, N-word CAS) that limit both scalability, correctness and performance characteristic

Additionally, it's well-known that array-based data structures significantly outperform linked structures, while fetch-and-add algorithms outperform CAS-based ones. Both of these facts are acknowledged by the new implementation:

@qwwdfsad
Copy link
Collaborator Author

This issue is also a final straw for the *BroadcastChannel deprecation -- broadcasts were never meant to be complete channels (-> comply its interface) and it's time for us to let them go and stop maintaining them

ndkoval added a commit that referenced this issue Feb 13, 2023
See #3621 for detailed description and rationale

Fixes #3621
@qwwdfsad
Copy link
Collaborator Author

qwwdfsad commented Feb 28, 2023

After #3084 our performance-related outline is the following:

  • Any loops over channel contents (including repeated send as well as receive) are, on average, allocation free, even if every operation suspends). No CancellableContinuation (it's cached in loop's state machine and allocated once), no cancellation handler (its cached in CC instance), no intermediate list nodes (nodes are batched, an allocation every 64 operations)
  • Average operation throughput in sequential scenario for rendezvous is at least 35% higher, while allocation rate is significantly (no number us depends on your workload) lower. It's kind of conservative to assume that amortized channel's influence on system throughput is reduced in half
  • Thanks to optimized data-structure (fadd-based arrays vs "6-CAS DLL"), contended throughput is magnitudes higher. I won't post any numbers, as they are really subjective to benchmark (because we measure stressed out channel, which is unlikely to be realistic in real applications) and looks unrealistic anyway :)

It seems like we have no other low-hangning fruits in channels.

@ndkoval
Copy link
Member

ndkoval commented Mar 2, 2023

Also, a customized ChannelSinkBenchmark for buffered channels (capacity=BUFFERED) suggests x1.7 speed up on my laptop 💪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants