Optimize `Mutex` & `AtomicCell` #3346

BalmungSan · 2023-01-05T21:32:49Z

Optimize `Mutex` & `AtomicCell`

This PR adds two alternative implementations of Mutex & AtomicCell based on Async rather than just Concurrent, these implementations should be more efficient.
This is a binary and source-compatible change. Since users don't need to do anything special to get those new implementations, rather if the underlying data type they use support Async; e.g. like IO, they will get the optimized versions automatically.

Benchmark results

Configuration

forks: 2
threads: 1
warm-up iterations: 10
iterations: 10

`Mutex`

Concurrent

Benchmark	(fibers)	(iterations)	Mode	Cnt	Score	Error	Units
MutexBenchmark.happyPathConcurrent	10	1000	thrpt	20	192.475	± 2.296	ops/s
MutexBenchmark.happyPathConcurrent	50	1000	thrpt	20	41.280	± 0.486	ops/s
MutexBenchmark.happyPathConcurrent	100	1000	thrpt	20	19.415	± 0.289	ops/s
-	-	-	-	-	-	-	-
MutexBenchmark.highContentionConcurrent	10	1000	thrpt	20	26.426	± 0.330	ops/s
MutexBenchmark.highContentionConcurrent	50	1000	thrpt	20	6.963	± 0.204	ops/s
MutexBenchmark.highContentionConcurrent	100	1000	thrpt	20	3.750	± 0.055	ops/s
-	-	-	-	-	-	-	-
MutexBenchmark.cancellationConcurrent	10	1000	thrpt	20	20.768	± 1.628	ops/s
MutexBenchmark.cancellationConcurrent	50	1000	thrpt	20	6.835	± 0.065	ops/s
MutexBenchmark.cancellationConcurrent	100	1000	thrpt	20	3.653	± 0.128	ops/s

Async

Benchmark	(fibers)	(iterations)	Mode	Cnt	Score	Error	Units
MutexBenchmark.happyPathAsync	10	1000	thrpt	20	212.839	± 1.069	ops/s
MutexBenchmark.happyPathAsync	50	1000	thrpt	20	46.925	± 1.366	ops/s
MutexBenchmark.happyPathAsync	100	1000	thrpt	20	23.266	± 0.127	ops/s
-	-	-	-	-	-	-	-
MutexBenchmark.highContentionAsync	10	1000	thrpt	20	33.913	± 2.316	ops/s
MutexBenchmark.highContentionAsync	50	1000	thrpt	20	8.227	± 0.082	ops/s
MutexBenchmark.highContentionAsync	100	1000	thrpt	20	4.577	± 0.160	ops/s
-	-	-	-	-	-	-	-
MutexBenchmark.cancellationAsync	10	1000	thrpt	20	24.807	± 0.694	ops/s
MutexBenchmark.cancellationAsync	50	1000	thrpt	20	8.832	± 0.086	ops/s
MutexBenchmark.cancellationAsync	100	1000	thrpt	20	5.034	± 0.027	ops/s

`AtomicCell` (using the respective `Mutex`)

Concurrent

Benchmark	(fibers)	(iterations)	Mode	Cnt	Score	Error	Units
AtomicCellBenchmark.happyPathConcurrent	10	1000	thrpt	20	164.129	± 0.711	ops/s
AtomicCellBenchmark.happyPathConcurrent	50	1000	thrpt	20	34.864	± 0.628	ops/s
AtomicCellBenchmark.happyPathConcurrent	100	1000	thrpt	20	17.276	± 0.024	ops/s
-	-	-	-	-	-	-	-
AtomicCellBenchmark.highContentionConcurrent	10	1000	thrpt	20	27.247	± 1.409	ops/s
AtomicCellBenchmark.highContentionConcurrent	50	1000	thrpt	20	6.315	± 0.073	ops/s
AtomicCellBenchmark.highContentionConcurrent	100	1000	thrpt	20	2.893	± 0.034	ops/s
-	-	-	-	-	-	-	-
AtomicCellBenchmark.cancellationConcurrent	10	1000	thrpt	20	20.799	± 0.854	ops/s
AtomicCellBenchmark.cancellationConcurrent	50	1000	thrpt	20	7.363	± 0.195	ops/s
AtomicCellBenchmark.cancellationConcurrent	100	1000	thrpt	20	3.723	± 0.025	ops/s

Async

Benchmark	(fibers)	(iterations)	Mode	Cnt	Score	Error	Units
AtomicCellBenchmark.happyPathAsync	10	1000	thrpt	20	192.392	± 0.191	ops/s
AtomicCellBenchmark.happyPathAsync	50	1000	thrpt	20	40.638	± 0.139	ops/s
AtomicCellBenchmark.happyPathAsync	100	1000	thrpt	20	18.958	± 0.139	ops/s
-	-	-	-	-	-	-	-
AtomicCellBenchmark.highContentionAsync	10	1000	thrpt	20	29.978	± 0.128	ops/s
AtomicCellBenchmark.highContentionAsync	50	1000	thrpt	20	8.026	± 0.366	ops/s
AtomicCellBenchmark.highContentionAsync	100	1000	thrpt	20	4.632	± 0.045	ops/s
-	-	-	-	-	-	-	-
AtomicCellBenchmark.cancellationAsync	10	1000	thrpt	20	23.468	± 0.547	ops/s
AtomicCellBenchmark.cancellationAsync	50	1000	thrpt	20	8.633	± 0.073	ops/s
AtomicCellBenchmark.cancellationAsync	100	1000	thrpt	20	5.680	± 0.101	ops/s

Conclusions

IMHO the improvements are noticeable, being around a 20% increase in most situations.
Also, AtomicCell results are pretty equivalent; but still slightly better, than the Mutex one. Suggesting that most of the increase comes from the Mutex, still I think the AsyncAtomicCell is worth it.

AtomicCellBenchmarkResults.txt
MutexBenchmarkResults.txt

std/shared/src/main/scala/cats/effect/std/Mutex.scala

armanbilge · 2023-01-06T19:33:07Z

I said "improve" but I am not totally sure this version is better than just delegating to Semaphore

Here's an idea, that I think could be material improvement. What if we extracted the UnsafeUnbounded datastructure introduced for the async Queue into a common place, so that we can use it here?

cats-effect/std/shared/src/main/scala/cats/effect/std/Queue.scala

Line 1042 in 130fdcc

final class UnsafeUnbounded[A] {

We can use the Semaphore-based implementation for Concurrent constraint, but use a runtime check for Async to upgrade to the UnsafeUnbounded implementation.

armanbilge · 2023-01-06T19:56:11Z

We can use the Semaphore-based implementation for Concurrent constraint

Or the implementation proposed in this PR, which is good too :)

Btw: I believe this PR can be targeted at 3.4.x, since it's just internal optimizations (no new APIs)

BalmungSan · 2023-01-07T04:37:28Z

std/shared/src/main/scala/cats/effect/std/Mutex.scala

+    // Cancels a Fiber waiting for the Mutex.
+    private def cancel(thisCB: CB, thisCell: LockCell, previousCell: LockCell): F[Unit] =
+      F.delay {
+        // If we are canceled.
+        // First, we check if the state still contains ourselves,
+        // if that is the case, we swap it with the previousCell.
+        // This ensures any consequent attempt to acquire the Mutex
+        // will register its callback on the appropriate cell.
+        // Additionally, that confirms there is no Fiber
+        // currently waiting for us.
+        if (!state.compareAndSet(thisCell, previousCell)) {
+          // Otherwise,
+          // it means we have a Fiber waiting for us.
+          // Thus, we need to tell the previous cell
+          // to awake that Fiber instead.
+          var nextCB = thisCell.get()
+          while (nextCB eq null) {
+            // There is a tiny fraction of time when
+            // the next cell has acquired ourselves,
+            // but hasn't registered itself yet.
+            // Thus, we spin loop until that happens
+            nextCB = thisCell.get()
+          }
+          if (!previousCell.compareAndSet(thisCB, nextCB)) {
+            // However, in case the previous cell had already completed,
+            // then the Mutex is free and we can awake our waiting fiber.
+            if (nextCB ne null) nextCB.apply(RUnit)
+          }
+        }
+      }
+
+    // Awaits until the Mutex is free.
+    private def await(thisCell: LockCell): F[Unit] =
+      F.asyncCheckAttempt[Unit] { thisCB =>
+        F.delay {
+          val previousCell = state.getAndSet(thisCell)
+
+          if (previousCell eq null) {
+            // If the previous cell was null,
+            // then the Mutex is free.
+            RUnit.asInstanceOf[Either[Option[F[Unit]], Unit]]
+          } else {
+            // Otherwise,
+            // we check again that the previous cell haven't been completed yet,
+            // if not we tell the previous cell to awake us when they finish.
+            if (!previousCell.compareAndSet(null, thisCB)) {
+              // If it was already completed,
+              // then the Mutex is free.
+              RUnit.asInstanceOf[Either[Option[F[Unit]], Unit]]
+            } else {
+              Left(Some(cancel(thisCB, thisCell, previousCell)))
+            }
+          }
+        }
+      }
+
+    // Acquires the Mutex.
+    private def acquire(poll: Poll[F]): F[LockCell] =
+      F.delay(new AtomicReference[CB]()).flatMap { thisCell =>
+        poll(await(thisCell).map(_ => thisCell))
+      }
+
+    // Releases the Mutex.
+    private def release(thisCell: LockCell): F[Unit] =
+      F.delay {
+        // If the state still contains our own cell,
+        // then it means nobody was waiting for the Mutex,
+        // and thus it can be put on a free state again.
+        if (!state.compareAndSet(thisCell, null)) {
+          // Otherwise,
+          // our cell is probably not empty,
+          // we must awake whatever Fiber is waiting for us.
+          val nextCB = thisCell.getAndSet(Sentinel)
+          if (nextCB ne null) nextCB.apply(RUnit)
+        }
+      }


So, the idea of this new implementation is to save the Deferred instantiation, based on the assumption that at any point in time a Fiber is waiting for only one other Fiber, and it is being waited by at most another one.

So, the idea is that we will have:

state which is a mutable reference to the last Cell in the chain.

thisCell which is the chain Cell of the Fiber trying to access the Mutex.

previousCell which is the previous Cell in the chain; maybe the one currently holding the Mutex.

Acquire:
First, we create our thisCell and replace it in the state, retrieving the previousCell; atomically.
We then use F.asyncCheckAttempt to check if the Mutex is currently in use. If that is the case, we register the callback in the previousCell and wait for it. Otherwise, we can just acquire it.

Release:
We call the callback that was registered in our Cell and call it. Considering the possibility that maybe no one is actually waiting for us.

Cancelation:
In the case we were canceled while waiting for the Mutex:

Situation A: No one is waiting for us.
Thus, we only need to ensure that any Fiber that follows awaits the previous cell; they will know how to handle the case if it is already released.

Situation B: Someone is waiting for us.
Then we need to unregister our callback from the previous cell and change it to the callback of the Fiber waiting for us; we need to double check in case the Mutex was released at that moment to rather notify whoever is waiting for us.

I believe this allocates the bare minimum and is concurrent safe.

PS: Thanks a lot to Arman (@armanbilge) for brainstorming with me on this implementation :)

benchmarks/src/main/scala/cats/effect/benchmarks/AtomicCellBenchmark.scala

BalmungSan · 2023-02-05T16:17:13Z

@djspiewak @armanbilge before running the benchmarks I want to confirm with both of you if you think they are appropriate.

djspiewak

Nice work!

benchmarks/src/main/scala/cats/effect/benchmarks/AtomicCellBenchmark.scala

djspiewak · 2023-02-05T19:40:34Z

Let's see the benchmark results but overall lgtm

benchmarks/src/main/scala/cats/effect/benchmarks/AtomicCellBenchmark.scala

BalmungSan · 2023-02-07T14:52:08Z

@djspiewak @armanbilge benchmark results added to the description of the PR! :D

djspiewak · 2023-02-07T23:37:16Z

Benchmarks look compelling! Thank you!

BalmungSan force-pushed the improve-mutex branch 2 times, most recently from e38fe68 to f0827a1 Compare January 5, 2023 21:47

BalmungSan commented Jan 5, 2023

View reviewed changes

std/shared/src/main/scala/cats/effect/std/Mutex.scala Outdated Show resolved Hide resolved

BalmungSan force-pushed the improve-mutex branch from f0827a1 to 1b85b44 Compare January 5, 2023 22:06

BalmungSan mentioned this pull request Jan 5, 2023

ConcurrentAtomicCell #3347

Merged

This comment was marked as resolved.

Sign in to view

armanbilge reviewed Jan 5, 2023

View reviewed changes

std/shared/src/main/scala/cats/effect/std/Mutex.scala Outdated Show resolved Hide resolved

BalmungSan force-pushed the improve-mutex branch from 1b85b44 to 498d4af Compare January 5, 2023 23:30

Improve Mutex implementation

d7c3234

BalmungSan force-pushed the improve-mutex branch from 498d4af to d7c3234 Compare January 6, 2023 00:03

BalmungSan force-pushed the improve-mutex branch 5 times, most recently from 07be7bf to 9dd1b92 Compare January 7, 2023 03:10

BalmungSan requested a review from armanbilge January 7, 2023 03:13

Attempt to improve LockChain

d9d70ba

BalmungSan force-pushed the improve-mutex branch from d4cba8f to 9851058 Compare January 7, 2023 04:36

BalmungSan commented Jan 7, 2023

View reviewed changes

BalmungSan force-pushed the improve-mutex branch 2 times, most recently from 5b8b9d8 to 1c1b674 Compare January 7, 2023 14:45

Re-Add Concurrent based Mutex implementation

02c301e

BalmungSan force-pushed the improve-mutex branch from 1c1b674 to 02c301e Compare January 10, 2023 19:40

BalmungSan added 2 commits February 4, 2023 19:03

Merge branch 'series/3.x' into improve-mutex

1332785

Make helpers private

14e9ef7

BalmungSan force-pushed the improve-mutex branch from 86fd9b5 to 145833c Compare February 5, 2023 00:48

BalmungSan commented Feb 5, 2023

View reviewed changes

benchmarks/src/main/scala/cats/effect/benchmarks/AtomicCellBenchmark.scala Outdated Show resolved Hide resolved

Re-add AsyncAtomicCell

8bdd943

Fix AtomicCell example

6160f7b

BalmungSan force-pushed the improve-mutex branch from 145833c to 6c070e1 Compare February 5, 2023 00:51

Add AtomicCell benchmark

3309ab3

BalmungSan force-pushed the improve-mutex branch from 6c070e1 to 7a75dfd Compare February 5, 2023 16:16

Add Mutex benchmark

bb2da6c

BalmungSan force-pushed the improve-mutex branch from 7a75dfd to bb2da6c Compare February 5, 2023 16:21

BalmungSan changed the title ~~Improve Mutex implementation~~ Optimize Mutex & AtomicCell Feb 5, 2023

djspiewak previously approved these changes Feb 5, 2023

View reviewed changes

benchmarks/src/main/scala/cats/effect/benchmarks/AtomicCellBenchmark.scala Outdated Show resolved Hide resolved

Add Mutex cancellation benchmark

c45759b

BalmungSan dismissed djspiewak’s stale review via c45759b February 5, 2023 22:37

djspiewak reviewed Feb 6, 2023

View reviewed changes

benchmarks/src/main/scala/cats/effect/benchmarks/AtomicCellBenchmark.scala Outdated Show resolved Hide resolved

Fix benchmarks

4f12eda

BalmungSan requested review from djspiewak and armanbilge and removed request for armanbilge and djspiewak February 7, 2023 14:51

Fix formatting

1d7468d

djspiewak closed this Feb 7, 2023

djspiewak reopened this Feb 7, 2023

djspiewak merged commit 2f3ed2a into typelevel:series/3.x Feb 7, 2023

BalmungSan deleted the improve-mutex branch February 7, 2023 23:46

armanbilge mentioned this pull request Feb 9, 2023

Even faster async mutex #3409

Merged

BalmungSan mentioned this pull request Apr 25, 2023

New AsyncMutex implementation #3562

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `Mutex` & `AtomicCell` #3346

Optimize `Mutex` & `AtomicCell` #3346

BalmungSan commented Jan 5, 2023 •

edited

Loading

This comment was marked as resolved.

armanbilge commented Jan 6, 2023

armanbilge commented Jan 6, 2023

BalmungSan Jan 7, 2023

BalmungSan commented Feb 5, 2023

djspiewak left a comment

djspiewak commented Feb 5, 2023

BalmungSan commented Feb 7, 2023

djspiewak commented Feb 7, 2023

Optimize Mutex & AtomicCell #3346

Optimize Mutex & AtomicCell #3346

Conversation

BalmungSan commented Jan 5, 2023 • edited Loading

Optimize Mutex & AtomicCell

Benchmark results

Configuration

Mutex

AtomicCell (using the respective Mutex)

Conclusions

This comment was marked as resolved.

armanbilge commented Jan 6, 2023

armanbilge commented Jan 6, 2023

BalmungSan Jan 7, 2023

Choose a reason for hiding this comment

BalmungSan commented Feb 5, 2023

djspiewak left a comment

Choose a reason for hiding this comment

djspiewak commented Feb 5, 2023

BalmungSan commented Feb 7, 2023

djspiewak commented Feb 7, 2023

Optimize `Mutex` & `AtomicCell` #3346

Optimize `Mutex` & `AtomicCell` #3346

BalmungSan commented Jan 5, 2023 •

edited

Loading

Optimize `Mutex` & `AtomicCell`

`Mutex`

`AtomicCell` (using the respective `Mutex`)