WeakRB: New and proven SPSC FIFO implementation #2314

uklotzde · 2019-10-04T09:45:33Z

A Weak-consistency Ring Buffer implementing a bounded, lock-free, single-producer/single-consumer FIFO. This could gradually replace the legacy implementation in lib/portaudio. The element type must provide a default constructor and support move assignment.

Used for:

Multi-threaded analysis (replaces MpscFifo)
~~Lock-free CachingReaderWorker (new)~~ deferred
CachingReaderWorker <-> CachingReader (engine thread)
StatsManager
- ...fixing a memory leak and avoiding allocations

References:

daschuer · 2019-10-04T12:04:51Z

While the new fifo looks intersting, I think it is not the best place to use it.
It is because, in the double load situation, we don't want to process old tracks on the queue. We always immediately want to process the last track the user has thrown into the deck.
So I think a ControlValuAtomic based solution fits better.

daschuer · 2019-10-04T12:19:18Z

No ... we don't enven need a ControlValueAtomic. Just a QAtomicPointer, that contains the track to load and a reference track to compare if the track is already loaded.

uklotzde · 2019-10-04T12:29:32Z

No ... we don't enven need a ControlValueAtomic. Just a QAtomicPointer, that contains the track to load and a reference track to compare if the track is already loaded.

pre C++20: https://en.cppreference.com/w/cpp/memory/shared_ptr
since C++20: https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic2

uklotzde · 2019-10-04T12:30:33Z

I'll remove the commit from this PR.

uklotzde · 2019-10-05T11:22:56Z

Getting all the modulo calculations right was tricky. The new interface includes some nice convenience methods.

daschuer · 2019-10-08T06:38:55Z

This is really an interesting topic. Unfortunately it disproves some of my assumptions about the use of the volatile keyword, which are also part of the Mixxx source.
https://stackoverflow.com/questions/26307071/does-the-c-volatile-keyword-introduce-a-memory-fence

Does it mean that the original PaUtilRingBuffer is not guaranteed to work, because memory fences are not guaranteed with volatile? Scary.

The Lamport queue from https://www.di.ens.fr/~guatto/papers/sbac13.pdf has also no memory explicit fences. The author assumes sequential access. How is this related?

uklotzde · 2019-10-08T06:52:40Z

volatile is useless for synchronization purposes. It guarantees that data is actually written to an address, i.e. required for memory-mapped access.

The PortAudio ring buffer uses explicit memory barriers, probably an implementation variant of the mentioned Lamport queue.

daschuer

can we make use of https://en.cppreference.com/w/cpp/memory/uninitialized_move
and avoid to initially fill the cue with default constructed objects?

daschuer · 2019-10-08T07:53:50Z

src/util/spscfifo.h

+        if (writable > 0) {
+            DEBUG_ASSERT(writable < m_size);
+            for (size_type i = 0; i < writable; ++i) {
+                m_data[(m_pfront + i) % m_size] = std::move(data[i]);


Here we move the object out of the callers memory. I guess that is surprising and should be avoided.
Can we use value_type&& for the move type?
For me it is also unclear how the compiler distinguishes both push variants. Can we use different names to make it obvious?

(m_pfront + i) % m_size fails in case of overflow and max size_type is not dividable by m_size.

I've fixed the overflow detection, adjusting the capacity if necessary.

The const vs. non-const overload is chosen be the argument type. I have renamed the copying push operations to distinguish them explicitly. By default push moves values into the queue while pop moves values out of the queue. Moving is supposed to be more general and efficient than copying.

daschuer · 2019-10-08T08:11:04Z

src/util/spscfifo.h

+            for (size_type i = 0; i < writable; ++i) {
+                m_data[(m_pfront + i) % m_size] = data[i];
+            }
+            m_pfront = (m_pfront + writable) % m_size;


why we store here "relaxed" and "release" short after.

I think we should make sure to read and write the atomics only once par fifo access and cache their values in a local variable during usage.

m_cback and m_pfront are already non-shared, cached values. Both members are documented accordingly, that's the basic idea of the algorithm.

Introducing a local variable is not necessary, let's leave this optimization to the compiler.

It is IMHO necessary, to better understand the code. And to have a performance gain independent from the used comilerer.

The atomic type gurantees that the value is actually written and not optimized away. So we must not use it if optimization is desired.

In this case I thing = is overloaded with load which defaults to a sequential store.

It could be also interesting, if the new code actually is faster. Sometimes trivial quirks destroys a theoretical performance gain making this PR pointless.

In this case a memory fence is IMHO negligible compared to calling a sequential constructor vs. to a vectorized memcpy.

Again: m_pfront and m_cback are simple integers, not atomics. The atomics m_front and m_back are accessed through explicit load and store operations.

The use cases were this new FIFO is used only push or pop single messages into/from the queue, not for plain audio data. The memcpy is dangerous if those types are not PODs. Safety first. For PODs the compiler might be able to optimize the loops away. If this is prevented by the modulo operation then two separate loops could be used.

The existing code already contains a memory leak, because the detached alloc/free operations of the POD message type are used incorrectly. And this is only a minor bug.

Ok, now I see the issue.
So we store the same value redundant in two variables?
That should be fixed.

That's the point of the algorithm!!!

uklotzde · 2019-10-08T19:42:56Z

I've continued with the original Rust naming style after starting it this way out of habit. Negotiable.

daschuer · 2019-10-09T10:34:11Z

src/util/spscfifo.h

+              m_cback(0) {
+        DEBUG_ASSERT(m_size > capacity);
+        DEBUG_ASSERT(m_size < m_size + capacity); // no overflow
+        DEBUG_ASSERT(std::atomic_is_lock_free(&m_front));


m_front.is_lock_free() will look nicer.

daschuer · 2019-10-09T10:38:10Z

src/util/spscfifo.h

+    typedef S size_type;
+
+    explicit SpscFifo(size_type capacity)
+            : m_size(adjust_capacity(capacity) + 1),


What is the use for the extra field?
In one use, we rely on the fact that only one element is queued. Does this fail now?

The use of the cue is there ugly anyway. I think we should fix the user code as well.

daschuer · 2019-10-09T10:39:00Z

src/analyzer/analyzerthread.cpp

@@ -83,7 +83,7 @@ AnalyzerThread::AnalyzerThread(
          m_dbConnectionPool(std::move(dbConnectionPool)),
          m_pConfig(std::move(pConfig)),
          m_modeFlags(modeFlags),
-          m_nextTrack(MpscFifoConcurrency::SingleProducer),
+          m_nextTrack(1),


I think we should not use a queue here.

daschuer · 2019-10-09T11:06:12Z

src/util/spscfifo.h

+
+  private:
+    static size_type adjust_capacity(size_type capacity) {
+        VERIFY_OR_DEBUG_ASSERT(capacity <= std::numeric_limits<size_type>::max() / 2) {


Why we limit the capacity to half the possible size?

The % trick only works if there is no rest after an overflow. We need to adjust the size for this.

Let's say size_type is char.
We have possible index 0...255
If capacity is 100 we have 100 + 1 (vor whatever reason)
Index 255 % 101 = 53
Index (255 + 1) % 101 = 0

If we force the capacity to 128, it works
Index 255 % 128 = 127 (last index)
Index (255 + 1) % 127 = 0 (first index)

This is done here:

mixxx/src/util/fifo.h

Line 19 in 2944e2f

size = roundUpToPowerOf2(size);

daschuer · 2019-10-09T11:15:42Z

What is the main point of this PR?
Just fixing a memory leak?

I think if we introduce our own FIFO here, It should be fully optimized and real greate.

uklotzde added this to the 2.3.0 milestone Oct 4, 2019

uklotzde changed the title ~~WeakRB: New and proven SPSC FIFO implementation~~ [WiP] WeakRB: New and proven SPSC FIFO implementation Oct 4, 2019

uklotzde changed the title ~~[WiP] WeakRB: New and proven SPSC FIFO implementation~~ WeakRB: New and proven SPSC FIFO implementation Oct 5, 2019

uklotzde added 7 commits October 8, 2019 08:13

New and proven SpscFifo implementation (WeakRB)

ae7b97d

Use new SpscFifo in CachingReader

91327b7

Use SpscFifo in WorkerThreadScheduler

ff727f1

Prevent copy/move construction

86804e1

Fix modulo index operations

0a72220

Use SpscFifo for StatsPipe

75d34b5

Fix memory leak caused by dropped StatReports

5bfe11b

daschuer requested changes Oct 8, 2019

View reviewed changes

uklotzde added 2 commits October 8, 2019 21:26

Add capacity overflow checks and prevent overflow

5cd7981

Give overloaded, copying push operations a separate name

969defb

daschuer requested changes Oct 9, 2019

View reviewed changes

uklotzde closed this Oct 9, 2019

uklotzde mentioned this pull request Oct 13, 2019

Add a bounded SPSC queue crossbeam-rs/crossbeam#338

Open

uklotzde deleted the weakrb branch October 17, 2019 22:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WeakRB: New and proven SPSC FIFO implementation #2314

WeakRB: New and proven SPSC FIFO implementation #2314

uklotzde commented Oct 4, 2019 •

edited

Loading

daschuer commented Oct 4, 2019

daschuer commented Oct 4, 2019

uklotzde commented Oct 4, 2019

uklotzde commented Oct 4, 2019

uklotzde commented Oct 5, 2019

daschuer commented Oct 8, 2019

uklotzde commented Oct 8, 2019

daschuer left a comment

daschuer Oct 8, 2019

daschuer Oct 8, 2019

uklotzde Oct 8, 2019

uklotzde Oct 8, 2019

daschuer Oct 8, 2019

daschuer Oct 8, 2019

uklotzde Oct 8, 2019

uklotzde Oct 8, 2019

daschuer Oct 9, 2019

uklotzde Oct 9, 2019

daschuer Oct 9, 2019

uklotzde Oct 9, 2019

uklotzde commented Oct 8, 2019

daschuer Oct 9, 2019

daschuer Oct 9, 2019

daschuer Oct 9, 2019

daschuer Oct 9, 2019

daschuer commented Oct 9, 2019

WeakRB: New and proven SPSC FIFO implementation #2314

WeakRB: New and proven SPSC FIFO implementation #2314

Conversation

uklotzde commented Oct 4, 2019 • edited Loading

daschuer commented Oct 4, 2019

daschuer commented Oct 4, 2019

uklotzde commented Oct 4, 2019

uklotzde commented Oct 4, 2019

uklotzde commented Oct 5, 2019

daschuer commented Oct 8, 2019

uklotzde commented Oct 8, 2019

daschuer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uklotzde commented Oct 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daschuer commented Oct 9, 2019

uklotzde commented Oct 4, 2019 •

edited

Loading