-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WeakRB: New and proven SPSC FIFO implementation #2314
Conversation
While the new fifo looks intersting, I think it is not the best place to use it. |
No ... we don't enven need a ControlValueAtomic. Just a QAtomicPointer, that contains the track to load and a reference track to compare if the track is already loaded. |
pre C++20: https://en.cppreference.com/w/cpp/memory/shared_ptr |
I'll remove the commit from this PR. |
Getting all the modulo calculations right was tricky. The new interface includes some nice convenience methods. |
This is really an interesting topic. Unfortunately it disproves some of my assumptions about the use of the volatile keyword, which are also part of the Mixxx source. Does it mean that the original PaUtilRingBuffer is not guaranteed to work, because memory fences are not guaranteed with volatile? Scary. The Lamport queue from https://www.di.ens.fr/~guatto/papers/sbac13.pdf has also no memory explicit fences. The author assumes sequential access. How is this related? |
The PortAudio ring buffer uses explicit memory barriers, probably an implementation variant of the mentioned Lamport queue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make use of https://en.cppreference.com/w/cpp/memory/uninitialized_move
and avoid to initially fill the cue with default constructed objects?
if (writable > 0) { | ||
DEBUG_ASSERT(writable < m_size); | ||
for (size_type i = 0; i < writable; ++i) { | ||
m_data[(m_pfront + i) % m_size] = std::move(data[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we move the object out of the callers memory. I guess that is surprising and should be avoided.
Can we use value_type&& for the move type?
For me it is also unclear how the compiler distinguishes both push variants. Can we use different names to make it obvious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(m_pfront + i) % m_size fails in case of overflow and max size_type is not dividable by m_size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed the overflow detection, adjusting the capacity if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The const vs. non-const overload is chosen be the argument type. I have renamed the copying push operations to distinguish them explicitly. By default push moves values into the queue while pop moves values out of the queue. Moving is supposed to be more general and efficient than copying.
for (size_type i = 0; i < writable; ++i) { | ||
m_data[(m_pfront + i) % m_size] = data[i]; | ||
} | ||
m_pfront = (m_pfront + writable) % m_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we store here "relaxed" and "release" short after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make sure to read and write the atomics only once par fifo access and cache their values in a local variable during usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_cback
and m_pfront
are already non-shared, cached values. Both members are documented accordingly, that's the basic idea of the algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introducing a local variable is not necessary, let's leave this optimization to the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is IMHO necessary, to better understand the code. And to have a performance gain independent from the used comilerer.
The atomic type gurantees that the value is actually written and not optimized away. So we must not use it if optimization is desired.
In this case I thing = is overloaded with load which defaults to a sequential store.
It could be also interesting, if the new code actually is faster. Sometimes trivial quirks destroys a theoretical performance gain making this PR pointless.
In this case a memory fence is IMHO negligible compared to calling a sequential constructor vs. to a vectorized memcpy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again: m_pfront
and m_cback
are simple integers, not atomics. The atomics m_front
and m_back
are accessed through explicit load and store operations.
The use cases were this new FIFO is used only push or pop single messages into/from the queue, not for plain audio data. The memcpy is dangerous if those types are not PODs. Safety first. For PODs the compiler might be able to optimize the loops away. If this is prevented by the modulo operation then two separate loops could be used.
The existing code already contains a memory leak, because the detached alloc/free operations of the POD message type are used incorrectly. And this is only a minor bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now I see the issue.
So we store the same value redundant in two variables?
That should be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the point of the algorithm!!!
I've continued with the original Rust naming style after starting it this way out of habit. Negotiable. |
m_cback(0) { | ||
DEBUG_ASSERT(m_size > capacity); | ||
DEBUG_ASSERT(m_size < m_size + capacity); // no overflow | ||
DEBUG_ASSERT(std::atomic_is_lock_free(&m_front)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_front.is_lock_free() will look nicer.
typedef S size_type; | ||
|
||
explicit SpscFifo(size_type capacity) | ||
: m_size(adjust_capacity(capacity) + 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use for the extra field?
In one use, we rely on the fact that only one element is queued. Does this fail now?
The use of the cue is there ugly anyway. I think we should fix the user code as well.
@@ -83,7 +83,7 @@ AnalyzerThread::AnalyzerThread( | |||
m_dbConnectionPool(std::move(dbConnectionPool)), | |||
m_pConfig(std::move(pConfig)), | |||
m_modeFlags(modeFlags), | |||
m_nextTrack(MpscFifoConcurrency::SingleProducer), | |||
m_nextTrack(1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not use a queue here.
|
||
private: | ||
static size_type adjust_capacity(size_type capacity) { | ||
VERIFY_OR_DEBUG_ASSERT(capacity <= std::numeric_limits<size_type>::max() / 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we limit the capacity to half the possible size?
The % trick only works if there is no rest after an overflow. We need to adjust the size for this.
Let's say size_type is char.
We have possible index 0...255
If capacity is 100 we have 100 + 1 (vor whatever reason)
Index 255 % 101 = 53
Index (255 + 1) % 101 = 0
If we force the capacity to 128, it works
Index 255 % 128 = 127 (last index)
Index (255 + 1) % 127 = 0 (first index)
This is done here:
Line 19 in 2944e2f
size = roundUpToPowerOf2(size); |
What is the main point of this PR? I think if we introduce our own FIFO here, It should be fully optimized and real greate. |
A Weak-consistency Ring Buffer implementing a bounded, lock-free, single-producer/single-consumer FIFO. This could gradually replace the legacy implementation in lib/portaudio. The element type must provide a default constructor and support move assignment.
Used for:
MpscFifo
)Lock-freedeferredCachingReaderWorker
(new)CachingReaderWorker
<->CachingReader
(engine thread)StatsManager
References: