Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve audio player behavior #4572

Merged
merged 8 commits into from
Feb 17, 2024
Merged

Improve audio player behavior #4572

merged 8 commits into from
Feb 17, 2024

Conversation

rom1v
Copy link
Collaborator

@rom1v rom1v commented Jan 7, 2024

This PR improves internal implementation details of the audio player.

Atomics

The main change consists in removing locking contention between the audio receiver thread and the audio output thread in the "happy path". Synchronization is replaced by atomics. Locking is kept for corner cases where the writer thread needs to "read" (to consume/drop samples).

Compensation thresholds

To adjust the audio samples so that a target latency is preserved between the input and the output, compensation (think "resampling", see blogpost) is applied. The compensation is proportional to the difference between the actual buffering level and the target buffering level.

But to avoid spurious compensation (due to noise errors), it was only enabled if this difference was more than 1 ms. However, the buffering level does not change continuously: it increases abruptly when a packet is received, and decreases abruptly when an audio block is consumed, so a rolling average is used. This estimation may sometimes vary by an amount which may trigger (unwanted) compensation.

To avoid the problem, make two changes:

  • increase the rolling average smoothness
  • increase the threshold to enable compensation from 1 ms to 4 ms

But keep a smaller threshold (1 ms) for disabling compensation, so that the buffering level is restored closer to the target value. This avoids to keep the actual level close to the compensation threshold.

Here is a log capture before the changes (scrcpy -Vverbose) (look at the actual spurious compensation values):

VERBOSE: [Audio] Buffering: target=2400 avg=2353.506104 cur=2610 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2354.768311 cur=2370 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2300.147705 cur=1890 compensation=99
VERBOSE: [Audio] Buffering: target=2400 avg=2376.177734 cur=2635 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2359.578613 cur=2395 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2340.571777 cur=2635 compensation=59
VERBOSE: [Audio] Buffering: target=2400 avg=2343.081055 cur=2649 compensation=56
VERBOSE: [Audio] Buffering: target=2400 avg=2360.970947 cur=2423 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2365.144531 cur=2423 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2361.825684 cur=2423 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2369.891357 cur=2663 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2368.985352 cur=2423 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2354.009277 cur=2183 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2355.908203 cur=2903 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2385.742920 cur=2663 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2274.370605 cur=1943 compensation=125
VERBOSE: [Audio] Buffering: target=2400 avg=2205.103271 cur=1735 compensation=194
VERBOSE: [Audio] Buffering: target=2400 avg=2144.965820 cur=2023 compensation=255
VERBOSE: [Audio] Buffering: target=2400 avg=2328.314941 cur=2327 compensation=71
VERBOSE: [Audio] Buffering: target=2400 avg=2413.046875 cur=2345 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2303.484375 cur=1865 compensation=96
VERBOSE: [Audio] Buffering: target=2400 avg=2282.799561 cur=2369 compensation=117
VERBOSE: [Audio] Buffering: target=2400 avg=2449.577148 cur=2398 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2423.165527 cur=2398 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2492.875244 cur=2398 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2480.594482 cur=2398 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2394.034912 cur=2398 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2286.450928 cur=1438 compensation=113
VERBOSE: [Audio] Buffering: target=2400 avg=2260.310547 cur=2186 compensation=139
VERBOSE: [Audio] Buffering: target=2400 avg=2252.251953 cur=1981 compensation=147
VERBOSE: [Audio] Buffering: target=2400 avg=2422.288330 cur=2498 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2572.453613 cur=2498 compensation=-172

The compensation values are expressed in samples / 4 seconds, so for example a value of 96 means 24 samples compensated per second (for 48000 input samples, there will be 48024 samples written to the buffer, which adds 500µs of compensation).

And after the changes:

VERBOSE: [Audio] Buffering: target=2400 avg=2364.726318 cur=2260 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2350.423096 cur=2020 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2358.727783 cur=2260 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2339.984131 cur=2020 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2335.473145 cur=2020 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2332.891602 cur=2260 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2333.311523 cur=2020 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2334.060791 cur=2980 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2327.629883 cur=2500 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2338.355957 cur=2740 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2361.206299 cur=2980 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2349.394043 cur=1540 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2339.491943 cur=2260 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2339.862305 cur=2260 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2344.531494 cur=2020 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2381.400879 cur=1780 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2437.357422 cur=2980 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2486.074463 cur=2740 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2541.514648 cur=2740 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2544.961914 cur=1540 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2518.584473 cur=1780 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2484.096680 cur=2740 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2474.057373 cur=2980 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2465.440430 cur=2020 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2469.177002 cur=2980 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2489.142090 cur=2020 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2481.343994 cur=2980 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2496.606689 cur=2500 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2511.326172 cur=2260 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2512.120361 cur=2980 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2507.924072 cur=2980 compensation=0
VERBOSE: [Audio] Buffering: target=2400 avg=2491.568604 cur=2980 compensation=0

Spurious compensation is still possible, but less likely (of course, expected compensation still occurs, for example on buffer underflow).

@rom1v rom1v force-pushed the audio_player_atomic branch 4 times, most recently from 52fe7b5 to c8f00f0 Compare January 19, 2024 16:13
@rom1v rom1v force-pushed the audio_player_atomic branch 3 times, most recently from 0126abd to 4dd201e Compare January 24, 2024 15:17
@rom1v rom1v force-pushed the audio_player_atomic branch 3 times, most recently from edad610 to 8fe2e29 Compare February 2, 2024 13:47
@rom1v rom1v force-pushed the audio_player_atomic branch from 0c5b7af to 4e35761 Compare February 16, 2024 10:59
This avoids unreasonable values which could lead to integer overflow.

PR #4572 <#4572>
The audio output thread only reads samples from the buffer, and most of
the time, the audio receiver thread only writes samples to the buffer.
In these cases, using atomics avoids lock contention.

There are still corner cases where the audio receiver thread needs to
"read" samples (and drop them), so lock only in these cases.

PR #4572 <#4572>
Use different thresholds for enabling and disabling compensation.

Concretely, enable compensation if the difference between the average
and the target buffering levels exceeds 4 ms (instead of 1 ms). This
avoids unnecessary compensation due to small noise in buffering level
estimation.

But keep a smaller threshold (1 ms) for disabling compensation, so that
the buffering level is restored closer to the target value. This avoids
to keep the actual level close to the compensation threshold.

PR #4572 <#4572>
The buffering level does not change continuously: it increases abruptly
when a packet is received, and decreases abruptly when an audio block is
consumed.

To estimate the buffering level, a rolling average is used.

To make the buffering more stable, increase the smoothness of this
rolling average. This decreases the risk of enabling audio compensation
due to an estimation error.

PR #4572 <#4572>
If playback starts too early, insert silence until the buffer is filled
up to at least target_buffering before playing.

PR #4572 <#4572>
The assumption that underflow and overbuffering are caused by jitter
(and that the delay between the producer and consumer will be caught up)
does not always hold.

For example, if the consumer does not consume at the expected rate (the
SDL callback is not called often enough, which is an audio output
issue), many samples will be dropped due to overbuffering, decreasing
the average buffering indefinitely.

Prevent the average buffering to become negative to limit the
consequences of an unexpected behavior.

PR #4572 <#4572>
@rom1v rom1v force-pushed the audio_player_atomic branch from 4e35761 to a7cf4da Compare February 17, 2024 15:14
@rom1v rom1v merged commit a7cf4da into dev Feb 17, 2024
armm29393 added a commit to armm29393/scrcpy-root that referenced this pull request May 24, 2024
scrcpy v2.4

Changes since v2.3.1:
 - Add UHID keyboard and mouse support (Genymobile#4473)
 - Simulate tilt multitouch by pressing Shift (Genymobile#4529)
 - Add rotation support for non-default display (Genymobile#4698)
 - Improve audio player (Genymobile#4572)
 - Adapt to display API changes in Android 15 (Genymobile#4646, Genymobile#4656, Genymobile#4657)
 - Adapt audio workarounds to Android 14 (Genymobile#4492)
 - Fix clipboard for IQOO devices on Android 14 (Genymobile#4492, Genymobile#4589, Genymobile#4703)
 - Fix integer overflow for audio packet duration (Genymobile#4536)
 - Rework cleanup (Genymobile#4649)
 - Upgrade FFmpeg to 6.1.1 in Windows releases (Genymobile#4713)
 - Upgrade libusb to 1.0.27 in Windows releases (Genymobile#4713)
 - Various technical fixes
rom1v added a commit that referenced this pull request May 27, 2024
PR #4752 removed the need for locks except for corner cases. Now replace
the remaining lock sections by atomics.

Refs #4572 <#4572>
rom1v added a commit that referenced this pull request May 28, 2024
PR #4752 removed the need for locks except for corner cases. Now replace
the remaining lock sections by atomics.

Refs #4572 <#4572>
rom1v added a commit that referenced this pull request May 29, 2024
PR #4572 removed the need for locks except for corner cases. Now replace
the remaining lock sections by atomics.

Refs #4572 <#4572>
rom1v added a commit that referenced this pull request May 30, 2024
PR #4572 removed the need for locks except for corner cases. Now replace
the remaining lock sections by atomics.

Refs #4572 <#4572>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant