Skip to content

Towards Fixing ALSA

Robin Gareus edited this page Oct 2, 2018 · 1 revision

I wish to raise a discussion about taking ALSA to the next level:

In 2009, Paul Davis gave an interesting talk at Linux Plumbers Conference in which he outlined a number of design flaws in the way we currently still do audio I/O in GNU/Linux in 2018. [1]

I believe we should take action and come up with a patchset for ALSA that fixes these flaws:

The #1 thing thing that causes issues with audio on Linux today is the lack of
a single unified audio API stack that is used by all applications. --Paul Davis

Common unique user complaints / associated technical flaw:

  1. User: "Why can't I use different sound cards simultaneously as an aggregated device with synced clocks?" Flaw: ALSA does not support asynchronous sample rate conversion to align two or more sound cards' clocks.

  2. User: "Why can't I run a low-latency audio program like Ardour and something with higher latency like my desktop notification sounds simultaneously to share the same sound card?" Flaw: ALSA has no support for multiple buffer queue lengths with different quality of service behavior on the same card.

  3. User: "How come when I reboot, my two sound cards swap IDs and I have to reconfigure my sound set up?" Flaw: ALSA does not provide unique device IDs.

  4. User: "Why can't I easily map my sound ports to different channels?" Flaw: ALSA does not provide port names.

  5. User: "Why do I need to open a low-level hardware device myself when I want to write a simple audio application?" Flaw: The way ALSA is structured, it currently encourages applications to control hardware directly and use a push model at the lowest level (this part should be deprecated).

  6. User: "Why can't I get decent audio latency without using a RT-patched kernel and fiddling with IRQ scheduling?" Flaw: Timing of I/O too coarse, buffer positions only known at resolution of last audio IRQ, and transfer of samples to/from device is driven directly by arrival of audio IRQ.

  7. User: "I have two (or more) of the same device. They show up in seemingly random order." and "I want to temporarily dis/reconnect devices without loosing the mapping (audio+midi ports)." FLAW: ALSA does not provide reliable unique device IDs.

Discussion

Why do audio apps expect to get direct access to the device? We don't do this for video. The "device" oriented approach that ALSA uses needs to be deprecated and made less visible, and applications should use a more "service" oriented API. Both JACK and CoreAudio provide examples of this.

One may argue that PulseAudio proves that you can catch and redirect the use of ALSA and force it into a "service" oriented model. But there are aspects of ALSA that still make this a broken approach: Robin Gareus commented recently on LAD [3] that there are a number of problems that cannot be overcome by using ALSA. For example, many users complain that "I can't use my many USB mics with Ardour"; "My MIDI devices are garbled"; ALSA does not provide port-names. You can't use the same device in different configurations (aggregates) by different applications at the same time, with potentially different samplerates and buffersizes for each application. CoreAudio on macOS can handle all this, why can't we get this stuff right?

Ok, so PulseAudio has a "service" based audio API, why can't it make this all happen? Because it's not a "Pull" model at the lowest level, it uses a "Push" model!

A Pull model is where the subsystem responsible for serving audio decides when to read and write to the actual sound hardware and how many bytes, rather than a Push model where the application trying to have sound decides all these things. Obviously, a Pull model is necessary at the lowest level of audio subsystem because audio capture/playback has timing considerations as it is a real-time device. That means deprecating OSS and the part of ALSA that gives users raw device access, and providing a new unified server API from the kernel that can be used by a userspace daemon to serve audio.

Timing

We need to ensure that a new unified server API meets everyone's needs, from ultra low-latency pro-audio users, to users who just want to hear their notifications ding once in a while and not waste much battery.

If we're to fix this properly, we need to get the timing of buffer positions right. Another design flaw of ALSA is that the timing of I/O is too coarsely measured, i.e. currently the positions of read and write pointers are only known up to a resolution of the time of the last audio IRQ, and the arrival of the IRQ drives the transfer process. Alternatively, there can be clean observations of timing of sound buffer positions in the kernel and we could get access to them. Using these [buffer position, system time] pairs, we could recover a higher resolution estimate of the buffer position at any time, thus being able to requeue audio samples using DMA to adjust for clock jitter and desired latency of the connected clients, up to some hardware limit.

Loosely speaking, in the Linux kernel - currently an audio sound device or sound card is considered as a device that streams audio samples. But we need to remember that it should also stream a set of timing data, that is, the positions of the buffer pointers at given times; instead of assuming that the system clock is always in sync with the delivery of samples. This is partly because the audio master clock is not 100% in sync with the system clock and also because the kernel may not deliver samples exactly when you expect it to.

Paul Davis came up with a solution and spoke about this many years ago:

His idea was to suggest that audio IRQs become a source to resync a delay locked loop (DLL [2]) to predict the position of sound buffer pointers instead of using audio IRQs directly to determine when to read and write. This would enable the user to run different audio clients at different latencies and secondly enable theoretically lower latency, because you could now write/read the buffers "in between" audio IRQs. If you have a DLL to tell you how fast the pointers are moving then you can use a different IRQ (e.g. high-res timers) to write/read at other times. This is dependent on hardware design. In theory you could write just 8 samples ahead of the playback pointer, which would require DMA bus master access. But Paul says that driving everything from an IRQ is a limiting design.

All sound devices seem to trigger interrupts so you can feed a DLL to recover the timing of buffer positions, but some devices might not have DMA. This means you would need to simulate DMA by using a 'safety buffer' for which you don't read or write earlier than. If the hardware expects to get a whole buffer of contents and the protocol never allows to modify what the hardware got, then the 'safety buffer' size in that case could correspond to the transfer block size for the device. This way you can never write closer than the chunk that will be transferred. This may or may not correspond to the IRQ period. The size of the 'safety buffer' would need to be reported to userspace in all cases. The sound hardware itself could be hardcoded to run at its lowest latency setting, perhaps.

A further discussion of a complete solution to the audio problem may include:

  1. A plan to deprecate "device" based access API

  2. Formulating a server API that handles:

  • Data format, including sample rate
  • Signal routing
  • Start/stop
  • Latency inquiries
  • Synchronization
  • A server daemon handles device interaction

Details of timing considerations:

  1. Capture and communication of best-possible timestamps of sound buffer positions

  2. Communication of the latency/buffering sizes to userspace

  3. Kernel <-> userspace audio communication (e.g. an "audio framebuffer" interface instead of read()/write())

  • Communicating audio data

  • Communicating timing information

  • Communicating control flow (e.g does the kernel schedule/wake user threads or does userspace schedule everything using high-perf timers)

  1. What should go in the kernel and what should go in userspace

[1] https://blog.linuxplumbersconf.org/2009/slides/Paul-Davis-lpc2009.pdf

[2] https://kokkinizita.linuxaudio.org/papers/usingdll.pdf

[3] https://lists.linuxaudio.org/archives/linux-audio-dev/2018-August/037247.html

Comments welcome here -> https://github.com/linuxaudio/Linux-Audio-Workgroup/issues/1