-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathweb-audio-perf.bs
868 lines (781 loc) · 40.3 KB
/
web-audio-perf.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
<pre class='metadata'>
Title: Web Audio API performance and debugging notes
Status: ED
ED: https://padenot.github.io/web-audio-perf
shortname: web-audio-perf
Level:1
Editor: Paul Adenot <padenot@mozilla.com>
Abstract: These notes present the Web Audio API from a performance and debugging point of view, outlining some differences between implementation.
group: plain
Boilerplate: omit property-index logo copyright references property-index
</pre>
<style>
a[data-file] {
border-bottom: 1px dotted gray;
}
a[data-file]:hover {
border-bottom: 2px dotted gray;
text-decoration: none;
}
</style>
<section>
<h2>Introduction</h2>
In this tutorial, we will look at two different aspects of working with the Web
Audio API.
First, we'll have a look at the different implementations
available today, how to inspect their source code, and report problems found
while testing application using the Web Audio API with them.
We'll then have a look into the performance characteristics of the different
<code>AudioNode</code>s available, their performance profile, overall CPU and
memory cost.
We'll continue by exploring the different strategies and techniques
implementors have used when writing their implementation of the Web Audio API.
We'll then look into ways to make processing lighter, while still retaining
the essence of the application, for example to make a "degraded" mode for
mobile. We'll use techniques such as substituting rendering methods to trade
fidelity against CPU load, pre-baking assets, minimizing resampling.
Finally, we'll touch on tools and techniques useful to debug audio problems,
both using the browser developer tools, or JavaScript code designed to inspect
static and dynamic audio graphs and related Web Audio API objects.
</section>
<section>
<h2>The different implementations</h2>
Four complete (if there is such thing, considering the standard is always
evolving) Web Audio API implementations are available as of today in browsers:
<ul>
<li>The first ever implementation was part of WebKit. At the time, Chrome
and Safari were sharing the same code.</li>
<li>Then, Blink got forked from WebKit, and the two gradually diverged. They
share a lot of code, but can be considered separate implementations these
days.</li>
<li>Gecko was the second implementation, mostly from scratch, but borrowing
a few files from the Blink fork, for some processing code.</li>
<li>Edge's source are not available, but is based on an old snapshot of
Blink.</li>
</ul>
The source code from the first three implementations can be read, compiled and
modified, here are the relevant locations:
<ul>
<li>
WebKit's implementation lives at <a
href="https://trac.webkit.org/browser/trunk/">https://trac.webkit.org/browser/trunk/</a>,
in at the path <code>/Source/WebCore/Modules/webaudio</code>.
</li>
<li>
Blink's implementation can be found at
<a href="https://code.google.com/p/chromium/codesearch#chromium/src/">
https://code.google.com/p/chromium/codesearch#chromium/src/</a>
which is a handy web interface, with cross-referencing of symbols. The Web
Audio API implementation lives at
<code>third_party/WebKit/Source/modules/webaudio</code>, but a number of
classes and functions, intended to be shared among different Chromium modules
are located at <code>./third_party/WebKit/Source/platform/audio/</code>.
</li>
<li>
Gecko's implementation can be found at <a
href="https://dxr.mozilla.org/mozilla-central">
https://dxr.mozilla.org/mozilla-central</a>, which is also a nice web
interface with cross-referencing of symbols,
and the Web Audio API implementation is located in
<code>dom/media/webaudio</code>. Some shared components are in
<code>dom/media</code>.
</li>
</ul>
Issues (about performance or correctness) can be filed in the project's bug
tracker:
<ul>
<li>WebKit: <a
href="https://bugs.webkit.org/enter_bug.cgi?product=WebKit&component=Web%20Audio">https://bugs.webkit.org/enter_bug.cgi?product=WebKit&component=Web%20Audio</a>
(WebKit bugzilla account needed).</li>
<li>Blink: <a href="https://new.crbug.com/">https://new.crbug.com/</a>
(Google account needed).</li>
<li>Gecko: <a
href="https://bugzilla.mozilla.org/enter_bug.cgi?product=Core&component=Web%20Audio">https://bugzilla.mozilla.org/enter_bug.cgi?product=Core&component=Web%20Audio</a>
(GitHub, Persona or Mozilla Bugzilla account needed).</li>
</ul>
When filing issues, a minimal test case reproducing the issue is very welcome,
as is a benchmark in case of performance problems. A stand alone HTML file is
usually preferred to a jsfiddle, jsbin or similar service, for archival
purposes.
<h2>Performance analysis</h2>
<h3><code>AudioNode</code>s characteristics</h3>
This section explains the characteristics of each of the
<code>AudioNode</code> that are available in the Web Audio API, from four
angles.
<ul>
<li>
CPU, that is the temporal complexity of the processing algorithm;
</li>
<li>
Memory, whether node needs to keep buffers around, or needs internal memory
for processing;
</li>
<li>
Latency, whether the processing induces a delay in the processing chain. If
this section is not present, the node does not add latency;
</li>
<li>
Tail, whether you can have a non-zero output when the
input is continuously silent (for example because the audio source has
stopped). If this section is not present, the node does not have a tail.
</li>
</ul>
<h4> AudioBufferSourceNode </h4>
<dl>
<dt>CPU</dt>
<dd>The <code>AudioBufferSourceNode</code> automatically resamples its
<code>buffer</code> attribute to the sample-rate of the <code>AudioContext</code>. Resampling is
done differently in different browsers. Edge, Blink and Webkit based browser
use <a class="chromium" data-file="AudioBufferSourceNode.cpp:302">linear
resampling</a>, that is cheap, has no latency, but has low quality. Gecko
based browser use a <a class="firefox"
data-file="AudioBufferSourceNode.cpp:294">more expensive</a> but higher
quality technique, that introduces some latency.</dd>
<dt>Memory</dt>
<dd>The <code>AudioBufferSourceNode</code> reads sample from an
<code>AudioBuffer</code> that can be shared between multiple nodes. The
resampler used in Gecko uses some memory for the filter, but nothing major.</dd>
</dl>
<h4> ScriptProcessorNode </h4>
<dl>
<dt>CPU</dt>
<dd>On Gecko-based browsers, this node uses a <a
data-file="ScriptProcessorNode.cpp:31" class="firefox">message queue</a>
to send buffers back and forth between the main thread and the rendering
thread. On other browsers, <a data-file="ScriptProcessorNode.cpp:153"
class="chromium">buffer ping-ponging</a> is used. This means that the
former is more reliable against dropouts, but can have a higher latency
(depending on the main thread event loop load), whereas the latter drops out
more easily, but has fixed latency.</dd>
<dt>Memory</dt>
<dd>
Buffers have to be allocated to move audio back and forth between threads.
Since Gecko uses a buffer queue, more memory can be used.
</dd>
<dt>Latency</dt>
<dd> The latency is specified when creating the node. If Gecko has trouble
keeping up, the latency will increase, up to a point where audio will <a
class="firefox" data-file="ScriptProcessorNode.cpp:133">start to
drop</a>.</dd>
</dl>
<h4> AnalyserNode </h4>
<dl>
<dt>CPU</dt>
<dd>This node can give frequency domain data, using a Fast Fourier Transform
algorithm, that is expensive to compute. The higher the buffer size, the more
expensive the computing is. <code>byte</code> version of the analysis methods
are <a data-file="AnalyserNode.cpp:228" class="firefox">not cheaper</a> than
<code>float</code> alternative, they are provided for
convenience: the <code>byte</code> version are computed from the
<code>float</code> version, using simple quantization to 2^8 values.</dd>
<dt>Memory</dt>
<dd>Fast Fourier Transform algorithms use internal memory for processing.
Different platforms and browsers have different algorithms, so it's hard to
quantify exactly how much memory is going to be used. Additionaly, some
memory is going to be used for the <code>AudioBuffer</code> passed in to the
analysis methods. </dd>
<dt>Latency</dt>
<dd>Because of the windowing function there can be some perceived latency in
this node, but windowing can be disabled by setting it to 0.</dd>
<dt>Tail</dt>
<dd>Because of the windowing function there can be a tail with
this node, but windowing can be disabled by setting it to 0.</dd>
</dl>
<h4> GainNode </h4>
<dl>
<dt>CPU</dt>
<dd>Gecko-based browsers, the gain is always <a data-file="GainNode.cpp:72"
class="firefox">applied lazily</a>, and folded in before processing that
require to touch the samples, or before send the
rendered buffer back to the operating system, so <code>GainNode</code> with
a fixed gain are essentially free. In other engines, the gain is applied to
the input buffer <a data-file="GainNode.cpp:51" class="chromium">as it's
received</a>. When automating the gain using <code>AudioParam</code>
methods, the gain is applied to the buffer in all browsers. </dd>
<dt>Memory</dt>
<dd>A <code>GainNode</code> is stateless and has therefore no associated
memory cost.</dd>
</dl>
<h4> DelayNode </h4>
<dl>
<dt>CPU</dt>
<dd>This node essentially copies input data into a buffer, and reads from this
buffer at a different location to compute its output buffer.</dd>
<dt> Memory</dt>
<dd> The memory cost is a function of the number of input and output
channels and the length of the delay line. </dd>
<dt>Latency</dt>
<dd>Obviously this node introduces latency, but no more than the latency set
by its parameter</dd>
<dt>Tail</dt>
<dd>This node is being kept around (not collected) until it has finished
reading and has output all of its internal buffer.</dd>
</dl>
<h4> BiquadFilterNode </h4>
<dl>
<dt>CPU</dt>
<dd>Biquad filters are relatively cheap (<a data-file="blink/Biquad.cpp:66"
class="firefox">five multiplication and four additions per
sample</a>).</dd>
<dt>Memory</dt>
<dd>Very cheap, four float for the memory of the filter.</dd>
<dt>Latency</dt>
<dd>Exactly <a data-file="blink/Biquad.cpp:71" class="firefox">two frames</a>
of latency, due to how the filter works.</dd>
<dt>Tail</dt>
<dd>Variable tail, depending on the filter setting (in particular the
resonance).</dd>
</dl>
<h4> IIRFilterNode </h4>
<dl>
<dt>CPU</dt>
<dd>Similarly to the biquad filter, they are rather cheap. The complexity
depends on the number of coefficients, that is set at construction.</dd>
<dt>Memory</dt>
<dd>Again, the memory usage depends on the number of coefficients, but is
overall very small (a couple floats per coefficients).</dd>
<dt>Latency</dt>
<dd>A frame per coefficient.</dd>
<dt>Tail</dt>
<dd>Variable, depending on the value of the coefficients.</dd>
</dl>
<h4> WaveShaperNode </h4>
<dl>
<dt>CPU</dt>
<dd>The computational complexity depends on the oversampling. If no
oversampling is used, a sample is read in the wave table, <a
data-file="WaveShaperNode.cpp:197" class="firefox">using linear
interpolation</a>, which is a cheap process in itself. If oversampling is used, a
resampler is used. Depending on the browser engine, different resampling
techniques can be used (FIR, linear, etc.).</dd>
<dt>Memory</dt>
<dd>This node is making a copy of the curve, so it can be quite expensive in
terms of memory.</dd>
<dt>Latency</dt>
<dd>This node does not add latency if oversampling is not used. If
over-sampling is used, and depending on the resampling technique, latency can
be added by the processing.</dd>
<dt>Tail</dt>
<dd>Similarly, depending on the resampling technique used, and when using
over-sampling, a tail can be present.</dd>
</dl>
<h4> PannerNode, when <code>panningModel == "HRTF"</code> </h4>
<dl>
<dt>CPU</dt>
<dd><strong>Very</strong> expensive. This node is constantly doing
convolutions between the input data and a set of HRTF impulse, that are
characteristic of the elevation and azimuth. Additionaly, when the position
changes, it <a data-file="blink/HRTFPanner.cpp:272"
class="firefox">interpolates</a> (cross-fades) between the old and new
position, so that the transition between two HRTF impulses is smooth. This
means that for a stereo source, and while moving, there can be <a
data-file="blink/HRTFPanner.cpp:258" class="firefox">four convolvers</a>
processing at once. Additionaly, the HRTF panning needs short delay
lines.</dd>
<dt>Memory</dt>
<dd>The HRTF panner needs to load a set of HRTF impulses around when
operating. Gecko loads the HRTF database only if needed, while other engines
load it unconditionally. The convolver and delay lines require memory as
well, depending on the Fast Fourier Transform implementation used.</dd>
<dt>Latency</dt>
<dd>HRTF always adds <a data-file="blink/HRTFPanner.cpp:312"
class="firefox">some amount of delay</a>, but the amount depends on the
azimuth and elevation.</dd>
<dt>Tail</dt>
<dd>Similarly, depending on the azimuth and elevation, a tail of different
duration is present.</dd>
</dl>
<h4> PannerNode, when <code>panningModel == "equalpower"</code> </h4>
<dl>
<dt>CPU</dt>
<dd>Rather cheap. The processing has two parts:
<ul>
<li>
First, the <a data-file="PannerNode.cpp:409" class="firefox">azimuth
needs to be determined</a> from the Cartesian coordinate of the source
and listener, this is a bit of vector maths, and can be cached by the
implementation for static sources.
</li>
<li>
Then, <a data-file="PanningUtils.h:52" class="firefox">gain is
applied</a>, maybe blending the two channels is the source is stereo.
</li>
</ul> </dd>
<dt>Memory</dt>
<dd>The processing being stateless, this has no memory cost.</dd>
</dl>
<h4> StereoPannerNode </h4>
<dl>
<dt>CPU</dt>
<dd>Similar to the <code>"equalpower"</code> panning, but the azimuth is
cheaper to compute since there is no need to do the vector math, we already
have the position. </dd>
<dt>Memory</dt>
<dd>Stateless processing, no memory cost.</dd>
</dl>
<h4> ConvolverNode </h4>
<dl>
<dt>CPU</dt>
<dd>Very expensive, and depending on the duration of the convolution
impulse. A <a data-file="blink/ReverbConvolver.cpp:156"
class="firefox">background thread</a> is used to offload some of the
processing, but computational burst can occur in some browsers. Basically,
multiple FFT are computed for each block.</dd>
<dt>Memory</dt>
<dd>The node is making a copy of the buffer for internal use, so it's taking
a fair bit or memory (depending on the duration of the impulse).
Additionaly, some memory can be used for the Fast Fourier Transform
implementation, depending on the platform.
</dd>
<dt>Latency</dt>
<dd>Convolver can be used to create delay-like effect, so latency can
certainly be introduced by a <code>ConvolverNode</code>.</dd>
<dt>Tail</dt>
<dd>Depending on the convolution impulse, there can be a tail.</dd>
</dl>
<h4> ChannelSplitterNode / ChannelMergerNode </h4>
<dl>
<dt>CPU</dt>
<dd>This is merely splitting or merging channels, that is copying buffer
around.
</dd>
<dt>Memory</dt>
<dd>No memory implications</dd>
</dl>
<h4> DynamicsCompressorNode </h4>
<dl>
<dt>CPU</dt>
<dd>The exact algorithm is not specified yet. In practice, it's the same in
all browsers, a peak detecting look-ahead, with a pre-emphasis and
post-de-emphasis, not too expensive.</dd>
<dt>Memory</dt>
<dd>Not very expensive in terms of memory, just some floats to track the
internal state.</dd>
<dt>Latency</dt>
<dd>Being a look ahead compressor, it introduces a fixed look-ahead of
six milliseconds.</dd>
<dt>Tail</dt>
<dd>Because of the emphasis, there is a tail. Also, compression can boost
quiet audio, so audible sound can appear to last longer.</dd>
</dl>
<h4> OscillatorNode </h4>
<dl>
<dt>CPU</dt>
<dd>The basic wave forms are implemented using multiple <a
data-file="OscillatorNode.cpp:261" class=firefox>wave tables</a> computed
using the inverse Fourier transform of a buffer with carefully chosen
coefficients (apart from the sine that is <a
data-file="OscillatorNode.cpp:219" class=firefox>computed directly</a> in
Gecko). This means that there is an initial cost when changing the wave
form, that is <a data-file="OscillatorNode.cpp:118"
class="firefox">cached</a> in Gecko-based browser. After the initial cost,
processing is essentially doing linear interpolation between multiple wave
tables. When the frequency changes, new tables have to be computed.
</dd>
<dt>Memory</dt>
<dd>A number of wave tables have to be stored, that can take up some memory.
Those are shared in Gecko-based browsers, apart from the sine wave in Gecko,
that is directly computed.</dd>
</dl>
<h3> Other noteworthy performance characteristics </h3>
<h4> Memory model </h4>
Web Audio API implementation use two threads. The <em>control thread</em> is
the thread on which are issued the Web Audio API calls:
<code>createGain</code>, <code>setTargetAtTime</code>, etc. The <em>rendering
thread</em> is the thread that is responsible for rendering the audio. This
can be a normal thread (for example for an <code>OfflineAudioContext</code>) or a
system provided, high-priority audio thread (for a normal
<code>AudioContext</code>). Of course, informations have to be communicated
between the two threads.
Current Web Audio API implementations have taken two different approaches to
implement the specification. Gecko-based browsers use an <em>message
passing</em> model, whereas all the other implementation use a <em>shared
memory</em> model. This has a number of implications in practice.
First, in engines that are using the <em>shared memory</em> model, changes to
the graph and <code>AudioParam</code> can occur at any time. This means that
in some scenario, manipulation (from the main thread) of internal Web Audio
API data structures can be reflected more quickly in the rendering thread. For
example, if the audio thread is current rendering, a modification from the
main thread will be reflected immediately on the rendering thread.
A drawback of this approach is that it is necessary to have some
synchronization between the control thread and the rendering thread. The
rendering thread is often very high priority (usually the highest priority on
the system), to guarantee no under-runs (or dropouts), which are considered
catastrophic failures, for most audio rendering system. Under-runs usually
occur when the audio rendering thread did not make its deadline. For example,
it took more than 5 milliseconds of processing to process 5 milliseconds of
audio. Non-Gecko based browers are often using <em>try locks</em> to ensure
smooth operation.
In certain parts of the Web Audio API, there is a need to a be able to access
data structure sent to the rendering thread, from the main thread. Gecko-based
browsers keep two synchronized copies of the data structure to implement this,
this has a cost in memory, that other engines don't have to pay.
<h4> AudioParam </h4>
The way a web application uses <code>AudioParam</code> plays an important role
in the performance of the application. <code>AudioParam</code> come in two
flavours, <em>a-rate</em> and <em>k-rate</em> parameters. <em>a-rate</em>
parameters have their value computed for each audio sample, whereas
<em>k-rate</em> parameters are computed once per 128 frames block.
<code>AudioParam</code> methods (<code>setValueAtTime</code>,
<code>linearRampToValueAtTime</code>, etc.) each insert <em>events</em> in a
list of events that is later accessed by the rendering thread.
Handling <code>AudioParam</code>, for an engine, means first finding the right
event (or events) to consider for the block of audio to render. Different
approaches are taken by different browsers. Gecko prefers to prune all events
that are in the past but the one that is right before the current time
(certain events require to have a look at the previous event's value). This
guarantees amortized <em>O(1)</em> complexity (amortized because deallocations
can take some time). Other engines do a linear scan in the event list to find
the right one.
In practice, both techniques perform well enough so that the difference is not
noticeable most of the time. If the application uses <em>a lot </em> of
<code>AudioParam</code> events, non-Gecko based browers can have performance
issues, because scanning through the list starts to take a non-trivial amount
of time. Strategies can be employed to mitigate this issue, by creating new
<code>AudioNode</code>, with new <code>AudioParam</code>, that start with an
empty list of events.
For example, let's take a <code>GainNode</code> that is used as the envelope
of an <code>OscillatorNode</code>-based kick drum, playing each beat at 140
beat per minute. The envelope is often implemented using a
<code>setValueAtTime</code> call to set the initial volume of the hit, that
often depends on the velocity, immediately followed by a
<code>setTargetAtTime</code> call, at the same time, with a value of 0 to have
a curve that decays to silence, and a time constant that depends on the
release parameter of the synth. At 140BPM, 280 events will be inserted by
minute. To ensure stable performance, and leveraging the fact that a
<code>GainNode</code> is very cheap to create and connect, it might be worth
it to consider swapping the node that is responsible for the envelope
regularly.
Gecko-based browers, because of their event-loop based model, have two copies
of the event list. One on the main thread, that contains all the events, and
one on the rendering thread, that is regularly pruned to only contain relevant
events. Other engines simply synchronize the event list using a lock, and only
have one copy of the event list. Depending on how the application schedules
its <code>AudioParam</code> events (the worst case being when all the events
are scheduled in advance), having two copies of the event list can take a
non-negligible amount of memory.
The second part of <em>AudioParam</em> handling is computing the actual value
(or value, for an <em>a-rate</em> parameter, based on past, present and future
events. Gecko-based browsers are not very efficient are computing those values
(and suffer from a number of bugs in the implementation). This code is going
to go through a rewrite <em>real soon (tm)</em>. Other engines are much more
efficient (which, most of the time, offsets their less efficient way of
searching for the right events to consider). Blink even has optimized code to
compute the automation curves, using SSE intrinsics on x86.
All that said, and while implementations are becoming more and more efficient
at computing automation curves, it is event more efficient to not use
<code>AudioParam</code> if not necessary, implementation often take different
code path if they know an <code>AudioParam</code> value is going to be
constant for a block, and this is easier to detect if the <code>value</code>
attribute has been directly set.
<h4> Node ordering </h4>
Audio nodes have to be processed in a specific order to get correct results.
For example, considering a graph with an <code>AudioBufferSourceNode</code>
connected to a <code>GainNode</code>, connected to the
<code>AudioDestinationNode</code>. The <code>AudioBufferSourceNode</code> has
to be processed first, it gets
the values from the <code>AudioBuffer</code> and maybe resamples them.
The output of this node is passed to the <code>GainNode</code>, that applies
its own processing, passing down the computed data to the
<code>AudioDestinationNode</code>.
Gecko-based browsers and other engines have taken two different approaches for
ordering Web Audio API graphs.
Gecko uses an algorithm based on Tim Leslie's iterative implementation
[<a href="http://www.timl.id.au/?p=327">1</a>][<a
href="https://github.com/scipy/scipy/blob/e2c502fca/scipy/sparse/csgraph/_traversal.pyx#L582">2</a>]
of Pearce's variant [<a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1707">3</a>] of Tarjan's strongly connected components (SCC)
algorithm, which is basically a kind of topological sort that takes cycles
into account to properly handle cycles with <code>DelayNode</code>.
Other engines simply do a depth first walk of the graph, uses a graph coloring
approach to detect cycles, and pull from the <code>DelayNode</code> internal
buffer in case of cycles. This is way simpler, but differs in rendering with
the other approach.
Gecko is only running the algorithm if the topology of the graph has changed
(for example if a node has been added, or a connection has been removed),
while other engine perform the traversal for each rendering quantum. Depending
on if the application has a mostly static or very dynamic graph topology, this
can have an impact on performance.
<h4> Latency </h4>
Audio output latency is very important for the Web Audio API. Usually,
implementations use the lowest available audio latency on the system to run
the <em>rendering thread</em>. Roughly, this mean on the order of 10ms on
Windows (post-Vista, using WASAPI), very very low on OSX and iOS (a few
milliseconds), a 30-40ms on Linux/PulseAudio. There are not yet engines that
support Jack with low-latency code, although this would certainly have amazing
latency numbers. Windows XP does not usually have great output latency, and
licensing terms (and available time to write code!) prevent the use of ASIO on
Windows, which would also have great performance.
Android devices are a bit of a problem, as the latency can vary between 12.5ms
on a Galaxy Nexus (which was the device with the lowest latency) up to 150ms
on some very cheap devices. Although there is a way to know the latency using
system API (that browsers use for audio/video synchronization), Web App
authors don't currently have a way to know the output latency. This is
currently being addressed (<a
href="https://github.com/WebAudio/web-audio-api/issues/12">#12</a>).
Creating <code>AudioContext</code> with a higher latency is a feature that is
currently under discussion (<a
href="https://github.com/WebAudio/web-audio-api/issues/348">#348</a>). This allows a
number of things, such as battery savings for use-cases that don't require
very low latency. For example, an application that only does playback, but
wants to display the Fourier transform of the signal, or simply the wave form
of the signal itself, have an equalizer on the signal path, apply compression,
all would benefit from this feature, because real-time interaction is not
required. Of course, the latency that is the most efficient, battery-wise and
regardless of audio output latency depends on the operating system and
hardware.
By using bigger rendering quantum (by using bigger buffers, caused by
increasing the audio latency in most systems) also allows creating and
rendering more complex Web Audio API graphs. Indeed, audio systems often pull
quite a lot of memory into CPU cache lines to compute each rendering block (be
it audio sample data, automation timeline event arrays, the memory for the
nodes themselves, etc.). Rendering for a longer period of time means having
caches that are more hot and lets the CPU crunching numbers without having to
wait for memory faults. Digital Audio Workstation users often increase the
latency when they heard "audio crackles" (under-runs), the same logic applies
here. Of course, there is a point where having a higher latency is actually
not helping the computation. This value depends on the system.
<h4> Browser architecture </h4>
Firefox (that is based on Gecko) uses only one process for all the tabs and
the chrome (which is the user interface of the browser, that is, the tabs, the
menus, etc., a very confusing term nowadays), whereas other browsers use
multiple processes.
While the rendering of Web Audio API graphs happen on dedicated thread, this
has important implications in terms of responsiveness. The event loop in Gecko
is shared between all the tabs. When using a Web Application that uses the Web
Audio API, if all the other tabs are still doing some processing, receiving
events, running script, etc. in the background, the Web Audio API calls will
be delayed, so it is necessary to plan a bit more in advance (see Chris
Wilson's <a
href="http://www.html5rocks.com/en/tutorials/audio/scheduling/">Tale of two
clocks</a>). This is somewhat being addressed by the electrolysis (e10s)
project, but there is still a long way to go to catch up in responsiveness.
Other engines usually use multiple processes, so the event loop load is split
among multiple processes, and there is less chances that the script that is
issuing Web Audio API calls is delayed, allowing more tight delays for
scheduling things (be it an <code>AudioParam</code>, the starting an
<code>AudioBufferSourceNode</code>, etc.).
<h4><code>decodeAudioData</code></h4>
Gecko based browsers use a thread pool (with multiple threads) when handling
multiple <code>decodeAudioData</code> calls, whereas other browsers serialize
the decoding on a thread.
Additionaly, audio is resampled differently, and different audio decoders are
used, with different optimizations on different OS and architectures, leading
to a wide variety of performance profiles.
Authors should take advantage of (nowadays ubiquitous) multi-core machines
(even phones often have four or more core these days), and try to saturate the
CPUs with decoding operations. Since the rendering thread is higher priority
than the decoding threads, no audio under-runs are to be expected.
<h4>Micro-optimizations</h4>
Audio processing is composed of a lot of very tight loops, repeating a
particular operation over and over on audio buffers.
Implementations take advantage of SIMD (Single Instruction Multiple Data)
available in CPUs, and optimize certain very common operation after profiling,
to make processing go faster, in order to render have bigger graphs, allow
lower audio latency.
Gecko has NEON (on ARM) and SSE (on x86) function implementing all the very
common functions (Applying a gain in place, copying a buffer while applying a
gain value, panning a mono buffer to a stereo position, etc.), as well as
optimized FFT code (taken from the FFMPEG project on x86, and OpenMax DL
routines on ARM). Of course, these optimizations are only used if the CPU on
which the browser is running has the necessary extensions, falling back to
normal scalar code if not. Depending on the function, those optimizations have
proven to be between three and sixteen times faster than their scalar
counterpart.
Blink has SSE code for some <code>AudioParam</code> event handling, and has
the same FFT code as Gecko (both on x86 and ARM).
WebKit can use different FFT implementation, depending on the platform, for
example the FFT from the <em>Accelerate</em> framework on OSX.
This results in a variety of performance profiles for FFT-based processing
(HRTF panner, <code>ConvolverNode</code>, <code>AnalyserNode</code>, etc.).
Additionaly, very often, the decoding code used by the implementation of
<code>decodeAudioData</code> are very well optimized using SIMD and a variety
of other techniques.
</section>
<section>
<h2>Using lighter processing</h2>
Audio processing is a hard real-time process. If the audio has no been
computed when the system is about to output it, drop outs will occur, in the
form of clicks, silence, noise and other unpleasant sounds.
Web Browsers run a variety of platform and devices, some of them being very
powerful, and some of them with limited resources. To ensure a smooth
experiences for all users and depending on the application itself, it can be
necessary to use techniques that does not sound as good as the normal
technique, but still allow to have a convincing experience, retaining the
essence of the interaction.
<h3>Custom processing</h3>
Sometimes, the Web Audio API falls short in what it has to offer, to solve
particular problems. It can be necessary to use an <code>AudioWorklet</code>
or a <code>ScriptProcessorNode</code> to implement custom processing.
JavaScript is a very fast language if written properly, but there are a number
of rules to obey to achieve this result.
<ul>
<li>
Using typed array is a must. They are very fast compared to the normal
array. In the Web Audio API, the <code>Float32Array</code> is the most
common. Re-using arrays is important, there is no need to spend a lot of
time in the allocator code.
</li>
<li>
Keeping the working set small is important, not using a lot of arrays and
always using the same data is faster.
</li>
<li>
No DOM manipulation or fiddling with the prototype of object during
processing as this invalidates the JITed code.
</li>
<li>
Trying to stay as mono-morphic as possible and always using the same code
path yields more optimized JITed code. Suddenly calling a function with
a normal array instead of a <code>Float32Array</code> will invalidate a lot
of code and take a slow path.
</li>
<li>
Compiling C or C++ to JavaScript (or soon Web Assembly) yields the best
performance. <em>emscripten</em> or other tools can be used to compile
libraries or custom code into a typed subset of JavaScript that does not
allocate and is very fast at number crunching.
</li>
<li>
Experimental extensions, such as <code>SIMD.js</code> or
<code>SharedArrayBuffer</code> can make things run faster, but they are not
available in all browsers.
</li>
</ul>
<h3>Worker-based custom processing</h3>
All of the above techniques can be used with workers, to offload heavy
processing. While the worker does not have access to the Web Audio API,
buffers can be transfered without copy, and the audio can then be sent back to
the main thread for use with the Web Audio API.
This has latency implications, and care must be taken so that the worker
always has finished processing on time, generally using a FIFO.
<h3>Copying audio data</h3>
Reusing <code>AudioBuffer</code> internal data can be performed more
efficiently than with calling <code>getChannelData()</code>.
<code>copyFromChannel()</code> and <code>copyToChannel()</code> allow the
engine to to optimize away some copies and allocations.
<h3>Built-in resampling</h3>
The <code>OfflineAudioContext</code> can be use to perform off-main-thread and
off-rendering-thread re-resampling of audio buffer. This can have two
different uses:
<ul>
<li>
By resampling an <code>AudioBuffer</code> to the sample-rate of the
<code>AudioContext</code>, no time is spent resampling in the
<code>AudioBufferSourceNode</code>.
</li>
<li>
By resampling an <code>AudioBuffer</code> to a low sample rate, memory can
be saved. This has sound quality and CPU implications, but not all sounds
have partials that are high enough to require a full-band rate.
</li>
</ul>
<pre highlight=javascript>
function resample(input, target_rate) {
return new Promise((resolve, reject) => {
if (typeof input != "AudioBuffer") {
reject()
}
if (typeof target_rate != "number" && target_rate <= 0) {
reject()
}
var resampling_ratio = input.sampleRate / target_rate;
var final_length = input.length * resampling_ratio;
var off = new OfflineAudioContext(input.numberOfChannels,
final_length, target_rate);
var source = off.createBufferSource();
source.buffer = input;
source.connect(off.destination);
source.start(0);
off.startRendering().then(resolve).catch(reject);
});
}
</pre>
<h3>Track or asset freezing</h3>
The <code>OfflineAudioContext</code> can be use to apply, ahead of playback,
complex processing to audio buffer, to avoid having to recompute the
processing each time. This is called <em>baking</em> or <em>freezing</em>.
This can be employed for a full sound-track, as well as, for example, the
individual sound effects of a video game.
<h3>Cheaper reverb</h3>
When using the Web Audio API, the simplest way of adding a reverberation
effect to a sound is to connect it into a <code>ConvolverNode</code>, setting
its <code>buffer</code> attribute to a reverb impulse, often a decaying curve
of some sort, synthesized using a program, or recorded and processed from a
real location. While this setup has a very good sound quality (of course,
depending on the impulse chosen), it is very expensive to compute. While
computers and modern phones are very powerful, longer reverb tails, or running
the Web App on a cheap mobile devices might not work.
Cheaper reverb can be created using delay lines, all-pass and low-pass
filters, to achieve a very convincing effect. This also has the advantage of
having parameters you can change, instead of an impulse buffer.
<h3>Cheaper panning</h3>
HRTF panning is based on convolution, sounds good, but is very expensive. It
can be replaced, for example on mobile, by a very short reverb and an
equal-power panner, with a distance function and Cartesian positions adjusted
properly. This has the advantage of not requiring very very expensive
computation when continuously changing the positions of the listener and
source, something that is often the case when basing those positions on sensor
data, such as accelerometer and gyroscopes.
</section>
<section>
<h2>Debugging Web Audio API applications</h2>
<h3>Node Wrapping</h3>
A good strategy to debug Web Audio API application is to wrap
<code>AudioNode</code>s using ES6's <code>Proxy</code> or normal prototype
override, to keep track of state, and being able to tap in an re-route audio,
insert analysers and so on.
<h3>Firefox' Web Audio API debugger</h3>
Firefox has a custom developer tool panel, that shows the graph, with the
following features:
<ul>
<li>Ability to see the topology of the graph, garbage collection of nodes
and sub-graph portion.</li>
<li>Setting and getting the value of <code>AudioNode</code> attributes.</li>
<li>Bypass nodes</li>
</ul>
There are some up and coming feature coming up as well:
<ul>
<li>Memory consumption of <code>AudioNode</code>s and
<code>AudioBuffers</code>.</li>
<li> Tapping into nodes, inserting analysers. </li>
<li> Inspecting <code>AudioParam</code> time-lines </li>
<li> CPU profiling of real-time budget </li>
</ul>
<h3>Memory profiling</h3>
It is pretty easy to determine the size in bytes of the buffer space used by
<code>Audiobuffer</code>:
<pre highlight=javascript>
function AudioBuffer_to_bytes(buffer) {
if (!(buffer instanceof AudioBuffer)) {
throw "not an AudioBuffer.";
}
const sizeoffloat32 = 4;
return buffer.length * buffer.numberOfChannels * sizeoffloat32;
}
</pre>
Other memory figures can be obtained in Firefox by going to
<code>about:memory</code>, clicking "Measure", and looking to the tab containing your
page.
</section>
<div data-fill-with="conformance">
</div>
<script>
var chromiumFiles = document.querySelectorAll(".chromium");
for (var i = 0; i < chromiumFiles.length; i++) {
var tuple = chromiumFiles[i].dataset.file.split(":");
var file = tuple[0];
var line = tuple[1];
chromiumFiles[i].href =
"https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/modules/webaudio/"+file+"&l="+line;
}
var firefoxFile = document.querySelectorAll(".firefox");
for (var i = 0; i < firefoxFile.length; i++) {
var tuple = firefoxFile[i].dataset.file.split(":");
var file = tuple[0];
var line = tuple[1];
firefoxFile[i].href =
"https://dxr.mozilla.org/mozilla-central/source/dom/media/webaudio/" + file + "#" + line;
}
</script>
</body>
</html>