-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storing AudioBuffers in native sample bit depth (was: Add support for 16-bit sample type?) #2396
Comments
My opinion in this matter is pretty much in sync with KG... The thing here is that we have a lot of different scenarios that have different needs. While for a game it might be OK to let the UA decide the quality / performance tradeoff, but for a DAW degrading the quality is a dealbreaker. I don't want us to encourage the UA to do this sort of stuff. The difference betweeen letting the developer decide and letting the UA decide is that especially in the case of mobile the developer can update the choice-making logic much faster, but the UA might be maintained for a longer period. The developer knows what the application does, and will do, whereas the UA will have to resort to heuristics that can easily go wrong (as I pointed out, something that's an optimization for one application is a bug for another). Hiding performance as implementation details is a terrible idea in my experience. Another thing to consider is that both use cases I mentioned here (games and DAWs) traditionally don't actually do any heuristics, but let the users choose the performance options. This makes sense because ultimately the user is actually in the best position to make the decision because they can see the impact of it. If we let the UA decide this sort of thing, those applications lose the ability to let the user decide. I think all in all, letting the UA decide is more or less a useless feature that would cause a lot of implementation complexity and people trying to work around it anyway. As for the subject of compressed assets, with using <audio> there's the unresolved problem of time-syncing it with the rest of the API. One option would be to allow to create an AudioBuffer out of an <audio> element. This would of course throw if the asset hasn't finished loading yet. |
I still think it is a good idea to let the UA decide what representation to use internally, but it should not degrade the quality of samples (unless there is some hint in the API that lets a developer tell the UA what quality level it can accept). Here's a fairly non-intrusive way to support integer formats that I think would solve most problems (comments welcome):
Furthermore, I suggest that:
In general, I don't see how we could both have integer formats internally AND resample the data. To me it seems that we need to go to float32 when resampling, right? |
What's the value in that? Because the cost is really high if you expect the UAs to actually deliver heuristics that help in most cases and don't at least hurt in others. At most, the UA deciding to me would be better as a NTH feature that is either opt-in or opt-out, because in the end it's going to be the user who has the access to the most relevant information, so I'm against doing anything that prevents the application from providing the user with the choice in this. Don't get me wrong, I'm all for reasonable defaults, but in matters where the performance impact is this high, the defaults should be possible to overwrite. |
I think that actually the hinting should work the other way around; the UA could give hints to the application on what's the best thing to do, then the application can use that as a default unless the user overrides it, or if there is a known limitation with a given device, etc. |
At the moment implementations can choose (without normative spec changes) to store decoded buffers in some compressed format that is expanded to floats when read or played back. That might or might not perform well relative to other implementations that don't do this. The playback quality relative to other decoding treatments might vary. It's not for the spec to say. That said, we plan to revisit in-memory compression more thoroughly in the next version of the spec. |
I wonder if there has been any thoughts or communication about this recently? I'm currently getting bitten by this again when porting a game from Android with Emscripten to run in web browser in a mobile phone, and facing considerable memory pressure trying to get it to run. Being able to store audio as original 16-bit instead of expanding to 32-bit would save around 50MB of RAM at runtime for the application, which would be a huge saving when trying to run on phones with 256MB/512MB of RAM. |
We have not been talking about this recently, but we understand it is still an issue. Joe, did you mean to push that back to v.next on June 3rd ? |
I did mean to push it back, because I thought the sense of the group was that implementations could store audio any way they want internally inside an AudioBuffer even if the externally visible data is represented as floats. That doesn't mean I'm dismissing it as an issue, I understand it's a big deal, but we hadn't agreed on a straightforward solution in the spec and there seems to be some room for implementations to make this better without changing the spec. |
By the way, I wonder if the |
If an option lived in |
Has there been any recent advances on this, or thoughts on if/when support for 16bit audio buffers might be realistically introduced? We've been working with a major game company partner on a HTML5 title to be deployed on Facebook, and out of memory crashes contribute more than 30% of the initial conducted QA tests. Profiling shows that having support for 16-bit audio would allow optimizing the game to use 10-20% less memory, which would definitely help with the OOM crashes. Games often utilize a lot of different sound effects, and they are preloaded up front since they need to be played back in real time as a response to a game logic event, so they typically have large banks of audio stored in memory. The native version of the game utilizes 16-bit audio buffers, so needing to expand them to 32-bit on the web causes a big discrepancy in native app vs HTML5 app memory footprints. |
Hi, I have worked on porting a mobile game to WebGL, which you can check out at www.topeleven.com. In our case using 16-bit audio would decrease memory usage about 10%. So this is an optimization worth considering. |
This is still showing up in most Unreal Engine 4 and Unity3D ported titles on the web, and being able to use 16-bit integer formats for audio effects would be a big size saving for these demos. I wonder what the latest thinking is on this? This bug was added a "V1" label earlier, but that was then removed by @mdjp . What does that mean, and what does its removal mean? Has there been any thought for adding this feature in the future? Thanks! |
Based on skimming over the issue and the labels set here, this will not be in v1, but in the next version. I think there are a couple issues that need to be worked out. First, what does I think specifying a new
|
I have created a test suite of different audio files and effects that currently are problematic. You can visit https://github.com/juj/audio_test_suite to find it, or http://clb.demon.fi/audio_test_suite/ to check it out live.
While creating the above set of tests, I notice Overall, I'd like to see the manipulation of compressed and uncompressed audio be much more symmetric in the API, so that all features are available on both formats. At that point, different uncompressed formats would probably also become easier to express. Though I only know of a small subsection of Web Audio API overall, so not sure how easy or hard that would be to achieve. |
I would imagine that playback is the 90% case, with an extremely shallow effect graph. For something like ConvolutionEffectNode, mandating float inputs/outputs is fine. For AudioSourceBufferNode (and perhaps ScriptProcessorNode), I'd really like to see 16-bit-depth support here. |
It would also be nice to have better control over the samplerate - the fact that decodeaudiodata always downsamples to the output rate is unfortunate, as it's lossy. |
Although perhaps not the API you'd choose, control over the sample rate is available through OfflineAudioContext. |
Hmm, interesting, since AudioBuffers can (IIRC) be shared across contexts... |
@hoch had opened a discussion about AudioDeviceClient API, which led to a conversation about efficient compressed audio sample playback. That prompted an illustration/sketch of an API to play back compressed audio clips, something like follows: var audioFeatures = AudioDevice.enumerateAudioSupport(); // Returns a list of e.g. {sampleRate: 44100, channels: 'stereo' }, {sampleRate: 48000, channels: '5.1' }
var device = new AudioDevice({sampleRate: 44100, channels: 'stereo' });
// Compressed audio playback:
var mediaSource = new MediaSource(myTypedArray, /*offset*/43242, /*length*/5325, 'audio/ogg'); // weak reference to typed array bits, no deep copy of byte data
// or mediaSource = new MediaSource(fetch('foo.ogg'));
mediaSource.downloadHint = 'download on first play'/'download up front'/'decode up front';
mediaSource.onloaded / .readystate etc. to provide information
var mediaInstance = new MediaInstance(mediaSource);
mediaInstance.start = 0;
mediaInstance.loopStart = 2342;
mediaInstance.loopEnd = 53114;
mediaInstance.loopTimes = 3; // default=infinity
mediaInstance.end = 350000;
mediaInstance.pitch/.volume/.worldPosition = ...;
mediaInstance.onloopended/.onended = function() {};
var playbackGroup = device.createAudioPlaybackGroup();
var playbackInstance1 = playbackGroup.play(mediaInstance, timeFromNow=0);
var playbackInstance2 = playbackGroup.play(mediaInstance, timeFromNow=2);
var playbackInstance3 = playbackGroup.play(mediaInstance, timeFromNow=4);
playbackInstance2.pitch = ...; // animate the playback pitch
playbackGroup.volume/.pitch = ...; // animate clips in a group
playbackGroup.stop(); // stops all audio files playing in this group
// soft real time push mode synthesis:
var playbackGroup = device.createAudioPlaybackGroup();
var mediaInstance1 = new MediaInstance(myTypedArray, /*offset=*/2000, /*length=*/400000);
myTypedArray[2000 through 402000] = /*synthesized audio frames*/;
playbackGroup.appendQueue(mediaInstance1);
var mediaInstance2 = new MediaInstance(myTypedArray, /*offset=*/402000, /*length=*/400000);
myTypedArray[402000 through 802000] = /*more synthesized audio frames*/;
playbackGroup.appendQueue(mediaInstance2); // queue up to be played back after above buffer @hoch asked to drop it to the issue tracker for reference. Not sure how to tie in to Web Audio, but hopefully it gives ideas of the use cases. |
Virtual F2F:
|
Teleconf: This is useful. @padenot mentioned that Firefox already does this internally and transparently. The question is if it should be exposed to the developer and what the API should look like. Proposals welcome. |
TPAC 2020:
The two last point are complementary and don't serve the same use-case, I believe both would have their use. |
Hey, thanks for the ping! I was not aware of TPAC, and missed out on that - but would love to join in a call if that would help the progress. My take on raw 8-bit/16-bit vs 4-bit DPCM is that neither can obviate a need for the other. Both types of formats are used in native game projects, so I would vote to see support for both in Web Audio. (preference towards raw if only one had to be chosen) |
To reference your second bullet point: choosing the overall bit depth of the audio context would significantly help my application's memory footprint. Being forced to use 32 bit floating point is maxing out my memory when I have 16+ long-form audio files loaded in. |
Virtual F2F 2021: Increase this to priority-1. We will support additional depths for linear PCM (i.e. not 4-bit DPCM). Lots of details need to be worked out, but probably |
@rtoy It's rare, but folks may want to decode these to 24-bit.
Are you saying that the web app can specify the desired target bit depth, in cases where the original isn't known? If so, that sounds great. (For example, an MP3 encoder may take 24-bit PCM samples, and the decoder may be able to output 24-bit PCM samples, but as far as I understand it there is no inherent bit depth while in MP3-land. The web application could request 24-bit PCM if it wanted.) |
Sorry. I really meant that for an encoded file, decodeAudioData can return a buffer of whatever the appropriate bit depth is if there is one. So a 24-bit wav file gets a 24-bit buffer. Well, I guess there isn't really a 24-bit array type, so it would probably be a 32-bit array type.
Ah, I'm not sure about that. I think we want to minimize the changes to decodeAudioData since WebCodecs can probably do everything better. So, I'm not sure about what you can specify for decodeAudioData. But certainly as a user, I want to be able to create an AudioBuffer manually with a specified bit depth. If nothing else, this is useful for testing that AudioBuffers behave correctly. |
AudioWG call:
|
Next step - straw man and draft spec text required. |
Using an integer 16-bit sample type instead of float32 would allow saving half of the memory on audio data when it's resident in RAM.
Consider adding support for users to utilize audio data in such formats.
Discussion thread about this is at http://lists.w3.org/Archives/Public/public-audio/2013OctDec/0294.html
The text was updated successfully, but these errors were encountered: