-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoding mp3/ogg/aac to fix Web Audio API .decodeAudioData() shortcomings? #366
Comments
WebCodecs does not decode files, it decodes raw bitstreams. In the common case the bitstream data will be paired with metadata in a container (eg. mp4a is in an ISO BMFF aka MP4 container), and it is the job of the application to extract both.
If your file is an mp4a file, you will need to use an ISO BMFF parser (eg. mp4box.js) to extract this information. I recommend https://gpac.github.io/mp4box.js/test/filereader.html to get a feel for the sort of metadata that is in an mp4a file.
This information is all in the ISO BMFF container metadata. I'm not certain offhand, but the number of channels may require parsing the AAC codec-specific data. (Edit: channel count is in the
WebCodecs provides only codecs, not a player implementation. It is our hope that these sorts of APIs will be built on top of WebCodecs. |
Apologies for the confusion. With all the conversations that have happened in the past and the eagerness to close out Web Audio bugs in favor of WebCodecs, there has been a misunderstanding that it would directly address these use cases, but now it is clear that it does not. Thanks for the clarification. |
@juj, to clarify, while WebCodecs does not included all of the components to address these use cases, it is intended that WebCodecs fills the "decoding" roll (while javascript fills the "demuxing" roll). Folks who presently build mixing engines on top of WebAudio should find this really useful, as it allows you to decode only what you need (vs the whole file) and just-in-time. Better flexibility, less memory. Btw, we are working on a demo that decodes audio (and video) here |
In various conversations throughout the years, the Web Audio API working group has generally closed out feature requests to the API, and it has been the expectation that WebCodecs+AudioWorklets+SAB+Wasm would allow developers to write an audio decoding+mixing engine to fix on their own the shortcomings that Web Audio API currently has for audio playback parity with native apps ([1], [2], [3], [4], [5] etc.). Recalling, the major two shortcomings are:
The general result is that web sites using Web Audio API typically excessively overuse system memory and CPU power. That makes WebCodecs extremely appealing to adopt by a large number of users.
So sat down today to try to use WebCodecs to fix up
.decodeAudioData()
that would patch up the above issues by decoding audio on-demand. I could not find any examples on how to use WebCodecs AudioDecoder API to decompress an audio file on the fly, so tried to build my own just based on the spec.However I run into a few issues from the get-go. My small example looks like
which raises the following observations:
I need to specify the codec for the input file. For general ogg vorbis and mp3 that is possible to look from the file suffix (needs enforcing that all asset files come with the suffix identified, or some side-channel information, but that's passable), however for AAC the codec description strings seem to be more complex, there seems to be some kind of profile system in play?. How would I know which profile the input AAC file had? I'd like to just be able to say
mp4a
oraac
for an AAC-encoded file, and not have to specify the*
part formp4a.*
. (unless it is something trivial that can be statically reasoned without having to write a file format parser?) Ideally I'd just passcodec: 'autodetect'
(or omit it altogether) and have the system know what kind of audio I have passed.I don't know what to put in to the fields marked with (2). These fields are something I would want to just leave out, and have the codec tell me what the input file had. With the exception of the
duration
field, the other fields look like they are mandatory, and Chrome won't decode unless they are specified. Shouldn't the codec be able to know this info based on the input file that is provided to it?Ignoring (2) for now and hardcoding known values; When starting off
.decode()
, it will in my test call the output callback 1149 times to immediately decode the whole file to completion, just like.decodeAudioData()
did, hence winning nothing. I suppose it is doing what I asked for, since I provided the whole file as one chunk. What I instead want is to be able to tell the API to e.g. "decode 1 second forward from current position", or "decode 4096 new samples".I presume I should be slicing up the input file to pass as the
data
field to pace the decoder, But the issue is that I don't know what is the chunk size is that I should to pass to the encoder to achieve that 1 second or 4096 new samples that I want?If I choose a too high value, I do excess work and cause excess memory usage.
So as a hack I try to choose an arbitrary small value, e.g. 16kB. My test reads like this:
This code decodes the first chunk and calls
output
callback 29 times, and when I try to calldecodeMore()
after that, I just get a genericDOMException: Decoding error.
no matter what I try. This is where I hit the wall.There does not seem to exist an API to tell when all decoding is complete? The
output
callback will get invoked a number of times as a result of calling.decode()
once, so I don't know how many times theoutput
callback will trigger before I should issue a new call to start decoding the next chunk. How should this be achieved?Am I trying this right? I don't even know if the API is supposed to work like this? Or is there something I am missing?
Thanks for any help!
The text was updated successfully, but these errors were encountered: