-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Image Decoding, associated interfaces and algorithms #152
Conversation
@dalecurtis, mind taking a first pass? Interface wise, this is everything we discussed. But behind those friendly interfaces hides quite a bit of state. @aboba, FYI - will request review formally once Dale has had a go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dalecurtis, great feedback.
Also includes other minor fixes.
@cconcolato, we're interested to get your take how ImageDecoder describes a list of of ImageTracks (ImageDecoder.tracks). We think it maps well onto video-based image formats like avif (given that "tracks" is a longstanding concept for video), but there's some debate about whether it is over engineered. That is, do formats like avif support an arbitrary number of tracks? If we instead expect all image formats to define at most 2 tracks (one animated, one still), then we could simplify the API to remove all mention of tracks and make selections simply by giving a value for preferAnimation. |
@aboba, I think Dale's review is sufficiently concluded for you to begin. Standing by for questions and feedback. |
@chcunningham Not sure I have enough information to answer. Let me try. Then, an ISOBMFF file can indeed contain many tracks. For example, an mp4 file could contain an AV1 video track, an audio track, a subtitle track. It could also contain an image sequence track (e.g. using only the key frames of the video track) meant to be viewed as a GIF-like animation before the video is clicked. It could also contain one image item to give a representative image of the video. Theoretically, you can construct files with as many tracks and items as you want. It could have N video tracks, K audio tracks, P image sequence tracks, Q subtitle tracks, R image items, etc... In practice, typical files will have simple configurations. Reading:
It seems to me that you want to expose only image items and image sequences (which is fine). I'm curious how an implementation is supposed to decide if it exposes a sequence of images as a video track or as an image track. Do you expect the track container to guide the implementation? For example, in ISOBMFF, video tracks and image sequence tracks are differentiated by the track handler In practice, I don't think files containing images will contain more than one image sequence. As for image items, there are use cases where people think about storing multiple image items in the same file. This is for example because they are the result of a capture burst, or bracketed images, or multi-angle, multi-view images, or even to package multiple resolutions in the same file. But these use cases are rather rare IMHO. If you want to keep the API simple for now, the preferAnimation approach seems reasonable and matches the hypothetical reader API that MIAF defines:
Maybe you want to consider adding preferThumbnail? @joedrago may have additional feeback based on the libavif API and its integration in browsers |
Thank you @cconcolato!
Yes. And I'm calling both a "track", where an image item is just a track with one frame.
When you say "video track", I take it you mean an ImageTrack for which track.animated=true. If so, the my intent with animated is to provide an early signal that this track will have a frameCount > 1. Ideally we would do away with animated entirely and just have frameCount, but frameCount is not always known at the outset (particularly for gif), so this may cause folks to prematurely consider a track with 1 frame as non-animated.
An earlier draft did just have preferAnimation without the tracks mechanism. But then we had the frameCount and animated properties directly on ImageDecoder. This works, but its limiting if we later do want to add some description of alternative tracks. I noticed that html has long defined AudioTrackList and VideoTrackList interfaces, so this seemed like a pattern we might follow with ImageTrack(List).
Could do. What happens when preferThumbnail and preferAnimated compete? Maybe prefer should be an enum w/ either type? |
No I really meant a VideoTrack. This spec defines an ImageTrack and that the HTML spec defines a VideoTrack. When both will be implemented, and the browser is presented with an MP4 video track or MP4 image sequence track, how will it decide if it uses a VideoTrack or an ImageTrack?
Maybe, no strong opinion. |
Ah, I follow. Do you expect to see files like this in the wild? How often? Would it be reasonable to describe these with the image/* mimetype (vs video/*)? |
I can't predict the future, but I could envisage people creating dual-headed files (with an image sequence track and a video track, possibly sharing coded frames) to be used in both |
Thanks. Probably you meant
|
undefined reset(); | ||
undefined close(); | ||
|
||
static Promise<boolean> isTypeSupported(DOMString type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a promise-based API? Can we instead return the boolean synchronously? https://github.com/dalecurtis/image-decoder-api/issues/6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was made promise as we anticipate cases where the UA may not synchronously have the answer. As image formats have started to use video codecs, decoding an image may require instantiating video decoders backed by platform APIs that. A browser architecture may be such that these APIs are called in a separate process, sandboxed for improved security. Supported types then becomes a question for these same APIs, which is implemented via async IPC.
Earlier media capability detection APIs, <video>.canPlayType()
, MSE.isTypeSupported()
, and WebRTC's RTCRtpSender.getCapabilities() have all been sync. This has at times been problematic for implementers. Newer APIs like MediaCapabilities.decodingInfo() were made async. Note that the other isConfigSupported() interfaces in WebCodecs are also async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that actual image decoding might depend on another process, but answering the question "do I know what to do with this image type at all (without necessarily doing the work)" seems orthogonal to that. Couldn't browsers just maintain a list of supported image types and synchronously check it whenever isTypeSupported
is called?
If not, then have we considered changing the name of this API? IMHO it's surprising to give it the same name as an existing API without matching the signature of its return value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that actual image decoding might depend on another process, but answering the question "do I know what to do with this image type at all (without necessarily doing the work)" seems orthogonal to that. Couldn't browsers just maintain a list of supported image types and synchronously check it whenever isTypeSupported is called?
If the browser entirely relies on the OS to provide the codec, it may not be possible to know statically what codecs are supported (particularly for newer formats). Instead, we may be forced to query OS apis that are adjacent to the actual decoding APIs. Often this involves the same async IPC to a privaledged sandboxed process.
If not, then have we considered changing the name of this API? IMHO it's surprising to give it the same name as an existing API without matching the signature of its return value.
Open to suggestions. I liked this name for its similarity actually. It is performing essentially the same function as its predecessor. I think any confusion would be pretty immediately resolved at dev-time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the browser entirely relies on the OS to provide the codec, it may not be possible to know statically what codecs are supported (particularly for newer formats). Instead, we may be forced to query OS apis that are adjacent to the actual decoding APIs. Often this involves the same async IPC to a privaledged sandboxed process.
In my experience, this is mostly true for video, in the sense that it's well possible that the device that allows (say) power or cpu-efficient decoding is simply physically removed, and so that it's impossible to store the capabilities somewhere on the browser for synchronous access.
Do we have any evidence of a similar constraint for images? In our experience, hardware decoding for images has this problem where the setup time drawfs the decoding time, and the setup time need to happen per image (with a possible edge case when lots of images have the same dimensions/format maybe?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in Chrome, in the event of a gpu process crash, it is possible support for some formats is no longer available.
Otherwise I agree, we're just making this async for symmetry and a hypothetical. Maybe @jernoble or @aboba want to chime in to avoid a decision that would preclude any formats they might want to support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dalecurtis what formats aren't supported in Chrome when the GPU process crashes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the hypothetical world where a browser only has support for HEIC via a platform decoder, there is a case where repeated GPU crashes may put the browser into a software-rendering/no-gpu mode that prevents access to the platform decoder.
As a practical example, in some cases the platform decoder may have hard limits on the number of instances available and/or may not be working reliably (e.g., it's hanging or crashing too frequently) to the point that it's disabled at runtime. This happens today in Chrome on Android for H.264 support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think it's unlikely that a browser would choose to support an image format and not have a software fallback, but even if such a browser did exist the web author would still need to deal with decoding support going away after they've already checked for support withisTypeSupported()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Practically we don't have software fallback on Android for H.264 today -- so that at least is a real case. In the case where the codec goes away during usage the decoder would trigger a decoding error. The same would occur if the loss occurred between construction and the first decode call.
I think authors would have to handle this case regardless of if iTS is sync or not though. For security reasons (e.g., malicious invocation of the platform decoder), even if software fallback is available, it's unclear that automatic fallback is the right operation.
Ultimately my initial implementation was always synchronous, so if everyone wants this to be sync and isn't swayed by the hypotheticals I don't mind switching it back. I believe @chcunningham only suggested it for symmetry with the rest of the WebCodecs APIs.
Queue task to establish tracks upon construction. As this is no longer user driven, we replace the method with an attribute to let users know when the track list is "ready". Establishing tracks was previously user driven by a call to decodeMetadat(). After further consideration, this seems needlessly complex. Decoding track metadata is not resource intensive and we expect that most usage of ImageDecoder is such that they wish to decode actual frames right away. Users who wish to defer decoding track ImageData may still do so by deferring construction of the ImageDecoder. This commit also includes other small fixes (formatting, output timestamp and duration, and closing ImageDecoder if we fail to establish tracks once data is "complete").
We have quite a few feedback items already, and this is being circulated through various teams internally at Mozilla. Because of the type and number of comments we have, I think it's best to fix a few immediate things here (not a lot), merge, and then I'll open a series of issues on this repo, tagged with an appropriate tag ( That said, taking more than a month to review a fundamental new way to decode image on the web while the space hasn't really moved for years, while quite a few image formats are being added currently, all with the peculiarities and possible optimizations to make this a really compelling option for authors and unlocking new classes of apps is not exactly a long time. |
Thanks Paul! I look forward to your reports. Your proposal sgtm. While the PR may have only been out for a month, the API hasn't changed materially around the primary use case from the initial explainer and that's been circulating for over a year. |
Clicked by mistake! |
To clarify, do I understand correctly that you will soon private a short list of things to consider for immediate fixes? |
I was going to add a regular review here like usual for a couple of things yes. |
undefined reset(); | ||
undefined close(); | ||
|
||
static Promise<boolean> isTypeSupported(DOMString type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean HEIC? HEIF is supported in software today by Chrome and Firefox.
I went ahead and created a tag called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHA: f5a294b Reason: push, by @chcunningham Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
SHA: f5a294b Reason: push, by @chcunningham Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
SHA: f5a294b Reason: push, by @chcunningham Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Fixes #50.
Preview | Diff