-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inaccurate seeking #3
Comments
Hi! 😃Terribly sorry about the late reply!! If pyav solves this issue, then we would be happy to use it instead of opencv as a backend. We have not noticed your specific issue before but are aware that seeking is a bit imprecise with opencv. Would you be able to share a video we can use to test pyav? Thanks! |
Hi Jan, no worries. Fun timing: I just put sth quick together yesterday. It's not as versatile as napari-video, so I am happy to fold anything you find useful into napari-video if there is interest. https://github.com/danionella/napari-pyav Example video: they are all large, so let me see if I can make a small one with the same problem. Also, note it's hard to see the problem unless you have a reference. We noticed it only during annotation and when seeking back and forth. |
That looks perfect I think this is much cleaner and more focused than what we have atm. And probably better suited for the use case of working with videos in napari. Would love to incorporate this into napari-video. Wanna open a pull request? I would then also make you a co-owner of the repo and move the package to a new shared organization... |
Thank you for the offer. No need to move it and happy to create a branch if you give me rights. Being listed as contributor is more than enough credit. However, I should caution against replacing the old approach without further testing. For example, we only use grayscale videos (more specifically, h264 .mp4) and haven't tested rgb (should work, but just giving an example). Also there is a strict timestamp check to ensure reproducible seeking, but I could imagine it might simply fail for videos with variable frame rate or other peculiarities – raising an exception. We like that, but others might prefer a plugin that gives you a frame no matter what, even if it might be off by a few ms. The nice thing about opencv is that it is high level and others have done the job of testing various formats with it. |
Hi both, just chiming in here to say that I'm really interested in these developments, as a I gave Also, apart from |
Hello friends! A general primer on frame-accurate reliable seekingThis is hard for a bunch of reasons, the primary one being that different video container formats have varying support for precise seeking depending on how they store timestamps / seek points / keyframes. Video container formats most commonly used for behavior are MP4 and AVI, but this is different from the video encoder codec which the algorithm that defines how the pixel data is represented and compressed. Both MP4 and AVI support multiple different codecs, which makes things even more complicated. MP4 containers are ideal for our purposes. MP4 files have an index of keyframes (I-frames), which makes seeking to those fast and precise. Once you're there, MP4 stores frames with a combination of moov atoms and mdata atoms. The moov atoms contain timestamp metadata (and other stuff) which makes precise indexing way easier, while the mdata atoms contain the actual encoded image data. AVI containers have an index of keyframes, but not the intermediate metadata, which makes it tricky to seek precisely unless you make some assumptions about the preciseness of frame rate, or you have every frame stored as a keyframe (e.g., MJPEG). The specifics of the codec will affect the precise performance and seeking strategy. In general, the idea is to seek to the nearest keyframe, then decode forwards or backwards until you hit the exact frame index you want. In FFMPEG, this strategy is enabled through the In practice, this is trickier due to how non-keyframe data is stored and dependent on each other, and the fuzziness in the specification of different codecs and container formats. So why do we get inconsistent results when seeking?In short: most software that writes video files are bad. One big stumbling block is that seeking is all designed around timestamps, not frame indices. This means that we have to use the FPS to calculate the timestamp corresponding to our frame. As you might imagine, this creates big issues if the video was recorded with a variable frame rate, stored at a variable frame rate, or with incorrect timing information. Depending on the software used to encode the video, it might try to store the precise time that the frame was received by the camera, by the encoder process, or not at all! When using free running cameras rather than externally triggered ones, this gets annoying real fast. Other issues are gnarlier:
A major culprit of a lot of these is Could we just check when videos are bad then?Ideally, if we could detect some of these conditions, we could try to mitigate for them (e.g., use a slower frame-by-frame decoding strategy with integrity checks and no skipping), but this is a huge pain in the ass considering how many edge cases and combinations there are. In principle, you could use Other issues like missing or corrupted timestamps of keyframe indices are detectable by parsing the Even assuming you could detect these issues, it's not clear that frame-by-frame reading is the answer (and certainly isn't when we want to do a precise random seek). Can you just force a video to be seekable?Since these are pretty annoying technical issues, our recommended solution is to just re-encode your video if you have any seekability issues. From the SLEAP docs:
The param->analyse.inter = X264_ANALYSE_I8x8|X264_ANALYSE_I4x4;
param->analyse.i_me_method = X264_ME_DIA;
param->analyse.i_subpel_refine = 1;
param->i_frame_reference = 1;
param->analyse.b_mixed_references = 0;
param->analyse.i_trellis = 0;
param->rc.b_mb_tree = 0;
param->analyse.i_weighted_pred = X264_WEIGHTP_NONE;
param->rc.i_lookahead = 0; These will overwrite the defaults for the codec. I haven't tested these extensively, but these are the ones that are likely to interact with seekability:
We've put a lot of work into just directly encoding to this preset at acquisition time to avoid having to recompress data, but I appreciate that this isn't always possible depending on the acquisition software and available bindings. What about decoding then?Since we don't have much control over how people encode their videos, what should we use to maximize compatibility on the decoding side? Ideally, we'd like to use something that is both fast AND supports the accurate seeking strategies outlined above (failing gracefully when it can't). The most feature-complete implementation of all of this complex decoding logic is definitely The most commonly used high-level reader that uses While it has the advantage that it can decode the video bytes in-memory, it comes with the rest of the In principle, we could just use Recently, however, we dropped So what's the best solution? We now default to the tried-and-true (if a bit janky) pipe protocol. This works by doing There are like 1000 wrappers for doing this, but we use The only thing that feels icky is having to spin up a subprocess and communicate via pipes. This is unavoidable, but the nice thing is that every OS has solid support for pipes, with overall performance (while governed by multiple factors) generally following the trend of Linux > macOS > Windows. FFMPEG itself also has a pretty fast startup time, so you probably won't feel that overhead. Empirically, when comparing this approach to Takeaways
Hope this deep dive is helpful! I'll try to cross link to here from other places too for findability. |
Hi @talmo, thank you for this very helpful summary! I'm a big fan of SLEAP and you obviously have a lot of experience working with a wide variety of sources. I'm very inclined to go with your recommendation, but what still holds me back is the speed – at least in my perhaps naive test – of imageio. Here is a quick comparison with an h264 encoded mp4 (on ubuntu linux): Is there a faster approach for frame-accurate seeking with imageio-ffmpeg? Or do you think another wrapper might be faster? EDIT: I should add that FastVideoReader is the PyAV-based class currently being used in napari-pyav |
@niksirbi I guess the future of napari-pyav is a bit uncertain at this stage, but I'd be curious to hear more about the issue (feel very free to open one, especially if you can link to a file that fails) |
Thanks for the in-depth explanation of the issues, @talmo! To me, it looks like using imageio with pyav as the default backend (offering ffmpeg as an option) would be the way forward. It's seems sufficiently fast and accurate for working with videos in napari. And it should be straightforward to implement. Seems like @bjudkewitz's FastVideoReader is faster but seems to fail for some videos. Not sure why that is - could look into this in the future to make FastVideoReader work with more video formats. |
Some thoughts on minimal criteria for a video reader in data analysis:
For our work, I would say we need a reader that fulfills at least criteria 1 and 2 (if some frames are out of order it would be acceptable, as long as they are displayed deterministically). For others, all three might be important. Based on experience opencv can fail on criteria (1) and (2). A superficial inspection of imageio-pyav makes me think it could fail silently on (2), but I'd need to inspect it a bit more. Linking to the relevant sections of imageio-pyav seek and read code. Seeking constant frame rate videos in imageio-pyav is quite similar to FastVideoReader, except that the latter fails if the resulting frame has a different pts from the one requested. One could turn the exception into a mere warning and return the frame anyway. I suspect this would work similarly to how imageio-pyav works now (except with a warning when things are wrong). imageio-pyav handles variable frame rate videos in a different way: it just rewinds to the first frame and reads frame by frame until the target. That's obviously very slow for frames that are not at the very beginning. FastVideoReader currently doesn't handle variable frame rate and would likely fail due to a pts mismatch. It would be easy to make it handle it like imageio-pyav, but I'm inclined to disallow seeking in such cases and just emit a warning that gives instructions on transcoding (basically Talmo's superfast recipe above). Maybe we should do the same if there are b-frames present in the stream. PyAV exposes this information via The main difference between the two pyav approaches is how the decoded frame is converted into an rgb numpy array. Imageo-pyav seems to use an ffmpeg filter pipeline, while FastVideoReader uses libav's pretty efficient sws_scale (frame_obj.to_ndarray() in pyav). I suspect this is the main reason why their code is slower. I don't quite understand why they choose the less efficient route, except maybe compatibility across their plugin options. |
I've opened an issue here: danionella/napari-pyav#1 It contains a link to the offending video, to help with debugging/testing. |
Thanks a lot @talmo and @bjudkewitz for the detailed explanations, they have definitely improved my understanding of the problem. This kind of information should be made available as a doi-ed public document, to be honest. Lots of people in the field struggle with similar issues. I hope this discussion helps us move towards a functional napari plugin for video playback. |
Thanks @talmo and @bjudkewitz for your input! It seems there are solutions for robust and accurate video reading, though currently only in Rust. Given that existing Python solutions are imperfect at best, developing a robust and fast video reading framework in Python would benefit us all. Rerun appears to have found some viable solutions that we might be able to adapt. Regarding napari-video, I suggest a two-phase approach:
I will look into implementing step 1 but thIngs are busy around here atm so this will probably not happen in the very near future, so any help would be greatly appreciated! |
Hi all, in case this is useful to anyone: we've consolidated some much-used video (and hdf5) IO tools under https://github.com/danionella/daio (see |
Thanks a lot for making this plugin! I noticed that seeking can be off by a few frames. This becomes apparent when using napari_video for annotation. You can then notice that the annotation layer and video suddenly get out of sync when you seek to another frame and come back. I assume this is due to a known limitation of OpenCV. opencv/opencv#9053
It is possible that this is worse for some codecs than others, but it is a problem for our videos. Has this been observed by others? Is there any interest in switching this plugin e.g. to pyav?
The text was updated successfully, but these errors were encountered: