Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VideoClips Assertion Error #1884

Closed
mjunyent opened this issue Feb 13, 2020 · 20 comments · Fixed by #2201 or #5489
Closed

VideoClips Assertion Error #1884

mjunyent opened this issue Feb 13, 2020 · 20 comments · Fixed by #2201 or #5489

Comments

@mjunyent
Copy link
Contributor

Hello,

I'm trying to load a big video. Following #1446 I used a VideoClips object, but it's crashing when trying to get clips with certain ids with this error:

AssertionError Traceback (most recent call last)
in ()
----> 1 x = video_clips.get_clip(1)

/usr/local/lib/python3.6/dist-packages/torchvision/datasets/video_utils.py in get_clip(self, idx)
324 video = video[resampling_idx]
325 info["video_fps"] = self.frame_rate
--> 326 assert len(video) == self.num_frames, "{} x {}".format(video.shape, self.num_frames)
327 return video, audio, info, video_idx

AssertionError: torch.Size([0, 1, 1, 3]) x 32

The code I use is just this:

from torchvision.datasets.video_utils import VideoClips
video_clips = VideoClips(["test_video.mp4"], clip_length_in_frames=32, frames_between_clips=32)
for i in range(video_clips.num_clips()):
    x = video_clips.get_clip(i)

video_clips.num_clips() is much bigger than the ids that are failing. Changing the clipt_length or frames_between doesn't help.

Checking the code I see [0,1,1,3] is returned by read_video when no vframes are read:

if vframes:
vframes = torch.as_tensor(np.stack(vframes))
else:
vframes = torch.empty((0, 1, 1, 3), dtype=torch.uint8)

But, for some clip ids and clip_lengths it's just that the sizes don't match, as the assertion error is something like this AssertionError: torch.Size([19, 360, 640, 3]) x 128

I followed the issue to _read_from_stream and checked no AV exceptions where raised. And running this part of the function:

for idx, frame in enumerate(container.decode(**stream_name)):
frames[frame.pts] = frame
if frame.pts >= end_offset:
if should_buffer and buffer_count < max_buffer_size:
buffer_count += 1
continue
break

I saw that for an start_pts=32032, end_pts=63063 it returned just one frame on frames with pts=237237. Which is later discarted as it's a lot bigger than end_pts.

Also, the stream.time_base is Fraction(1, 24000) which doesn't match the start and end pts provided by VideoClips.

So it seems there is a problem with the seeking on my video. But it has a standard h264 encoding and I have no problem reading it sequentially with pyav.
I'm wondering if I'm doing something wrong or there might be an issue with the read_video seeking (as the warning says it should be using seconds?).

This is the video info according to ffmpeg:

Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: mp42isom
creation_time : 2016-10-10T15:36:46.000000Z
Duration: 00:21:24.37, start: 0.000000, bitrate: 1002 kb/s
Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 640x360 [SAR 1:1 DAR 16:9], 900 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default)
Metadata:
handler_name : Telestream Inc. Telestream Media Framework - Release TXGP 2016.42.192059
encoder : AVC
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 93 kb/s (default)
Metadata:
handler_name : Telestream Inc. Telestream Media Framework - Release TXGP 2016.42.192059

Thanks!

@bjuncek
Copy link
Contributor

bjuncek commented Mar 5, 2020

Hello and thank you for the thorough analysis.

This issue seems like a corrupted file, but as you say, FFMPEG info looks ok.
Have you tried using a different backend ('video_reader' vs 'pyav')? That saved my ass in one case at least.

Best,
Bruno

@fepegar
Copy link
Contributor

fepegar commented May 6, 2020

I'm having a similar problem:

Traceback (most recent call last):
  File "/Users/fernando/git/sudep/scripts/infer_video_kinetics.py", line 99, in <module>
    sample = dataset[i]
  File "/Users/fernando/git/sudep/scripts/infer_video_kinetics.py", line 74, in __getitem__
    video, audio, info, video_idx = self.video_clips.get_clip(idx)
  File "/usr/local/Caskroom/miniconda/base/envs/sudep/lib/python3.6/site-packages/torchvision/datasets/video_utils.py", line 367, in get_clip
    video.shape, self.num_frames
AssertionError: torch.Size([6, 128, 228, 3]) x 8

I'm iterating over a Dataset built using VideoClips. The error happens while retrieving sample number 156 out of 174, so it's not the end of the video. For now, I just commented out the assertion, but this way I can't use a DataLoader because the samples will have different size.

@fepegar
Copy link
Contributor

fepegar commented May 6, 2020

I haven't been able to try with video_reader:

/usr/local/Caskroom/miniconda/base/envs/sudep/lib/python3.6/site-packages/torchvision/__init__.py:64: UserWarning: video_reader video backend is not available
  warnings.warn("video_reader video backend is not available")

@bjuncek
Copy link
Contributor

bjuncek commented May 7, 2020

@fmassa this seems like a problem similar to what I had on my devmachine which I have attributed to the overall messiness of my conda installation and such: namely, I've had several issues where a standard install would not build the video_reader and I'd have to
a) manually install the dependencies, and
b) build TV from source

note: often a few iterations of a) and b) before everything was working properly

@fepegar can you confirm that this is what's happening?
If so, I'll try get a clean repro for this and see if I can tackle the build system for this.

Thanks and best wishes,
Bruno

@fepegar
Copy link
Contributor

fepegar commented May 7, 2020

@fepegar can you confirm that this is what's happening?

I'm not sure exactly what you'd like me to confirm 😅

I'm on macOS, ran this:

$ conda create -n tv python -y && conda activate tv && pip install torch torchvision
$ python -c "import torchvision; torchvision.set_video_backend('video_reader')"

And got the above message. I'll investigate further. But I feel like this discussion should maybe move to a new issue.

@fepegar
Copy link
Contributor

fepegar commented May 7, 2020

My value of ext_specs is None here, in case it helps.

ext_specs = extfinder.find_spec("video_reader")

I just tried building from source, but I'm still not able to set the video_reader backend.

@zhangguanheng66
Copy link
Contributor

@fmassa do you have a idea about the issue?

@mjunyent
Copy link
Contributor Author

mjunyent commented May 7, 2020

Hello,

We digged a bit more in this and found that setting should_buffer to True fixes the issue:

should_buffer = False

The problem is in this section that reads the frames:

for idx, frame in enumerate(container.decode(**stream_name)):
frames[frame.pts] = frame
if frame.pts >= end_offset:
if should_buffer and buffer_count < max_buffer_size:
buffer_count += 1
continue
break

PTS might not be read in order and this causes the break to happen before all the relevant frames have been read.

For example in our case our end_offset is 15 but first a frame with PTS 15 is received and then one with PTS 14. So we hit the break without reading frame 14 and we crash latter on the assert for size.

It seems this can happen with AVI videos, I found this discussion on PyAV relevant PyAV-Org/PyAV#534. We confirm we are in a similar case, our AVI video has frames without PTS as it is not strictly required.

Setting the should_buffer to true seems a good solution, is there any reason why this is set to false or not exposed as a parameter? Another solution could be doing a hard compare frame.pts == end_offset i'm not fully sure if this always happens, but if end_offset is chosen as in VideoClips (selecting keyframes) it should work too.

@fmassa
Copy link
Member

fmassa commented May 11, 2020

Hi @mjunyent

Thanks for the investigation!

We could make should_buffer be True by default. This would have a small impact on runtime speed though, but might be better to do this in order to avoid those corner-case issues.

The issue I found with empty pts was due to packed b-frames in DivX, but that was the only case I found for this type of video. I agree that the handling for this is very fragile though.

If you could do some performance benchmarks comparing the runtime penalty of always setting should_buffer to True and without, and the results are not much slower, could you send a PR setting should_buffer to True?

@fepegar
Copy link
Contributor

fepegar commented May 14, 2020

Thanks!

Shall I create an issue about video_reader not being available or you think it's been fixed in #2183?

@fmassa
Copy link
Member

fmassa commented May 14, 2020

@fepegar please open a new issue. I hope it was fixed with #2183 , so if you try that first it would be graet.

@fepegar
Copy link
Contributor

fepegar commented Jun 11, 2020

I'm still having this issue (on Linux). I'm using version 0.6.0 and I set should_buffer = True.

My video:

$ ffprobe 006_01_L.mp4                                                                        
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '006_01_L.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2mp41
    encoder         : Lavf57.83.100
  Duration: 00:01:52.13, start: 0.000000, bitrate: 691 kb/s
    Stream #0:0(und): Video: hevc (Rext) (hev1 / 0x31766568), yuv444p(tv, progressive), 640x360, 558 kb/s, 15 fps, 15 tbr, 15360 tbn, 15 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : SoundHandler

@fepegar
Copy link
Contributor

fepegar commented Jul 22, 2020

Should I open a new issue for this?

@GELIELEO
Copy link

@fepegar Hi, do you solve this probem?

@GELIELEO
Copy link

@mjunyent Hi, do you solve this probem?

@mhubii
Copy link

mhubii commented Nov 16, 2021

I still run into this issue

@fmassa
Copy link
Member

fmassa commented Nov 18, 2021

This is still an issue that was recently re-introduced in #3791

This is the same problem as #4839 and #4112

Raising the priority to high because it's been broken for several months already

@jramapuram
Copy link

Still seeing this error.

@prabhat00155
Copy link
Contributor

Still seeing this error.

@jramapuram Could you please confirm if #5489 fixes your error?

@miltonllera
Copy link

I am running into a similar issue, where the VideoClips instance is returns exactly one more frame than expected (tested with several values).

I am using PyAV as a backend on torch=1.12.1 and torchvision=0.12.0. Dataset is Kinetics downloaded form the S3 bucket referenced in the Kinetics dataset class.

I have no idea of how to solve this, or if it's even a problem. I could just drop the last frame, but that doesn't seem like what I should do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment