VideoClips Assertion Error #1884

mjunyent · 2020-02-13T18:03:12Z

Hello,

I'm trying to load a big video. Following #1446 I used a VideoClips object, but it's crashing when trying to get clips with certain ids with this error:

AssertionError Traceback (most recent call last)
in ()
----> 1 x = video_clips.get_clip(1)

/usr/local/lib/python3.6/dist-packages/torchvision/datasets/video_utils.py in get_clip(self, idx)
324 video = video[resampling_idx]
325 info["video_fps"] = self.frame_rate
--> 326 assert len(video) == self.num_frames, "{} x {}".format(video.shape, self.num_frames)
327 return video, audio, info, video_idx

AssertionError: torch.Size([0, 1, 1, 3]) x 32

The code I use is just this:

from torchvision.datasets.video_utils import VideoClips
video_clips = VideoClips(["test_video.mp4"], clip_length_in_frames=32, frames_between_clips=32)
for i in range(video_clips.num_clips()):
    x = video_clips.get_clip(i)

video_clips.num_clips() is much bigger than the ids that are failing. Changing the clipt_length or frames_between doesn't help.

Checking the code I see [0,1,1,3] is returned by read_video when no vframes are read:

vision/torchvision/io/video.py

Lines 251 to 254 in 85b8fbf

    
           if vframes: 
        
               vframes = torch.as_tensor(np.stack(vframes)) 
        
           else: 
        
               vframes = torch.empty((0, 1, 1, 3), dtype=torch.uint8)

But, for some clip ids and clip_lengths it's just that the sizes don't match, as the assertion error is something like this AssertionError: torch.Size([19, 360, 640, 3]) x 128

I followed the issue to _read_from_stream and checked no AV exceptions where raised. And running this part of the function:

vision/torchvision/io/video.py

Lines 144 to 150 in 85b8fbf

    
           for idx, frame in enumerate(container.decode(**stream_name)): 
        
               frames[frame.pts] = frame 
        
               if frame.pts >= end_offset: 
        
                   if should_buffer and buffer_count < max_buffer_size: 
        
                       buffer_count += 1 
        
                       continue 
        
                   break

I saw that for an start_pts=32032, end_pts=63063 it returned just one frame on frames with pts=237237. Which is later discarted as it's a lot bigger than end_pts.

Also, the stream.time_base is Fraction(1, 24000) which doesn't match the start and end pts provided by VideoClips.

So it seems there is a problem with the seeking on my video. But it has a standard h264 encoding and I have no problem reading it sequentially with pyav.
I'm wondering if I'm doing something wrong or there might be an issue with the read_video seeking (as the warning says it should be using seconds?).

This is the video info according to ffmpeg:

Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: mp42isom
creation_time : 2016-10-10T15:36:46.000000Z
Duration: 00:21:24.37, start: 0.000000, bitrate: 1002 kb/s
Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 640x360 [SAR 1:1 DAR 16:9], 900 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default)
Metadata:
handler_name : Telestream Inc. Telestream Media Framework - Release TXGP 2016.42.192059
encoder : AVC
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 93 kb/s (default)
Metadata:
handler_name : Telestream Inc. Telestream Media Framework - Release TXGP 2016.42.192059

Thanks!

The text was updated successfully, but these errors were encountered:

bjuncek · 2020-03-05T12:47:55Z

Hello and thank you for the thorough analysis.

This issue seems like a corrupted file, but as you say, FFMPEG info looks ok.
Have you tried using a different backend ('video_reader' vs 'pyav')? That saved my ass in one case at least.

Best,
Bruno

fepegar · 2020-05-06T16:17:36Z

I'm having a similar problem:

Traceback (most recent call last):
  File "/Users/fernando/git/sudep/scripts/infer_video_kinetics.py", line 99, in <module>
    sample = dataset[i]
  File "/Users/fernando/git/sudep/scripts/infer_video_kinetics.py", line 74, in __getitem__
    video, audio, info, video_idx = self.video_clips.get_clip(idx)
  File "/usr/local/Caskroom/miniconda/base/envs/sudep/lib/python3.6/site-packages/torchvision/datasets/video_utils.py", line 367, in get_clip
    video.shape, self.num_frames
AssertionError: torch.Size([6, 128, 228, 3]) x 8

I'm iterating over a Dataset built using VideoClips. The error happens while retrieving sample number 156 out of 174, so it's not the end of the video. For now, I just commented out the assertion, but this way I can't use a DataLoader because the samples will have different size.

fepegar · 2020-05-06T16:18:37Z

I haven't been able to try with video_reader:

/usr/local/Caskroom/miniconda/base/envs/sudep/lib/python3.6/site-packages/torchvision/__init__.py:64: UserWarning: video_reader video backend is not available
  warnings.warn("video_reader video backend is not available")

bjuncek · 2020-05-07T09:28:45Z

@fmassa this seems like a problem similar to what I had on my devmachine which I have attributed to the overall messiness of my conda installation and such: namely, I've had several issues where a standard install would not build the video_reader and I'd have to
a) manually install the dependencies, and
b) build TV from source

note: often a few iterations of a) and b) before everything was working properly

@fepegar can you confirm that this is what's happening?
If so, I'll try get a clean repro for this and see if I can tackle the build system for this.

Thanks and best wishes,
Bruno

fepegar · 2020-05-07T09:39:19Z

@fepegar can you confirm that this is what's happening?

I'm not sure exactly what you'd like me to confirm 😅

I'm on macOS, ran this:

$ conda create -n tv python -y && conda activate tv && pip install torch torchvision
$ python -c "import torchvision; torchvision.set_video_backend('video_reader')"

And got the above message. I'll investigate further. But I feel like this discussion should maybe move to a new issue.

fepegar · 2020-05-07T09:45:48Z

My value of ext_specs is None here, in case it helps.

vision/torchvision/io/_video_opt.py

Line 24 in 7a36388

ext_specs = extfinder.find_spec("video_reader")

I just tried building from source, but I'm still not able to set the video_reader backend.

zhangguanheng66 · 2020-05-07T13:57:54Z

@fmassa do you have a idea about the issue?

mjunyent · 2020-05-07T14:58:37Z

Hello,

We digged a bit more in this and found that setting should_buffer to True fixes the issue:

vision/torchvision/io/video.py

Line 110 in 85b8fbf

should_buffer = False

The problem is in this section that reads the frames:

vision/torchvision/io/video.py

Lines 144 to 150 in 85b8fbf

    
           for idx, frame in enumerate(container.decode(**stream_name)): 
        
               frames[frame.pts] = frame 
        
               if frame.pts >= end_offset: 
        
                   if should_buffer and buffer_count < max_buffer_size: 
        
                       buffer_count += 1 
        
                       continue 
        
                   break

PTS might not be read in order and this causes the break to happen before all the relevant frames have been read.

For example in our case our end_offset is 15 but first a frame with PTS 15 is received and then one with PTS 14. So we hit the break without reading frame 14 and we crash latter on the assert for size.

It seems this can happen with AVI videos, I found this discussion on PyAV relevant PyAV-Org/PyAV#534. We confirm we are in a similar case, our AVI video has frames without PTS as it is not strictly required.

Setting the should_buffer to true seems a good solution, is there any reason why this is set to false or not exposed as a parameter? Another solution could be doing a hard compare frame.pts == end_offset i'm not fully sure if this always happens, but if end_offset is chosen as in VideoClips (selecting keyframes) it should work too.

fmassa · 2020-05-11T12:46:27Z

Hi @mjunyent

Thanks for the investigation!

We could make should_buffer be True by default. This would have a small impact on runtime speed though, but might be better to do this in order to avoid those corner-case issues.

The issue I found with empty pts was due to packed b-frames in DivX, but that was the only case I found for this type of video. I agree that the handling for this is very fragile though.

If you could do some performance benchmarks comparing the runtime penalty of always setting should_buffer to True and without, and the results are not much slower, could you send a PR setting should_buffer to True?

fepegar · 2020-05-14T14:34:57Z

Thanks!

Shall I create an issue about video_reader not being available or you think it's been fixed in #2183?

fmassa · 2020-05-14T18:07:24Z

@fepegar please open a new issue. I hope it was fixed with #2183 , so if you try that first it would be graet.

fepegar · 2020-06-11T18:52:22Z

I'm still having this issue (on Linux). I'm using version 0.6.0 and I set should_buffer = True.

My video:

$ ffprobe 006_01_L.mp4                                                                        
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '006_01_L.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2mp41
    encoder         : Lavf57.83.100
  Duration: 00:01:52.13, start: 0.000000, bitrate: 691 kb/s
    Stream #0:0(und): Video: hevc (Rext) (hev1 / 0x31766568), yuv444p(tv, progressive), 640x360, 558 kb/s, 15 fps, 15 tbr, 15360 tbn, 15 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : SoundHandler

fepegar · 2020-07-22T18:59:54Z

Should I open a new issue for this?

GELIELEO · 2021-10-22T14:25:39Z

@fepegar Hi, do you solve this probem?

GELIELEO · 2021-10-22T14:25:58Z

@mjunyent Hi, do you solve this probem?

mhubii · 2021-11-16T17:30:35Z

I still run into this issue

fmassa · 2021-11-18T13:03:38Z

This is still an issue that was recently re-introduced in #3791

This is the same problem as #4839 and #4112

Raising the priority to high because it's been broken for several months already

jramapuram · 2022-02-26T02:48:51Z

Still seeing this error.

prabhat00155 · 2022-02-26T23:08:05Z

Still seeing this error.

@jramapuram Could you please confirm if #5489 fixes your error?

miltonllera · 2022-10-18T13:14:49Z

I am running into a similar issue, where the VideoClips instance is returns exactly one more frame than expected (tested with several values).

I am using PyAV as a backend on torch=1.12.1 and torchvision=0.12.0. Dataset is Kinetics downloaded form the S3 bucket referenced in the Kinetics dataset class.

I have no idea of how to solve this, or if it's even a problem. I could just drop the last frame, but that doesn't seem like what I should do.

cpuhrsch assigned fmassa Mar 12, 2020

mjunyent mentioned this issue May 11, 2020

Set should_buffer to True by default in _read_from_stream video.py #2201

Merged

fmassa closed this as completed in #2201 May 14, 2020

fepegar mentioned this issue May 14, 2020

UserWarning: video_reader video backend is not available #2216

Closed

fmassa reopened this Nov 18, 2021

fmassa assigned prabhat00155 and bjuncek and unassigned fmassa Nov 18, 2021

fmassa added bug high priority module: video labels Nov 18, 2021

pytorch-probot bot added the triage review label Nov 18, 2021

prabhat00155 mentioned this issue Feb 26, 2022

Fix shape mismatch error #5489

Merged

bjuncek mentioned this issue Apr 1, 2022

2022: state of video IO in torchvision #5720

Open

18 tasks

datumbox closed this as completed in #5489 Apr 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VideoClips Assertion Error #1884

VideoClips Assertion Error #1884

mjunyent commented Feb 13, 2020

bjuncek commented Mar 5, 2020

fepegar commented May 6, 2020

fepegar commented May 6, 2020

bjuncek commented May 7, 2020

fepegar commented May 7, 2020 •

edited

Loading

fepegar commented May 7, 2020 •

edited

Loading

zhangguanheng66 commented May 7, 2020

mjunyent commented May 7, 2020

fmassa commented May 11, 2020

fepegar commented May 14, 2020

fmassa commented May 14, 2020

fepegar commented Jun 11, 2020

fepegar commented Jul 22, 2020

GELIELEO commented Oct 22, 2021

GELIELEO commented Oct 22, 2021

mhubii commented Nov 16, 2021

fmassa commented Nov 18, 2021 •

edited

Loading

jramapuram commented Feb 26, 2022

prabhat00155 commented Feb 26, 2022

miltonllera commented Oct 18, 2022

VideoClips Assertion Error #1884

VideoClips Assertion Error #1884

Comments

mjunyent commented Feb 13, 2020

bjuncek commented Mar 5, 2020

fepegar commented May 6, 2020

fepegar commented May 6, 2020

bjuncek commented May 7, 2020

fepegar commented May 7, 2020 • edited Loading

fepegar commented May 7, 2020 • edited Loading

zhangguanheng66 commented May 7, 2020

mjunyent commented May 7, 2020

fmassa commented May 11, 2020

fepegar commented May 14, 2020

fmassa commented May 14, 2020

fepegar commented Jun 11, 2020

fepegar commented Jul 22, 2020

GELIELEO commented Oct 22, 2021

GELIELEO commented Oct 22, 2021

mhubii commented Nov 16, 2021

fmassa commented Nov 18, 2021 • edited Loading

jramapuram commented Feb 26, 2022

prabhat00155 commented Feb 26, 2022

miltonllera commented Oct 18, 2022

fepegar commented May 7, 2020 •

edited

Loading

fepegar commented May 7, 2020 •

edited

Loading

fmassa commented Nov 18, 2021 •

edited

Loading