Latest PyAV version breaks torchvision.io.write_video when writing video with audio #6814

nateraw · 2022-10-22T03:44:13Z

🐛 Describe the bug

Description

It seems the latest PyPi release of PyAV (av==10.0.0) breaks torchvision.io.write_video when writing a video containing audio. If you call write_video without audio array, it still seems to work fine - it's just when an audio array is provided that it breaks.

Reproducible Example

Here is a colab notebook reproducing the issue:
Here is that same notebook as GitHub gist: https://gist.github.com/nateraw/1123039fc90cefd90d282cf6c297d1d2

Strack Trace

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-3-28056a084160> in <module>
      6     audio_fps=44100,
      7     audio_codec='aac',
----> 8     audio_options=None
      9 )

/usr/local/lib/python3.7/dist-packages/torchvision/io/video.py in write_video(filename, video_array, fps, video_codec, options, audio_array, audio_fps, audio_codec, audio_options)
    114             num_channels = audio_array.shape[0]
    115             audio_layout = "stereo" if num_channels > 1 else "mono"
--> 116             audio_sample_fmt = container.streams.audio[0].format.name
    117 
    118             format_dtype = np.dtype(audio_format_dtypes[audio_sample_fmt])

IndexError: tuple index out of range

Temporary Solution

Downgrading to av==9.2.0 fixed the issue for me.

Versions

PyTorch version: 1.12.1+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.22.6
Libc version: glibc-2.26

Python version: 3.7.15 (default, Oct 12 2022, 19:14:55)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.10.133+-x86_64-with-Ubuntu-18.04-bionic
Is CUDA available: False
CUDA runtime version: 11.2.152
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.6
[pip3] torch==1.12.1+cu113
[pip3] torchaudio==0.12.1+cu113
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.13.1
[pip3] torchvision==0.13.1+cu113
[conda] Could not collect

The text was updated successfully, but these errors were encountered:

pmeier · 2022-10-24T05:36:05Z

Duplicate of #6790. Keeping this open for now, because it includes reproduction.

YosuaMichael · 2022-10-24T15:58:59Z

Thanks for the report and the reproduction @nateraw!
Looking at the problem, I debug a bit and I could fix this particular problem by replacing this line to:

# old: audio_sample_fmt = container.streams.audio[0].format.name
audio_sample_fmt = container.streams[1].format.name

And after inspecting the resulting container, I notice that seems like there is a problem for the stream type during add stream since all the streams go to other (which is unexpected):

print(container.streams.audio)
# ()

print(container.streams.video)
# ()

print(container.streams.other)
# (<av.Stream #0 video/libx264 at 0x12ad48a40>, <av.Stream #1 audio/aac at 0x12ad49800>)

Note that in av==9.2 we got the following output:

print(container.streams.audio)
# (<av.AudioStream #1 aac at 44100Hz, stereo, fltp at 0x149cd2020>)

print(container.streams.video)
# (<av.VideoStream #0 libx264, yuv420p 320x240 at 0x149cd1fc0>,)

print(container.streams.other)
# ()

I think this is a bug in pyav on classification on the stream type, but I am not so sure myself. I will try creating issue on pyav for this.

cc @bjuncek @jdsgomes that have more insight on video decoder.

YosuaMichael · 2023-01-31T13:22:26Z

An update on this issue, seems like it has been fixed on pyav in this PR. See this issue for more detail.

nateraw · 2023-01-31T18:53:36Z

Nice!! thanks for the update @YosuaMichael :)

So after they make a release, what will be done here? I noticed the lines that check the installed version simply link to GitHub. We will want to warn not to install 10.0.0, right?

Lines I'm referring to:

vision/torchvision/io/video.py

Lines 15 to 36 in 7cf0f4c

    
           try: 
        
               import av 
        
               av.logging.set_level(av.logging.ERROR) 
        
               if not hasattr(av.video.frame.VideoFrame, "pict_type"): 
        
                   av = ImportError( 
        
                       """\ 
        
           Your version of PyAV is too old for the necessary video operations in torchvision. 
        
           If you are on Python 3.5, you will have to build from source (the conda-forge 
        
           packages are not up-to-date).  See 
        
           https://github.com/mikeboers/PyAV#installation for instructions on how to 
        
           install PyAV on your system. 
        
           """ 
        
                   ) 
        
           except ImportError: 
        
               av = ImportError( 
        
                   """\ 
        
           PyAV is not installed, and is necessary for the video operations in torchvision. 
        
           See https://github.com/mikeboers/PyAV#installation for instructions on how to 
        
           install PyAV on your system. 
        
           """ 
        
               )

pmeier · 2023-02-01T07:47:47Z

Yeah, it seems we could simply add a version check for av==10.0.0 there. Not sure if we want to error out or just raise a warning.

bjuncek · 2023-02-28T16:03:11Z

Update: for now, the pypi/conda-forge pyav wheels still contain this bug.
I'll revise the setup to avoid 10.0.0 until further notice.

check if the PR fixes CI build and solves this issues
close when new pyav is released
remove pin

pmeier assigned bjuncek Oct 24, 2022

pmeier added module: video dependency issue duplicate labels Oct 24, 2022

pmeier mentioned this issue Oct 24, 2022

av==10.0.0 breaks CI #6790

Closed

YosuaMichael mentioned this issue Oct 24, 2022

[av v10.0] The output container does not put audio stream under audio but under other instead PyAV-Org/PyAV#1044

Closed

6 tasks

pmeier mentioned this issue Feb 3, 2023

Add Python 3.11 Linux CPU Unittesting #7155

Closed

metal3d mentioned this issue Apr 21, 2023

container.streams.audio[0].format.name is empty #7534

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latest PyAV version breaks torchvision.io.write_video when writing video with audio #6814

Latest PyAV version breaks torchvision.io.write_video when writing video with audio #6814

nateraw commented Oct 22, 2022

pmeier commented Oct 24, 2022

YosuaMichael commented Oct 24, 2022 •

edited

Loading

YosuaMichael commented Jan 31, 2023

nateraw commented Jan 31, 2023

pmeier commented Feb 1, 2023

bjuncek commented Feb 28, 2023 •

edited

Loading

Latest PyAV version breaks torchvision.io.write_video when writing video with audio #6814

Latest PyAV version breaks torchvision.io.write_video when writing video with audio #6814

Comments

nateraw commented Oct 22, 2022

🐛 Describe the bug

Description

Reproducible Example

Strack Trace

Temporary Solution

Versions

pmeier commented Oct 24, 2022

YosuaMichael commented Oct 24, 2022 • edited Loading

YosuaMichael commented Jan 31, 2023

nateraw commented Jan 31, 2023

pmeier commented Feb 1, 2023

bjuncek commented Feb 28, 2023 • edited Loading

YosuaMichael commented Oct 24, 2022 •

edited

Loading

bjuncek commented Feb 28, 2023 •

edited

Loading