Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed missing audio with video_reader backend #3934

Merged
merged 6 commits into from
Jun 15, 2021

Conversation

prabhat00155
Copy link
Contributor

Resolves #3890.

video_path = "data/WUzgd7C1pWA.mp4"
set_video_backend('video_reader')
print(f'set backend: {get_video_backend()}')

visual, audio, info = read_video(video_path, pts_unit='pts')
print('Visual:', visual.shape, 'Audio:', audio.shape, info)

visual, audio, info = read_video(video_path, pts_unit='sec')
print('Visual:', visual.shape, 'Audio:', audio.shape, info)
---
Visual: torch.Size([327, 256, 340, 3]) Audio: torch.Size([523264, 1]) {'video_fps': 29.970029830932617, 'audio_fps': 48000.0}
Visual: torch.Size([327, 256, 340, 3]) Audio: torch.Size([523264, 1]) {'video_fps': 29.970029830932617, 'audio_fps': 48000.0}

video_path = "data/WUzgd7C1pWA.mp4"
set_video_backend('video_reader')
print(f'set backend: {get_video_backend()}')

visual, audio, info = read_video(video_path, start_pts=1001, pts_unit='pts')
print('Visual:', visual.shape, 'Audio:', audio.shape, info)

visual, audio, info = read_video(video_path, start_pts=0.0333667, pts_unit='sec')
print('Visual:', visual.shape, 'Audio:', audio.shape, info)
---
set backend: video_reader
Visual: torch.Size([326, 256, 340, 3]) Audio: torch.Size([521216, 1]) {'video_fps': 29.970029830932617, 'audio_fps': 48000.0}
Visual: torch.Size([326, 256, 340, 3]) Audio: torch.Size([521216, 1]) {'video_fps': 29.970029830932617, 'audio_fps': 48000.0}


video_path = "data//WUzgd7C1pWA.mp4"
set_video_backend('video_reader')
print(f'set backend: {get_video_backend()}')

visual, audio, info = read_video(video_path, start_pts=1001, end_pts=2002, pts_unit='pts')
print('Visual:', visual.shape, 'Audio:', audio.shape, info)

visual, audio, info = read_video(video_path, start_pts=0.0333667, end_pts=0.1001000, pts_unit='sec')
print('Visual:', visual.shape, 'Audio:', audio.shape, info)
---
set backend: video_reader
Visual: torch.Size([2, 256, 340, 3]) Audio: torch.Size([2048, 1]) {'video_fps': 29.970029830932617, 'audio_fps': 48000.0}
Visual: torch.Size([3, 256, 340, 3]) Audio: torch.Size([3072, 1]) {'video_fps': 29.970029830932617, 'audio_fps': 48000.0}

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good. Do you think we should put a unit-test to test for this case?

@prabhat00155
Copy link
Contributor Author

The changes look good. Do you think we should put a unit-test to test for this case?

Yeah that makes sense, added the unit test.

@datumbox datumbox self-requested a review May 27, 2021 18:24
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This failing test seems related. Marking as requires changes to avoid accidental merges. Let's check it out next week. :)

@datumbox
Copy link
Contributor

datumbox commented Jun 11, 2021

Following the reported internal issue, I took this patch and applied it on latest master:

$ git diff
diff --git a/torchvision/io/_video_opt.py b/torchvision/io/_video_opt.py
index a34b023b..e92ac1bd 100644
--- a/torchvision/io/_video_opt.py
+++ b/torchvision/io/_video_opt.py
@@ -155,7 +155,7 @@ def _align_audio_frames(aframes, aframe_pts, audio_pts_range):
     e_idx = num_samples
     if start < audio_pts_range[0]:
         s_idx = int((audio_pts_range[0] - start) / step_per_aframe)
-    if end > audio_pts_range[1]:
+    if audio_pts_range[1] != -1 and end > audio_pts_range[1]:
         e_idx = int((audio_pts_range[1] - end) / step_per_aframe)
     return aframes[s_idx:e_idx, :]

Then opened the same file as on the bug report:

>>> from torchvision.io import read_video
>>> vid_path = "./---0tKA3iYI.mp4"
>>> vid, aud, meta = read_video(vid_path, 1, 2, pts_unit="sec")
>>> vid.shape
torch.Size([31, 360, 204, 3])
>>> aud.shape
torch.Size([1, 0])

Not sure if it's the same issue or a different one. Thoughts?

Edit:

The problem persists even after setting the backend. Might be unrelated to the fixes of the PR and instead be a separate problem:

>>> import torchvision
>>> torchvision.set_video_backend('video_reader')
>>> from torchvision.io import read_video
>>> vid_path = "./---0tKA3iYI.mp4"
>>> vid, aud, meta = read_video(vid_path, 1, 2, pts_unit="sec")
>>> vid.shape,  aud.shape
(torch.Size([31, 360, 204, 3]), torch.Size([1, 0]))

@prabhat00155
Copy link
Contributor Author

prabhat00155 commented Jun 11, 2021

Following the reported internal issue, I took this patch and applied it on latest master:

$ git diff
diff --git a/torchvision/io/_video_opt.py b/torchvision/io/_video_opt.py
index a34b023b..e92ac1bd 100644
--- a/torchvision/io/_video_opt.py
+++ b/torchvision/io/_video_opt.py
@@ -155,7 +155,7 @@ def _align_audio_frames(aframes, aframe_pts, audio_pts_range):
     e_idx = num_samples
     if start < audio_pts_range[0]:
         s_idx = int((audio_pts_range[0] - start) / step_per_aframe)
-    if end > audio_pts_range[1]:
+    if audio_pts_range[1] != -1 and end > audio_pts_range[1]:
         e_idx = int((audio_pts_range[1] - end) / step_per_aframe)
     return aframes[s_idx:e_idx, :]

Then opened the same file as on the bug report:

>>> from torchvision.io import read_video
>>> vid_path = "./---0tKA3iYI.mp4"
>>> vid, aud, meta = read_video(vid_path, 1, 2, pts_unit="sec")
>>> vid.shape
torch.Size([31, 360, 204, 3])
>>> aud.shape
torch.Size([1, 0])

Not sure if it's the same issue or a different one. Thoughts?

Sorry, the current PR fixes missing audio with video_reader backend

set_video_backend('video_reader')

By default, we use pyav backend, which also returns empty audio frames: #3779. I'll fix that in a different PR.

@datumbox datumbox self-requested a review June 11, 2021 17:03
datumbox
datumbox previously approved these changes Jun 11, 2021
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prabhat00155 thanks for the PR, LGTM.

Edit:
Unfortunately I see a segmentation fault that looks relevant. See unittest_linux_cpu_py3.9

@datumbox datumbox dismissed their stale review June 11, 2021 17:06

test failed

@prabhat00155 prabhat00155 requested a review from datumbox June 12, 2021 09:27
@prabhat00155
Copy link
Contributor Author

prabhat00155 commented Jun 12, 2021

@prabhat00155 thanks for the PR, LGTM.

Edit:
Unfortunately I see a segmentation fault that looks relevant. See unittest_linux_cpu_py3.9

@datumbox The seg fault was caused by incompatible ffmpeg version, which was fixed in #4041. It should be fine now.

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@prabhat00155 prabhat00155 merged commit b74366b into pytorch:master Jun 15, 2021
@prabhat00155 prabhat00155 deleted the prabhat00155/fix_audio branch June 15, 2021 10:32
facebook-github-bot pushed a commit that referenced this pull request Jun 21, 2021
Summary:
* Fixed missing audio with video_reader backend

* Added unit test

Reviewed By: fmassa

Differential Revision: D29264318

fbshipit-source-id: de95e0bd38d2f844c756652fe42de99b1ab32210
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Audio missing when using read_video() with video_reader backend
3 participants