Skip to content

Commit

Permalink
Add file-like object support to Streaming API (#2400)
Browse files Browse the repository at this point in the history
Summary:
This commit adds file-like object support to Streaming API.

## Features
- File-like objects are expected to implement `read(self, n)`.
- Additionally `seek(self, offset, whence)` is used if available.
- Without `seek` method, some formats cannot be decoded properly.
  - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
  - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
  - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
  - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.

## Code structure

The approach is very similar to how file-like object is supported in sox-based I/O.
In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.

![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)

## Refactoring involved
- Extracted to #2402
  - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
  - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
  - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.

## TODO:
- [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).

Pull Request resolved: #2400

Reviewed By: carolineechen

Differential Revision: D36520073

Pulled By: mthrok

fbshipit-source-id: 271c86a09bdddb1c66c19ce5586be663cb1f7725
  • Loading branch information
mthrok authored and facebook-github-bot committed May 21, 2022
1 parent 6776299 commit aa91aa0
Show file tree
Hide file tree
Showing 14 changed files with 763 additions and 359 deletions.
163 changes: 122 additions & 41 deletions examples/tutorials/streaming_api_tutorial.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
# It can
# - Load audio/video in variety of formats
# - Load audio/video from local/remote source
# - Load audio/video from file-like object
# - Load audio/video from microphone, camera and screen
# - Generate synthetic audio/video signals.
# - Load audio/video chunk by chunk
Expand All @@ -51,7 +52,7 @@
# `<some media source> -> <optional processing> -> <tensor>`
#
# If you have other forms that can be useful to your usecases,
# (such as integration with `torch.Tensor` type and file-like objects)
# (such as integration with `torch.Tensor` type)
# please file a feature request.
#

Expand All @@ -60,11 +61,15 @@
# --------------
#

import IPython
import matplotlib.pyplot as plt
import torch
import torchaudio

print(torch.__version__)
print(torchaudio.__version__)

######################################################################
#

try:
from torchaudio.io import StreamReader
except ModuleNotFoundError:
Expand All @@ -87,8 +92,8 @@
pass
raise

print(torch.__version__)
print(torchaudio.__version__)
import IPython
import matplotlib.pyplot as plt

base_url = "https://download.pytorch.org/torchaudio/tutorial-assets"
AUDIO_URL = f"{base_url}/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav"
Expand All @@ -102,16 +107,26 @@
# handle. Whichever source is used, the remaining processes
# (configuring the output, applying preprocessing) are same.
#
# 1. Common media formats
# 1. Common media formats (resource indicator of string type or file-like object)
# 2. Audio / Video devices
# 3. Synthetic audio / video sources
#
# The following section covers how to open common media formats.
# For the other streams, please refer to the
# `Advanced I/O streams` section.
#
# .. note::
#
# The coverage of the supported media (such as containers, codecs and protocols)
# depend on the FFmpeg libraries found in the system.
#
# If `StreamReader` raises an error opening a source, please check
# that `ffmpeg` command can handle it.
#

######################################################################
# Local files
# ~~~~~~~~~~~
#
# To open a media file, you can simply pass the path of the file to
# the constructor of `StreamReader`.
Expand All @@ -132,12 +147,73 @@
# # Video file
# StreamReader(src="video.mpeg")
#

######################################################################
# Network protocols
# ~~~~~~~~~~~~~~~~~
#
# You can directly pass a URL as well.
#
# .. code::
#
# # Video on remote server
# StreamReader(src="https://example.com/video.mp4")
#
# # Playlist format
# StreamReader(src="https://example.com/playlist.m3u")
#
# # RTMP
# StreamReader(src="rtmp://example.com:1935/live/app")
#

######################################################################
# File-like objects
# ~~~~~~~~~~~~~~~~~
#
# You can also pass a file-like object. A file-like object must implement
# ``read`` method conforming to :py:attr:`io.RawIOBase.read`.
#
# If the given file-like object has ``seek`` method, StreamReader uses it
# as well. In this case the ``seek`` method is expected to conform to
# :py:attr:`io.IOBase.seek`.
#
# .. code::
#
# # Open as fileobj with seek support
# with open("input.mp4", "rb") as src:
# StreamReader(src=src)
#
# In case where third-party libraries implement ``seek`` so that it raises
# an error, you can write a wrapper class to mask the ``seek`` method.
#
# .. code::
#
# class Wrapper:
# def __init__(self, obj):
# self.obj = obj
#
# def read(self, n):
# return self.obj.read(n)
#
# .. code::
#
# import requests
#
# response = requests.get("https://example.com/video.mp4", stream=True)
# s = StreamReader(Wrapper(response.raw))
#
# .. code::
#
# import boto3
#
# response = boto3.client("s3").get_object(Bucket="my_bucket", Key="key")
# s = StreamReader(Wrapper(response["Body"]))
#

######################################################################
# Opening a headerless data
# ~~~~~~~~~~~~~~~~~~~~~~~~~
#
# If attempting to load headerless raw data, you can use ``format`` and
# ``option`` to specify the format of the data.
#
Expand Down Expand Up @@ -213,8 +289,8 @@
#

######################################################################
# 5.1. Default streams
# --------------------
# Default streams
# ~~~~~~~~~~~~~~~
#
# When there are multiple streams in the source, it is not immediately
# clear which stream should be used.
Expand All @@ -227,8 +303,8 @@
#

######################################################################
# 5.2. Configuring output streams
# -------------------------------
# Configuring output streams
# ~~~~~~~~~~~~~~~~~~~~~~~~~~
#
# Once you know which source stream you want to use, then you can
# configure output streams with
Expand All @@ -250,21 +326,25 @@
# When the StreamReader buffered this number of chunks and is asked to pull
# more frames, StreamReader drops the old frames/chunks.
# - ``stream_index``: The index of the source stream.
# - ``decoder``: If provided, override the decoder. Useful if it fails to detect
# the codec.
# - ``decoder_option``: The option for the decoder.
#
# For audio output stream, you can provide the following additional
# parameters to change the audio properties.
#
# - ``sample_rate``: When provided, StreamReader resamples the audio on-the-fly.
# - ``dtype``: By default the StreamReader returns tensor of `float32` dtype,
# with sample values ranging `[-1, 1]`. By providing ``dtype`` argument
# - ``format``: By default the StreamReader returns tensor of `float32` dtype,
# with sample values ranging `[-1, 1]`. By providing ``format`` argument
# the resulting dtype and value range is changed.
# - ``sample_rate``: When provided, StreamReader resamples the audio on-the-fly.
#
# For video output stream, the following parameters are available.
#
# - ``format``: Image frame format. By default StreamReader returns
# frames in 8-bit 3 channel, in RGB order.
# - ``frame_rate``: Change the frame rate by dropping or duplicating
# frames. No interpolation is performed.
# - ``width``, ``height``: Change the image size.
# - ``format``: Change the image format.
#

######################################################################
Expand Down Expand Up @@ -298,7 +378,7 @@
# streamer.add_basic_video_stream(
# frames_per_chunk=10,
# frame_rate=30,
# format="RGB"
# format="rgb24"
# )
#
# # Stream video from source stream `j`,
Expand All @@ -310,7 +390,7 @@
# frame_rate=30,
# width=128,
# height=128,
# format="BGR"
# format="bgr24"
# )
#

Expand Down Expand Up @@ -341,8 +421,8 @@
#

######################################################################
# 5.3. Streaming
# --------------
# 6. Streaming
# ------------
#
# To stream media data, the streamer alternates the process of
# fetching and decoding the source data, and passing the resulting
Expand All @@ -368,7 +448,7 @@
#

######################################################################
# 6. Example
# 7. Example
# ----------
#
# Let's take an example video to configure the output streams.
Expand All @@ -392,9 +472,9 @@
#

######################################################################
# Opening the source media
# ~~~~~~~~~~~~~~~~~~~~~~~~
#
# 6.1. Opening the source media
# ------------------------------
# Firstly, let's list the available streams and its properties.
#

Expand All @@ -406,8 +486,8 @@
#
# Now we configure the output stream.
#
# 6.2. Configuring ouptut streams
# -------------------------------
# Configuring ouptut streams
# ~~~~~~~~~~~~~~~~~~~~~~~~~~

# fmt: off
# Audio stream with 8k Hz
Expand All @@ -428,7 +508,7 @@
frame_rate=1,
width=960,
height=540,
format="RGB",
format="rgb24",
)

# Video stream with 320x320 (stretched) at 3 FPS, grayscale
Expand All @@ -437,7 +517,7 @@
frame_rate=3,
width=320,
height=320,
format="GRAY",
format="gray",
)
# fmt: on

Expand Down Expand Up @@ -466,8 +546,8 @@
print(streamer.get_out_stream_info(i))

######################################################################
# 6.3. Streaming
# --------------
# Streaming
# ~~~~~~~~~
#

######################################################################
Expand Down Expand Up @@ -542,7 +622,9 @@
#
# .. seealso::
#
# `Device ASR with Emformer RNN-T <./device_asr.html>`__.
# - `Accelerated Video Decoding with NVDEC <../hw_acceleration_tutorial.html>`__.
# - `Online ASR with Emformer RNN-T <./online_asr_tutorial.html>`__.
# - `Device ASR with Emformer RNN-T <./device_asr.html>`__.
#
# Given that the system has proper media devices and libavdevice is
# configured to use the devices, the streaming API can
Expand Down Expand Up @@ -622,14 +704,13 @@
#

######################################################################
# 2.1. Synthetic audio examples
# -----------------------------
# Synthetic audio examples
# ------------------------
#

######################################################################
# Sine wave with
# ~~~~~~~~~~~~~~
#
# Sine wave
# ~~~~~~~~~
# https://ffmpeg.org/ffmpeg-filters.html#sine
#
# .. code::
Expand Down Expand Up @@ -675,8 +756,8 @@
#

######################################################################
# Generate noise with
# ~~~~~~~~~~~~~~~~~~~
# Noise
# ~~~~~
# https://ffmpeg.org/ffmpeg-filters.html#anoisesrc
#
# .. code::
Expand All @@ -694,8 +775,8 @@
#

######################################################################
# 2.2. Synthetic video examples
# -----------------------------
# Synthetic video examples
# ------------------------
#

######################################################################
Expand Down Expand Up @@ -811,8 +892,8 @@
#

######################################################################
# 3.1. Custom audio streams
# -------------------------
# Custom audio streams
# --------------------
#
#

Expand Down Expand Up @@ -897,8 +978,8 @@ def _display(i):
_display(3)

######################################################################
# 3.2. Custom video streams
# -------------------------
# Custom video streams
# --------------------
#

# fmt: off
Expand Down
2 changes: 1 addition & 1 deletion test/torchaudio_unittest/common_utils/case_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ def get_base_temp_dir(cls):

@classmethod
def tearDownClass(cls):
super().tearDownClass()
if cls.temp_dir_ is not None:
try:
cls.temp_dir_.cleanup()
Expand All @@ -52,6 +51,7 @@ def tearDownClass(cls):
#
# Following the above thread, we ignore it.
pass
super().tearDownClass()

def get_temp_path(self, *paths):
temp_dir = os.path.join(self.get_base_temp_dir(), self.id())
Expand Down
Loading

0 comments on commit aa91aa0

Please sign in to comment.