Skip to content

Commit

Permalink
Add file-like object support to Streaming API (pytorch#2400)
Browse files Browse the repository at this point in the history
Summary:
This commit adds file-like object support to Streaming API.

## Features
- File-like objects are expected to implement `read(self, n)`.
- Additionally `seek(self, offset, whence)` is used if available.
- Without `seek` method, some formats cannot be decoded properly.
  - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
  - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
  - So as to have the arguments common to both audio and video in from of the rest of the arguments, the order of the arguments are changed.
  - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.

## Code structure

The approach is very similar to how file-like object is supported in sox-based I/O.
In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.

![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)

## Refactoring involved
- Extracted to pytorch#2402
  - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
  - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
  - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.
- On Python side, the switch of binding happens in the constructor of `StreamReader` class. Since all the methods have to be delegated to the same set of binding, a backend was introduced, which is abstracted away from user code.

## TODO:
- [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).

Pull Request resolved: pytorch#2400

Differential Revision: D36520073

Pulled By: mthrok

fbshipit-source-id: 9ceb5a2470abf3b764a12f3abe1355311ccc7eb4
  • Loading branch information
mthrok authored and facebook-github-bot committed May 20, 2022
1 parent 38cf5b7 commit beceba4
Show file tree
Hide file tree
Showing 14 changed files with 636 additions and 326 deletions.
19 changes: 11 additions & 8 deletions examples/tutorials/streaming_api_tutorial.py
Original file line number Diff line number Diff line change
Expand Up @@ -250,21 +250,24 @@
# When the StreamReader buffered this number of chunks and is asked to pull
# more frames, StreamReader drops the old frames/chunks.
# - ``stream_index``: The index of the source stream.
# - ``decoder``: If provided, override the decoder. Useful if it fails to detect
# the codec.
# - ``decoder_option``: The option for the decoder.
#
# For audio output stream, you can provide the following additional
# parameters to change the audio properties.
#
# - ``sample_rate``: When provided, StreamReader resamples the audio on-the-fly.
# - ``dtype``: By default the StreamReader returns tensor of `float32` dtype,
# with sample values ranging `[-1, 1]`. By providing ``dtype`` argument
# - ``format``: By default the StreamReader returns tensor of `float32` dtype,
# with sample values ranging `[-1, 1]`. By providing ``format`` argument
# the resulting dtype and value range is changed.
# - ``sample_rate``: When provided, StreamReader resamples the audio on-the-fly.
#
# For video output stream, the following parameters are available.
#
# - ``format``: Change the image format.
# - ``frame_rate``: Change the frame rate by dropping or duplicating
# frames. No interpolation is performed.
# - ``width``, ``height``: Change the image size.
# - ``format``: Change the image format.
#

######################################################################
Expand Down Expand Up @@ -298,7 +301,7 @@
# streamer.add_basic_video_stream(
# frames_per_chunk=10,
# frame_rate=30,
# format="RGB"
# format="rgb24"
# )
#
# # Stream video from source stream `j`,
Expand All @@ -310,7 +313,7 @@
# frame_rate=30,
# width=128,
# height=128,
# format="BGR"
# format="bgr24"
# )
#

Expand Down Expand Up @@ -428,7 +431,7 @@
frame_rate=1,
width=960,
height=540,
format="RGB",
format="rgb24",
)

# Video stream with 320x320 (stretched) at 3 FPS, grayscale
Expand All @@ -437,7 +440,7 @@
frame_rate=3,
width=320,
height=320,
format="GRAY",
format="gray",
)
# fmt: on

Expand Down
2 changes: 1 addition & 1 deletion test/torchaudio_unittest/common_utils/case_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ def get_base_temp_dir(cls):

@classmethod
def tearDownClass(cls):
super().tearDownClass()
if cls.temp_dir_ is not None:
try:
cls.temp_dir_.cleanup()
Expand All @@ -52,6 +51,7 @@ def tearDownClass(cls):
#
# Following the above thread, we ignore it.
pass
super().tearDownClass()

def get_temp_path(self, *paths):
temp_dir = os.path.join(self.get_base_temp_dir(), self.id())
Expand Down
Loading

0 comments on commit beceba4

Please sign in to comment.