Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Conformer RNN-T with TCPGen for biasing
Conformer RNN-T with TCPGen for biasing first commit BrianSun Conformer RNN-T with TCPGen for biasing parent c68625152ad84f9ea4e881fac695f7d98ee326a9 author Caroline Chen <carolinechen@fb.com> 1659983982 -0700 committer G. Sun <gs534@login-e-3.data.cluster> 1674296079 +0000 parent c68625152ad84f9ea4e881fac695f7d98ee326a9 author Caroline Chen <carolinechen@fb.com> 1659983982 -0700 committer G. Sun <gs534@login-e-3.data.cluster> 1674296047 +0000 parent c68625152ad84f9ea4e881fac695f7d98ee326a9 author Caroline Chen <carolinechen@fb.com> 1659983982 -0700 committer G. Sun <gs534@login-e-3.data.cluster> 1674295932 +0000 parent c68625152ad84f9ea4e881fac695f7d98ee326a9 author Caroline Chen <carolinechen@fb.com> 1659983982 -0700 committer G. Sun <gs534@login-e-3.data.cluster> 1674295795 +0000 parent c68625152ad84f9ea4e881fac695f7d98ee326a9 author Caroline Chen <carolinechen@fb.com> 1659983982 -0700 committer G. Sun <gs534@login-e-3.data.cluster> 1674295664 +0000 parent c68625152ad84f9ea4e881fac695f7d98ee326a9 author Caroline Chen <carolinechen@fb.com> 1659983982 -0700 committer G. Sun <gs534@login-e-3.data.cluster> 1674295524 +0000 parent c68625152ad84f9ea4e881fac695f7d98ee326a9 author Caroline Chen <carolinechen@fb.com> 1659983982 -0700 committer G. Sun <gs534@login-e-3.data.cluster> 1674295462 +0000 Fix stylecheck (#2606) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2606 Reviewed By: nateanl Differential Revision: D38502666 Pulled By: carolineechen fbshipit-source-id: 1e279996fff3621835a07882c63328856fe38f3a Add NNLM support to CTC Decoder (#2528) Summary: Expose flashlight's LM and LMState classes to support decoding with custom language models, including NN LMs. The `ctc_decoder` API is as follows - To decode with KenLM, pass in KenLM language model path to `lm` variable - To decode with custom LM, create Python class with `CTCDecoderLM` subclass, and pass in the class to `lm` variable. Additionally create a file of LM words listed in order of the LM index, with a word per line, and pass in the file to `lm_path`. - To decode without a language model, set `lm` to `None` (default) Validated against fairseq w2l decoder on sample LibriSpeech dataset and LM. Code for validation can be found [here](https://github.com/facebookresearch/fairseq/compare/main...carolineechen:fairseq:ctc-decoder). Also added unit tests to validate custom implementations of ZeroLM and KenLM, and also using a biased LM. Follow ups: - Train simple LM on LibriSpeech and demonstrate usage in tutorial or examples directory cc jacobkahn Pull Request resolved: https://github.com/pytorch/audio/pull/2528 Reviewed By: mthrok Differential Revision: D38243802 Pulled By: carolineechen fbshipit-source-id: 445e78f6c20bda655aabf819fc0f771fe68c73d7 Fix dataset docs parsing issue with extra spaces (#2607) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2607 Reviewed By: carolineechen, nateanl Differential Revision: D38522606 Pulled By: skim0514 fbshipit-source-id: 2c38b8dcb343bcf624bfda1bfa2afd91abf2e668 Fixed argument validation in TorchAudio filtering (#2609) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2609 Converted argument validations in torchaudio/functional/filtering from assert based validation to the preferred if-then raise validation. Added specific error messages in all cases. Reviewed By: mthrok Differential Revision: D38515029 fbshipit-source-id: 6c644a042f86c6feb2bbe8bd02fdb484fe27fae9 Fix bug in Conformer RNN-T recipe (#2611) Summary: https://github.com/pytorch/audio/issues/2535 modified the Conformer RNN-T Lightning module to accept a SentencePiece model instance rather than a file path. This PR makes changes to account for this in the train script. Pull Request resolved: https://github.com/pytorch/audio/pull/2611 Reviewed By: carolineechen Differential Revision: D38578892 Pulled By: hwangjeff fbshipit-source-id: ec3b9823ad30ffb730baa13d10d8b79020866aac Add additive noise function (#2608) Summary: Adds function `add_noise`, which computes and returns the sum of a waveform and scaled noise. Pull Request resolved: https://github.com/pytorch/audio/pull/2608 Reviewed By: nateanl Differential Revision: D38557141 Pulled By: hwangjeff fbshipit-source-id: 1457fa213f43ca5b4333d3c7580971655d4260a0 Introducing pytorch-cuda metapackage (#2612) Summary: Introducing pytorch-cuda metapackage Same as: https://github.com/pytorch/vision/pull/6371 Following PR: https://github.com/pytorch/builder/pull/1094 Adds cuda metapackage called pytorch-cuda . This way we can make sure to install correct version of cuda dependencies and don't depend on conda-forge. Pull Request resolved: https://github.com/pytorch/audio/pull/2612 Reviewed By: hwangjeff, seemethere, nateanl Differential Revision: D38633332 Pulled By: atalman fbshipit-source-id: 78a6115bb252ebdb6d66a57d7d2c4a4978ddb501 Remove outdated doc (#2617) Summary: `ctc_decoder` has become beta, remove it from prototype documents. Pull Request resolved: https://github.com/pytorch/audio/pull/2617 Reviewed By: hwangjeff Differential Revision: D38706869 Pulled By: nateanl fbshipit-source-id: 41679f4e65a584b6b882af4551a50123f1dcef02 Update doc version selector link (#2605) Summary: The link to version selector has been absolute link, which had been a trap when reviewing gh-pages deployment from folk. This commit changes that to relative link. Pull Request resolved: https://github.com/pytorch/audio/pull/2605 Test Plan: - https://mthrok.github.io/audio/main/index.html -> click version selector -> https://mthrok.github.io/audio/versions.html - https://mthrok.github.io/audio/0.12.1/index.html -> click version selector -> https://pytorch.org/audio/versions.html Reviewed By: carolineechen, nateanl Differential Revision: D38695645 Pulled By: mthrok fbshipit-source-id: 91132ac19b8c61f39d304a162435b9c6599ef2b2 Fix anaconda upload (#2621) Summary: Same as: https://github.com/pytorch/vision/pull/6422 Testing: ``` export ANACONDA_PATH=$(conda info --base)/bin echo $ANACONDA_PATH /opt/homebrew/Caskroom/miniconda/base/bin $ANACONDA_PATH/anaconda -V anaconda Command line client (version 1.10.0) ``` Failure: https://github.com/pytorch/audio/runs/7837085749?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/audio/pull/2621 Reviewed By: weiwangmeta, seemethere Differential Revision: D38714324 Pulled By: atalman fbshipit-source-id: 55342cf69006e9250403c955202846bab4516f3e Move xcode to 14 from 12.5 (#2622) Summary: Similar to https://github.com/pytorch/vision/pull/6218 Fixing MacOS builds Pull Request resolved: https://github.com/pytorch/audio/pull/2622 Reviewed By: weiwangmeta Differential Revision: D38722983 Pulled By: atalman fbshipit-source-id: 4cef85c97dc270fc812bc289592c4f3815f73c85 Added example for MelScale transform (#2616) Summary: Added example for MelScale transform as mentioned in issue https://github.com/pytorch/audio/issues/1564. Pull Request resolved: https://github.com/pytorch/audio/pull/2616 Reviewed By: carolineechen Differential Revision: D38743145 Pulled By: nateanl fbshipit-source-id: e24ca92f5317a0ea5a141418bf084b12cfb22486 Added example for AmplitudeToDB transform (#2615) Summary: Added example for AmplitudeToDB transform as mentioned in issue https://github.com/pytorch/audio/issues/1564. Pull Request resolved: https://github.com/pytorch/audio/pull/2615 Reviewed By: carolineechen Differential Revision: D38743117 Pulled By: nateanl fbshipit-source-id: bf0f760299f4777a4bca65da86359faa00b16207 Use double quotes for string in functional and transforms (#2618) Summary: To make the code consistent, we should use double quotation marks for all strings. This PR make such changes in functional and transforms. Pull Request resolved: https://github.com/pytorch/audio/pull/2618 Reviewed By: carolineechen Differential Revision: D38744137 Pulled By: nateanl fbshipit-source-id: 74213a24d9f66c306cc92019d77dcb2a877f94bd Fix doc warning (#2627) Summary: Resolves the following warning ``` /torchaudio/docs/source/transforms.rst:94: WARNING: Title underline too short. :hidden:`Loudness` ----------------- ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2627 Reviewed By: carolineechen Differential Revision: D38814802 Pulled By: mthrok fbshipit-source-id: 5dfaf2d7bae22dba0f4a14f04ca63f28d6b2a749 Fix Sphinx-gallery display and pin sphinx-related packages (#2629) Summary: This commit fixes the issue with the recent Sphinx-Gallery update. Also it pins the versions of Sphinx-related packages. Before: <img width="256" alt="Screen Shot 2022-08-17 at 10 02 23 PM" src="https://user-images.githubusercontent.com/855818/185140952-28f2d98a-b586-424c-a003-b69089f48eb9.png"> After: https://user-images.githubusercontent.com/855818/185271889-bd4f86a0-986b-43bb-8121-bd77750d74f0.mov Pull Request resolved: https://github.com/pytorch/audio/pull/2629 Reviewed By: carolineechen Differential Revision: D38816417 Pulled By: mthrok fbshipit-source-id: 11ee3f9121d9a302772ee1f461dacae52eb28852 Tweak tutorials (#2630) Summary: Resolves the following warnings ``` /torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:195: WARNING: Unexpected indentation. /torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:446: WARNING: Unexpected indentation. /torchaudio/docs/source/tutorials/audio_io_tutorial.rst:559: WARNING: Content block expected for the "note" directive; none found. /torchaudio/docs/source/tutorials/mvdr_tutorial.rst:338: WARNING: Bullet list ends without a blank line; unexpected unindent. ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2630 Reviewed By: nateanl Differential Revision: D38816632 Pulled By: mthrok fbshipit-source-id: 135ded4e064d136be67ce24439e96f5e9c9ce635 Update notes around nightly build and third parties (#2632) Summary: Google Colab now has torchaudio 0.12 pre-installed. This commit removes the note about nightly build. Pull Request resolved: https://github.com/pytorch/audio/pull/2632 Reviewed By: carolineechen Differential Revision: D38827632 Pulled By: mthrok fbshipit-source-id: ac769780868b741c3012357d589ec0019d9af6eb Added example for InverseMelScale transform (#2635) Summary: Added example for InverseMelScale transform as mentioned in issue https://github.com/pytorch/audio/issues/1564. Pull Request resolved: https://github.com/pytorch/audio/pull/2635 Reviewed By: carolineechen Differential Revision: D38830318 Pulled By: nateanl fbshipit-source-id: fd26a700d495f6755db0767625aa8577cb89bd83 Update ASR inference tutorial (#2631) Summary: * Use download_asset * Remove notes around nightly * Print versions first * Remove duplicated import Pull Request resolved: https://github.com/pytorch/audio/pull/2631 Reviewed By: carolineechen Differential Revision: D38830395 Pulled By: mthrok fbshipit-source-id: c9259df33562defe249734d1ed074dac0fddc2f6 Update README.md (#2633) Summary: Update compatibility matrix Pull Request resolved: https://github.com/pytorch/audio/pull/2633 Reviewed By: nateanl Differential Revision: D38827670 Pulled By: mthrok fbshipit-source-id: 5c66bf60a06e37919ee725a5f4adf571e6c89100 Refactor sox pybind source code (#2636) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2636 At the early stage of torchaudio extension module, `torchaudio/csrc/pybind` directory was created so that all the code defining Python interface would be placed there and there will be only one extension module called `torchaudio._torchaudio`. However, the codebase has been evolved in a way separate extensions are defined for each feature (third party dependency) for the sake of more moduler file organization. What is left in `csrc/pybind` is libsox Python bindings. This commit moves it under `csrc/sox`. Follow-up rename `torchaudio._torchaudio` to `torchaudio._torchaudio_sox`. Reviewed By: carolineechen Differential Revision: D38829253 fbshipit-source-id: 3554af45a2beb0f902810c5548751264e093f28d Added example for MFCC transform (#2637) Summary: Added example for MFCC transform as mentioned in issue https://github.com/pytorch/audio/issues/1564. Note: Python formatter package `black` uses double quotes for the string dict keys (e.g. in `melkwargs` for this example). Please let me know if there is a different linter/format/convention that is preferred! Pull Request resolved: https://github.com/pytorch/audio/pull/2637 Reviewed By: carolineechen Differential Revision: D38873729 Pulled By: nateanl fbshipit-source-id: 2e8fe2930671e7c5d02c0c37cf1ca5cc8c5079e3 Added example for Loudness transform (#2641) Summary: Added example for Loudness transform (implemented in PR https://github.com/pytorch/audio/issues/2472) as mentioned in issue https://github.com/pytorch/audio/issues/1564. Pull Request resolved: https://github.com/pytorch/audio/pull/2641 Reviewed By: nateanl Differential Revision: D38907782 Pulled By: carolineechen fbshipit-source-id: fd2bcc4bac3095a626ea9cf36cb70cb2bf003d63 Update Sphinx-gallery to 0.11.1 (#2638) Summary: The minor release fixes some gallery issue, which allows to remove some of the customization we had in https://github.com/pytorch/audio/issues/2629 https://output.circle-artifacts.com/output/job/553a9b98-8260-4cb4-a681-20ef97d2c33e/artifacts/0/docs/pipelines.html#torchaudio.pipelines.Wav2Vec2ASRBundle Pull Request resolved: https://github.com/pytorch/audio/pull/2638 Reviewed By: carolineechen, nateanl Differential Revision: D38909097 Pulled By: mthrok fbshipit-source-id: 78346d93b54fca2a19b28991c224324ef53221c9 [Nova] Added draft calling GHA workflow for building linux wheels (#2548) Summary: As part of Project Nova, we are consolidating CI/CD workflows and infra, making them reusable across PyTorch ecosystem libraries. https://github.com/pytorch/test-infra/pull/460 introduces a general-purpose reusable workflow to build linux wheels for python libraries. This PR introduces a caller workflow that triggers the reusable workflow. Details around modular env setup, passing input args across workflows, etc. are still being worked out. Using reusable workflow defined in https://github.com/pytorch/test-infra/pull/506 Pull Request resolved: https://github.com/pytorch/audio/pull/2548 Reviewed By: osalpekar Differential Revision: D38947733 Pulled By: mehtanirav fbshipit-source-id: 03ab88cef973a092f5c5d1ff8c74ec7ae7e46d01 Added example for LFCC transform (#2640) Summary: Added example for LFCC transform as mentioned in issue https://github.com/pytorch/audio/issues/1564. Pull Request resolved: https://github.com/pytorch/audio/pull/2640 Reviewed By: carolineechen Differential Revision: D38908975 Pulled By: nateanl fbshipit-source-id: ffdd994390db7f27556b011a8050a65eef9cd09d Add StreamWriter (#2628) Summary: This commit adds FFmpeg-based encoder StreamWriter class. StreamWriter is pretty much the opposite of StreamReader class, and it supports; * Encoding audio / still image / video * Exporting to local file / streaming protocol / devices etc... * File-like object support (in later commit) * HW video encoding (in later commit) See also: https://fburl.com/gslide/z85kn5a9 (Meta internal) Pull Request resolved: https://github.com/pytorch/audio/pull/2628 Reviewed By: nateanl Differential Revision: D38816650 Pulled By: mthrok fbshipit-source-id: a9343b0d55755e186971dc96fb86eb52daa003c8 [Nova] Build Linux Conda Binaries using reusable workflow (#2626) Summary: Calling the reusable workflow introduced in https://github.com/pytorch/test-infra/pull/546 to build conda binaries on linux. Pull Request resolved: https://github.com/pytorch/audio/pull/2626 Reviewed By: mehtanirav Differential Revision: D39028057 Pulled By: osalpekar fbshipit-source-id: d74ea3771967d0ee2b0ad28a8f811a95145b2183 Replace bg_iterator in examples (#2645) Summary: `bg_iterator` was deprecated in 0.11 because it was known to have issues (deadlock) without speed up. Remove instances of `bg_iterator` used in torchaudio examples. Resolves https://github.com/pytorch/audio/issues/2642 Pull Request resolved: https://github.com/pytorch/audio/pull/2645 Reviewed By: nateanl Differential Revision: D38954292 Pulled By: carolineechen fbshipit-source-id: 2333ab5228c2b8511ff532057543aaf9d02b2789 [Nova] Use pkg-helpers to modularize GHA Linux Conda Builds (#2650) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2650 Reviewed By: mehtanirav Differential Revision: D39040559 Pulled By: osalpekar fbshipit-source-id: df39e23d7c246728793aab969b8dc1070af88d75 add CUDA 11.7 builds (#2623) Summary: CC atalman Pull Request resolved: https://github.com/pytorch/audio/pull/2623 Reviewed By: hwangjeff, nateanl Differential Revision: D39036432 Pulled By: atalman fbshipit-source-id: cd74a1bf8f74e31bd2c32c80d32c617f4b1766e8 Add file-like object support to StreamWriter (#2648) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2648 Reviewed By: nateanl Differential Revision: D38976874 Pulled By: mthrok fbshipit-source-id: 0541dea2a633d97000b4b8609ff6b83f6b82c864 Add CUDA HW encoding support to StreamWriter (#2505) Summary: This commits add CUDA hardware encoding to StreamWriter. For certain video formats, it can encode video directly from CUDA Tensor, without needing to move the data to host CPU. Pull Request resolved: https://github.com/pytorch/audio/pull/2505 Reviewed By: hwangjeff Differential Revision: D37446830 Pulled By: mthrok fbshipit-source-id: eee6424f01a99a3b611dcad45ed58f86cba4672a Remove obsolete examples (#2655) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2655 Removed obsolete example and the corresponding test Reviewed By: mthrok Differential Revision: D39260253 fbshipit-source-id: 0bde71ffd75dd0c94a5cc4a9940f4648a5d61bd7 Add metadata function for LibriSpeech (#2653) Summary: Adding support for metadata mode, requested in https://github.com/pytorch/audio/issues/2539, by adding a public `get_metadata()` function in the dataset. This function can be used directly by users to fetch metadata for individual dataset indices, or users can subclass the dataset and override `__getitem__` with `get_metadata` to create a dataset class that directly handles metadata mode. Pull Request resolved: https://github.com/pytorch/audio/pull/2653 Reviewed By: nateanl, mthrok Differential Revision: D39105114 Pulled By: carolineechen fbshipit-source-id: 6f26f1402a053dffcfcc5d859f87271ed5923348 Fix random Gaussian generation (#2639) Summary: This PR is meant to address the bug raised in issue https://github.com/pytorch/audio/issues/2634. In particular, previously the Box Muller transform was used to generate Gaussian variates for dithering based on `torch.rand` uniform variates, but it was incorrectly implemented (e.g. the same uniform variate was used as input to the transform, rather than two different uniform variates), which led to a different (non-Gaussian) distribution. This PR instead uses `torch.randn` to generate the Gaussian variates. Pull Request resolved: https://github.com/pytorch/audio/pull/2639 Reviewed By: mthrok Differential Revision: D39101144 Pulled By: carolineechen fbshipit-source-id: 691e49679f6598ef0a1675f6f4ee721ef32215fd Tweak documentation (#2656) Summary: 1. Override class `__module__` attribute in `conf.py` so that no manual override is necessary 2. Fix SourceSeparationBundle member attribute Pull Request resolved: https://github.com/pytorch/audio/pull/2656 Reviewed By: carolineechen Differential Revision: D39293053 Pulled By: mthrok fbshipit-source-id: 2b8d6be1aee517d0e692043c26ac2438a787adc6 Fix LibriSpeech Conforner RNN-T eval script (#2666) Summary: `ConformerRNNTModule`'s initializer now accepts a SentencePiece model rather than a path to a model as input. This PR corrects `eval.py` accordingly. Pull Request resolved: https://github.com/pytorch/audio/pull/2666 Reviewed By: carolineechen Differential Revision: D39386968 Pulled By: hwangjeff fbshipit-source-id: 95a94dd898263d648650f7376c29810b1456d6c1 [Nova] Remove the old caller GitHub Actions Linux wheels/conda Build Workflows (#2660) Summary: We moved over to a new design for release workflows that encompass all the build logic in the test-infra repo (apart from custom pre-build and post-build scripts). Thus, we no longer need these caller workflows in the audio repo. This PR removes them entirely. Pull Request resolved: https://github.com/pytorch/audio/pull/2660 Reviewed By: seemethere Differential Revision: D39392456 Pulled By: osalpekar fbshipit-source-id: a8bdeb4738b91666abcdc883f6f8f1bf359f1d42 Move hybrid demucs model out of prototype (#2668) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2668 Reviewed By: nateanl, mthrok Differential Revision: D39433671 Pulled By: carolineechen fbshipit-source-id: 3545a5b4019832861c34fd8c05e5f8600fd80d5c Do not use nested namespaces in torchaudio/sox (#2663) Summary: As it is a C++17 feature, and PyTorch and its extensions must still be C++14 compatible, as also specified in the top level CMakeLists.txt: https://github.com/pytorch/audio/blob/8a0d7b36f7821fe55175f0d4e3ca6299b3817a6c/CMakeLists.txt#L30 Otherwise, it pollutes build logs with noisy ``` /Users/runner/work/test-infra/test-infra/pytorch/audio/torchaudio/csrc/sox/pybind/io.cpp:12:21: warning: nested namespace definition is a C++17 extension; define each namespace separately [-Wc++17-extensions] namespace torchaudio::sox_io { ^~~~~~~~ { namespace sox_io ``` Pull Request resolved: https://github.com/pytorch/audio/pull/2663 Reviewed By: atalman, nateanl Differential Revision: D39362842 Pulled By: malfet fbshipit-source-id: f9659d4420f1cc0194990d531455cf59b66c26b9 [Bootcamp] Fix Typo (#2661) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2661 Fixed typo in `audio_data_augmentation_tutorial.py` Reviewed By: malfet, mthrok Differential Revision: D39352353 fbshipit-source-id: aea35dab03fb7422421948bd26716e10a8d65f92 Move SourceSeparationBundle and pre-trained ConvTasNet pipeline into Beta (#2669) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2669 Reviewed By: carolineechen, mthrok Differential Revision: D39433560 Pulled By: nateanl fbshipit-source-id: 5b652b31c00badb37b27a32ac25b422a5bcc74cb CUDA 11.3 remove. New Stable version is 11.6 (#2670) Summary: CUDA 11.3 Removing. Core PR: https://github.com/pytorch/pytorch/pull/84866 cc malfet ptrblck Pull Request resolved: https://github.com/pytorch/audio/pull/2670 Reviewed By: malfet, osalpekar Differential Revision: D39449263 Pulled By: atalman fbshipit-source-id: f86bb119685ead3ffcabd92c4bb8076aecde4095 Move Hybrid Demucs pipeline to beta (#2673) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2673 Reviewed By: mthrok Differential Revision: D39507612 Pulled By: carolineechen fbshipit-source-id: 3a9ee53f72cabd6e3085c76867017be4a6ed7f53 Add Decoder LM Docs (#2658) Summary: modifications to ctc decoder LM docstrings on top of https://github.com/pytorch/audio/issues/2657 Pull Request resolved: https://github.com/pytorch/audio/pull/2658 Reviewed By: mthrok Differential Revision: D39468921 Pulled By: carolineechen fbshipit-source-id: c5497cc2fa22fb98a304d037e27c91bf68a9ad6a Tweak badge link URL generation (#2677) Summary: Currently, the way feature badges are generated assumes that both documentations and the supported features page are on the same level from the root. This does not work when we introduce `:autosummary:` which generates individual documentation pages one level below. This commit changes it so that links to the supported features page are properly relative from the documentation level. There is no appearance change from this commit. Pull Request resolved: https://github.com/pytorch/audio/pull/2677 Reviewed By: carolineechen Differential Revision: D39507451 Pulled By: mthrok fbshipit-source-id: f18da4201f0eb747586be21c8bd9a958217aebc2 Move conv_tasnet_base doc out of prototype (#2675) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2675 Reviewed By: carolineechen Differential Revision: D39515996 Pulled By: nateanl fbshipit-source-id: 5824375f6a758af21b6ad6c635dd06081663644f Consolidate bibliography / reference (#2676) Summary: Preparation for the adoptation of `autosummary`. Replace `:footcite:` with `:cite:` and introduce dedicated reference page, as `:footcite:` does not work well with `autosummary`. Example: https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/datasets.html#cmuarctic https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/references.html Pull Request resolved: https://github.com/pytorch/audio/pull/2676 Reviewed By: carolineechen Differential Revision: D39509431 Pulled By: mthrok fbshipit-source-id: e6003dd01ec3eff3d598054690f61de8ee31ac9a Update doc theme to the latest (#2679) Summary: To follow the change related to Linux Foundation movement. (we are still pinning the theme version so that our customization does not break randomly.) Pull Request resolved: https://github.com/pytorch/audio/pull/2679 Reviewed By: carolineechen Differential Revision: D39531566 Pulled By: mthrok fbshipit-source-id: 64353577d05f9dbda00dd9d10b9ebcedddfdce5b Update Sphinx to 5.1.1 (#2678) Summary: Previous versions of Sphinx reported wrong path for return class. This issue is fixed on the latest Sphinx. It allows to remove the patch we apply in `conf.py`. This is essential for the adoptation of `:autosummary:`, as it won't render correctly with the patch. https://output.circle-artifacts.com/output/job/19d93ede-08de-4b9e-9d66-67ca5dab964e/artifacts/0/docs/pipelines.html Pull Request resolved: https://github.com/pytorch/audio/pull/2678 Reviewed By: carolineechen Differential Revision: D39509447 Pulled By: mthrok fbshipit-source-id: e104bc6a87f32cba6c549a9fe8f2d1e489ee27e4 Switch to use conda install action for m1 builds (#2674) Summary: Usage setup-minicoda action for m1 build We want to try to address space issues on m1. The following action: ``` pytorch/test-infra/.github/actions/setup-miniconda@main ``` Sets up miniconda in temp folder which should be cleaned between runs Pull Request resolved: https://github.com/pytorch/audio/pull/2674 Reviewed By: jeanschmidt Differential Revision: D39540481 Pulled By: atalman fbshipit-source-id: 0596598ab6b2f99c775aa0c9e14a3a388533068d Adopt `:autosummary:` in `torchaudio.io` module doc (#2681) Summary: This commit adopts :autosummary: directive to `torchaudio.io` module. It adds table of contents on `torchaudio.io` level. https://output.circle-artifacts.com/output/job/282089d1-c120-4d22-809f-0e0ac0947c37/artifacts/0/docs/io.html <img width="1094" alt="Screen Shot 2022-09-16 at 7 33 32 AM" src="https://user-images.githubusercontent.com/855818/190520248-27e469f8-7689-4dc2-b591-7b3f08bb4dff.png"> https://output.circle-artifacts.com/output/job/282089d1-c120-4d22-809f-0e0ac0947c37/artifacts/0/docs/generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader <img width="1108" alt="Screen Shot 2022-09-16 at 7 33 59 AM" src="https://user-images.githubusercontent.com/855818/190520292-d090fed0-2f18-4961-b9f3-9e4808fd437e.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2681 Reviewed By: carolineechen Differential Revision: D39560459 Pulled By: mthrok fbshipit-source-id: 3de5f22b8d8d0834dfd8bac8619fbfaa44c5f4dd Adopt `:autosummary:` in `torchaudio.models.decoder` module doc (#2684) Summary: * Adopts `:autosummary:` in decoder module doc * Hide the constructor signature of `CTCDecoder` as `ctc_decoder` function is the one client code is supposed to be using. * Introduce `children` property to `CTCDecoderLMState` otherwise it does not show up in the doc. https://output.circle-artifacts.com/output/job/7aac5eb9-7d2d-4f63-bcdf-83a6f40b4e5a/artifacts/0/docs/models.decoder.html <img width="748" alt="Screen Shot 2022-09-16 at 5 23 22 PM" src="https://user-images.githubusercontent.com/855818/190592409-0c2ec8a4-d2cf-4d76-a965-8a570faaeb1a.png"> https://output.circle-artifacts.com/output/job/7aac5eb9-7d2d-4f63-bcdf-83a6f40b4e5a/artifacts/0/docs/generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder <img width="723" alt="Screen Shot 2022-09-16 at 5 23 53 PM" src="https://user-images.githubusercontent.com/855818/190592501-3fad1e07-ae3e-44f5-93be-f33181025390.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2684 Reviewed By: carolineechen Differential Revision: D39574272 Pulled By: mthrok fbshipit-source-id: d977660bd46f5cf98c535adbf2735be896b28773 Adopt `:autosummary:` in `torchaudio.transforms` module doc (#2683) Summary: * Introduce the mini-index at `torchaudio.transforms` page. * Add "Augmentations" subsection. * Also updated the overall introduction. https://output.circle-artifacts.com/output/job/1b65246a-403c-4d2c-b97d-d1b582d8b4e5/artifacts/0/docs/transforms.html <img width="721" alt="Screen Shot 2022-09-16 at 5 20 08 PM" src="https://user-images.githubusercontent.com/855818/190591795-97c169db-a95b-480a-8d3c-d80072efa045.png"> <img width="755" alt="Screen Shot 2022-09-16 at 5 20 28 PM" src="https://user-images.githubusercontent.com/855818/190591828-03026918-febd-4194-91aa-7d8f704e17cc.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2683 Reviewed By: carolineechen Differential Revision: D39574255 Pulled By: mthrok fbshipit-source-id: a4beed7cacbb5184bad96efa903a3a1123dab627 [Nova] Remove Extraneous Build Scripts (#2695) Summary: There is a single pre/post script needed for building torchaudio. This PR: 1. Removes the old conda-specific build script 2. Renames the wheel script to be a general name Pull Request resolved: https://github.com/pytorch/audio/pull/2695 Reviewed By: kit1980 Differential Revision: D39631971 Pulled By: osalpekar fbshipit-source-id: 52b49a6e792536b6264228c01ac356d247b18ea8 Update nightly wheels to ROCm5.2 (#2672) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2672 Reviewed By: atalman Differential Revision: D39468320 Pulled By: mthrok fbshipit-source-id: 0e7bd4fd922ba0db51700e140b95328a5b687a6f Adopt `:autosummary:` in `torchaudio.functional` module doc (#2693) Summary: https://output.circle-artifacts.com/output/job/b23174d2-5cee-4ee9-be39-3228b9ae4abe/artifacts/0/docs/functional.html <img width="1133" alt="Screen Shot 2022-09-20 at 11 19 23 AM" src="https://user-images.githubusercontent.com/855818/191152824-96c5b16c-bd38-4656-b1ae-0b58699dbd62.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2693 Reviewed By: carolineechen Differential Revision: D39650930 Pulled By: mthrok fbshipit-source-id: 28b5b03d21b922e37e02bfddda2bf1dea696cc18 Add Speech Commands metadata function (#2687) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2687 Reviewed By: mthrok Differential Revision: D39647596 Pulled By: carolineechen fbshipit-source-id: 8ff874fc1e828130f6754e83ce1f702ca13dfac0 Adopt `:autosummary:` in `torchaudio.models` module doc (#2690) Summary: * Introduce the mini-index at `torchaudio.models` page. https://output.circle-artifacts.com/output/job/25e59810-3866-4ece-b1b7-8a10c7a2286d/artifacts/0/docs/models.html <img width="1042" alt="Screen Shot 2022-09-20 at 1 20 50 PM" src="https://user-images.githubusercontent.com/855818/191166816-83314ad1-8b67-475b-aa10-d4cc59126295.png"> <img width="1048" alt="Screen Shot 2022-09-20 at 1 20 58 PM" src="https://user-images.githubusercontent.com/855818/191166829-1ceb65e0-9506-4328-9a2f-8b75b4e54404.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2690 Reviewed By: carolineechen Differential Revision: D39654948 Pulled By: mthrok fbshipit-source-id: 703d1526617596f647c85a7148f41ca55fffdbc8 Support in-memory decoding via Tensor wrapper in StreamReader (#2694) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2694 This commit adds Tensor type as input to `StreamReader`. The Tensor is interpreted as byte string buffer. Reviewed By: hwangjeff Differential Revision: D39467630 fbshipit-source-id: 6369eed5e16fbb657568bf6bb80d703483d72f8e Add StreamReader Tensor Binding to src (#2699) Summary: In https://github.com/pytorch/audio/issues/2694 CMakeLists.txt was not properly updated, so the tests are failing. This commit fix it. Pull Request resolved: https://github.com/pytorch/audio/pull/2699 Reviewed By: carolineechen Differential Revision: D39687409 Pulled By: mthrok fbshipit-source-id: 2e14f3c478f1f8a112a03839f2dbcca51215fed7 Adopt `:autosummary:` in `torchaudio.pipelines` module doc (#2689) Summary: * Introduce the mini-index at `torchaudio.pipelines` page. * Add introductions * Update pipeline tutorials https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/pipelines.html <img width="1163" alt="Screen Shot 2022-09-20 at 1 23 29 PM" src="https://user-images.githubusercontent.com/855818/191167049-98324e93-2e16-41db-8538-3b5b54eb8224.png"> <img width="1115" alt="Screen Shot 2022-09-20 at 1 23 49 PM" src="https://user-images.githubusercontent.com/855818/191167071-4770f594-2540-43a4-a01c-e983bf59220f.png"> https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/generated/torchaudio.pipelines.RNNTBundle.html#torchaudio.pipelines.RNNTBundle <img width="1108" alt="Screen Shot 2022-09-20 at 1 24 18 PM" src="https://user-images.githubusercontent.com/855818/191167123-51b33a5f-c30c-46bc-b002-b05d2d0d27b7.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2689 Reviewed By: carolineechen Differential Revision: D39691253 Pulled By: mthrok fbshipit-source-id: ddf5fdadb0b64cf2867b6271ba53e8e8c0fa7e49 Add metadata mode for various datasets (#2697) Summary: Add metadata mode for the following SUPERB benchmark datasets - QUESST14 - Fluent Speech Commands - VoxCeleb1 follow ups: - Add metadata mode for LibriMix -- waiting for unit tests to merge - Add IEMOCAP + SNIPS datasets Pull Request resolved: https://github.com/pytorch/audio/pull/2697 Reviewed By: mthrok Differential Revision: D39666809 Pulled By: carolineechen fbshipit-source-id: 3a8f07627acceed70f960f47e694efad75b108c2 Update and fix tutorials (#2701) Summary: * Fix Sphinx warning * Update asset management Pull Request resolved: https://github.com/pytorch/audio/pull/2701 Reviewed By: carolineechen Differential Revision: D39714126 Pulled By: mthrok fbshipit-source-id: a5b04cfbf8bedce67c621b6bfe1dcd975b343313 Adopt `:autosummary:` in `torchaudio.datasets` module doc (#2692) Summary: * Introduce the mini-index at `torchaudio.datasets` page. * Standardize the format of return type docstring. https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/datasets.html <img width="936" alt="Screen Shot 2022-09-21 at 6 56 52 PM" src="https://user-images.githubusercontent.com/855818/191475141-a97f2bea-705f-49bc-8c34-6ec869e76793.png"> https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/generated/torchaudio.datasets.CMUDict.html#torchaudio.datasets.CMUDict <img width="1069" alt="Screen Shot 2022-09-21 at 6 57 32 PM" src="https://user-images.githubusercontent.com/855818/191475293-e3302528-27ea-4212-9c12-fd6d900fdf3e.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2692 Reviewed By: carolineechen Differential Revision: D39687463 Pulled By: mthrok fbshipit-source-id: 4175fc15388817d2fe76206188618dd1576281df Introduce IO section to getting started tutorials (#2703) Summary: Since that new tutorials for StreamWriter are being added, there are more tutorials for media IO than the rest. So this commit introduces sub-index for IO tutorials. Pull Request resolved: https://github.com/pytorch/audio/pull/2703 Reviewed By: carolineechen Differential Revision: D39769049 Pulled By: mthrok fbshipit-source-id: 19a3981bc624fdce1d5d703c67e28a751a15e812 [Nova] Moving Linux Wheels over to Nova (#2702) Summary: This does 2 things: Comments out Linux Wheels-related jobs in CircleCI so that they are not run on nightlies/releases. Adds a GHA workflow that calls the build workflow in pytorch/test-infra. Testing: Verified that the builds are triggered by this workflow, and all builds are green: https://github.com/pytorch/audio/actions/runs/3109635749/jobs/5040029155 Pull Request resolved: https://github.com/pytorch/audio/pull/2702 Reviewed By: seemethere Differential Revision: D39756852 Pulled By: osalpekar fbshipit-source-id: 7e222d80ca0720e3be43b929f1e55f5c0166b947 [perf][5/5] Replace IValue::toString()->string() with IValue::toStringRef() (#2700) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2700 ATT for pytorch/audio Reviewed By: mthrok Differential Revision: D39707243 fbshipit-source-id: 1dc2a5a9fe913a9071e6df679e39d632b75212fb Add CUDA version check (#2707) Summary: Adds check to ensure that TorchAudio and PyTorch versions use the same CUDA version. Pull Request resolved: https://github.com/pytorch/audio/pull/2707 Reviewed By: mthrok Differential Revision: D39791154 Pulled By: hwangjeff fbshipit-source-id: de00889c7bac897c6b8762502f9d37797016b71d Fix CUDA check (#2710) Summary: `torch.version.cuda` can return a string of form X.X or X.X.X. This PR modifies the CUDA version check to account for this. Pull Request resolved: https://github.com/pytorch/audio/pull/2710 Reviewed By: carolineechen, nateanl Differential Revision: D39796810 Pulled By: hwangjeff fbshipit-source-id: b483bd8200195844d65d0caddebaf1b10f939b64 Remove linux wheel from circleci (#2714) Summary: Remove linux wheel from circleci Pull Request resolved: https://github.com/pytorch/audio/pull/2714 Reviewed By: weiwangmeta Differential Revision: D39816121 Pulled By: atalman fbshipit-source-id: a3c99b530896888d7b4271d8b3f27f3c986b3480 Fix windows tests related to old conda on circleci (#2704) Summary: Conda version on circleCI prints following message: ``` ==> WARNING: A newer version of conda exists. <== current version: 4.6.14 latest version: 4.14.0 ``` and as a result this error: ``` + /c/tools/miniconda3/Scripts/conda.exe install -v -y -c pytorch-nightly -c nvidia pytorch numpy ffmpeg pytorch-cuda=11.6 Collecting package metadata: ...working... done Solving environment: ...working... Too long with no output (exceeded 30m0s): context deadline exceeded ``` This should update the conda version running on the system and allow us to install pytorch and run some tests. Pull Request resolved: https://github.com/pytorch/audio/pull/2704 Reviewed By: weiwangmeta Differential Revision: D39820037 Pulled By: atalman fbshipit-source-id: 4a82a7a6cbe3dc1a5807ac669e2fa79f454037fa [Nova] Add build-type argument for when upload should be triggered (#2706) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2706 Reviewed By: kit1980 Differential Revision: D39786253 Pulled By: osalpekar fbshipit-source-id: 2a0c427f57e5c70ff1cf419b7e0c2316e5f0e16c Back out "[audio][PR] [Nova] Moving Linux Wheels over to Nova" (#2718) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2718 Original commit changeset: 7e222d80ca07 Original Phabricator Diff: D39756852 (https://github.com/pytorch/audio/commit/7ba7cf4d24a2967b8fa4aaff437116524281f8fd) Reviewed By: weiwangmeta Differential Revision: D39839899 fbshipit-source-id: f5605eb9882f7c7f0008e88338ab711131b29404 Fix mismatched cuda version in smoke tests on windows wheels (#2721) Summary: Example job that was failing previously: https://app.circleci.com/pipelines/github/pytorch/audio/12796/workflows/ae96794a-6df4-4a2a-84df-ada7a7250045/jobs/927709 The failure: ``` "Detected that PyTorch and TorchAudio were compiled with different CUDA versions. " RuntimeError: Detected that PyTorch and TorchAudio were compiled with different CUDA versions. PyTorch has CUDA version 11.7 whereas TorchAudio has CUDA version 11.6. Please install the TorchAudio version that matches your PyTorch version. ``` Has install command: ``` pip install $(ls ~/workspace/torchaudio*.whl) -f "https://download.pytorch.org/whl/${UPLOAD_CHANNEL}/torch_${UPLOAD_CHANNEL}.html" pip install /c/Users/circleci/workspace/torchaudio-0.13.0.dev20220927+cu116-cp37-cp37m-win_amd64.whl -f https://download.pytorch.org/whl/nightly/torch_nightly.html ``` Linux job (succeeds) for uses different "-f" (find links) url, that includes specific cuda version: https://app.circleci.com/pipelines/github/pytorch/audio/12809/workflows/aadca2ab-5a00-4a0a-ab6a-4a1b7a503713/jobs/927861 Command: ``` pip install $(ls ~/workspace/torchaudio*.whl) -f "https://download.pytorch.org/whl/${UPLOAD_CHANNEL}/${CU_VERSION}/torch_${UPLOAD_CHANNEL}.html" pip install /root/workspace/torchaudio-0.13.0.dev20220927+cu116-cp37-cp37m-linux_x86_64.whl -f https://download.pytorch.org/whl/nightly/cu116/torch_nightly.html ``` This PR makes Windows installation match the linux one. Testing: * verified command manually on Circle CI: ``` >>> import torch >>> import torchaudio C:\tools\miniconda3\lib\site-packages\torchaudio\compliance\kaldi.py:22: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\utils\tensor_numpy.cpp:77.) EPSILON = torch.tensor(torch.finfo(torch.float).eps) C:\tools\miniconda3\lib\site-packages\torchaudio\backend\utils.py:62: UserWarning: No audio backend is available. warnings.warn("No audio backend is available.") ``` Co-authered: weiwangmeta Pull Request resolved: https://github.com/pytorch/audio/pull/2721 Reviewed By: hwangjeff Differential Revision: D39870805 Pulled By: izaitsevfb fbshipit-source-id: 2957cba4f53d00783a5c07099f24050ce15e7d1c Removing cuda102 (#2715) Summary: Removing cuda102 Pull Request resolved: https://github.com/pytorch/audio/pull/2715 Reviewed By: hwangjeff Differential Revision: D39823444 Pulled By: atalman fbshipit-source-id: c11d798ab86cf9a6d5ed3804958b4a0c2f8a87ff Revert "Removing cuda102 (#2715)" (#2723) Summary: Revert this fot now untill docker is updated Pull Request resolved: https://github.com/pytorch/audio/pull/2723 Reviewed By: nateanl Differential Revision: D39900382 Pulled By: atalman fbshipit-source-id: f8701e359bc11e8f9f3a29144f7e7da336a470da Cuda 102 deprecation (#2724) Summary: Cuda 10.2 deprecation, migration of unit tests from cuda 10.2 to cuda 11.6 Pull Request resolved: https://github.com/pytorch/audio/pull/2724 Reviewed By: weiwangmeta Differential Revision: D39912484 Pulled By: atalman fbshipit-source-id: e760b630375eae94384cda68d24f83ef46ada6d9 Delete packaging/README.md (#2730) Summary: The file looks hopelessly outdated. Pull Request resolved: https://github.com/pytorch/audio/pull/2730 Reviewed By: mthrok Differential Revision: D39993805 Pulled By: kit1980 fbshipit-source-id: f5ad97c83873061175455cc7b129ec71a9ec3d7d Add citation for MuST-C dataset in Emformer RNNT pipeline. (#2728) Summary: The MuST-C reference is added in https://github.com/pytorch/audio/pull/2689. This PR adds the citation to the RNNT pipeline documentation. Pull Request resolved: https://github.com/pytorch/audio/pull/2728 Reviewed By: carolineechen Differential Revision: D39990882 Pulled By: nateanl fbshipit-source-id: 011057952dd8aa30a4cb7c7af0ac75123e329d7e Adopt :autosummary: to multiple modules (#2664) Summary: Adopt `:autosummary:` to various modules * torchaudio.compliance.kaldi * torchaudio.sox_effects * torchaudio.utils Pull Request resolved: https://github.com/pytorch/audio/pull/2664 Reviewed By: nateanl Differential Revision: D39841873 Pulled By: mthrok fbshipit-source-id: ff4fa6976324fca5f35b737b715f976e2a722bac Add StreamWriter media device/streaming tutorial (#2708) Summary: https://output.circle-artifacts.com/output/job/213c71c8-c9b5-4516-af92-a2f8dab2c9fd/artifacts/0/docs/tutorials/streamwriter_advanced.html Pull Request resolved: https://github.com/pytorch/audio/pull/2708 Reviewed By: carolineechen Differential Revision: D40013310 Pulled By: mthrok fbshipit-source-id: 7226b021ce2fe951b3bf0bd41e93a6bbcf696124 Tweak tutorials (#2733) Summary: * Port downstream change https://github.com/pytorch/tutorials/pull/2060 * Fix inter-tutorial links and references Pull Request resolved: https://github.com/pytorch/audio/pull/2733 Reviewed By: hwangjeff Differential Revision: D40086902 Pulled By: hwangjeff fbshipit-source-id: 00b04c6a1b68fb9fadd52b610b26ecaab15d52d8 Increase CircleCi no_output_timeout for `install binaries` steps (#2734) Summary: The goal is to to reduce the number of job failures due to timeouts, see https://app.circleci.com/pipelines/github/pytorch/audio/12882/workflows/f99da1a5-32e6-4bac-8ceb-fbf36d693e2d/jobs/936363?invite=true#step-105-105 for example. Pull Request resolved: https://github.com/pytorch/audio/pull/2734 Reviewed By: weiwangmeta, atalman Differential Revision: D40077578 fbshipit-source-id: 573f43a4d088a7086fa6925ac5ba1fdd1e8f39ec Torchaudio load libary path fix for windows python 3.8 (#2735) Summary: Torchaudio load libary path fix for windows and python = 3.8 Fixes: https://github.com/pytorch/audio/issues/2726 Fixes following issue: ``` >>> import torchaudio Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\__init__.py", line 1, in <module> from torchaudio import ( # noqa: F401 File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 128, in <module> _init_extension() File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 98, in _init_extension _load_lib("libtorchaudio") File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 52, in _load_lib torch.ops.load_library(path) File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torch\_ops.py", line 573, in load_library ctypes.CDLL(path) File "C:\Users\atalman\miniconda3\envs\mywin38\lib\ctypes\__init__.py", line 373, in __init__ self._handle = _dlopen(self._name, mode) FileNotFoundError: Could not find module 'C:\Users\atalman\miniconda3\envs\mywin38\Lib\site-packages\torchaudio\lib\libtorchaudio.pyd' (or one of its dependencies). Try using the full path with constructor syntax. >>> ``` Caused by dlls not being found in the conda environment ``` C:\Users\atalman\miniconda3\envs\mywin38\bin\ ``` While this environment is set correctly in PATH its ignored with Python = 3.8 Please refer to: https://stackoverflow.com/questions/59330863/cant-import-dll-module-in-python Pull Request resolved: https://github.com/pytorch/audio/pull/2735 Reviewed By: carolineechen Differential Revision: D40112293 Pulled By: carolineechen fbshipit-source-id: c7fc9bb49fc3ec4a2855c6ea473f36808103ed1e Add StreamWriter tutorial (#2698) Summary: Add a tutorial for basic usage of torchaudio.io.StreamWriter. https://output.circle-artifacts.com/output/job/55d9a495-af7a-483c-84cb-de9a08cfd2f3/artifacts/0/docs/tutorials/streamwriter_basic_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2698 Reviewed By: carolineechen Differential Revision: D40133007 Pulled By: carolineechen fbshipit-source-id: 141f692c32343981bfb228357f21562ffe36f623 Fix sphinx gallery list in io doc (#2736) Summary: Specifying multiple object in `:minigallery:` directive shows duplicated tutorials. This commit fixes it by listing tutorials based on module used. https://output.circle-artifacts.com/output/job/c3da2a22-40d5-4e2d-b73a-28b39e712817/artifacts/0/docs/io.html Before: <img width="694" alt="Screen Shot 2022-10-07 at 7 04 35 AM" src="https://user-images.githubusercontent.com/855818/194427092-ca1202e7-0731-4c18-b48b-24923d692a4a.png"> After: <img width="648" alt="Screen Shot 2022-10-07 at 7 03 14 AM" src="https://user-images.githubusercontent.com/855818/194426950-5b780458-2bf0-43ef-b020-fcbbfdf8d41b.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2736 Reviewed By: carolineechen Differential Revision: D40160247 Pulled By: carolineechen fbshipit-source-id: 547496f9b569ff7a4d70db97e90f3ea503344477 Modify `info_audio` to compute and return number of frames if not found in stream info (#2740) Summary: Modifies `info_audio` to compute and return number of frames if not found in stream info. This resolves the `num_frames == 0` issue for mp3 that's cited in https://github.com/pytorch/audio/issues/2524. Pull Request resolved: https://github.com/pytorch/audio/pull/2740 Reviewed By: nateanl Differential Revision: D40168639 Pulled By: nateanl fbshipit-source-id: bb45baa0f9cd56844315b04e40ab9835d825fc24 Update sox info docstring to account for mp3 frame count handling (#2742) Summary: Updates sox info docstring to account for mp3 frame count handling fix introduced in https://github.com/pytorch/audio/issues/2740. Pull Request resolved: https://github.com/pytorch/audio/pull/2742 Reviewed By: nateanl Differential Revision: D40189846 Pulled By: nateanl fbshipit-source-id: d6371418d7d4867dd0b97ee72ebf846d5c93dc30 Update HW video processing tutorial (#2739) Summary: * Add HW encoding to HW tutorial https://colab.research.google.com/drive/1DDah_IaGULEO66CfQWltRqaVheBkiXdN#scrollTo=eXzKSVrHk1vS Pull Request resolved: https://github.com/pytorch/audio/pull/2739 Reviewed By: hwangjeff Differential Revision: D40197086 Pulled By: hwangjeff fbshipit-source-id: 1780a5419f6705f7c24ba96bd46c3310438af7db Add IEMOCAP dataset (#2732) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2732 Reviewed By: nateanl Differential Revision: D40186996 Pulled By: nateanl fbshipit-source-id: a0ad325b7153c9e580dad2c515730dadbe8840c4 Fix HuBERT docstring (#2746) Summary: The docstring of `wav2vec2` argument is wrong. Fix it in this PR. Pull Request resolved: https://github.com/pytorch/audio/pull/2746 Reviewed By: carolineechen Differential Revision: D40225995 Pulled By: nateanl fbshipit-source-id: 770e9c928ebebd7b6307e181601eb64625d668da Add unit test for LibriMix dataset (#2659) Summary: Besides the unit test, the PR also addresses these issues: - The original `LibriMix` dataset only supports "min" mode, which means the audio length is the minimum of all clean sources. It is default for source separation task. Users may also want to use "max" mode which allows for end-to-end separation and recognition. The PR adds ``mode`` argument to let users decide which dataset they want to use. - If the task is ``"enh_both"``, the target is the audios in ``mix_clean`` instead of separate clean sources. The PR fixes it to use ``mix_clean`` as target. Pull Request resolved: https://github.com/pytorch/audio/pull/2659 Reviewed By: carolineechen Differential Revision: D40229227 Pulled By: nateanl fbshipit-source-id: fc07e0d88a245e1367656d3767cf98168a799235 Add Snips Dataset (#2738) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2738 Reviewed By: carolineechen Differential Revision: D40238099 Pulled By: nateanl fbshipit-source-id: c5cc94c2a348a6ef34c04b8dd26114ecb874d73e Fix windows python 3.8 loading path (#2747) Summary: Fix windows python 3.8 loading path Pull Request resolved: https://github.com/pytorch/audio/pull/2747 Reviewed By: nateanl Differential Revision: D40264326 Pulled By: nateanl fbshipit-source-id: f4a24757de7b48c63a7481034eb11fc3ff174327 Add metadata for Librimix (#2751) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2751 Reviewed By: nateanl Differential Revision: D40267874 Pulled By: carolineechen fbshipit-source-id: 4e45a02c650ed65c05cde82289a400a3be877927 Increase inactivity timeout for binary build jobs (#2754) Summary: Increase inactivity timeout for binary build jobs Pull Request resolved: https://github.com/pytorch/audio/pull/2754 Reviewed By: carolineechen Differential Revision: D40275368 Pulled By: atalman fbshipit-source-id: 5e682bb78bda640d615f874fbdf0e650b5a38ee0 Skip hubert xlarge torchscript test (#2758) Summary: a couple of circleci unittests are failing during hubert xlarge torchscript test, which has been known to fail on Windows in the past (#65776). this PR disables this test on circleci cc atalman Pull Request resolved: https://github.com/pytorch/audio/pull/2758 Reviewed By: mthrok Differential Revision: D40290535 Pulled By: carolineechen fbshipit-source-id: 5c5fb43434a517b6c439a8cb8e853015d1550a57 Improve wav2vec2/hubert model for pre-training (#2716) Summary: This PR improves the Wav2Vec2/HuBERT model regarding model pre-training. - The model initialization of positional embedding and transformer module is essential to model pre-training. The accuracy of unmasked frames should be higher than masked frames, as it is an easier task. but without the initialization, the accuracy of masked frames is higher than unmasked frames. Compared the performance after two epochs with 16 GPUs. - With model initialization, the accuracies of masked/unmasked frames are 0.08/0.11. - Without model initialization, the accuracies of masked/unmasked frames are 0.06/0.04. - After adding the model initialization, the gradient is easy to overflow (aka `nan` gradient). In paper [Self-Supervised Learning for speech recognition with Intermediate layer supervision](https://arxiv.org/abs/2112.08778) the authors propose a simple but effective method to mitigate the overflow issue, by scaling down the multiplication of query and key and subtracting the maximum value from it (subtracting a constant value won't change the output of softmax). Then it guarantees the value won't be overflowed. - In the original fairseq, the mask indices are generated by `numpy.random.choice`. Here replace `torch.multinomial` with `torch.randperm`. (cc carolineechen). Other improvements within training scripts will be included in a separate PR. Pull Request resolved: https://github.com/pytorch/audio/pull/2716 Reviewed By: xiaohui-zhang Differential Revision: D39832189 Pulled By: nateanl fbshipit-source-id: f4d2a473a79ad63add2dd16624bd155d5ce4de27 Improve hubert recipe for pre-training and fine-tuning (#2744) Summary: following pr https://github.com/pytorch/audio/issues/2716 - For preprocessing - The HuBERT feature takes lots of memory which may not fit some machines. Enable to use a subset of feature for training a k-means model. - For pre-training - Normalize the loss based on the total number of masked frames across all GPUs. - Use mixed precision training. fp16 is not well supported in pytorch_lightning. - Log accuracies of masked/unmasked frames during training. - Clip the gradients with norm `10.0`. - For ASR fine-tuning - Normalize the loss based on the total number of batches across all GPUs, same as in the conformer recipe of TorchAudio. - Use mixed precision training. - Add "|" after the end of transcription to capture the silence/word termination, same as in fairseq recipe. - Update the WER results on LibriSpeech dev and test sets. | | WER% (Viterbi)| WER% (KenLM) | |:-----------------:|--------------:|--------------:| | dev-clean | 10.9 | 4.2 | | dev-other | 17.5 | 9.4 | | test-clean | 10.9 | 4.4 | | test-other | 17.8 | 9.5 | Pull Request resolved: https://github.com/pytorch/audio/pull/2744 Reviewed By: carolineechen Differential Revision: D40282322 Pulled By: nateanl fbshipit-source-id: 4723584c912e70e8970149fe09de005385eaab90 Fix typos in tacotron2 tutorial (#2761) Summary: `publishe`->`published` Also, not sure if it should be `pre-trained weight is published` or `pre-trained weights are published` Pull Request resolved: https://github.com/pytorch/audio/pull/2761 Reviewed By: carolineechen Differential Revision: D40313042 Pulled By: malfet fbshipit-source-id: c22085ca0b1125a06aa04bf38231d0a9fbfed00b Add gtzan download note (#2763) Summary: GTZAN download link is no longer working, so the torchaudio download functionality for GTZAN does not work properly, per https://github.com/pytorch/audio/issues/2743. Add a note in the docs to reflect this discovery. Pull Request resolved: https://github.com/pytorch/audio/pull/2763 Reviewed By: nateanl, mthrok Differential Revision: D40315071 Pulled By: carolineechen fbshipit-source-id: 3250326c45d227546a9c62b33ba890199ad19242 Update tutorial author information (#2764) Summary: Adding and updating author information. Pull Request resolved: https://github.com/pytorch/audio/pull/2764 Reviewed By: carolineechen Differential Revision: D40332427 Pulled By: mthrok fbshipit-source-id: 4f04c7351386c122e3b0a45c2ed1757a04b7dc9a Add custom lm example to decoder tutorial (#2762) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2762 Reviewed By: mthrok Differential Revision: D40332603 Pulled By: carolineechen fbshipit-source-id: 2de51265adc81b4728f4d6798d287bd2eccf5251 Fix CTCDecoder doc (#2766) Summary: * Document `__call__` instead of `__init__` * List CTCHypothesis first as it is used in combination with CTCDecoder * Fix indentation of score method docstring Pull Request resolved: https://github.com/pytorch/audio/pull/2766 Reviewed By: carolineechen Differential Revision: D40349388 Pulled By: mthrok fbshipit-source-id: 5e512e6c2b29d3533eb62d09b289154ccd1abf4c Fix fading in hybrid demucs tutorial (#2769) Summary: The separation applies on chunks of audios to avoid OOM. The combination of consecutive chunks is described in the graph: ![image](https://user-images.githubusercontent.com/8653221/195691886-002844e6-4ec5-41de-8910-df8046553998.png) In the last audio chunk, there is no future chunk to be combined, hence the overlap on the right side doesn't need to be faded. Pull Request resolved: https://github.com/pytorch/audio/pull/2769 Reviewed By: carolineechen Differential Revision: D40358382 Pulled By: nateanl fbshipit-source-id: ec8be895d7a67acb257e2693b64922397163ed5e Fix leaking matplotlib figure (#2771) Summary: In StreamWriter basic usage tutorial, matplotlib is used to generate raster images of waveforms, and the figure used is left unshown in the resulting tutorial with the use of ``sphinx_gallery_defer_figures`` command. It turned out that this figure is shown in the next code block executed by Sphinx Gallery, and the figure is placed in totally unrelated place. https://pytorch.org/audio/main/tutorials/audio_feature_extractions_tutorial.html <img width="951" alt="Screen Shot 2022-10-14 at 10 06 58 PM" src="https://user-images.githubusercontent.com/855818/195855124-ecd9be49-5085-4acd-9a93-608d9d1ee9ce.png"> This commit fixes it by closing the figure. Pull Request resolved: https://github.com/pytorch/audio/pull/2771 Reviewed By: nateanl Differential Revision: D40382076 Pulled By: mthrok fbshipit-source-id: 015f2bab8492d3b4fbe70e1174c7776a5aa2679a Update resampling tutorial (#2773) Summary: * Refactor benchmark script * Rename `time` variable to avoid (potential) conflicting with time module * Fix `beta` parameter in benchmark (it was not used previously) * Use `timeit` module for benchmark * Add plot * Move the comment on result at the end * Add link to an explanation of aliasing https://output.circle-artifacts.com/output/job/20b57d2f-3614-4161-a18e-e0c1a537739c/artifacts/0/docs/tutorials/audio_resampling_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2773 Reviewed By: carolineechen Differential Revision: D40421337 Pulled By: mthrok fbshipit-source-id: b402f84d4517695daeca75fb84ad876ef9354b3a Update description of HDemucs pipelines (#2774) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2774 Reviewed By: carolineechen Differential Revision: D40445274 Pulled By: nateanl fbshipit-source-id: 6388323a5fa5c548a86829cb3f7cafee5382d18d Add file_name to the returned item in Snips dataset (#2775) Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2775 Reviewed By: carolineechen Differential Revision: D40481144 Pulled By: nateanl fbshipit-source-id: 5d0fb2478767704603a3ec28d74160e7892d4d0e Update download path for speechcommands (#2777) Summary: previous download link for v0.02 did not download the entire dataset, but only the training dataset, resulting in issues when trying to access the testing or validation data. Pull Request resolved: https://github.com/pytorch/audio/pull/2777 Reviewed By: nateanl Differential Revision: D40480605 Pulled By: carolineechen fbshipit-source-id: a594506b4ccfb548a7d5043b716c58463480c103 Add notes on file structure in Voxceleb1 based datasets (#2776) Summary: The file structure of VoxCeleb1 is as follows: ``` root/ └── wav/ └── speaker_id folders ``` Users who use [Kaldi](https://github.com/kaldi-asr/kaldi/blob/f6f4ccaf213f0fe8b26e633a7dc0c802150626a0/egs/voxceleb/v1/local/make_voxceleb1_v2.pl) to get the VoxCeleb1 dataset have "dev" and "test" folders above "wav" folder. However, in the file lists like https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test.txt or https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/iden_split.txt there is not such differentiation. It's not necessary to put the extracted files into separate folders. This PR adds notes in `VoxCeleb1Identification` and `VoxCeleb1Verification` datasets to inform the file structure to users. Pull Request resolved: https://github.com/pytorch/audio/pull/2776 Reviewed By: carolineechen Differential Revision: D40483707 Pulled By: nateanl fbshipit-source-id: ccd1780a72a5b53f0300c2466c3073a293ad7b8d [Nova] New GHA Workflow for Docstring Sync (#2720) Summary: Create a standalone GitHub Actions workflow for Docstring Sync. This job (https://app.circleci.com/pipelines/github/pytorch/audio/12625/workflows/96223ad2-0fcd-4dae-a045-d530aaf9b55c/jobs/907466) currently depends on linux wheels builds, which creates a dependency that makes the migration to Nova trickier. This PR creates a fresh standalone workflow for this job that is triggered per-PR and before nightly/release cuts. Pull Request resolved: https://github.com/pytorch/audio/pull/2720 Reviewed By: izaitsevfb, seemethere Differential Revision: D39863574 Pulled By: osalpekar fbshipit-source-id: 8599dc006693242278857a3dedeb4fddc1eed14b [Nova] Clean commit for Enabling Nova Linux Wheels Workflows (#2719) Summary: Creating this fresh PR since we're reverting the older commit that removed build configs from the CircleCI file. This does not change the existing builds/uploads in CircleCI, and should not break any existing jobs/workflows. This is just to add back workflows to build the Linux Wheels with Nova, upload them to GH artifacts (NOT to the actual nightly channels), and ensure that they produce the same binaries as CircleCI. TO CLARIFY: this does not upload anything to nightly channels, so this PR has not effect on any existing jobs or distributed binaries. We will create a workflow (most likely in test-infra) that does this comparison between the binaries to ensure there is parity between the binaries before we start uploading with Nova. Pull Request resolved: https://github.com/pytorch/audio/pull/2719 Reviewed By: hwangjeff, weiwangmeta Differential Revision: D39866440 Pulled By: osalpekar fbshipit-source-id: 9ebf0402214fcd97cc519801276d85d336617410 Add iemocap variants (#2778) Summary: add ability to load only improvised or only scripted utterances. Pull Request resolved: https://github.com/pytorch/audio/pull/2778 Reviewed By: nateanl Differential Revision: D40511865 Pulled By: carolineechen fbshipit-source-id: e1fe3908ac2aa306ad30c242ddd25762b2268539 Bump version to 0.14 (#2779) Summary: Bump version to 0.14 Pull Request resolved: https://github.com/pytorch/audio/pull/2779 Reviewed By: carolineechen Differential Revision: D40523034 Pulled By: atalman fbshipit-source-id: 325e6ffcac4763a7d83ba600c2c3d9eadae03c31 Fix doc in torchaudio.backend (#2781) Summary: address https://github.com/pytorch/audio/issues/2780 Pull Request resolved: https://github.com/pytorch/audio/pull/2781 Reviewed By: carolineechen, mthrok Differential Revision: D40556794 Pulled By: nateanl fbshipit-source-id: b24912489d41e5663b4b4dcfb8be743fb962097e Remove archive file in gh-pages branch (#2786) Summary: The motivation of generating `artifact.tar.gz` in the `build_docs` job is to easily use it for adding documentation in each stable release. But it is committed into `gh-pages` branch which causes the git repository very huge (see https://github.com/pytorch/audio/issues/2783). This PR removes the tar file from the commit. Pull Request resolved: https://github.com/pytorch/audio/pull/2786 Reviewed By: caroli…
- Loading branch information