Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CTC Beam Search Decoder (KenLM Lexicon) #2072

Closed
wants to merge 13 commits into from

Conversation

carolineechen
Copy link
Contributor

Add support for a CTC Beam Search Decoder with KenLM language model support and lexicon constraint

  • bindings from flashlight / wav2letter project
  • decoder API similar to fairseq's KenLMDecoder
  • factory function for constructing the decoder (while core decoder API is not finalized)

@mthrok
Copy link
Collaborator

mthrok commented Dec 14, 2021

Thanks for the PR.

A) Can you import this diff and see if there is any unexpected memory issue?

Then, Could you split the PR (maybe leaving this PR as-is and make new ones) into

  1. The code ported from FL with only minor cosmetic changes
  2. The custom code you added for torchaudio (C++ and Python)
  3. Build update. (CMake and stuff)

RE 1. we can land it with simple review without actual build process as long as A) does not yield any issue.
Then 2 and 3 become easier to handle.

@facebook-github-bot
Copy link
Contributor

@carolineechen has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@nateanl
Copy link
Member

nateanl commented Dec 14, 2021

Looks like the CI test failures are from Python 3.6. Rebasing the PR should resolve it.

@@ -0,0 +1,39 @@
# Flashlight Decoder Binding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file can be put somewhere more explicit IMO. For example, CONTRIBUTING.md or in prototype directory.

@@ -0,0 +1,222 @@
import torch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The factory function is not imported in prototype/__init__.py. Could you add kenlm_lexicon_decoder to it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prototype/__init__.py should not import anything (with external library dependencies).
I missed it when I was reviewing emformer/rnn-t, but they are pure-python implementation, and they won't cause segmentation fault, so they are exceptionally okay.

If ctcdecoder is imported in prototype/__init__.py, then KenLM becomes mandatory dependency for using anything in prototype module. It will force user who do not care about this decoder but want to use other prototype module to install KenLM, which is inconvenient.

Copy link
Member

@nateanl nateanl Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. If users want to try prototype feature, are they expected to add it to __init__.py manually?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. If we put everything in a separate submodule under prototype, and as long as prototype.__init__.py does not import this submodule, users should be able to perform import normally (from torchaudio.prototype.ctc_decoder import foo), while we mitigate the risk of broken dependency.

torchaudio
└── prototype
    ├── __init__.py  # does not import ctc_decoder module
    └── ctc_decoder
        ├── __init__.py  # Initialize extension module here
        └── ctc_decoder.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Sounds good to me.

torchaudio/prototype/ctc_decoder.py Outdated Show resolved Hide resolved
@carolineechen carolineechen force-pushed the ctc-decoder-binding branch 2 times, most recently from 0f17a0a to 6080f39 Compare December 16, 2021 20:45

#include <stdexcept>

#include <kenlm/lm/model.hh>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <kenlm/lm/model.hh>
#include "lm/model.hh"

Although this is not wrong, but the upstream KenLM defines the header structure this way, so we need to use the local include here. It causes an error like this https://fburl.com/sandcastle/jivo6e9y

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I had a similar issue when installing torchaudio with ctc decoder. The kenlm repository has to be inside the audio directory in order to be built. You can try by installing KenLM outside of audio directory and build torchaudio.

Do you have any idea how to mitigate this issue?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some ideas for build process, but I need to experiment it as it has to be compatible with local build/CI build/fbcode build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would "kenlm/lm/model.hh" work? the current suggestion "lm/model.hh" causes my local build to fail

@facebook-github-bot
Copy link
Contributor

@carolineechen has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Dec 17, 2021
Summary:
part of #2072 -- splitting up the PR for easier review

Add C++ files from [flashlight](https://github.com/flashlight/flashlight) that are needed for building CTC decoder w/ Lexicon and KenLM support

Note: the code here will not be compiled until the build process is changed (future PR)

Pull Request resolved: #2075

Reviewed By: mthrok

Differential Revision: D33186825

Pulled By: carolineechen

fbshipit-source-id: 5b69eea7634f3fae686471d988422942bb784cd9
facebook-github-bot pushed a commit that referenced this pull request Dec 17, 2021
Summary:
part of #2072 -- splitting up the PR for easier review

Add C++ files for binding CTC decoder functionality for Python

Note: the code here will not be compiled until the build process is changed

Pull Request resolved: #2079

Reviewed By: mthrok

Differential Revision: D33196286

Pulled By: carolineechen

fbshipit-source-id: 9fe4a8635b60ebfb594918bab00f5c3dccf96bd2
facebook-github-bot pushed a commit that referenced this pull request Dec 18, 2021
Summary:
After all the C++ code from #2072 are added, this commit will enable decoder/KenLM integration in the build process.

Pull Request resolved: #2078

Reviewed By: carolineechen

Differential Revision: D33198183

Pulled By: mthrok

fbshipit-source-id: 9d7fa76151d06fbbac3785183c7c2ff9862d3128
facebook-github-bot pushed a commit that referenced this pull request Dec 23, 2021
Summary:
Part of #2072 -- splitting up PR for easier review

This PR adds Python decoder API and basic README

Pull Request resolved: #2089

Reviewed By: mthrok

Differential Revision: D33299818

Pulled By: carolineechen

fbshipit-source-id: 778ec3692331e95258d3734f0d4ab60b6618ddbc
@carolineechen carolineechen deleted the ctc-decoder-binding branch December 29, 2021 20:39
xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this pull request May 4, 2022
Summary:
part of pytorch#2072 -- splitting up the PR for easier review

Add C++ files from [flashlight](https://github.com/flashlight/flashlight) that are needed for building CTC decoder w/ Lexicon and KenLM support

Note: the code here will not be compiled until the build process is changed (future PR)

Pull Request resolved: pytorch#2075

Reviewed By: mthrok

Differential Revision: D33186825

Pulled By: carolineechen

fbshipit-source-id: 5b69eea7634f3fae686471d988422942bb784cd9
xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this pull request May 4, 2022
Summary:
part of pytorch#2072 -- splitting up the PR for easier review

Add C++ files for binding CTC decoder functionality for Python

Note: the code here will not be compiled until the build process is changed

Pull Request resolved: pytorch#2079

Reviewed By: mthrok

Differential Revision: D33196286

Pulled By: carolineechen

fbshipit-source-id: 9fe4a8635b60ebfb594918bab00f5c3dccf96bd2
xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this pull request May 4, 2022
Summary:
After all the C++ code from pytorch#2072 are added, this commit will enable decoder/KenLM integration in the build process.

Pull Request resolved: pytorch#2078

Reviewed By: carolineechen

Differential Revision: D33198183

Pulled By: mthrok

fbshipit-source-id: 9d7fa76151d06fbbac3785183c7c2ff9862d3128
xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this pull request May 4, 2022
Summary:
Part of pytorch#2072 -- splitting up PR for easier review

This PR adds Python decoder API and basic README

Pull Request resolved: pytorch#2089

Reviewed By: mthrok

Differential Revision: D33299818

Pulled By: carolineechen

fbshipit-source-id: 778ec3692331e95258d3734f0d4ab60b6618ddbc
mthrok pushed a commit to mthrok/audio that referenced this pull request Dec 13, 2022
* Add Profiling PyTorch workloads with the Instrumentation and Tracing Technology (ITT) API recipe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants