-
Notifications
You must be signed in to change notification settings - Fork 313
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Showing
9 changed files
with
381 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
21 changes: 21 additions & 0 deletions
21
docs/source/model-export/code/export-model-state-dict-pretrained-out.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
2022-10-13 19:09:02,233 INFO [pretrained.py:265] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'encoder_dim': 512, 'nhead': 8, 'dim_feedforward': 2048, 'num_encoder_layers': 12, 'decoder_dim': 512, 'joiner_dim': 512, 'model_warm_step': 3000, 'env_info': {'k2-version': '1.21', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '4810e00d8738f1a21278b0156a42ff396a2d40ac', 'k2-git-date': 'Fri Oct 7 19:35:03 2022', 'lhotse-version': '1.3.0.dev+missing.version.file', 'torch-version': '1.10.0+cu102', 'torch-cuda-available': False, 'torch-cuda-version': '10.2', 'python-version': '3.8', 'icefall-git-branch': 'onnx-doc-1013', 'icefall-git-sha1': 'c39cba5-dirty', 'icefall-git-date': 'Thu Oct 13 15:17:20 2022', 'icefall-path': '/k2-dev/fangjun/open-source/icefall-master', 'k2-path': '/k2-dev/fangjun/open-source/k2-master/k2/python/k2/__init__.py', 'lhotse-path': '/ceph-fj/fangjun/open-source-2/lhotse-jsonl/lhotse/__init__.py', 'hostname': 'de-74279-k2-test-4-0324160024-65bfd8b584-jjlbn', 'IP address': '10.177.74.203'}, 'checkpoint': './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp/pretrained-iter-1224000-avg-14.pt', 'bpe_model': './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500/bpe.model', 'method': 'greedy_search', 'sound_files': ['./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav'], 'sample_rate': 16000, 'beam_size': 4, 'beam': 4, 'max_contexts': 4, 'max_states': 8, 'context_size': 2, 'max_sym_per_frame': 1, 'simulate_streaming': False, 'decode_chunk_size': 16, 'left_context': 64, 'dynamic_chunk_training': False, 'causal_convolution': False, 'short_chunk_size': 25, 'num_left_chunks': 4, 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500} | ||
2022-10-13 19:09:02,233 INFO [pretrained.py:271] device: cpu | ||
2022-10-13 19:09:02,233 INFO [pretrained.py:273] Creating model | ||
2022-10-13 19:09:02,612 INFO [train.py:458] Disable giga | ||
2022-10-13 19:09:02,623 INFO [pretrained.py:277] Number of model parameters: 78648040 | ||
2022-10-13 19:09:02,951 INFO [pretrained.py:285] Constructing Fbank computer | ||
2022-10-13 19:09:02,952 INFO [pretrained.py:295] Reading sound files: ['./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav'] | ||
2022-10-13 19:09:02,957 INFO [pretrained.py:301] Decoding started | ||
2022-10-13 19:09:06,700 INFO [pretrained.py:329] Using greedy_search | ||
2022-10-13 19:09:06,912 INFO [pretrained.py:388] | ||
./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav: | ||
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS | ||
|
||
./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav: | ||
GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONORED BOSOM TO CONNECT HER PARENT FOREVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN | ||
|
||
./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav: | ||
YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION | ||
|
||
|
||
2022-10-13 19:09:06,912 INFO [pretrained.py:390] Decoding Done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,135 @@ | ||
Export model.state_dict() | ||
========================= | ||
|
||
When to use it | ||
-------------- | ||
|
||
During model training, we save checkpoints periodically to disk. | ||
|
||
A checkpoint contains the following information: | ||
|
||
- ``model.state_dict()`` | ||
- ``optimizer.state_dict()`` | ||
- and some other information related to training | ||
|
||
When we need to resume the training process from some point, we need a checkpoint. | ||
However, if we want to publish the model for inference, then only | ||
``model.state_dict()`` is needed. In this case, we need to strip all other information | ||
except ``model.state_dict()`` to reduce the file size of the published model. | ||
|
||
How to export | ||
------------- | ||
|
||
Every recipe contains a file ``export.py`` that you can use to | ||
export ``model.state_dict()`` by taking some checkpoints as inputs. | ||
|
||
.. hint:: | ||
|
||
Each ``export.py`` contains well-documented usage information. | ||
|
||
In the following, we use | ||
`<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless3/export.py>`_ | ||
as an example. | ||
|
||
.. note:: | ||
|
||
The steps for other recipes are almost the same. | ||
|
||
.. code-block:: bash | ||
cd egs/librispeech/ASR | ||
./pruned_transducer_stateless3/export.py \ | ||
--exp-dir ./pruned_transducer_stateless3/exp \ | ||
--bpe-model data/lang_bpe_500/bpe.model \ | ||
--epoch 20 \ | ||
--avg 10 | ||
will generate a file ``pruned_transducer_stateless3/exp/pretrained.pt``, which | ||
is a dict containing ``{"model": model.state_dict()}`` saved by ``torch.save()``. | ||
|
||
How to use the exported model | ||
----------------------------- | ||
|
||
For each recipe, we provide pretrained models hosted on huggingface. | ||
You can find links to pretrained models in ``RESULTS.md`` of each dataset. | ||
|
||
In the following, we demonstrate how to use the pretrained model from | ||
`<https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13>`_. | ||
|
||
.. code-block:: bash | ||
cd egs/librispeech/ASR | ||
git lfs install | ||
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13 | ||
After cloning the repo with ``git lfs``, you will find several files in the folder | ||
``icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp`` | ||
that have a prefix ``pretrained-``. Those files contain ``model.state_dict()`` | ||
exported by the above ``export.py``. | ||
|
||
In each recipe, there is also a file ``pretrained.py``, which can use | ||
``pretrained-xxx.pt`` to decode waves. The following is an example: | ||
|
||
.. code-block:: bash | ||
cd egs/librispeech/ASR | ||
./pruned_transducer_stateless3/pretrained.py \ | ||
--checkpoint ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp/pretrained-iter-1224000-avg-14.pt \ | ||
--bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500/bpe.model \ | ||
--method greedy_search \ | ||
./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav \ | ||
./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav \ | ||
./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav | ||
The above commands show how to use the exported model with ``pretrained.py`` to | ||
decode multiple sound files. Its output is given as follows for reference: | ||
|
||
.. literalinclude:: ./code/export-model-state-dict-pretrained-out.txt | ||
|
||
Use the exported model to run decode.py | ||
--------------------------------------- | ||
|
||
When we publish the model, we always note down its WERs on some test | ||
dataset in ``RESULTS.md``. This section describes how to use the | ||
pretrained model to reproduce the WER. | ||
|
||
.. code-block:: bash | ||
cd egs/librispeech/ASR | ||
git lfs install | ||
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13 | ||
cd icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp | ||
ln -s pretrained-iter-1224000-avg-14.pt epoch-9999.pt | ||
cd ../.. | ||
We create a symlink with name ``epoch-9999.pt`` to ``pretrained-iter-1224000-avg-14.pt``, | ||
so that we can pass ``--epoch 9999 --avg 1`` to ``decode.py`` in the following | ||
command: | ||
|
||
.. code-block:: bash | ||
./pruned_transducer_stateless3/decode.py \ | ||
--epoch 9999 \ | ||
--avg 1 \ | ||
--exp-dir ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp \ | ||
--lang-dir ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500 \ | ||
--max-duration 600 \ | ||
--decoding-method greedy_search | ||
You will find the decoding results in | ||
``./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp/greedy_search``. | ||
|
||
.. caution:: | ||
|
||
For some recipes, you also need to pass ``--use-averaged-model False`` | ||
to ``decode.py``. The reason is that the exported pretrained model is already | ||
the averaged one. | ||
|
||
.. hint:: | ||
|
||
Before running ``decode.py``, we assume that you have already run | ||
``prepare.sh`` to prepare the test dataset. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
Export to ncnn | ||
============== | ||
|
||
We support exporting LSTM transducer models to `ncnn <https://github.com/tencent/ncnn>`_. | ||
|
||
Please refer to :ref:`export-model-for-ncnn` for details. | ||
|
||
We also provide `<https://github.com/k2-fsa/sherpa-ncnn>`_ | ||
performing speech recognition using ``ncnn`` with exported models. | ||
It has been tested on Linux, macOS, Windows, and Raspberry Pi. The project is | ||
self-contained and can be statically linked to produce a binary containing | ||
everything needed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
Export to ONNX | ||
============== | ||
|
||
In this section, we describe how to export models to ONNX. | ||
|
||
.. hint:: | ||
|
||
Only non-streaming conformer transducer models are tested. | ||
|
||
|
||
When to use it | ||
-------------- | ||
|
||
It you want to use an inference framework that supports ONNX | ||
to run the pretrained model. | ||
|
||
|
||
How to export | ||
------------- | ||
|
||
We use | ||
`<https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless3>`_ | ||
as an example in the following. | ||
|
||
.. code-block:: bash | ||
cd egs/librispeech/ASR | ||
epoch=14 | ||
avg=2 | ||
./pruned_transducer_stateless3/export.py \ | ||
--exp-dir ./pruned_transducer_stateless3/exp \ | ||
--bpe-model data/lang_bpe_500/bpe.model \ | ||
--epoch $epoch \ | ||
--avg $avg \ | ||
--onnx 1 | ||
It will generate the following files inside ``pruned_transducer_stateless3/exp``: | ||
|
||
- ``encoder.onnx`` | ||
- ``decoder.onnx`` | ||
- ``joiner.onnx`` | ||
- ``joiner_encoder_proj.onnx`` | ||
- ``joiner_decoder_proj.onnx`` | ||
|
||
You can use ``./pruned_transducer_stateless3/exp/onnx_pretrained.py`` to decode | ||
waves with the generated files: | ||
|
||
.. code-block:: bash | ||
./pruned_transducer_stateless3/onnx_pretrained.py \ | ||
--bpe-model ./data/lang_bpe_500/bpe.model \ | ||
--encoder-model-filename ./pruned_transducer_stateless3/exp/encoder.onnx \ | ||
--decoder-model-filename ./pruned_transducer_stateless3/exp/decoder.onnx \ | ||
--joiner-model-filename ./pruned_transducer_stateless3/exp/joiner.onnx \ | ||
--joiner-encoder-proj-model-filename ./pruned_transducer_stateless3/exp/joiner_encoder_proj.onnx \ | ||
--joiner-decoder-proj-model-filename ./pruned_transducer_stateless3/exp/joiner_decoder_proj.onnx \ | ||
/path/to/foo.wav \ | ||
/path/to/bar.wav \ | ||
/path/to/baz.wav | ||
How to use the exported model | ||
----------------------------- | ||
|
||
We also provide `<https://github.com/k2-fsa/sherpa-onnx>`_ | ||
performing speech recognition using `onnxruntime <https://github.com/microsoft/onnxruntime>`_ | ||
with exported models. | ||
It has been tested on Linux, macOS, and Windows. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
.. _export-model-with-torch-jit-script: | ||
|
||
Export model with torch.jit.script() | ||
=================================== | ||
|
||
In this section, we describe how to export a model via | ||
``torch.jit.script()``. | ||
|
||
When to use it | ||
-------------- | ||
|
||
If we want to use our trained model with torchscript, | ||
we can use ``torch.jit.script()``. | ||
|
||
.. hint:: | ||
|
||
See :ref:`export-model-with-torch-jit-trace` | ||
if you want to use ``torch.jit.trace()``. | ||
|
||
How to export | ||
------------- | ||
|
||
We use | ||
`<https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless3>`_ | ||
as an example in the following. | ||
|
||
.. code-block:: bash | ||
cd egs/librispeech/ASR | ||
epoch=14 | ||
avg=1 | ||
./pruned_transducer_stateless3/export.py \ | ||
--exp-dir ./pruned_transducer_stateless3/exp \ | ||
--bpe-model data/lang_bpe_500/bpe.model \ | ||
--epoch $epoch \ | ||
--avg $avg \ | ||
--jit 1 | ||
It will generate a file ``cpu_jit.pt`` in ``pruned_transducer_stateless3/exp``. | ||
|
||
.. caution:: | ||
|
||
Don't be confused by ``cpu`` in ``cpu_jit.pt``. We move all parameters | ||
to CPU before saving it into a ``pt`` file; that's why we use ``cpu`` | ||
in the filename. | ||
|
||
How to use the exported model | ||
----------------------------- | ||
|
||
Please refer to the following pages for usage: | ||
|
||
- `<https://k2-fsa.github.io/sherpa/python/streaming_asr/emformer/index.html>`_ | ||
- `<https://k2-fsa.github.io/sherpa/python/streaming_asr/conv_emformer/index.html>`_ | ||
- `<https://k2-fsa.github.io/sherpa/python/streaming_asr/conformer/index.html>`_ | ||
- `<https://k2-fsa.github.io/sherpa/python/offline_asr/conformer/index.html>`_ | ||
- `<https://k2-fsa.github.io/sherpa/cpp/offline_asr/gigaspeech.html>`_ | ||
- `<https://k2-fsa.github.io/sherpa/cpp/offline_asr/wenetspeech.html>`_ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
.. _export-model-with-torch-jit-trace: | ||
|
||
Export model with torch.jit.trace() | ||
=================================== | ||
|
||
In this section, we describe how to export a model via | ||
``torch.jit.trace()``. | ||
|
||
When to use it | ||
-------------- | ||
|
||
If we want to use our trained model with torchscript, | ||
we can use ``torch.jit.trace()``. | ||
|
||
.. hint:: | ||
|
||
See :ref:`export-model-with-torch-jit-script` | ||
if you want to use ``torch.jit.script()``. | ||
|
||
How to export | ||
------------- | ||
|
||
We use | ||
`<https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/lstm_transducer_stateless2>`_ | ||
as an example in the following. | ||
|
||
.. code-block:: bash | ||
iter=468000 | ||
avg=16 | ||
cd egs/librispeech/ASR | ||
./lstm_transducer_stateless2/export.py \ | ||
--exp-dir ./lstm_transducer_stateless2/exp \ | ||
--bpe-model data/lang_bpe_500/bpe.model \ | ||
--iter $iter \ | ||
--avg $avg \ | ||
--jit-trace 1 | ||
It will generate three files inside ``lstm_transducer_stateless2/exp``: | ||
|
||
- ``encoder_jit_trace.pt`` | ||
- ``decoder_jit_trace.pt`` | ||
- ``joiner_jit_trace.pt`` | ||
|
||
You can use | ||
`<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/lstm_transducer_stateless2/jit_pretrained.py>`_ | ||
to decode sound files with the following commands: | ||
|
||
.. code-block:: bash | ||
cd egs/librispeech/ASR | ||
./lstm_transducer_stateless2/jit_pretrained.py \ | ||
--bpe-model ./data/lang_bpe_500/bpe.model \ | ||
--encoder-model-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace.pt \ | ||
--decoder-model-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace.pt \ | ||
--joiner-model-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace.pt \ | ||
/path/to/foo.wav \ | ||
/path/to/bar.wav \ | ||
/path/to/baz.wav | ||
How to use the exported models | ||
------------------------------ | ||
|
||
Please refer to | ||
`<https://k2-fsa.github.io/sherpa/python/streaming_asr/lstm/index.html>`_ | ||
for its usage in `sherpa <https://k2-fsa.github.io/sherpa/python/streaming_asr/lstm/index.html>`_. | ||
You can also find pretrained models there. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
Model export | ||
============ | ||
|
||
In this section, we describe various ways to export models. | ||
|
||
|
||
|
||
.. toctree:: | ||
|
||
export-model-state-dict | ||
export-with-torch-jit-trace | ||
export-with-torch-jit-script | ||
export-onnx | ||
export-ncnn |
Oops, something went wrong.