From 223341b3939fbc505481ed6a2641da2c01bc2bcd Mon Sep 17 00:00:00 2001 From: Fangjun Kuang Date: Thu, 13 Oct 2022 20:34:23 +0800 Subject: [PATCH 1/2] Add doc about model export --- docs/source/index.rst | 1 + ...export-model-state-dict-pretrained-out.txt | 21 +++ .../model-export/export-model-state-dict.rst | 135 ++++++++++++++++++ docs/source/model-export/export-ncnn.rst | 9 ++ docs/source/model-export/export-onnx.rst | 68 +++++++++ .../export-with-torch-jit-script.rst | 58 ++++++++ .../export-with-torch-jit-trace.rst | 69 +++++++++ docs/source/model-export/index.rst | 14 ++ .../lstm_pruned_stateless_transducer.rst | 2 + 9 files changed, 377 insertions(+) create mode 100644 docs/source/model-export/code/export-model-state-dict-pretrained-out.txt create mode 100644 docs/source/model-export/export-model-state-dict.rst create mode 100644 docs/source/model-export/export-ncnn.rst create mode 100644 docs/source/model-export/export-onnx.rst create mode 100644 docs/source/model-export/export-with-torch-jit-script.rst create mode 100644 docs/source/model-export/export-with-torch-jit-trace.rst create mode 100644 docs/source/model-export/index.rst diff --git a/docs/source/index.rst b/docs/source/index.rst index 29491e3dcf..be9977ca91 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -21,6 +21,7 @@ speech recognition recipes using `k2 `_. :caption: Contents: installation/index + model-export/index recipes/index contributing/index huggingface/index diff --git a/docs/source/model-export/code/export-model-state-dict-pretrained-out.txt b/docs/source/model-export/code/export-model-state-dict-pretrained-out.txt new file mode 100644 index 0000000000..8d2d6d34be --- /dev/null +++ b/docs/source/model-export/code/export-model-state-dict-pretrained-out.txt @@ -0,0 +1,21 @@ +2022-10-13 19:09:02,233 INFO [pretrained.py:265] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'encoder_dim': 512, 'nhead': 8, 'dim_feedforward': 2048, 'num_encoder_layers': 12, 'decoder_dim': 512, 'joiner_dim': 512, 'model_warm_step': 3000, 'env_info': {'k2-version': '1.21', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '4810e00d8738f1a21278b0156a42ff396a2d40ac', 'k2-git-date': 'Fri Oct 7 19:35:03 2022', 'lhotse-version': '1.3.0.dev+missing.version.file', 'torch-version': '1.10.0+cu102', 'torch-cuda-available': False, 'torch-cuda-version': '10.2', 'python-version': '3.8', 'icefall-git-branch': 'onnx-doc-1013', 'icefall-git-sha1': 'c39cba5-dirty', 'icefall-git-date': 'Thu Oct 13 15:17:20 2022', 'icefall-path': '/k2-dev/fangjun/open-source/icefall-master', 'k2-path': '/k2-dev/fangjun/open-source/k2-master/k2/python/k2/__init__.py', 'lhotse-path': '/ceph-fj/fangjun/open-source-2/lhotse-jsonl/lhotse/__init__.py', 'hostname': 'de-74279-k2-test-4-0324160024-65bfd8b584-jjlbn', 'IP address': '10.177.74.203'}, 'checkpoint': './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp/pretrained-iter-1224000-avg-14.pt', 'bpe_model': './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500/bpe.model', 'method': 'greedy_search', 'sound_files': ['./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav'], 'sample_rate': 16000, 'beam_size': 4, 'beam': 4, 'max_contexts': 4, 'max_states': 8, 'context_size': 2, 'max_sym_per_frame': 1, 'simulate_streaming': False, 'decode_chunk_size': 16, 'left_context': 64, 'dynamic_chunk_training': False, 'causal_convolution': False, 'short_chunk_size': 25, 'num_left_chunks': 4, 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500} +2022-10-13 19:09:02,233 INFO [pretrained.py:271] device: cpu +2022-10-13 19:09:02,233 INFO [pretrained.py:273] Creating model +2022-10-13 19:09:02,612 INFO [train.py:458] Disable giga +2022-10-13 19:09:02,623 INFO [pretrained.py:277] Number of model parameters: 78648040 +2022-10-13 19:09:02,951 INFO [pretrained.py:285] Constructing Fbank computer +2022-10-13 19:09:02,952 INFO [pretrained.py:295] Reading sound files: ['./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav'] +2022-10-13 19:09:02,957 INFO [pretrained.py:301] Decoding started +2022-10-13 19:09:06,700 INFO [pretrained.py:329] Using greedy_search +2022-10-13 19:09:06,912 INFO [pretrained.py:388] +./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav: +AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS + +./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav: +GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONORED BOSOM TO CONNECT HER PARENT FOREVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN + +./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav: +YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION + + +2022-10-13 19:09:06,912 INFO [pretrained.py:390] Decoding Done diff --git a/docs/source/model-export/export-model-state-dict.rst b/docs/source/model-export/export-model-state-dict.rst new file mode 100644 index 0000000000..cc315dacc7 --- /dev/null +++ b/docs/source/model-export/export-model-state-dict.rst @@ -0,0 +1,135 @@ +Export model.state_dict() +========================= + +When to use it +-------------- + +During model training, we save checkpoints periodically to disk. + +A checkpoint contains the following information: + + - ``model.state_dict()`` + - ``optimizer.state_dict()`` + - and some other information related to training + +When we need to resume the training process from some point, we need a checkpoint. +However, if we want to publish the model for inference, then only +``model.state_dict()`` is needed. In this case, we need to strip all other information +except ``model.state_dict()`` to reduce the file size of the published model. + +How to export +------------- + +Every recipe contains a file ``export.py`` that you can use to +export ``model.state_dict()`` by taking some checkpoints as inputs. + +.. hint:: + + Each ``export.py`` contains well-documented usage information. + +In the following, we use +``_ +as an example. + +.. note:: + + The steps for other recipes are almost the same. + +.. code-block:: bash + + cd egs/librispeech/ASR + + ./pruned_transducer_stateless3/export.py \ + --exp-dir ./pruned_transducer_stateless3/exp \ + --bpe-model data/lang_bpe_500/bpe.model \ + --epoch 20 \ + --avg 10 + +will generate a file ``pruned_transducer_stateless3/exp/pretrained.pt``, which +is a dict containing ``{"model": model.state_dict()}`` saved by ``torch.save()``. + +How to use the exported model +----------------------------- + +For each recipe, we provide pretrained models hosted on huggingface. +You can find links to pretrained models in ``RESULTS.md`` of each dataset. + +In the following, we demonstrate how to use the pretrained model from +``_. + +.. code-block:: bash + + cd egs/librispeech/ASR + + git lfs install + git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13 + +After cloning the repo with ``git lfs``, you will find several files in the folder +``icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp`` +that have a prefix ``pretrained-``. Those files contains ``model.state_dict()`` +exported by the above ``export.py``. + +In each recipe, there is also a file ``pretrained.py``, which can use +``pretrained-xxx.pt`` to decode waves. The following is an example: + +.. code-block:: bash + + cd egs/librispeech/ASR + + ./pruned_transducer_stateless3/pretrained.py \ + --checkpoint ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp/pretrained-iter-1224000-avg-14.pt \ + --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500/bpe.model \ + --method greedy_search \ + ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1089-134686-0001.wav \ + ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0001.wav \ + ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/test_wavs/1221-135766-0002.wav + +The above commands show how to use the exported model with ``pretrained.py`` to +decode multiple sound files. Its output is given as follows for reference: + +.. literalinclude:: ./code/export-model-state-dict-pretrained-out.txt + +Use the exported model to run decode.py +--------------------------------------- + +When we publish the model, we always note down its WERs on some test +dataset in ``RESULTS.md``. This section describes how to use the +pretrained model to reproduce the WER. + +.. code-block:: bash + + cd egs/librispeech/ASR + git lfs install + git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13 + + cd icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp + ln -s pretrained-iter-1224000-avg-14.pt epoch-9999.pt + cd ../.. + +We create a symlink with name ``epoch-9999.pt`` to ``pretrained-iter-1224000-avg-14.pt``, +so that we can pass ``--epoch 9999 --avg 1`` to ``decode.py`` in the following +command: + +.. code-block:: bash + + ./pruned_transducer_stateless3/decode.py \ + --epoch 9999 \ + --avg 1 \ + --exp-dir ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp \ + --lang-dir ./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/data/lang_bpe_500 \ + --max-duration 600 \ + --decoding-method greedy_search + +You will find the decoding results in +``./icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp/greedy_search``. + +.. caution:: + + For some recipes, you also need to pass ``--use-averaged-model False`` + to ``decode.py``. The reason is that the exported pretrained model is already + the averaged one. + +.. hint:: + + Before running ``decode.py``, we assume that you have already run + ``prepare.sh`` to prepare the test dataset. diff --git a/docs/source/model-export/export-ncnn.rst b/docs/source/model-export/export-ncnn.rst new file mode 100644 index 0000000000..7e754105de --- /dev/null +++ b/docs/source/model-export/export-ncnn.rst @@ -0,0 +1,9 @@ +Export to ncnn +============== + +We support exporting LSTM transducer models to `ncnn `_. + +Please refer to :ref:`export-model-for-ncnn` for details. + +We also provide ``_ +do speech recognition using ``ncnn`` with exported models. diff --git a/docs/source/model-export/export-onnx.rst b/docs/source/model-export/export-onnx.rst new file mode 100644 index 0000000000..01f427acbb --- /dev/null +++ b/docs/source/model-export/export-onnx.rst @@ -0,0 +1,68 @@ +Export to ONNX +============== + +In this section, we describe how to export models to ONNX. + +.. hint:: + + Only non-streaming conformer transducer models are tested. + + +When to use it +-------------- + +It you want to use an inference framework that supports ONNX +to run the pretrained model. + + +How to export +------------- + +We use +``_ +as an example in the following. + +.. code-block:: bash + + cd egs/librispeech/ASR + epoch=14 + avg=2 + + ./pruned_transducer_stateless3/export.py \ + --exp-dir ./pruned_transducer_stateless3/exp \ + --bpe-model data/lang_bpe_500/bpe.model \ + --epoch $epoch \ + --avg $avg \ + --onnx 1 + +It will generate the following files inside ``pruned_transducer_stateless3/exp``: + + - ``encoder.onnx`` + - ``decoder.onnx`` + - ``joiner.onnx`` + - ``joiner_encoder_proj.onnx`` + - ``joiner_decoder_proj.onnx`` + +You can use ``./pruned_transducer_stateless3/exp/onnx_pretrained.py`` to decode +waves with the generated files: + +.. code-block:: bash + + ./pruned_transducer_stateless3/onnx_pretrained.py \ + --bpe-model ./data/lang_bpe_500/bpe.model \ + --encoder-model-filename ./pruned_transducer_stateless3/exp/encoder.onnx \ + --decoder-model-filename ./pruned_transducer_stateless3/exp/decoder.onnx \ + --joiner-model-filename ./pruned_transducer_stateless3/exp/joiner.onnx \ + --joiner-encoder-proj-model-filename ./pruned_transducer_stateless3/exp/joiner_encoder_proj.onnx \ + --joiner-decoder-proj-model-filename ./pruned_transducer_stateless3/exp/joiner_decoder_proj.onnx \ + /path/to/foo.wav \ + /path/to/bar.wav \ + /path/to/baz.wav + + +How to use the exported model +----------------------------- + +Please refer to +``_ for usage. +You can also find pretrained models there. diff --git a/docs/source/model-export/export-with-torch-jit-script.rst b/docs/source/model-export/export-with-torch-jit-script.rst new file mode 100644 index 0000000000..cdf3cabc2e --- /dev/null +++ b/docs/source/model-export/export-with-torch-jit-script.rst @@ -0,0 +1,58 @@ +.. _export-model-with-torch-jit-script: + +Export model with torch.jit.script() +=================================== + +In this section, we describe how to export a model via +``torch.jit.script()``. + +When to use it +-------------- + +If we want to use our trained model with torchscript, +we can use ``torch.jit.script()``. + +.. hint:: + + See :ref:`export-model-with-torch-jit-trace` + if you want to use ``torch.jit.trace()``. + +How to export +------------- + +We use +``_ +as an example in the following. + +.. code-block:: bash + + cd egs/librispeech/ASR + epoch=14 + avg=1 + + ./pruned_transducer_stateless3/export.py \ + --exp-dir ./pruned_transducer_stateless3/exp \ + --bpe-model data/lang_bpe_500/bpe.model \ + --epoch $epoch \ + --avg $avg \ + --jit 1 + +It will generate a file ``cpu_jit.pt`` in ``pruned_transducer_stateless3/exp``. + +.. caution:: + + Don't be confused by ``cpu`` in ``cpu_jit.pt``. We move all parameters + to CPU before saving it into a ``pt`` file, that's why we use ``cpu`` + in the filename. + +How to use the exported model +----------------------------- + +Please refer to the following pages for usage: + +- ``_ +- ``_ +- ``_ +- ``_ +- ``_ +- ``_ diff --git a/docs/source/model-export/export-with-torch-jit-trace.rst b/docs/source/model-export/export-with-torch-jit-trace.rst new file mode 100644 index 0000000000..506459909c --- /dev/null +++ b/docs/source/model-export/export-with-torch-jit-trace.rst @@ -0,0 +1,69 @@ +.. _export-model-with-torch-jit-trace: + +Export model with torch.jit.trace() +=================================== + +In this section, we describe how to export a model via +``torch.jit.trace()``. + +When to use it +-------------- + +If we want to use our trained model with torchscript, +we can use ``torch.jit.trace()``. + +.. hint:: + + See :ref:`export-model-with-torch-jit-script` + if you want to use ``torch.jit.script()``. + +How to export +------------- + +We use +``_ +as an example in the following. + +.. code-block:: bash + + iter=468000 + avg=16 + + cd egs/librispeech/ASR + + ./lstm_transducer_stateless2/export.py \ + --exp-dir ./lstm_transducer_stateless2/exp \ + --bpe-model data/lang_bpe_500/bpe.model \ + --iter $iter \ + --avg $avg \ + --jit-trace 1 + +It will generate three files inside ``lstm_transducer_stateless2/exp``: + + - ``encoder_jit_trace.pt`` + - ``decoder_jit_trace.pt`` + - ``joiner_jit_trace.pt`` + +You can use +``_ +to decode sound files with the following commands: + +.. code-block:: bash + + cd egs/librispeech/ASR + ./lstm_transducer_stateless2/jit_pretrained.py \ + --bpe-model ./data/lang_bpe_500/bpe.model \ + --encoder-model-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace.pt \ + --decoder-model-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace.pt \ + --joiner-model-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace.pt \ + /path/to/foo.wav \ + /path/to/bar.wav \ + /path/to/baz.wav + +How to use the exported models +------------------------------ + +Please refer to +``_ +for its usage in `sherpa `_. +You can also find pretrained models there. diff --git a/docs/source/model-export/index.rst b/docs/source/model-export/index.rst new file mode 100644 index 0000000000..9b7a2ee2d2 --- /dev/null +++ b/docs/source/model-export/index.rst @@ -0,0 +1,14 @@ +Model export +============ + +In this section, we describe various ways to export models. + + + +.. toctree:: + + export-model-state-dict + export-with-torch-jit-trace + export-with-torch-jit-script + export-onnx + export-ncnn diff --git a/docs/source/recipes/librispeech/lstm_pruned_stateless_transducer.rst b/docs/source/recipes/librispeech/lstm_pruned_stateless_transducer.rst index b9d5bdcba4..643855cc29 100644 --- a/docs/source/recipes/librispeech/lstm_pruned_stateless_transducer.rst +++ b/docs/source/recipes/librispeech/lstm_pruned_stateless_transducer.rst @@ -515,6 +515,8 @@ To use the generated files with ``./lstm_transducer_stateless2/jit_pretrained``: Please see ``_ for how to use the exported models in ``sherpa``. +.. _export-model-for-ncnn: + Export model for ncnn ~~~~~~~~~~~~~~~~~~~~~ From fc47ec9684187f3b1fed0a0e76fec52d869785e1 Mon Sep 17 00:00:00 2001 From: Fangjun Kuang Date: Fri, 14 Oct 2022 10:15:58 +0800 Subject: [PATCH 2/2] fix typos --- docs/source/model-export/export-model-state-dict.rst | 2 +- docs/source/model-export/export-ncnn.rst | 5 ++++- docs/source/model-export/export-onnx.rst | 7 ++++--- docs/source/model-export/export-with-torch-jit-script.rst | 2 +- 4 files changed, 10 insertions(+), 6 deletions(-) diff --git a/docs/source/model-export/export-model-state-dict.rst b/docs/source/model-export/export-model-state-dict.rst index cc315dacc7..c3bbd57084 100644 --- a/docs/source/model-export/export-model-state-dict.rst +++ b/docs/source/model-export/export-model-state-dict.rst @@ -66,7 +66,7 @@ In the following, we demonstrate how to use the pretrained model from After cloning the repo with ``git lfs``, you will find several files in the folder ``icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13/exp`` -that have a prefix ``pretrained-``. Those files contains ``model.state_dict()`` +that have a prefix ``pretrained-``. Those files contain ``model.state_dict()`` exported by the above ``export.py``. In each recipe, there is also a file ``pretrained.py``, which can use diff --git a/docs/source/model-export/export-ncnn.rst b/docs/source/model-export/export-ncnn.rst index 7e754105de..3dbb8b514a 100644 --- a/docs/source/model-export/export-ncnn.rst +++ b/docs/source/model-export/export-ncnn.rst @@ -6,4 +6,7 @@ We support exporting LSTM transducer models to `ncnn `_ -do speech recognition using ``ncnn`` with exported models. +performing speech recognition using ``ncnn`` with exported models. +It has been tested on Linux, macOS, Windows, and Raspberry Pi. The project is +self-contained and can be statically linked to produce a binary containing +everything needed. diff --git a/docs/source/model-export/export-onnx.rst b/docs/source/model-export/export-onnx.rst index 01f427acbb..dd4b3437a7 100644 --- a/docs/source/model-export/export-onnx.rst +++ b/docs/source/model-export/export-onnx.rst @@ -63,6 +63,7 @@ waves with the generated files: How to use the exported model ----------------------------- -Please refer to -``_ for usage. -You can also find pretrained models there. +We also provide ``_ +performing speech recognition using `onnxruntime `_ +with exported models. +It has been tested on Linux, macOS, and Windows. diff --git a/docs/source/model-export/export-with-torch-jit-script.rst b/docs/source/model-export/export-with-torch-jit-script.rst index cdf3cabc2e..a041dc1d5a 100644 --- a/docs/source/model-export/export-with-torch-jit-script.rst +++ b/docs/source/model-export/export-with-torch-jit-script.rst @@ -42,7 +42,7 @@ It will generate a file ``cpu_jit.pt`` in ``pruned_transducer_stateless3/exp``. .. caution:: Don't be confused by ``cpu`` in ``cpu_jit.pt``. We move all parameters - to CPU before saving it into a ``pt`` file, that's why we use ``cpu`` + to CPU before saving it into a ``pt`` file; that's why we use ``cpu`` in the filename. How to use the exported model