Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rand combine update result #467

Merged
merged 14 commits into from
Jul 11, 2022
70 changes: 35 additions & 35 deletions egs/librispeech/ASR/RESULTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -556,9 +556,9 @@ Number of model parameters 118129516 (i.e, 118.13 M).

| | test-clean | test-other | comment |
|-------------------------------------|------------|------------|----------------------------------------|
| greedy search (max sym per frame 1) | 2.39 | 5.57 | --epoch 39 --avg 7 --max-duration 600 |
| modified beam search | 2.35 | 5.50 | --epoch 39 --avg 7 --max-duration 600 |
| fast beam search | 2.38 | 5.50 | --epoch 39 --avg 7 --max-duration 600 |
| greedy search (max sym per frame 1) | 2.43 | 5.72 | --epoch 30 --avg 10 --max-duration 600 |
| modified beam search | 2.43 | 5.69 | --epoch 30 --avg 10 --max-duration 600 |
| fast beam search | 2.43 | 5.67 | --epoch 30 --avg 10 --max-duration 600 |

The training commands are:

Expand All @@ -567,8 +567,8 @@ export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"

./pruned_transducer_stateless5/train.py \
--world-size 8 \
--num-epochs 40 \
--start-epoch 0 \
--num-epochs 30 \
--start-epoch 1 \
--full-libri 1 \
--exp-dir pruned_transducer_stateless5/exp-L \
--max-duration 300 \
Expand All @@ -582,15 +582,15 @@ export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
```

The tensorboard log can be found at
<https://tensorboard.dev/experiment/Zq0h3KpnQ2igWbeR4U82Pw/>
<https://tensorboard.dev/experiment/aWzDj5swSE2VmcOYgoe3vQ>

The decoding commands are:

```bash
for method in greedy_search modified_beam_search fast_beam_search; do
./pruned_transducer_stateless5/decode.py \
--epoch 39 \
--avg 7 \
--epoch 30 \
--avg 10 \
--exp-dir ./pruned_transducer_stateless5/exp-L \
--max-duration 600 \
--decoding-method $method \
Expand All @@ -600,13 +600,14 @@ for method in greedy_search modified_beam_search fast_beam_search; do
--nhead 8 \
--encoder-dim 512 \
--decoder-dim 512 \
--joiner-dim 512
--joiner-dim 512 \
--use-averaged-model True
done
```

You can find a pretrained model, training logs, decoding logs, and decoding
results at:
<https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless5-2022-05-13>
<https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless5-2022-07-07>


#### Medium
Expand All @@ -615,9 +616,9 @@ Number of model parameters 30896748 (i.e, 30.9 M).

| | test-clean | test-other | comment |
|-------------------------------------|------------|------------|-----------------------------------------|
| greedy search (max sym per frame 1) | 2.88 | 6.69 | --epoch 39 --avg 17 --max-duration 600 |
| modified beam search | 2.83 | 6.59 | --epoch 39 --avg 17 --max-duration 600 |
| fast beam search | 2.83 | 6.61 | --epoch 39 --avg 17 --max-duration 600 |
| greedy search (max sym per frame 1) | 2.87 | 6.92 | --epoch 30 --avg 10 --max-duration 600 |
| modified beam search | 2.83 | 6.75 | --epoch 30 --avg 10 --max-duration 600 |
| fast beam search | 2.81 | 6.76 | --epoch 30 --avg 10 --max-duration 600 |

The training commands are:

Expand All @@ -626,8 +627,8 @@ export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"

./pruned_transducer_stateless5/train.py \
--world-size 8 \
--num-epochs 40 \
--start-epoch 0 \
--num-epochs 30 \
--start-epoch 1 \
--full-libri 1 \
--exp-dir pruned_transducer_stateless5/exp-M \
--max-duration 300 \
Expand All @@ -641,15 +642,15 @@ export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
```

The tensorboard log can be found at
<https://tensorboard.dev/experiment/bOQvULPsQ1iL7xpdI0VbXw/>
<https://tensorboard.dev/experiment/04xtWUKPRmebSnpzN1GMHQ>

The decoding commands are:

```bash
for method in greedy_search modified_beam_search fast_beam_search; do
./pruned_transducer_stateless5/decode.py \
--epoch 39 \
--avg 17 \
--epoch 30 \
--avg 10 \
--exp-dir ./pruned_transducer_stateless5/exp-M \
--max-duration 600 \
--decoding-method $method \
Expand All @@ -659,13 +660,14 @@ for method in greedy_search modified_beam_search fast_beam_search; do
--nhead 4 \
--encoder-dim 256 \
--decoder-dim 512 \
--joiner-dim 512
--joiner-dim 512 \
--use-averaged-model True
done
```

You can find a pretrained model, training logs, decoding logs, and decoding
results at:
<https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless5-M-2022-05-13>
<https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless5-M-2022-07-07>


#### Baseline-2
Expand All @@ -675,19 +677,19 @@ layers (24 v.s 12) but a narrower model (1536 feedforward dim and 384 encoder di

| | test-clean | test-other | comment |
|-------------------------------------|------------|------------|-----------------------------------------|
| greedy search (max sym per frame 1) | 2.41 | 5.70 | --epoch 31 --avg 17 --max-duration 600 |
| modified beam search | 2.41 | 5.69 | --epoch 31 --avg 17 --max-duration 600 |
| fast beam search | 2.41 | 5.69 | --epoch 31 --avg 17 --max-duration 600 |
| greedy search (max sym per frame 1) | 2.54 | 5.72 | --epoch 30 --avg 10 --max-duration 600 |
| modified beam search | 2.47 | 5.71 | --epoch 30 --avg 10 --max-duration 600 |
| fast beam search | 2.5 | 5.72 | --epoch 30 --avg 10 --max-duration 600 |

```bash
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"

./pruned_transducer_stateless5/train.py \
--world-size 8 \
--num-epochs 40 \
--start-epoch 0 \
--num-epochs 30 \
--start-epoch 1 \
--full-libri 1 \
--exp-dir pruned_transducer_stateless5/exp \
--exp-dir pruned_transducer_stateless5/exp-B \
--max-duration 300 \
--use-fp16 0 \
--num-encoder-layers 24 \
Expand All @@ -699,19 +701,16 @@ export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
```

The tensorboard log can be found at
<https://tensorboard.dev/experiment/73oY9U1mQiq0tbbcovZplw/>

**Caution**: The training script is updated so that epochs are counted from 1
after the training.
<https://tensorboard.dev/experiment/foVHNyqiRi2LhybmRUOAyg>

The decoding commands are:

```bash
for method in greedy_search modified_beam_search fast_beam_search; do
./pruned_transducer_stateless5/decode.py \
--epoch 31 \
--avg 17 \
--exp-dir ./pruned_transducer_stateless5/exp-M \
--epoch 30 \
--avg 10 \
--exp-dir ./pruned_transducer_stateless5/exp-B \
--max-duration 600 \
--decoding-method $method \
--max-sym-per-frame 1 \
Expand All @@ -720,13 +719,14 @@ for method in greedy_search modified_beam_search fast_beam_search; do
--nhead 8 \
--encoder-dim 384 \
--decoder-dim 512 \
--joiner-dim 512
--joiner-dim 512 \
--use-averaged-model True
done
```

You can find a pretrained model, training logs, decoding logs, and decoding
results at:
<https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless5-narrower-2022-05-13>
<https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless5-B-2022-07-07>


### LibriSpeech BPE training results (Pruned Stateless Transducer 4)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1064,10 +1064,6 @@ class RandomCombine(nn.Module):
is a random combination of all the inputs; but which in test time
will be just the last input.

All but the last input will have a linear transform before we
randomly combine them; these linear transforms will be initialized
to the identity transform.

The idea is that the list of Tensors will be a list of outputs of multiple
conformer layers. This has a similar effect as iterated loss. (See:
DEJA-VU: DOUBLE FEATURE PRESENTATION AND ITERATED LOSS IN DEEP TRANSFORMER
Expand Down Expand Up @@ -1267,7 +1263,6 @@ def _test_random_combine(final_weight: float, pure_prob: float, stddev: float):
num_channels = 50
m = RandomCombine(
num_inputs=num_inputs,
num_channels=num_channels,
final_weight=final_weight,
pure_prob=pure_prob,
stddev=stddev,
Expand All @@ -1289,9 +1284,7 @@ def _test_random_combine_main():
_test_random_combine(0.5, 0.5, 0.3)

feature_dim = 50
c = Conformer(
num_features=feature_dim, output_dim=256, d_model=128, nhead=4
)
c = Conformer(num_features=feature_dim, d_model=128, nhead=4)
batch_size = 5
seq_len = 20
# Just make sure the forward pass runs.
Expand Down