From ca40b1a1aa6e3b3575831dd020fc3f0d0f8f9178 Mon Sep 17 00:00:00 2001 From: Philippe Hebert Date: Thu, 2 Nov 2023 16:00:26 -0400 Subject: [PATCH 1/2] docs: defines relative speed in README --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 20532574c..a50594293 100644 --- a/README.md +++ b/README.md @@ -68,6 +68,8 @@ There are five model sizes, four with English-only versions, offering speed and | medium | 769 M | `medium.en` | `medium` | ~5 GB | ~2x | | large | 1550 M | N/A | `large` | ~10 GB | 1x | +Note that "relative speed" here refers to the speed of transcription of each model relative to each other, rather than relative to the duration of the sample to transcribe. Speed of transcription will vary based on the available hardware resources. + The `.en` models for English-only applications tend to perform better, especially for the `tiny.en` and `base.en` models. We observed that the difference becomes less significant for the `small.en` and `medium.en` models. Whisper's performance varies widely depending on the language. The figure below shows a WER (Word Error Rate) breakdown by languages of the Fleurs dataset using the `large-v2` model (The smaller the numbers, the better the performance). Additional WER scores corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4. Meanwhile, more BLEU (Bilingual Evaluation Understudy) scores can be found in Appendix D.3. Both are found in [the paper](https://arxiv.org/abs/2212.04356). From 9c762731da3f9c7d576c30766ba795011e4f79e1 Mon Sep 17 00:00:00 2001 From: Jong Wook Kim Date: Mon, 6 Nov 2023 02:42:52 -0800 Subject: [PATCH 2/2] combined paragraphs --- README.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/README.md b/README.md index a50594293..3dc26c682 100644 --- a/README.md +++ b/README.md @@ -57,8 +57,7 @@ pip install setuptools-rust ## Available models and languages -There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed. - +There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model; actual speed may vary depending on many factors including the available hardware. | Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed | |:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:| @@ -68,8 +67,6 @@ There are five model sizes, four with English-only versions, offering speed and | medium | 769 M | `medium.en` | `medium` | ~5 GB | ~2x | | large | 1550 M | N/A | `large` | ~10 GB | 1x | -Note that "relative speed" here refers to the speed of transcription of each model relative to each other, rather than relative to the duration of the sample to transcribe. Speed of transcription will vary based on the available hardware resources. - The `.en` models for English-only applications tend to perform better, especially for the `tiny.en` and `base.en` models. We observed that the difference becomes less significant for the `small.en` and `medium.en` models. Whisper's performance varies widely depending on the language. The figure below shows a WER (Word Error Rate) breakdown by languages of the Fleurs dataset using the `large-v2` model (The smaller the numbers, the better the performance). Additional WER scores corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4. Meanwhile, more BLEU (Bilingual Evaluation Understudy) scores can be found in Appendix D.3. Both are found in [the paper](https://arxiv.org/abs/2212.04356).