-
Notifications
You must be signed in to change notification settings - Fork 36
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* add stair_case graph to show number of words vs. delays (ms). Connected visualize.py via argparse. Now user may add --visualize in the terminal to have the graph output to the output folder * add visualize.py * add a buffer in front of audio file to show delay * modify visual.ipynb to include both staricase and waveform graph in 1 .png file. Add ability to read multiple dictionaries from instances.log * add ability to run --score-only with --visualize * add unit test for visualize and update .gitignore * untrack the python notebook used for prototyping * auto-generates output/visual directory when visual folder is not created * used black . to format everything * edit according to Xutai's suggestions * add visualize unit test to git workflow * fix black formatting * fix remaining file * come on black * black is weird. removed white space * add install matplotlib to workflow * idk man black is not blacking * replace ... with pass * pip==24.0 * returned to ... for dataloader * nvm pass is better for both python 3.7 and 3.8 black formatter * check for empty config.yaml file, if empty, system exits * change whisper to openai-whisper * using only python=3.8 * add editdistance for pip install in setup.py * remove creating an output directory * fix path issue * add matplotlib for pip * put 3.7 and 3.8 for python version * correct error on matplotlib * change to only 3.8 for github workflow * moved whisper dependencies to main.yml * add speech_to_text documentation * formatting changes * add line space to correct formatting * add nltk download
- Loading branch information
Showing
10 changed files
with
442 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,31 @@ | ||
Speech-to-Text | ||
============== | ||
============== | ||
|
||
Whisper Agent | ||
----------------- | ||
Use whisper to evaluate custom audio for speech to text transcription. | ||
First, change directory to :code:`speech_to_text`: | ||
|
||
.. code-block:: bash | ||
cd examples/speech-to-text | ||
Then, run the example code: | ||
|
||
.. code-block:: bash | ||
simuleval \ | ||
--agent whisper_waitk.py \ | ||
--source-segment-size 500 \ | ||
--waitk-lagging 3 \ | ||
--source source.txt --target reference/transcript.txt \ | ||
--output output --quality-metrics WER --visualize | ||
The optional :code:`--visualize` tag generates N number of graphs in speech_to_text/output/visual directory where N corresponds to the number of source audio provided. An example graph can be seen `here <https://github.com/facebookresearch/SimulEval/pull/107>`_. | ||
|
||
| | ||
In addition, it supports the :code:`--score-only` command, where it will read data from :code:`instances.log` without running inference, which saves time if you just want the scores. | ||
|
||
.. code-block:: bash | ||
simuleval --score-only --output output --visualize |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
This is a synthesized audio file to test your simultaneous speech to text and to speech to speach translation system. | ||
This is a synthesized audio file to test your simultaneous speech to text and to speech to speach translation system. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,144 @@ | ||
# Copyright (c) Facebook, Inc. and its affiliates. | ||
# All rights reserved. | ||
# | ||
# This source code is licensed under the license found in the | ||
# LICENSE file in the root directory of this source tree. | ||
|
||
import os | ||
import tempfile | ||
from pathlib import Path | ||
import simuleval.cli as cli | ||
import shutil | ||
import json | ||
|
||
ROOT_PATH = Path(__file__).parents[2] | ||
|
||
|
||
def test_visualize(root_path=ROOT_PATH): | ||
args_path = Path.joinpath(root_path, "examples", "speech_to_text") | ||
os.chdir(args_path) | ||
with tempfile.TemporaryDirectory() as tmpdirname: | ||
cli.sys.argv[1:] = [ | ||
"--agent", | ||
os.path.join(root_path, "examples", "speech_to_text", "whisper_waitk.py"), | ||
"--source-segment-size", | ||
"500", | ||
"--waitk-lagging", | ||
"3", | ||
"--source", | ||
os.path.join(root_path, "examples", "speech_to_text", "source.txt"), | ||
"--target", | ||
os.path.join( | ||
root_path, "examples", "speech_to_text", "reference/transcript.txt" | ||
), | ||
"--output", | ||
"output", | ||
"--quality-metrics", | ||
"WER", | ||
"--visualize", | ||
] | ||
cli.main() | ||
|
||
visual_folder_path = os.path.join("output", "visual") | ||
source_path = os.path.join( | ||
root_path, "examples", "speech_to_text", "source.txt" | ||
) | ||
source_length = 0 | ||
|
||
with open(source_path, "r") as f: | ||
source_length = len(f.readlines()) | ||
images = list(Path(visual_folder_path).glob("*.png")) | ||
assert len(images) == source_length | ||
shutil.rmtree("output") | ||
|
||
|
||
def test_visualize_score_only(root_path=ROOT_PATH): | ||
args_path = Path.joinpath(root_path, "examples", "speech_to_text") | ||
os.chdir(args_path) | ||
|
||
# Create sample instances.log and config.yaml in output directory | ||
output = Path("output") | ||
output.mkdir() | ||
os.chdir(output) | ||
with open("config.yaml", "w") as config: | ||
config.write("source_type: speech\n") | ||
config.write("target_type: speech") | ||
with open("instances.log", "w") as instances: | ||
json.dump( | ||
{ | ||
"index": 0, | ||
"prediction": "This is a synthesized audio file to test your simultaneous speech, to speak to speech, to speak translation system.", | ||
"delays": [ | ||
1500.0, | ||
2000.0, | ||
2500.0, | ||
3000.0, | ||
3500.0, | ||
4000.0, | ||
4500.0, | ||
5000.0, | ||
5500.0, | ||
6000.0, | ||
6500.0, | ||
6849.886621315192, | ||
6849.886621315192, | ||
6849.886621315192, | ||
6849.886621315192, | ||
6849.886621315192, | ||
6849.886621315192, | ||
6849.886621315192, | ||
6849.886621315192, | ||
], | ||
"elapsed": [ | ||
1947.3278522491455, | ||
2592.338800430298, | ||
3256.8109035491943, | ||
3900.0539779663086, | ||
4561.986684799194, | ||
5216.205835342407, | ||
5874.6888637542725, | ||
6526.906728744507, | ||
7193.655729293823, | ||
7852.792739868164, | ||
8539.628744125366, | ||
9043.279374916267, | ||
9043.279374916267, | ||
9043.279374916267, | ||
9043.279374916267, | ||
9043.279374916267, | ||
9043.279374916267, | ||
9043.279374916267, | ||
9043.279374916267, | ||
], | ||
"prediction_length": 19, | ||
"reference": "This is a synthesized audio file to test your simultaneous speech to text and to speech to speach translation system.", | ||
"source": [ | ||
"test.wav", | ||
"samplerate: 22050 Hz", | ||
"channels: 1", | ||
"duration: 6.850 s", | ||
"format: WAV (Microsoft) [WAV]", | ||
"subtype: Signed 16 bit PCM [PCM_16]", | ||
], | ||
"source_length": 6849.886621315192, | ||
}, | ||
instances, | ||
) | ||
|
||
os.chdir(args_path) | ||
|
||
with tempfile.TemporaryDirectory() as tmpdirname: | ||
cli.sys.argv[1:] = ["--score-only", "--output", "output", "--visualize"] | ||
cli.main() | ||
|
||
visual_folder_path = os.path.join("output", "visual") | ||
source_path = os.path.join( | ||
root_path, "examples", "speech_to_text", "source.txt" | ||
) | ||
source_length = 0 | ||
|
||
with open(source_path, "r") as f: | ||
source_length = len(f.readlines()) | ||
images = list(Path(visual_folder_path).glob("*.png")) | ||
assert len(images) == source_length | ||
shutil.rmtree("output") |
Oops, something went wrong.