Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Visualization tool #107

Merged
merged 37 commits into from
Aug 14, 2024
Merged

Feat: Visualization tool #107

merged 37 commits into from
Aug 14, 2024

Conversation

Epic-Eric
Copy link
Collaborator

@Epic-Eric Epic-Eric commented Jul 21, 2024

Description

Use Matplotlib to generate graphs that allow users to visualize speech transcription & translation data.

1st graph: Staircase graph.
Horizontal arrows represent the wait-k delays from reading from the source (s). Vertical arrows represent the output words from writing to the target (words).

2nd graph: Waveform graph.
The waveform is taken from the audio provided, and is displayed below with the x-axis (delay time) in sync with the Staircase graph, which allows convenient comparison and lookup for timestamps of interest.

Related issue: #15, #84

Example inputs

  1. Simply add --visualize in the command line argument, for example:
simuleval \
    --agent whisper_waitk.py \
    --source-segment-size 500 \
    --waitk-lagging 3 \
    --source source.txt --target reference/transcript.txt \
    --output output --quality-metrics WER --visualize
  1. It also supports visualization with --score-only command, where it will read data from instances.log without running inference, which saves time if you just want the scores.
simuleval --score-only --output output --visualize

Both commands will generate the corresponding graph in the output/visual directory

Example output

test

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • test_visualization.py. Both modes (inference & --score-only) passed, with the expected number of graphs in the output/visual directory.
  • A variety of audio files, and their visualization ensures that the format stays consistent and the words are easy to read.

Note

  • Only audio files where their rate is 22 kHZ have been tested to work. If you use iPhone's voice memo where the rate is 44.1 kHz, make sure to lower your audio rate using the following command:
    brew install sox (for Mac user)

Then:

pip install sox
sox test.wav -r 22050 test_22k.wav

Then, put test_22k.wav in source.txt and provide its transcript as the reference text in reference/transcript.txt.

Special thanks to:

My MLH Fellowship mentor: @xutaima

@facebook-github-bot
Copy link
Contributor

Hi @Epic-Eric!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 21, 2024
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@xutaima xutaima self-requested a review July 24, 2024 15:21
.gitignore Outdated Show resolved Hide resolved
examples/speech_to_text/output/config.yaml Outdated Show resolved Hide resolved
examples/speech_to_text/reference/transcript.txt Outdated Show resolved Hide resolved
examples/speech_to_text/whisper_waitk.py Outdated Show resolved Hide resolved
simuleval/cli.py Outdated Show resolved Hide resolved
simuleval/data/dataloader/dataloader.py Outdated Show resolved Hide resolved
simuleval/evaluator/evaluator.py Outdated Show resolved Hide resolved
simuleval/utils/agent.py Outdated Show resolved Hide resolved
simuleval/utils/visualize.py Outdated Show resolved Hide resolved
@xutaima
Copy link
Contributor

xutaima commented Jul 31, 2024

Hi @Epic-Eric, thanks for the PR! It looks good in general! A few suggestion

  • could you clean up the code a little bit? e.g. remove the comments and debug files
  • make sure the test cases are all passed
  • add your own test here

@xutaima xutaima mentioned this pull request Jul 31, 2024
1 task
@@ -125,8 +125,8 @@ class IterableDataloader:

@abstractmethod
def __iter__(self):
...
pass
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to pass, since the 3.7 and 3.8 Python's black formatters have different versions and they treat this ellipsis differently. The keyword pass also implies coming back to this code later on.

pytest simuleval/test/test_evaluator.py
pytest simuleval/test/test_remote_evaluation.py
pytest simuleval/test/test_s2s.py
pytest simuleval/test/test_visualize.py
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Black's formatting

@@ -40,6 +40,8 @@
"bitarray==2.6.0",
"yt-dlp",
"pydub",
"openai-whisper",
"editdistance",
],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed for running whisper

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Eric, could you remove the dependency on openai-whisper and pip install openai-whisper in test plam?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Just curious, why don't we put it in setup.py, so the user can run custom audio files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean to put it in main.yaml instead right

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

open(self.output / "instances.log", "a")
if self.output
else contextlib.nullcontext()
) as file:
system.reset()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Black's formatting

Copy link
Contributor

@xutaima xutaima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Eric, it looks great! Could address the comments on the openai whipser? After that we can merge the PR

@@ -40,6 +40,8 @@
"bitarray==2.6.0",
"yt-dlp",
"pydub",
"openai-whisper",
"editdistance",
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Eric, could you remove the dependency on openai-whisper and pip install openai-whisper in test plam?

@xutaima xutaima merged commit 7b45f68 into facebookresearch:main Aug 14, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants