Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Docker environment for remote speech to text evaluation #110

Merged
merged 16 commits into from
Aug 14, 2024

Conversation

Epic-Eric
Copy link
Collaborator

@Epic-Eric Epic-Eric commented Aug 11, 2024

Description

Docker creates a container where the operating system and dependencies are uniform and the setup process is streamlined.

To build the docker file, first change the directory to speech_to_text in the terminal from Simuleval parent folder:

cd examples/speech_to_text

Then, build the Docker image with:

docker build -t simuleval-speech-to-text:1.0 .

Next, run the remote evaluation server using the Docker image:

docker run -p 8888:8888 simuleval-speech-to-text:1.0

This binds port 8888 of the container (server) to port 8888 on the local machine (client).

To pass data to the server and execute remote evaluation, open another terminal, and change its directory to examples/speech_to_text. Finally, you can access the server with the following code for instance:

Example input

simuleval --remote-eval --remote-port 8888 \
    --source-segment-size 500 \
    --source source.txt --target reference/transcript.txt \
    --source-type speech --target-type text \
    --output output --quality-metrics WER

Example output

Screenshot 2024-08-11 at 12 06 32 PM

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

Tested locally first, as there were some bugs with the upstream repository as detailed here:
#109
Using COPY . /Simuleval instead of RUN git clone https://github.com/facebookresearch/SimulEval in the Dockerfile, I tested my local changes and ensured they worked as expected.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 11, 2024
@@ -34,6 +34,7 @@ jobs:
pip install sentencepiece
pip install -e .
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
python -c "import nltk; nltk.download('averaged_perceptron_tagger_eng')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why is nltk added here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not adding nltk made test_tree_pipeline_cmd in test_agent_pipeline.py to fail. It only happened recently, and wasn't an issue when I submitted the visualization PR. I have no clue why, but the error message told me that I need to nltk.download so I did

@xutaima xutaima merged commit 6101ba1 into facebookresearch:main Aug 14, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants