Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] formatting_your_dataset doc doesn't match the LJSpeech formatter #3050

Closed
sqrt10pi opened this issue Oct 9, 2023 · 2 comments
Closed
Labels
bug Something isn't working

Comments

@sqrt10pi
Copy link

sqrt10pi commented Oct 9, 2023

Describe the bug

The formatting_your_dataset doc recommends writing your dataset in this format with the note that it'll be compatible with the LJSpeech formatter:

# metadata.txt

audio1|This is my sentence.
audio2|This is maybe my sentence.
audio3|This is certainly my sentence.
audio4|Let this be your sentence.
...

If you create a dataset in that format and try to follow the instructions in tutorial_for_nervous_beginners, you'll get an error:

root@937a34667dbe:~# CUDA_VISIBLE_DEVICES="0" python3 TTS/bin/train_tts.py --config_path config.json
Traceback (most recent call last):
  File "/root/TTS/bin/train_tts.py", line 71, in <module>
    main()
  File "/root/TTS/bin/train_tts.py", line 47, in main
    train_samples, eval_samples = load_tts_samples(
  File "/root/TTS/tts/datasets/__init__.py", line 120, in load_tts_samples
    meta_data_train = formatter(root_path, meta_file_train, ignored_speakers=ignored_speakers)
  File "/root/TTS/tts/datasets/formatters.py", line 201, in ljspeech
    text = cols[2]
IndexError: list index out of range

Looking at the code for that formatter it's expecting 3 columns (looking up cols[2])

for line in ttf:
cols = line.split("|")
wav_file = os.path.join(root_path, "wavs", cols[0] + ".wav")
text = cols[2]
items.append({"text": text, "audio_file": wav_file, "speaker_name": speaker_name, "root_path": root_path})

To Reproduce

  1. Create transcript.txt
audio1|This is my sentence.
audio2|This is maybe my sentence.
audio3|This is certainly my sentence.
audio4|Let this be your sentence.
  1. Create config,json:
{
    "run_name": "my_run",
    "model": "glow_tts",
    "batch_size": 32,
    "eval_batch_size": 16,
    "num_loader_workers": 4,
    "num_eval_loader_workers": 4,
    "run_eval": true,
    "test_delay_epochs": -1,
    "epochs": 1000,
    "text_cleaner": "english_cleaners",
    "use_phonemes": false,
    "phoneme_language": "en-us",
    "phoneme_cache_path": "phoneme_cache",
    "print_step": 25,
    "print_eval": true,
    "mixed_precision": false,
    "output_path": "recipes/ljspeech/glow_tts/",
    "datasets":[{"formatter": "ljspeech", "meta_file_train":"transcript.csv", "path": "/dataset"}]
}
  1. Run CUDA_VISIBLE_DEVICES="0" python3 TTS/bin/train_tts.py --config_path config.json

Expected behavior

No response

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [],
        "available": false,
        "version": "11.8"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.1.0+cu118",
        "TTS": "0.17.8",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            ""
        ],
        "processor": "x86_64",
        "python": "3.10.12",
        "version": "#34~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep  7 13:12:03 UTC 2"
    }
}

This is a docker image based on the coqui-ai/tts docker image:

FROM ghcr.io/coqui-ai/tts

EXPOSE 5002

RUN mkdir -p /dataset
COPY wavs /dataset

RUN apt-get update && \
    apt-get install --yes \
      nano

COPY config.json config.json

ENTRYPOINT bash

Additional context

No response

@sqrt10pi sqrt10pi added the bug Something isn't working label Oct 9, 2023
@erogol
Copy link
Member

erogol commented Oct 16, 2023

ljspeech formatter expects 3 columns.

@WeberJulian
Copy link
Contributor

Thanks for flagging @sqrt10pi, looks like the doc was misleading, fixed with #3070

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants