[Bug] formatting_your_dataset doc doesn't match the LJSpeech formatter #3050

sqrt10pi · 2023-10-09T14:44:22Z

Describe the bug

The formatting_your_dataset doc recommends writing your dataset in this format with the note that it'll be compatible with the LJSpeech formatter:

# metadata.txt

audio1|This is my sentence.
audio2|This is maybe my sentence.
audio3|This is certainly my sentence.
audio4|Let this be your sentence.
...

If you create a dataset in that format and try to follow the instructions in tutorial_for_nervous_beginners, you'll get an error:

root@937a34667dbe:~# CUDA_VISIBLE_DEVICES="0" python3 TTS/bin/train_tts.py --config_path config.json
Traceback (most recent call last):
  File "/root/TTS/bin/train_tts.py", line 71, in <module>
    main()
  File "/root/TTS/bin/train_tts.py", line 47, in main
    train_samples, eval_samples = load_tts_samples(
  File "/root/TTS/tts/datasets/__init__.py", line 120, in load_tts_samples
    meta_data_train = formatter(root_path, meta_file_train, ignored_speakers=ignored_speakers)
  File "/root/TTS/tts/datasets/formatters.py", line 201, in ljspeech
    text = cols[2]
IndexError: list index out of range

Looking at the code for that formatter it's expecting 3 columns (looking up cols[2])

TTS/TTS/tts/datasets/formatters.py

Lines 198 to 202 in 9963519

    
           for line in ttf: 
        
               cols = line.split("|") 
        
               wav_file = os.path.join(root_path, "wavs", cols[0] + ".wav") 
        
               text = cols[2] 
        
               items.append({"text": text, "audio_file": wav_file, "speaker_name": speaker_name, "root_path": root_path})

To Reproduce

Create transcript.txt

audio1|This is my sentence.
audio2|This is maybe my sentence.
audio3|This is certainly my sentence.
audio4|Let this be your sentence.

Create config,json:

{
    "run_name": "my_run",
    "model": "glow_tts",
    "batch_size": 32,
    "eval_batch_size": 16,
    "num_loader_workers": 4,
    "num_eval_loader_workers": 4,
    "run_eval": true,
    "test_delay_epochs": -1,
    "epochs": 1000,
    "text_cleaner": "english_cleaners",
    "use_phonemes": false,
    "phoneme_language": "en-us",
    "phoneme_cache_path": "phoneme_cache",
    "print_step": 25,
    "print_eval": true,
    "mixed_precision": false,
    "output_path": "recipes/ljspeech/glow_tts/",
    "datasets":[{"formatter": "ljspeech", "meta_file_train":"transcript.csv", "path": "/dataset"}]
}

Run CUDA_VISIBLE_DEVICES="0" python3 TTS/bin/train_tts.py --config_path config.json

Expected behavior

No response

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [],
        "available": false,
        "version": "11.8"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.1.0+cu118",
        "TTS": "0.17.8",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            ""
        ],
        "processor": "x86_64",
        "python": "3.10.12",
        "version": "#34~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep  7 13:12:03 UTC 2"
    }
}

This is a docker image based on the coqui-ai/tts docker image:

FROM ghcr.io/coqui-ai/tts

EXPOSE 5002

RUN mkdir -p /dataset
COPY wavs /dataset

RUN apt-get update && \
    apt-get install --yes \
      nano

COPY config.json config.json

ENTRYPOINT bash

Additional context

No response

The text was updated successfully, but these errors were encountered:

erogol · 2023-10-16T10:09:51Z

ljspeech formatter expects 3 columns.

WeberJulian · 2023-10-16T10:25:50Z

Thanks for flagging @sqrt10pi, looks like the doc was misleading, fixed with #3070

sqrt10pi added the bug Something isn't working label Oct 9, 2023

erogol closed this as completed Oct 16, 2023

WeberJulian mentioned this issue Oct 16, 2023

Fix doc dataset #3070

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] formatting_your_dataset doc doesn't match the LJSpeech formatter #3050

[Bug] formatting_your_dataset doc doesn't match the LJSpeech formatter #3050

sqrt10pi commented Oct 9, 2023 •

edited

Loading

erogol commented Oct 16, 2023

WeberJulian commented Oct 16, 2023

[Bug] formatting_your_dataset doc doesn't match the LJSpeech formatter #3050

[Bug] formatting_your_dataset doc doesn't match the LJSpeech formatter #3050

Comments

sqrt10pi commented Oct 9, 2023 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

erogol commented Oct 16, 2023

WeberJulian commented Oct 16, 2023

sqrt10pi commented Oct 9, 2023 •

edited

Loading