Releases: TartuNLP/text-to-speech-worker
Releases · TartuNLP/text-to-speech-worker
Version 3.0.0
A new major version compatible with API version 3.0.0 or newer.
Compatible models (multispeaker, vctk and ljspeech) are attached below. Ensure they are downloaded, unzipped and structured as follows:
models
├── hifigan
│ ├── ljspeech
│ │ ├── config.json
│ │ └── model.pt
│ ├── vctk
│ │ ├── config.json
│ │ └── model.pt
└── tts
└── multispeaker
├── config.yaml
└── model_weights.hdf5
The following commands should be sufficient to achieve this:
wget -P models/tts/ https://github.com/TartuNLP/text-to-speech-worker/releases/download/v3.0.0/multispeaker.zip
wget -P models/hifigan/ https://github.com/TartuNLP/text-to-speech-worker/releases/download/v3.0.0/ljspeech.zip
wget -P models/hifigan/ https://github.com/TartuNLP/text-to-speech-worker/releases/download/v3.0.0/vctk.zip
unzip -d models/tts/ models/tts/multispeaker.zip
unzip -d models/hifigan/ models/hifigan/ljspeech.zip
unzip -d models/hifigan/ models/hifigan/vctk.zip
Additionally, the code is still compatible with older single-speaker models.
Changes:
- Added multispeaker model support (attached below)
- Added a workaround to synthesize longer sentences in multiple parts
- More information is sent to the API (predicted durations, normalized text, etc.)
- Minor bug fixes
Known issues:
- TF_VRAM_LIMIT does not reflect actual VRAM usage but just the amount used by the TTS model (not including the vocoder).
WARNING: git_hash mismatch
upon startup - the warning can be ignored.
Disclaimer - the LJSpeech and VCTK HiFiGAN vocoder models below are from this HiFiGAN repository.
Version 2.0.0
A TransformerTTS version with separate models per speaker. The included config file refers to the 6 models attached to this release.
Known issues:
- TF_VRAM_LIMIT does not reflect actual VRAM usage but just the amount used by the TTS model (not including the vocoder).
- VRAM limitations can cause unpredictable sentence length limitations.
Version 1.0.0
Deep Voice 3 worker for Estonian. The default model supports 6 different speakers.
- Mari (news)
- Kalev (news)
- Albert (news)
- Vesta (news)
- Külli (literature)
- Meelis (literature)
Based on this.