Skip to content

Releases: TartuNLP/text-to-speech-worker

Version 3.0.0

04 Jan 17:47
Compare
Choose a tag to compare

A new major version compatible with API version 3.0.0 or newer.

Compatible models (multispeaker, vctk and ljspeech) are attached below. Ensure they are downloaded, unzipped and structured as follows:

models
├── hifigan
│   ├── ljspeech
│   │   ├── config.json
│   │   └── model.pt
│   ├── vctk
│   │   ├── config.json
│   │   └── model.pt
└── tts
    └── multispeaker
        ├── config.yaml
        └── model_weights.hdf5

The following commands should be sufficient to achieve this:

wget -P models/tts/ https://github.com/TartuNLP/text-to-speech-worker/releases/download/v3.0.0/multispeaker.zip
wget -P models/hifigan/ https://github.com/TartuNLP/text-to-speech-worker/releases/download/v3.0.0/ljspeech.zip
wget -P models/hifigan/ https://github.com/TartuNLP/text-to-speech-worker/releases/download/v3.0.0/vctk.zip
unzip -d models/tts/ models/tts/multispeaker.zip
unzip -d models/hifigan/ models/hifigan/ljspeech.zip
unzip -d models/hifigan/ models/hifigan/vctk.zip

Additionally, the code is still compatible with older single-speaker models.

Changes:

  • Added multispeaker model support (attached below)
  • Added a workaround to synthesize longer sentences in multiple parts
  • More information is sent to the API (predicted durations, normalized text, etc.)
  • Minor bug fixes

Known issues:

  • TF_VRAM_LIMIT does not reflect actual VRAM usage but just the amount used by the TTS model (not including the vocoder).
  • WARNING: git_hash mismatch upon startup - the warning can be ignored.

Disclaimer - the LJSpeech and VCTK HiFiGAN vocoder models below are from this HiFiGAN repository.

Version 2.0.0

14 Sep 12:59
Compare
Choose a tag to compare

A TransformerTTS version with separate models per speaker. The included config file refers to the 6 models attached to this release.

Known issues:

  • TF_VRAM_LIMIT does not reflect actual VRAM usage but just the amount used by the TTS model (not including the vocoder).
  • VRAM limitations can cause unpredictable sentence length limitations.

Version 1.0.0

14 Sep 09:16
Compare
Choose a tag to compare

Deep Voice 3 worker for Estonian. The default model supports 6 different speakers.

  1. Mari (news)
  2. Kalev (news)
  3. Albert (news)
  4. Vesta (news)
  5. Külli (literature)
  6. Meelis (literature)

Based on this.