Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v25.2.0 #264

Open
wants to merge 117 commits into
base: main
Choose a base branch
from
Open

Conversation

ROBERT-MCDOWELL
Copy link
Collaborator

CHANGELOG

version 25.2.0:

  • version structure is now based on YEAR.MONTH.PATCH_NUMBER
  • Now no need to have admin privielges on Windows to install ebook2audiobook packages (replaced chocolatey by scoop)
  • added MPS processor
  • added custom models dropdown list
  • added voices dropdown list and play button to listen each of them
  • added voice extractor for upload voices (separate vocals from background and music)
  • added delete button for voices, custom models and audiobooks list
  • added builtin voices to the voices list and can be used for all TTS models
  • added --output_dir for custom output folder in headless mode
  • added directory options for ebook upload batch files in gradio/gui mode
  • added new output audio format ['m4b', 'm4a', 'mp4', 'webm', 'mov', 'mp3', 'flac', 'wav', 'ogg', 'aac'].
    More can be added on demand.
  • added running conversion cancellation via the ebook upload gradio component (when the "X" is clicked)
  • hew global config settings:
    tmp_expire = for inactive session before cleanup, in days
    max_custom_model: max custom model on list (by session id)
    max_custom_voices: max custom voice on list (by session id)
    tts_default_settings: fine tuned XTTS default parameters
    (refer to ./lib/conf.py for all new configuration settings)
  • gradio GUI settings are now saved and restored on refresh and browser exit
  • resume conversion in headlless and gradio GUI mode, when client page/connection lost or reloaded
    (however the user should restart the process manually with the same session id)
  • Math symbols and numbers to phonemes are now on all TTS engines
    (non covered languages are prounounced with the default_language_code set in ./lib/conf.py.
    PR are welcome to fix missing translations)
  • audio filtering, normalization and improvement of all upload voices and final audiobook
    to have the best sound presence and clarity.
  • fixed custom model upload
  • fixed missing pages in conversion
  • fixed modules and libraries missing during the installation (regex, mecab etc..)
  • various gradio design improvements
  • optimized multi language sentence splitting to minimize hallucinations and unnatural pauses
  • now numbers and maths symbles are said for fairseq and XTTSv2
  • the TTS model is now loaded once in the script and for all users using the same model
  • added coqui-tts builtin voices for all TTS engines and as standard in all languages
  • added new modal alerts for info, error, exception aand warnings
  • removed docker_utils which was a docker with ffmpeg and calibre only

Many more fixes and new features, but don't remember all.... see by yourself ;)

Currently in development:

  • added Terminal output console to gradio/gui
  • implement more TTS engines (list not decided yet)
  • apprise notification
  • implement chapter summarizing to create background music and sounds
  • implement indices in the metadata for each sentence in the final file
    to eventually improve the prounounciation and replace it with the new sentence.
  • add builtin voice list of xttsv2
  • add czhech, croation and others with cv/vits
  • add music interlude between chapters
  • adding chapters name (if chapters well detected) in place of number in the final metadata
  • split the output in multiple file if > 12hours # chapters as final
  • installation of the right torch and cuda version if GPU available so deepspeed can be used
  • automatic user crash bug report by email via a URL request
  • create a legends.py file for all gradio/gui legends to manage multilanguage
  • mark each sentence number in the metadata with the timeecode so
    the user would be able to re-convert one sentence before to export the audiobook
    (it requires to not delete the ebook temp folder)
  • use websocat in cmd and sh script to connect in headless mode via gradio and avoid tts load at each command

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant