Speech Jokey

This project is about building an application which synthesizes speech from user-provided text. The application is written in Python and uses the Kivy framework for the user interface.

Implementing intonations and emotions is still a significant challenge in the Assistive Technology applications of Text-To-Speech, but it would definitely enhance the communication experience for people with speech impairment. The aim of Speech Jokey is therefore to allow people with communication difficulties to interact with more intonation, emotions and emphasis pauses. In addition, the application is specifically designed to be used with eye tracking systems, facilitating the positioning of the cursor between lines and words of a text.

Like a DJ, this application allows you to create your own text with impressive emotions, different intonations and meaningful pauses as voice output just the way you like it, which explains the name Speech Jokey.

The designed logo for the application is currently:

Demo video

Demo video of the speech jokey program showing the generation of synthetic speech using the Elevenlabs API.

Speech synthesis

The application currently supports the following speech synthesis engines:

ElevenLabs API

Project setup

The project is based on Python 3.11, but it also supports lower version down to 3.9. To install Python, follow the instructions on the Python website.

Install dependencies

We use poetry for dependency management. To install poetry, please check their installation instructions. Or simply install it by running:

pip install poetry

Then make sure to configure poetry to install the virtual environment in the project root. This can be done by running:

poetry config virtualenvs.in-project true

Linux specific dependencies

Please install the following packages first:

sudo apt-get install xsel xclip

Python dependency installation

Installing the virtual environment is done by running:

poetry install --no-root

Managing Dependencies

The dependencies are listed in the pyproject.toml file. To add a new dependency, run:

poetry add <dependency>

Project building

The following procedures assume that you have installed the dependencies and that you are working inside the virtual environment.

Running the application (Any OS / Development)

To run the application, execute the following command in the root of the project:

poetry run python src/main.py

Building the application executable (Windows / Local Development)

To build the application, execute the following command in the root of the project:

(You might wanna grab a coffee while running this)

poetry run pyinstaller src/main.py --onefile --name SpeechJokey

The created build application specification SpeechJokey.spec can now be found in the root of the project. This file needs to be modified according to the following steps:

Import kivy dependencies at the top of the file: from kivy_deps import sdl2, glew
Add source tree after COLLECT(exe,: Tree('src\\'),
Add source dependencies after a.datas,: *[Tree(p) for p in (sdl2.dep_bins + glew.dep_bins)],

After these modifications, the application can be finalized by running:

(Should be very quick after the initial build)

poetry run pyinstaller SpeechJokey.spec

Inside the dist output folder a folder with the name SpeechJokey can be found. This folder contains the final .exe build of the application.

For a detailed step-by-step guide on how to build a Kivy application, see this written tutorial. (Keep in mind that the tutorial doesn't use poetry, so any command should be preceeded by poetry run)

Building the application executable (Windows / CI)

To build the application similar to how it would be built by the CI, copy the SpeechJokey.spec from .github\static to the project root and then execute the following command in the root of the project:

poetry run pyinstaller SpeechJokey.spec

Intended features

This is what the application currently looks like.

The settings page looks like this:

Specific settings for the speech engine ElevenLabs is looking like this:

Loading the text

Using the loading button, the user can select a saved text file and upload it in the text input. The text can still be edited.

Editing the text

To simplify editing the text, the cursor set via eye tracker is always placed at the end of a word. To move the cursor one position to the left or right, the user can use the arrow buttons at the bottom left of the application.

The editing feature is addressed especially to people who need eye tracking devices to move the cursor.

Selecting voice

The voice can be selected using the voice selection button or in the settings. All available voices are listed. On the selection of a voice a Popup will appear and the selected voice is displayed.

The currently selected voice is always displayed on the voice selection button.

Using the voice selection button:

Choosing the voice directly in the settings:

Selecting model

The model can be selected in the settings. All available models are listed and the selected model is displayed.

Entering API Key

To use ElevenLabs API the generated API Key must be entered in the settings.

SSML features for encoding intonation

To change the intonation adding breaks into text, shortcuts are implented. The break time can be adjusted in the code:

, adds a break of 0.0s
. adds a break of 0.5s
; adds a break of 0.5s

Synthesizing of an audio file

An audio file is generated using the synthesizing button.

Playing the audio file

Before playing the audio file, an audio file has to be synthesized using the synthesizing button.

The file can then be played with the play button.

Saving the text file

The final version of the edited text can be saved as a text file.

Demonstration of SpeechJokey

Here is a video of the intended features of the application.

Tutorials for beginner contributors

How to use Git

Git is a version control system. It allows you to keep track of changes made to your code and to collaborate with others. To learn more about Git, see this fundamental beginner tutorial.

Alternatively, you can play the Git game to learn git interactively.

How to use GitHub

GitHub is a platform for hosting Git repositories. It allows you to collaborate with others on your code. To learn more about GitHub, see this crash course.

How to use VS Code

VS Code is a code editor. It allows you to write code and to collaborate with others. To learn more about VS Code, see this crash course.

How to use Kivy

Kivy is a framework for building user interfaces. It allows you to build user interfaces for your application. To learn more about Kivy, watch this playlist for a beginner friendly introduction to the framework.

How-to use poetry

Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you. For a short introduction to poetry, see this tutorial.

Name		Name	Last commit message	Last commit date
Latest commit History 272 Commits
.github		.github
.vscode		.vscode
doc		doc
playground		playground
src		src
.gitignore		.gitignore
LICENSE		LICENSE
NOTES.md		NOTES.md
README.md		README.md
natasa-speech-synthesis.code-workspace		natasa-speech-synthesis.code-workspace
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
speech-jokey.ico		speech-jokey.ico

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Jokey

Demo video

Speech synthesis

Project setup

Install dependencies

Linux specific dependencies

Python dependency installation

Managing Dependencies

Project building

Running the application (Any OS / Development)

Building the application executable (Windows / Local Development)

Building the application executable (Windows / CI)

Intended features

Loading the text

Editing the text

Selecting voice

Selecting model

Entering API Key

SSML features for encoding intonation

Synthesizing of an audio file

Playing the audio file

Saving the text file

Demonstration of SpeechJokey

Tutorials for beginner contributors

How to use Git

How to use GitHub

How to use VS Code

How to use Kivy

How-to use poetry

About

Releases

Packages

Contributors 7

Languages

License

inclusion-international/speech-jokey

Folders and files

Latest commit

History

Repository files navigation

Speech Jokey

Demo video

Speech synthesis

Project setup

Install dependencies

Linux specific dependencies

Python dependency installation

Managing Dependencies

Project building

Running the application (Any OS / Development)

Building the application executable (Windows / Local Development)

Building the application executable (Windows / CI)

Intended features

Loading the text

Editing the text

Selecting voice

Selecting model

Entering API Key

SSML features for encoding intonation

Synthesizing of an audio file

Playing the audio file

Saving the text file

Demonstration of SpeechJokey

Tutorials for beginner contributors

How to use Git

How to use GitHub

How to use VS Code

How to use Kivy

How-to use poetry

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages