This project is about building an application which synthesizes speech from user-provided text. The application is written in Python and uses the Kivy framework for the user interface.
Implementing intonations and emotions is still a significant challenge in the Assistive Technology applications of Text-To-Speech, but it would definitely enhance the communication experience for people with speech impairment. The aim of Speech Jokey is therefore to allow people with communication difficulties to interact with more intonation, emotions and emphasis pauses. In addition, the application is specifically designed to be used with eye tracking systems, facilitating the positioning of the cursor between lines and words of a text.
Like a DJ, this application allows you to create your own text with impressive emotions, different intonations and meaningful pauses as voice output just the way you like it, which explains the name Speech Jokey.
The designed logo for the application is currently:
Demo video of the speech jokey program showing the generation of synthetic speech using the Elevenlabs API.
The application currently supports the following speech synthesis engines:
The project is based on Python 3.11
, but it also supports lower version down to 3.9
. To install Python, follow the instructions on the Python website.
We use poetry for dependency management. To install poetry, please check their installation instructions. Or simply install it by running:
pip install poetry
Then make sure to configure poetry to install the virtual environment in the project root. This can be done by running:
poetry config virtualenvs.in-project true
Please install the following packages first:
sudo apt-get install xsel xclip
Installing the virtual environment is done by running:
poetry install --no-root
The dependencies are listed in the pyproject.toml file. To add a new dependency, run:
poetry add <dependency>
The following procedures assume that you have installed the dependencies and that you are working inside the virtual environment.
To run the application, execute the following command in the root of the project:
poetry run python src/main.py
To build the application, execute the following command in the root of the project:
(You might wanna grab a coffee while running this)
poetry run pyinstaller src/main.py --onefile --name SpeechJokey
The created build application specification SpeechJokey.spec
can now be found in the root of the project.
This file needs to be modified according to the following steps:
- Import kivy dependencies at the top of the file:
from kivy_deps import sdl2, glew
- Add source tree after
COLLECT(exe,
:Tree('src\\'),
- Add source dependencies after
a.datas,
:*[Tree(p) for p in (sdl2.dep_bins + glew.dep_bins)],
After these modifications, the application can be finalized by running:
(Should be very quick after the initial build)
poetry run pyinstaller SpeechJokey.spec
Inside the dist
output folder a folder with the name SpeechJokey
can be found. This folder contains the final .exe
build of the application.
For a detailed step-by-step guide on how to build a Kivy application, see this written tutorial.
(Keep in mind that the tutorial doesn't use poetry, so any command should be preceeded by poetry run
)
To build the application similar to how it would be built by the CI, copy the SpeechJokey.spec
from .github\static
to the project root and then execute the following command in the root of the project:
poetry run pyinstaller SpeechJokey.spec
This is what the application currently looks like.
The settings page looks like this:
Specific settings for the speech engine ElevenLabs is looking like this:
Using the loading button, the user can select a saved text file and upload it in the text input. The text can still be edited.
To simplify editing the text, the cursor set via eye tracker is always placed at the end of a word. To move the cursor one position to the left or right, the user can use the arrow buttons at the bottom left of the application.
The editing feature is addressed especially to people who need eye tracking devices to move the cursor.
The voice can be selected using the voice selection button or in the settings. All available voices are listed. On the selection of a voice a Popup will appear and the selected voice is displayed.
The currently selected voice is always displayed on the voice selection button.
Using the voice selection button:
Choosing the voice directly in the settings:
The model can be selected in the settings. All available models are listed and the selected model is displayed.
To use ElevenLabs API the generated API Key must be entered in the settings.
To change the intonation adding breaks into text, shortcuts are implented. The break time can be adjusted in the code:
- , adds a break of 0.0s
- . adds a break of 0.5s
- ; adds a break of 0.5s
An audio file is generated using the synthesizing button.
Before playing the audio file, an audio file has to be synthesized using the synthesizing button.
The file can then be played with the play button.
The final version of the edited text can be saved as a text file.
Here is a video of the intended features of the application.
Git is a version control system. It allows you to keep track of changes made to your code and to collaborate with others. To learn more about Git, see this fundamental beginner tutorial.
Alternatively, you can play the Git game to learn git interactively.
GitHub is a platform for hosting Git repositories. It allows you to collaborate with others on your code. To learn more about GitHub, see this crash course.
VS Code is a code editor. It allows you to write code and to collaborate with others. To learn more about VS Code, see this crash course.
Kivy is a framework for building user interfaces. It allows you to build user interfaces for your application. To learn more about Kivy, watch this playlist for a beginner friendly introduction to the framework.
Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you. For a short introduction to poetry, see this tutorial.