This project provides a Python script that converts PDF documents into audio files. The script extracts text from a PDF, cleans the text, converts it to speech, and saves it as an audio file in MP3 format.
- Extracts text from large PDF files.
- Cleans unnecessary characters from the extracted text.
- Converts text to speech using
pyttsx3
. - Splits the text into chunks and processes them to avoid memory overload.
- Combines the generated audio chunks into a single MP3 file.
This project uses Poetry for dependency management. The required Python libraries are:
- PyPDF2
- pyttsx3
- pydub
macOS (using Homebrew)
You need to install ffmpeg to handle audio processing. This can be installed via Homebrew:
brew install ffmpeg
Windows
For Windows, you will need to download and install FFmpeg manually:
- Download the latest build of FFmpeg from the official website.
- Extract the files and add the bin directory to your system's PATH.
Linux
On Linux, FFmpeg can usually be installed via the package manager.
For Debian-based distributions (like Ubuntu):
sudo apt-get update
sudo apt-get install ffmpeg
For Red Hat-based distributions (like Fedora or CentOS):
sudo dnf install ffmpeg
- Clone the repository:
git clone https://github.com/sairoko12/pdf-2-audio.git
cd pdf-2-audio
- Install Python dependencies using Poetry:
- Ensure you have Poetry installed. If not, you can install it via pip:
pip install poetry
- Then, install the project dependencies:
poetry install
- Ensure you have Poetry installed. If not, you can install it via pip:
- Install System Dependencies:
- Follow the instructions in the “System Dependencies” section to install FFmpeg on your system.
To convert a PDF file to an audio file, run the following command:
poetry run python script.py
You should modify the next parameters of audio output:
- rate
- The rate parameter in pyttsx3 controls the speed of speech, measured in words per minute (WPM). Lower values make the speech slower, while higher values increase the speed. Adjusting this allows you to tailor the speech pace to your needs.
- volume
- The volume parameter in pyttsx3 controls the loudness of the speech. It is a float value between 0.0 (mute) and 1.0 (maximum volume). Adjusting this allows you to set the desired loudness level for the speech output.
- language of speech
- Available languages (applies only for macos)
# Example for frech usage with fast speech
convert_pdf_to_audio(
pdf_file='path/to/pdf_file.pdf',
output_file_name='path/to/outpu_audio_file.mp3',
language='fr-FR',
rate=250,
)
This will convert the specified PDF file to an MP3 audio file.
Temporary audio files generated during the conversion process are automatically removed after the final audio file is created.
If you encounter any issues or have suggestions for improvements, please feel free to submit an issue or a pull request.
This project is licensed under the MIT License. See the MIT License for more details.