manual.md
# PDF to JSON Transcriber User Manual
Welcome to the PDF to JSON Transcriber user manual. This software is designed to transcribe text from PDF documents, such as guides and manuals, into an efficient JSON format. The resulting JSON documents will be true to the specific text of the PDF, optimized for training GPT models.
## System Requirements
- Python 3.6 or higher
- PyMuPDF 1.18.19
## Installation
Before running the application, you need to install the required dependencies. You can do this by running the following command in your terminal:
```bash
pip install -r requirements.txt
This will install PyMuPDF, which is necessary for reading PDF files.
To start the application, navigate to the directory containing the main.py
file and run:
python main.py
This will open the graphical user interface (GUI) of the PDF to JSON Transcriber.
- Click on the "Import PDF" button.
- Navigate to the location of the PDF file you wish to transcribe.
- Select the file and click "Open".
The path of the imported PDF will be displayed in the application window.
- Once a PDF is imported, the "Export JSON" button will become active.
- Click on the "Export JSON" button.
- Choose the desired location to save the JSON file and provide a file name.
- Click "Save".
A success message will appear if the JSON file has been exported successfully. If there is an error during the text extraction process, an error message will be displayed.
- Import PDF: Allows you to select and import a PDF file from your local storage.
- Export JSON: Once a PDF is imported, you can export the transcribed text to a JSON file.
- File Path Display: Shows the path of the currently imported PDF file.
If you encounter any issues with the software, please ensure that you have the correct version of Python installed and that all dependencies from the requirements.txt
file have been installed properly.