GitHub

manual.md

# PDF to JSON Transcriber User Manual

Welcome to the PDF to JSON Transcriber user manual. This software is designed to transcribe text from PDF documents, such as guides and manuals, into an efficient JSON format. The resulting JSON documents will be true to the specific text of the PDF, optimized for training GPT models.

## System Requirements

- Python 3.6 or higher
- PyMuPDF 1.18.19

## Installation

Before running the application, you need to install the required dependencies. You can do this by running the following command in your terminal:

```bash
pip install -r requirements.txt

This will install PyMuPDF, which is necessary for reading PDF files.

Starting the Application

To start the application, navigate to the directory containing the main.py file and run:

python main.py

This will open the graphical user interface (GUI) of the PDF to JSON Transcriber.

Using the Software

Importing a PDF

Click on the "Import PDF" button.
Navigate to the location of the PDF file you wish to transcribe.
Select the file and click "Open".

The path of the imported PDF will be displayed in the application window.

Exporting to JSON

Once a PDF is imported, the "Export JSON" button will become active.
Click on the "Export JSON" button.
Choose the desired location to save the JSON file and provide a file name.
Click "Save".

A success message will appear if the JSON file has been exported successfully. If there is an error during the text extraction process, an error message will be displayed.

Main Functions

Import PDF: Allows you to select and import a PDF file from your local storage.
Export JSON: Once a PDF is imported, you can export the transcribed text to a JSON file.
File Path Display: Shows the path of the currently imported PDF file.

Troubleshooting

If you encounter any issues with the software, please ensure that you have the correct version of Python installed and that all dependencies from the requirements.txt file have been installed properly.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
prompt		prompt
.DS_Store		.DS_Store
README.md		README.md
json_writer.py		json_writer.py
main.py		main.py
manual.md		manual.md
pdf_reader.py		pdf_reader.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Starting the Application

Using the Software

Importing a PDF

Exporting to JSON

Main Functions

Troubleshooting

About

Releases

Packages

Languages

seaneschen/PDF2JSON

Folders and files

Latest commit

History

Repository files navigation

Starting the Application

Using the Software

Importing a PDF

Exporting to JSON

Main Functions

Troubleshooting

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages