Retrieval-Augmented Generation for Question-Answering on PDFs
This project provides a Retrieval-Augmented Generation (RAG) based solution for extracting answers from PDF documents. It Leverages LangChain for document processing and answer generation pipelines, LangChain Vectors for embedding, and Streamlit for the user interface.
- Python 3
- python-dotenv Documentation
- PyPDF2 Documentation
- pypdf Documentation
- Faiss CPU Documentation
- langchain-openai faiss-cpu Documentation
- Lang-chain Documentation
- LanG-Chain OPENAI Documentation
- StreamIt Documentation
-
Install all the dependencies:
%pip install -r requirements.txt
-
Alternatively, install the packages you need:
- PyPDF2
%pip install pypdf
- pypdf
%pip install pypdf
- Python dotenv
%pip install python-dotenv
- Faiss CPU
pip install faiss-cpu
- Lang chain OpenAI Faiss CPU
pip install --upgrade langchain-openai faiss-cpu
- Langchain
%pip install langchain
- Langchain-openAI
%pip install langchain-openai
- PyPDF2
- PDF Upload and Indexing
- PDF Deletion
- Vector-Based Retrieval
- Question Answering
- Create a .env file.
- Add the following fields:
- OPENAI_API_KEY = 'your_api_key'
- LANG_CHAIN_API_KEY = 'your_api_key'
- Add the .env file to your gitignore.
-
list.py
: Returns a list of all uploaded PDFs.- Usage:
python3 list.py
- Response:
- On success: Returns a list of PDF files and the count.
- On failure: Error: No PDF files present.
- Usage:
-
upload.py
: Uploads a PDF to the document directory.- Usage:
python3 upload.py --pdf_file=sample.pdf
- Response:
- On success:
Upload successful! Uploaded path and filename: document/file_sample.pdf
- On failure: [Errno 2] No such file or directory.
- On success:
- Usage:
-
retrieve.py
: Retrieves a PDF by name along with its content.- Usage:
python3 retrieve.py --pdf_file=sample.pdf
- Response:
- On success:
{'filename': 'sample.pdf', 'content': "lifelong"}
- On failure: Error: PDF file 'sample.pdf' not found in 'document'.
- On success:
- Usage:
-
delete.py
: Deletes a PDF by name and returns the current directory count.- Usage:
python3 delete.py --pdf_file=sample.pdf
- Response:
- On success:
{'filename': 'sample.pdf', 'content': "lifelong"}
- On failure:
PDF count before deletion: 2 PDF count after deletion: 1 sample.pdf deleted successfully.
- On success:
- Usage:
-
query.py
: Queries a PDF and returns an answer with a character limit of 300.- Usage:
python3 query.py --question="What are heat exchangers?"
- Response:
- On success:
{'page_number': 'answer'}
- On failure: An error has occurred.
- On success:
- Usage: