A Streamlit-based application that allows users to have interactive conversations with their PDF documents using GPT and RAG (Retrieval Augmented Generation) technique.
- Upload multiple PDF documents
- Interactive chat interface
- RAG (Retrieval Augmented Generation) for accurate responses
- Document chunking for better context handling
- Conversation memory to maintain context
- Clean and intuitive user interface
- Streamlit - Web application framework
- LangChain - LLM application framework
- OpenAI GPT - Large Language Model
- FAISS - Vector store for embeddings
- PyPDF - PDF processing
- Python 3.7+
- OpenAI API key
- Clone the repository:
git clone https://github.com/rajeshai/Streamlit-RAG-chatwith-PDF.git
cd Streamlit-RAG-chatwith-PDF
- Install required packages:
pip install -r requirements.txt
- Create a
requirements.txt
file with the following dependencies:
streamlit
langchain
openai
pypdf
tiktoken
langchain-community
faiss-cpu
- Run the Streamlit app:
streamlit run app.py
-
Open your web browser and navigate to the provided local URL (typically
http://localhost:8501
) -
Enter your OpenAI API key in the sidebar
-
Upload one or more PDF files
-
Click "Process PDFs" to initialize the chat system
-
Start asking questions about your documents!
-
Document Processing:
- PDFs are loaded and split into smaller chunks
- Text chunks are converted into embeddings using OpenAI's embedding model
- Embeddings are stored in a FAISS vector store for efficient retrieval
-
Query Processing:
- User questions are processed to find relevant document chunks
- Retrieved context is combined with the question
- GPT generates accurate responses based on the provided context
-
Conversation Management:
- Maintains chat history for context
- Handles multiple PDF documents simultaneously
- Provides a clean chat interface for interaction
- Requires an OpenAI API key (paid service)
- Large PDF files may take longer to process
- Processing time depends on the number and size of uploaded PDFs
Feel free to open issues or submit pull requests if you have suggestions for improvements!
- Built using OpenAI's GPT model
- Inspired by the LangChain framework
- Made possible by the Streamlit community