Welcome to the Intelligent Document Q&A System, a powerful tool designed to process PDF documents and provide concise answers to user queries based on the content of those documents. This application leverages advanced language models and document processing techniques to deliver accurate and contextually relevant responses.
- PDF Document Processing: Upload and process PDF documents to extract and store content for querying.
- Contextual Q&A: Ask questions and receive answers based solely on the content of the uploaded documents.
- Chat History: Maintain a history of questions and answers for easy reference.
- Streamlit Interface: User-friendly web interface built with Streamlit for seamless interaction.
- Langchain: For document processing and language model integration.
- OpenAI GPT-4o: Utilized for generating responses to user queries.
- Streamlit: Provides a simple and interactive web interface.
- Chroma: Used for storing and retrieving document embeddings.
- PyPDFLoader: For loading and parsing PDF documents.
- RecursiveCharacterTextSplitter: Splits documents into manageable chunks for processing.
To set up the application locally, follow these steps:
-
Clone the Repository:
git clone https://github.com/vinayjain18/document-qa-system.git cd document-qa-system
-
Install Dependencies: Ensure you have Python 3.10+ installed. Then, install the required packages:
pip install -r requirements.txt
-
Set Environment Variables: Set your OpenAI API key in the environment:
export OPENAI_API_KEY='your-openai-api-key' export OPENAI_MODEL='gpt-4o-mini'
-
Run the Application: Start the Streamlit server:
streamlit run app.py
- Upload Documents: Use the sidebar to upload PDF documents. Ensure each file is under 10MB.
- Process Documents: Click the "Process Documents" button to extract and store document content.
- Ask Questions: Once documents are processed, enter your question in the text input field.
- View Answers: The system will provide answers based on the document content. Review the chat history for past interactions.
- CHROMA_PATH: Directory path for storing document embeddings.
- MAX_HISTORY_LENGTH: Maximum number of interactions stored in chat history.
The application uses Python's built-in logging module to log information and errors. Logs are output to the console for easy monitoring.
- Integration with other NLP models: Integrate with other NLP models like BERT.
- Support for multiple languages: Add support for multiple languages.
- Enhanced user interface: Improve the user interface for better user experience.
- Document categorization: Implement document categorization based on content.
- Other Document types: Implement other types of documents like docs, html, xml, and many more.
- Customer Support: Implement it for creating Chatbots that uses your document to give precise answers.
This project is licensed under the MIT License.
For questions or support, please contact vinayjain449@gmail.com.
Thank you for using the Intelligent Document Q&A System! We hope it enhances your document processing and querying experience.