This project allows you to engage in interactive conversations with your PDF documents using LangChain, ChromaDB, and OpenAI's API. With this powerful combination, you can extract valuable insights and information from your PDFs through dynamic chat-based interactions.
The architecture of this project involves several components working together:
-
LangChain: It serves as the interface for communication with OpenAI's API. LangChain handles rephrasing, retrieves relevant text chunks, and manages the conversation flow.
-
ChromaDB: A vector database used to store and query high-dimensional vectors. It helps in efficiently searching for and retrieving relevant text chunks during conversations.
-
OpenAI's API: The API provides access to OpenAI's language models, such as GPT-3.5 Turbo. It processes prompts, generates responses, and incorporates retrieved text chunks to ensure accurate and context-aware conversations.
To get started with this project, follow the steps below:
- Python
- Pipenv
-
Clone the repository:
git clone https://github.com/yash9439/chat-with-multiple-pdf
-
Navigate to the project directory:
cd chat-with-multiple-pdf
-
Install the required dependencies using Pipenv:
pipenv install
-
Activate the Pipenv shell:
pipenv shell
-
Create a
.env
file and replaceOPENAI_API_KEY="sk-XXXXXXXXXXXXXXXXXXXXXXXXXXXX"
with your OpenAI API key:echo 'OPENAI_API_KEY="sk-XXXXXXXXXXXXXXXXXXXXXXXXXXXX"' > .env
-
Run merge script to combine all the PDF to chat with them simultaniously:
python src/merge.py
-
Run the ingestion script to parse and extract text from the PDF:
python src/ingest.py
-
Start the conversation script to interact with the PDF:
python src/chat-with-multiple-pdf.py
-
The pdf used here are a AI Development Index report 2023 and a research paper https://www.researchgate.net/publication/323498156_Artificial_Intelligence
-
OpenAI: OpenAI's platform provides access to powerful language models and APIs.
-
LangChain: LangChain is the library used for communication and interaction with OpenAI's API.
-
Chroma DB: Chroma DB is a vector database used to store and query high-dimensional vectors efficiently.
Feel free to explore this project and enhance it further to suit your needs. Enjoy chatting with your PDFs and extracting valuable insights!