Welcome to the CDP Knowledge Assistant! This chatbot leverages cutting-edge Natural Language Processing (NLP) techniques to help users retrieve and answer queries from documentation of CDP platforms such as Segment, mParticle, Lytics, and Zeotap.
The CDP Knowledge Assistant is built using:
- Web scraping to collect documentation data.
- Sentence Transformers for embedding generation.
- HDBSCAN clustering to organize documentation into topics.
- Nearest Neighbors for efficient retrieval.
- Qwen 2.5B-Instruct for contextual and accurate query generation.
- Automated Web Scraping: Extracts and organizes documentation text.
- Contextual Query Answering: Retrieves relevant text chunks based on input queries.
- Clustered Data Organization: Groups similar topics for efficient information retrieval.
- Interactive Chatbot Experience: Real-time interaction using NLP-powered models.
Follow these instructions to set up and run the CDP Knowledge Assistant locally.
Ensure you have the following installed:
- Python 3.8 or higher
- pip (Python package manager)
-
Clone the repository:
git clone https://github.com/Tirthraj1605/CDP-Knowledge-Assistant.git cd CDP-Knowledge-Assistant
-
Install the required dependencies:
pip install -r requirements.txt
-
Download the Qwen 2.5B-Instruct model and place it in the appropriate directory. You can use Hugging Face's
transformers
library to automate this.
requirements.txt
: Contains all necessary Python libraries.chatbot_notebook.ipynb
: It is a notebook of Chatbot.streamlit_chatbot.py
: It's a Chatbot with streamlit frontend.- Data Directory: Contains pre-scraped or clustered data, if applicable.
-
Run the chatbot:
streamlit run streamlit_chatbot.py
-
Interact with the chatbot by typing your queries. Example:
Your Query: How do I set up a new source in Segment?
-
Type
exit
to quit the chatbot.
CDP-Knowledge-Assistant/
|-- streamlit_chatbot.py # Main chatbot script
|-- chatbot_notebook.ipynb # Chatbot Notebook
|-- requirements.txt # Python dependencies
|-- README.md # Project description
|-- data/ # Directory to store data and embeddings , if applicable
- Web Scraping:
BeautifulSoup
,requests
- Clustering:
HDBSCAN
- Nearest Neighbor Search:
sklearn
- Sentence Transformers:
all-MiniLM-L6-v2
- Language Model: Qwen 2.5B-Instruct