CDP Knowledge Assistant

Welcome to the CDP Knowledge Assistant! This chatbot leverages cutting-edge Natural Language Processing (NLP) techniques to help users retrieve and answer queries from documentation of CDP platforms such as Segment, mParticle, Lytics, and Zeotap.

Overview

The CDP Knowledge Assistant is built using:

Web scraping to collect documentation data.
Sentence Transformers for embedding generation.
HDBSCAN clustering to organize documentation into topics.
Nearest Neighbors for efficient retrieval.
Qwen 2.5B-Instruct for contextual and accurate query generation.

Features

Automated Web Scraping: Extracts and organizes documentation text.
Contextual Query Answering: Retrieves relevant text chunks based on input queries.
Clustered Data Organization: Groups similar topics for efficient information retrieval.
Interactive Chatbot Experience: Real-time interaction using NLP-powered models.

Screenshot

Getting Started

Follow these instructions to set up and run the CDP Knowledge Assistant locally.

Prerequisites

Ensure you have the following installed:

Python 3.8 or higher
pip (Python package manager)

Installation

Clone the repository:

git clone https://github.com/Tirthraj1605/CDP-Knowledge-Assistant.git
cd CDP-Knowledge-Assistant

Install the required dependencies:
```
pip install -r requirements.txt
```
Download the Qwen 2.5B-Instruct model and place it in the appropriate directory. You can use Hugging Face's transformers library to automate this.

Required Files

requirements.txt: Contains all necessary Python libraries.
chatbot_notebook.ipynb: It is a notebook of Chatbot.
streamlit_chatbot.py: It's a Chatbot with streamlit frontend.
Data Directory: Contains pre-scraped or clustered data, if applicable.

Usage

Run the chatbot:
```
streamlit run streamlit_chatbot.py
```
Interact with the chatbot by typing your queries. Example:
```
Your Query: How do I set up a new source in Segment?
```
Type exit to quit the chatbot.

Project Structure

CDP-Knowledge-Assistant/
|-- streamlit_chatbot.py       # Main chatbot script
|-- chatbot_notebook.ipynb     # Chatbot Notebook
|-- requirements.txt           # Python dependencies
|-- README.md                  # Project description
|-- data/                      # Directory to store data and embeddings , if applicable

Technologies Used

Web Scraping: BeautifulSoup, requests
Clustering: HDBSCAN
Nearest Neighbor Search: sklearn
Sentence Transformers: all-MiniLM-L6-v2
Language Model: Qwen 2.5B-Instruct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CDP Knowledge Assistant

Overview

Features

Screenshot

Getting Started

Prerequisites

Installation

Required Files

Usage

Project Structure

Technologies Used

Acknowledgments and CDP

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
Screenshot.png		Screenshot.png
chatbot_notebook.ipynb		chatbot_notebook.ipynb
requirements.txt		requirements.txt
streamlit_chatbot.py		streamlit_chatbot.py

Tirthraj1605/CDP-Knowledge-Assistant

Folders and files

Latest commit

History

Repository files navigation

CDP Knowledge Assistant

Overview

Features

Screenshot

Getting Started

Prerequisites

Installation

Required Files

Usage

Project Structure

Technologies Used

Acknowledgments and CDP

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages