Skip to content

A RAG based chatbot that is able to respond with structured JSON format. The output will be always in a JSON format with expected key, value pairs.

License

Notifications You must be signed in to change notification settings

Ribin-Baby/RAG-json-responderV1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

41 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RAG Structured Output Generation ChatBot

🌟.❓ Problem Statement:

  1. Large Language Models (LLMs): Utilize quantized versions of LLMs like llama2, open-llama, or falcon that can run on Google Colab.

  2. Fine-tuning/Prompt Engineering:

    • Weather Service:

      • Identify location from user queries (e.g., "weather in Chennai").

      • Respond with "Service: Weather, Location: Chennai".

      • If location is missing, prompt user and respond with "Location: Chennai".

      • Support stateful interaction with multiple query-response exchanges.

    • News Service:

      • Extract location from queries about news (e.g., "news for India").

      • Respond with "Location: India" or "Location: USA" depending on the location found.

      • If location is missing, prompt user and respond with "Service: News, Location: India."

  3. Generalization: Design the system to work with any service and identify the relevant service and location from user prompts.

Specifically:

  • Develop a generic response format "Service: , Location: ".

  • Identify weather-related prompts (e.g., "weather in Bangalore", "climate in Bangalore") and respond with "Service: Weather, Location: Bangalore".

  • Identify news-related prompts (e.g., "news for India") and respond with "Service: News, Location: India".

  • Allow for handling of missing location information with prompts.

Desired Outcome:

  • A system that can dynamically identify services and locations from user prompts and provide appropriate responses.

  • The system should be generalizable to work with any service and location.

  • The system should be lightweight and run efficiently on Google Colab.

🌟. πŸ” Model Used:

  • We are using GPTQ quantized version of openchat_3.5 model, by "TheBloke".

    • GPTQ is a post-training quantization (PTQ) method for 4-bit quantization.
  • The original openchat_3.5 model itself is a fine-tuned Mistral Model.

    • πŸ”₯ The first 7B model Achieves Comparable Results with ChatGPT (March)! πŸ”₯

    • πŸ€– Open-source model on MT-bench scoring 7.81, outperforming 70B models πŸ€–

  • The original openchat_3.5 model can runs on consumer GPU with 24GB RAM, but the quantized version consumes ~6GB VRAM GPU only.

  • This is a 4 bit quantized, 7 Billion Parameter model, with a sequence length of 4096.

🏷️ Model Benchmarks:

Model # Params Average MT-Bench AGIEval BBH MC TruthfulQA MMLU HumanEval BBH CoT GSM8K
OpenChat-3.5 7B 61.6 7.81 47.4 47.6 59.1 64.3 55.5 63.5 77.3
ChatGPT (March)* ? 61.5 7.94 47.1 47.6 57.7 67.3 48.1 70.1 74.9
Mistral 7B - 6.84 38.0 39.0 - 60.1 30.5 - 52.2
Open-source SOTA** 13B-70B 61.4 7.71 41.7 49.7 62.3 63.7 73.2 41.4 82.3
WizardLM 70B Orca 13B Orca 13B Platypus2 70B WizardLM 70B WizardCoder 34B Flan-T5 11B MetaMath 70B

🌟. πŸ’» Requirements

  1. System Requirements
~ 6GB of system RAM

~ 6GB of GPU VRAM

Important

πŸ“Œ A GPU of 6GB VRAM is madatory for running the inference. Will works fine on google colab with T4 GPU enabled.


Fig.1 - Resource usage in google colab notebook environment with T4 GPU
  1. Software Requirements

    • python-version: 3.10

    • CUDA-version: 11.8

⬇️ python packages

pip install accelerate==0.25.0

pip install auto-gptq --extra-index-url "https://huggingface.github.io/autogptq-index/whl/cu118/"

pip install bitsandbytes==0.41.3.post2

pip install einops==0.7.0

pip install langchain==0.0.349

pip install optimum==1.15.0

pip install tiktoken==0.5.2

pip install torch @ "https://download.pytorch.org/whl/cu118/torch-2.1.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=a81b554184492005543ddc32e96469f9369d778dedd195d73bda9bed407d6589"
pip install torchaudio @ "https://download.pytorch.org/whl/cu118/torchaudio-2.1.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=cdfd0a129406155eee595f408cafbb92589652da4090d1d2040f5453d4cae71f"
pip install torchvision @ "https://download.pytorch.org/whl/cu118/torchvision-0.16.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=033712f65d45afe806676c4129dfe601ad1321d9e092df62b15847c02d4061dc"

pip install transformers==4.35.2

🌟. πŸ“– Explained:

  1. πŸ› οΈ Tools used:

  • transformers: "for importing and using LLM model from πŸ€—Huggingface."
  • auto-gptq: "for working with quantized models."
  • langchain: πŸ¦œπŸ”— "for advanced prompting."
  1. πŸ’¬ Conversation-Flow Chart:

Fig.2 - conversation flowchart of BOT interaction
  • This section describes the process of handling user input and generating structured output.

    I] User Input and Prompt:

    • When the bot receives user input, it is combined with a specific prompt.

    • This prompt instructs the LLM to generate a structured output.

    • For example if the user input is : USER: what is the weather in Chennai?

    • πŸ“š Prompt template: OpenChat

       GPT4 Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant:
      
    • The {prompt} along with the user input will be :

      GPT4 Correct User: As an Named Entity Recognition and Intent Classification Expert, your task is to analyze questions like the following '###user_input': 
      ###user_input: what is the weather in Chennai? 
      You are required to perform two main tasks based on the '###user_input': 
      #Task 1: Classify the '###user_input' into one of the predefined intents ['Weather', 'News']. IMPORTANT: If no clear match to given intents is found, categorize the intent as "Other". 
      #Task 2: Extract any geographical or location-related entities present in the '###user_input'. IMPORTANT: If no specific location is mentioned, label the location as "Other". 
      ###INSTRUCTIONS: Do NOT add any clarifying information. Output MUST follow the schema below. The output should be formatted as a JSON instance that conforms to the JSON schema below. 
      As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]} 
      the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted. 
      Here is the output schema: ''' {"properties": {"service_intent": {"title": "Service Intent", "description": "This field stores the intent classified from the user input.", "type": "string"}, "location": {"title": "Location", "description": "This field stores the location value extracted from the user input.", "type": "string"}}, "required": ["service_intent", "location"]} '''
      <|end_of_turn|>GPT4 Correct Assistant:
      • This prompt is generated by combining πŸ¦œπŸ”—langchain along with some custom instructions at the first place.
      • We can dynamically pass the required services here it is ['Weather', 'News'].

      II] LLM Output Generation: πŸ€–

      • The combined user input and prompt are passed to the LLM for processing.
      • The LLM attempts to generate structured output, aiming for JSON format.

      III] Output Validation: πŸ‘€

      • checking for structured output format:

        • If the output is not in JSON format an iterative process begins.
        • The user input is passed again to the LLM with the same prompt.
        • This loop continues for a pre-defined number of iterations (N).
        • This iterative process will help to filter out some rare glitch in LLM output.
      • checking for missing key values:

        • Following the generation of well-structured JSON data by the LLM, an additional validation step is performed. This step focuses on ensuring that the essential keys, "service" and "location," contain valid non-null values.
        • Service Key: The value of the "service" key is checked for null or emptiness.
        • Location Key: Similarly, the value of the "location" key is checked for null or emptiness.
      • πŸ§‘β€βš•οΈ Follow-up Questioning:

        If either key (service or location) lacks a valid value:

        • βš™οΈ Service

          • The bot initiates a follow-up questioning flow specifically designed to elicit the missing service information from the user.
          • This may involve asking the user directly what service they are seeking information about.
        • πŸ“ Location

          • A similar follow-up questioning flow is initiated if the "location" key lacks a valid value.
          • The bot prompts the user for the desired location information.
      • 🚦 Resuming the Process:

        • Once the user provides the missing information, the original process resumes.
        • The combined user input (including service and location information) is again passed to the LLM with the specific prompt for structured output generation.
        • The validation and follow-up questioning steps repeat as needed until all essential key-value pairs are obtained and a valid structured output is generated.
      • πŸ”₯This iterative approach guarantees that the bot generate a complete and accurate JSON response all the time, even if the user initially forgets to provide all necessary information.

      • πŸ”₯We also passing previous one chat history along with the user input in-order to direct the LLM to generate better quality results.

  1. ⚑User interactions: ⚑

    • Direct query:-


    Fig.3 - user directly asking the complete query in the first try itself

  • conversation with follow-up questioning when service information is missing:-

    Fig.4 - user gives query without specifying the service they need

  • conversation with follow-up questioning when location information is missing:-

    Fig.5 - user gives query without specifying the location details

  • conversation with follow-up questioning when both location & service informations are missing:-

    Fig.6 - user starts the conversation with a greeting only

🌟. πŸš€ Hands oN:

CLI

  • clone repo
git clone https://github.com/Ribin-Baby/RAG-json-responderV1.git
  • change directory
cd ./RAG-json-responderV1
  • install requirements
pip install -r requirements.txt
  • πŸ’₯ Run

Note

Need GPU with 6GB VRAM and cuda 11.8 installed. Better run on colab with T4 GPU.

python bot.py --s '["News", "Weather"]'
  • not only ["News", "Weather"] as services we can pass any services we need dynamically to ["Game", "Law"] or ["Economics", "Law", "Weather"] or any.

UI

  • open the bot_notebook.ipynb file in google colab environment and change the runtime to GPU. And run cell-by-cell.
  • It may be required to restart the colab after installing the packages. To do that run this cell below.
import  IPython
IPython.Application.instance().kernel.do_shutdown(True)
  • upon running the last cell, you will get an interactive UI to chat with the BoT.

πŸ‘ The END

About

A RAG based chatbot that is able to respond with structured JSON format. The output will be always in a JSON format with expected key, value pairs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published