Mistral 7B is a 7.3B parameter model that:
Outperforms Llama 2 13B on all benchmarks Outperforms Llama 1 34B on many benchmarks Approaches CodeLlama 7B performance on code while remaining good at English tasks Uses Grouped-query attention (GQA) for faster inference Uses Sliding Window Attention (SWA) to handle longer sequences at a lower cost
Download it here from Huggingface : https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/tree/main
Mistral 7B on being compared to the Llama 2 family:
An interesting metric to compare how models fare in the cost/performance plane is to compute “equivalent model sizes”. On reasoning, comprehension, and STEM reasoning (MMLU), Mistral 7B performs equivalently to Llama 2 that despite being one-third of its size.
For more details on how to finetune the model according to it's requirements : https://mistral.ai/news/announcing-mistral-7b/ Credit to Mistral.AI for the above facts.
- Creating Python Environment, activating it
- Installing and importing required libraries
- Setting up Github Repository
- Loading the PDF files
- Splitting the text data into text chunks using a TextSplitter
- Downloading the embeddings (I used Huggingface sentence transformers as its opensource) (To know more about text embeddings : https://medium.com/gopenai/text-embeddings-fa6e265312ce)
- Storing the embeddings in a Vector Database (I used FAISS as Chromadb keeps updating it's docs, so this would go obsolete) (To know more about vector databases : https://medium.com/@ariondasad/vector-databases-777606ea437f)
- Import the model and modify it according to requirements
- Using a query retriever, we will generate a prompt for our query
- Finally, creating a streamlit application as a proof of concept for our model.