support top 10 embedding models on the huggingface leaderboard (#571)

* add supported models Signed-off-by: XuhuiRen <xuhui.ren@intel.com> * add doc Signed-off-by: XuhuiRen <xuhui.ren@intel.com> * polish doc Signed-off-by: XuhuiRen <xuhui.ren@intel.com> --------- Signed-off-by: XuhuiRen <xuhui.ren@intel.com> Signed-off-by: XuhuiRen <44249229+XuhuiRen@users.noreply.github.com>
intel · Oct 29, 2023 · 3b52e7b · 3b52e7b
1 parent 1447e6f
commit 3b52e7b
Show file tree

Hide file tree

Showing 3 changed files with 40 additions and 9 deletions.
diff --git a/intel_extension_for_transformers/neural_chat/pipeline/plugins/retrieval/README.md b/intel_extension_for_transformers/neural_chat/pipeline/plugins/retrieval/README.md
@@ -11,6 +11,20 @@ The Neural Chat API offers an easy way to create and utilize chatbot models whil
 1. Dense Retrieval: This method is based on document embeddings, enhancing the accuracy of retrieval. Learn more about [here](https://medium.com/@aikho/deep-learning-in-information-retrieval-part-ii-dense-retrieval-1f9fecb47de9).
 2. Sparse Retrieval: Using TF-IDF, this method efficiently retrieves relevant information. Explore this approach in detail [here](https://medium.com/itnext/deep-learning-in-information-retrieval-part-i-introduction-and-sparse-retrieval-12de0423a0b9).
 
+We have already provided support for a wide range of pre-trained embedding models featured on the [HuggingFace text embedding leaderboard](https://huggingface.co/spaces/mteb/leaderboard). Users can conveniently choose an embedding model in two ways: they can either specify the model by its name on HuggingFace or download a model and save it under the default name. Below is a list of some supported embedding models available in our plugin. Users can select their preferred embedding model based on various factors such as model size, embedding dimensions, maximum sequence length, and average ranking score.
+|  Model   | Model Size (GB)  |Embedding Dimensions  |Max Sequence Length  |Average Ranking Score  |
+|  :----:  | :----:  | :----:  | :----: |:----: |
+| [bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5)  | 1.34 |1024  |512  |64.23|
+| [bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)  | 0.44 |768  |512  |63.55|
+| [	gte-large](https://huggingface.co/thenlper/gte-large)  | 0.67 |1024  |512  |63.13|
+| [stella-base-en-v2](https://huggingface.co/infgrad/stella-base-en-v2)  | 0.22 |768  |512 |62.61|
+| [gte-base](https://huggingface.co/thenlper/gte-base)  | 0.44 |768  |512  |62.39|
+| [	e5-large-v2](https://huggingface.co/intfloat/e5-large-v2)  | 1.34 |1024  |512  |62.25|
+| [instructor-xl](https://huggingface.co/hkunlp/instructor-xl)  | 4.96 |768  |512  |61.79|
+| [instructor-large](https://huggingface.co/hkunlp/instructor-large)  | 1.34 |768  |512  |61.59|
+
+In addition, our plugin seamlessly integrates the online embedding model, Google Palm2 embedding. To set up this feature, please follow the [Google official guideline](https://developers.generativeai.google/tutorials/embeddings_quickstart) to obtain your API key. Once you have your API key, you can activate the Palm2 embedding service by setting the `embedding_model` parameter to 'Google'.
+
 The workflow of this plugin consists of three main operations: document indexing, intent detection, and retrieval. The `Agent_QA` initializes itself using the provided `input_path` to construct a local database. During a conversation, the user's query is first passed to the `IntentDetector` to determine whether the user intends to engage in chitchat or seek answers to specific questions. If the `IntentDetector` determines that the user's query requires an answer, the retriever is activated to search the database using the user's query. The documents retrieved from the database serve as reference context in the input prompt, assisting in generating responses using the Large Language Models (LLMs). 
 
 # Usage
@@ -51,7 +65,7 @@ process [bool]: Select to process the too long document into small chucks. Defau
 
 input_path [str]: The user local path to a file folder or a specific file path. The code itself will check the path is a folder or a file. If it is a folder, the code will process all the files in the given folder. If it is a file, the code will prcess this single file.
 
-embedding_model [str]: the user specific document embedding model for dense retrieval. The user could selecte a specific embedding model from "https://huggingface.co/spaces/mteb/leaderboard". Default to "hkunlp/instructor-large". 
+embedding_model [str]: the user specific document embedding model for dense retrieval. The user could selecte a specific embedding model from "https://huggingface.co/spaces/mteb/leaderboard". Default to "BAAI/bge-base-en-v1.5". 
 
 max_length [int]: The max context length in the processed chucks. Should be combined with "process". Default to "512".
 

diff --git a/intel_extension_for_transformers/neural_chat/pipeline/plugins/retrieval/indexing/indexing.py b/intel_extension_for_transformers/neural_chat/pipeline/plugins/retrieval/indexing/indexing.py
@@ -20,14 +20,15 @@
 from haystack.document_stores import InMemoryDocumentStore, ElasticsearchDocumentStore
 from langchain.vectorstores.chroma import Chroma
 from langchain.docstore.document import Document
-from langchain.embeddings import HuggingFaceInstructEmbeddings
+from langchain.embeddings import HuggingFaceEmbeddings, HuggingFaceInstructEmbeddings, \
+    HuggingFaceBgeEmbeddings, GooglePalmEmbeddings
 from haystack.schema import Document as SDocument
 from .context_utils import load_unstructured_data, laod_structured_data, get_chuck_data
 
 
 class DocumentIndexing:
     def __init__(self, retrieval_type="dense", document_store=None, persist_dir="./output",
-                 process=True, embedding_model="hkunlp/instructor-large", max_length=512,
+                 process=True, embedding_model="BAAI/bge-base-en-v1.5", max_length=512,
                  index_name=None):
         """
         Wrapper for document indexing. Support dense and sparse indexing method.
@@ -36,10 +37,28 @@ def __init__(self, retrieval_type="dense", document_store=None, persist_dir="./o
         self.document_store = document_store
         self.process = process
         self.persist_dir = persist_dir
-        self.embedding_model = embedding_model
         self.max_length = max_length
         self.index_name = index_name
 
+        try:
+            if "instruct" in embedding_model:
+                self.embeddings = HuggingFaceInstructEmbeddings(model_name=embedding_model)
+            elif "bge" in embedding_model:
+                self.embeddings = HuggingFaceBgeEmbeddings(
+                    model_name=embedding_model,
+                    encode_kwargs={'normalize_embeddings': True},
+                    query_instruction="Represent this sentence for searching relevant passages:")
+            elif "Google" == embedding_model:
+                self.embeddings = GooglePalmEmbeddings()
+            else:
+                self.embeddings = HuggingFaceEmbeddings(
+                    model_name=embedding_model,
+                    encode_kwargs={"normalize_embeddings": True},
+                )
+        except Exception as e:
+            print("Please selet a proper embedding model")
+
+
 
     def parse_document(self, input):
         """
@@ -83,8 +102,7 @@ def batch_parse_document(self, input):
 
     def load(self, input):
         if self.retrieval_type=="dense":
-            embedding = HuggingFaceInstructEmbeddings(model_name=self.embedding_model)
-            vectordb = Chroma(persist_directory=self.persist_dir, embedding_function=embedding)
+            vectordb = Chroma(persist_directory=self.persist_dir, embedding_function=self.embeddings)
         else:
             if self.document_store == "inmemory":
                 vectordb = self.KB_construct(input)
@@ -114,8 +132,7 @@ def KB_construct(self, input):
                     new_doc = Document(page_content=data, metadata=metadata)
                     documents.append(new_doc)
                 assert documents!= [], "The given file/files cannot be loaded." 
-                embedding = HuggingFaceInstructEmbeddings(model_name=self.embedding_model)
-                vectordb = Chroma.from_documents(documents=documents, embedding=embedding,
+                vectordb = Chroma.from_documents(documents=documents, embedding=self.embeddings,
                                                  persist_directory=self.persist_dir)
                 vectordb.persist()
                 print("The local knowledge base has been successfully built!")

diff --git a/intel_extension_for_transformers/neural_chat/pipeline/plugins/retrieval/retrieval_agent.py b/intel_extension_for_transformers/neural_chat/pipeline/plugins/retrieval/retrieval_agent.py
@@ -24,7 +24,7 @@
 
 class Agent_QA():
     def __init__(self, persist_dir="./output", process=True, input_path=None,
-                 embedding_model="hkunlp/instructor-large", max_length=2048, retrieval_type="dense",
+                 embedding_model="BAAI/bge-base-en-v1.5", max_length=2048, retrieval_type="dense",
                  document_store=None, top_k=1, search_type="mmr", search_kwargs={"k": 1, "fetch_k": 5},
                  append=True, index_name="elastic_index_1", append_path=None,
                  response_template = "Please reformat your query to regenerate the answer.",