Skip to content

v3.3.0 - Massive CPU speedup with OpenVINO int8 quantization; Training with Prompts for stronger models; NanoBEIR IR evaluation; PEFT compatibility; Transformers v4.46.0 compatibility

Compare
Choose a tag to compare
@tomaarsen tomaarsen released this 11 Nov 12:02
· 30 commits to master since this release

4x speedup for CPU with OpenVINO int8 static quantization, training with prompts for a free performance boost, convenient evaluation on NanoBEIR: a subset of a strong Information Retrieval benchmark, PEFT compatibility by easily adding/loading adapters, Transformers v4.46.0 compatibility, and Python 3.8 deprecation.

Install this version with:

# Training + Inference
pip install sentence-transformers[train]==3.3.0

# Inference only, use one of:
pip install sentence-transformers==3.3.0
pip install sentence-transformers[onnx-gpu]==3.3.0
pip install sentence-transformers[onnx]==3.3.0
pip install sentence-transformers[openvino]==3.3.0

OpenVINO int8 static quantization (#3025)

We introduce int8 static quantization using OpenVINO, a highly performant solution that outperforms all other current backends by a mile, at a minimal loss in performance. Here are the updated benchmarks:

Quantizing directly to the Hugging Face Hub

from sentence_transformers import SentenceTransformer, export_static_quantized_openvino_model

# 1. Load a model with the OpenVINO backend
model = SentenceTransformer("all-MiniLM-L6-v2", backend="openvino")

# 2. Quantize the model to int8, push the model to https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
# as a pull request:
export_static_quantized_openvino_model(
    model,
    quantization_config=None,
    model_name_or_path="sentence-transformers/all-MiniLM-L6-v2",
    push_to_hub=True,
    create_pr=True,
)

You can immediately use the model, even before it's merged, by using the revision argument:

from sentence_transformers import SentenceTransformer

pull_request_nr = 2 # TODO: Update this to the number of your pull request
model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    backend="openvino",
    model_kwargs={"file_name": "openvino_model_qint8_quantized.xml"},
    revision=f"refs/pr/{pull_request_nr}"
)

And once it's merged:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    backend="openvino",
    model_kwargs={"file_name": "openvino/openvino_model_qint8_quantized.xml"},
)

Quantizing locally

You can also quantize a model and save it locally:

from sentence_transformers import SentenceTransformer, export_static_quantized_openvino_model
from optimum.intel import OVQuantizationConfig

model = SentenceTransformer("all-mpnet-base-v2", backend="openvino")
model.save_pretrained("path/to/all-mpnet-base-v2-local")
quantization_config = OVQuantizationConfig() # <- You can update settings here
export_static_quantized_openvino_model(model, quantization_config, "path/to/all-mpnet-base-v2-local")

And after quantizing, you can load it like so:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "path/to/all-mpnet-base-v2-local",
    backend="openvino",
    model_kwargs={"file_name": "openvino_model_qint8_quantized.xml"},
)

All original Sentence Transformer models already have these new openvino_model_qint8_quantized.xml files, so you can load them without exporting directly! I would recommend making pull requests for other models on Hugging Face that you'd like to see quantized.

Learn more about how to Speed up Inference in the documentation: https://sbert.net/docs/sentence_transformer/usage/efficiency.html

Training with Prompts (#2964)

Many modern embedding models are trained with β€œinstructions” or β€œprompts” following the INSTRUCTOR paper. These prompts are strings, prefixed to each text to be embedded, allowing the model to distinguish between different types of text.

For example, the mixedbread-ai/mxbai-embed-large-v1 model was trained with Represent this sentence for searching relevant passages: as the prompt for all queries. This prompt is stored in the model configuration under the prompt name "query", so users can specify that prompt_name in model.encode:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
query_embedding = model.encode("What are Pandas?", prompt_name="query")
# or
# query_embedding = model.encode("What are Pandas?", prompt="Represent this sentence for searching relevant passages: ")
document_embeddings = model.encode([
    "Pandas is a software library written for the Python programming language for data manipulation and analysis.",
    "Pandas are a species of bear native to South Central China. They are also known as the giant panda or simply panda.",
    "Koala bears are not actually bears, they are marsupials native to Australia.",
])
similarity = model.similarity(query_embedding, document_embeddings)
print(similarity)
# => tensor([[0.7594, 0.7560, 0.4674]])

Various papers (INSTRUCTOR, BGE) show that including prompts or instructions both during training and inference results in stronger performance. As of this release, it's now possible to easily train with prompts in Sentence Transformers with just one extra training argument: prompts. There are 4 accepted formats for it:

  1. str: A single prompt to use for all columns in all datasets. For example:
    args = SentenceTransformerTrainingArguments(
        ...,
        prompts="text: ",
        ...,
    )
  2. Dict[str, str]: A dictionary mapping column names to prompts, applied to all datasets. For example:
    args = SentenceTransformerTrainingArguments(
        ...,
        prompts={
            "query": "query: ",
            "answer": "document: ",
        },
        ...,
    )
  3. Dict[str, str]: A dictionary mapping dataset names to prompts. This should only be used if your training/evaluation/test datasets are a DatasetDict or a dictionary of Dataset. For example:
    args = SentenceTransformerTrainingArguments(
        ...,
        prompts={
            "stsb": "Represent this text for semantic similarity search: ",
            "nq": "Represent this text for retrieval: ",
        },
        ...,
    )
  4. Dict[str, Dict[str, str]]: A dictionary mapping dataset names to dictionaries mapping column names to prompts. This should only be used if your training/evaluation/test datasets are a DatasetDict or a dictionary of Dataset. For example:
    args = SentenceTransformerTrainingArguments(
        ...,
        prompts={
            "stsb": {
                "sentence1": "sts: ",
                "sentence2": "sts: ",
            },
            "nq": {
                "query": "query: ",
                "document": "document: ",
            },
        },
        ...,
    )

I've trained models with and without prompts for 2 base models: mpnet-base and bert-base-uncased:

For both base models, the model with prompts consistently outperformed the baseline model. After training, the models with prompts resulted in a 0.66% and 0.90% relative improvement on NDCG@10 at no extra cost.

mpnet-base tests bert-base-uncased tests
mpnet_base_nq_nanobeir bert_base_nq_nanobeir

NanoBEIR Evaluator integration (#2966)

This update introduced a new simple NanoBEIREvaluator, evaluating your model against NanoBEIR: a collection of subsets of the 13 BEIR datasets. BEIR corresponds to the retrieval tab of MTEB, and is commonly seen as a valuable indicator of general-purpose information retrieval performance.

With the NanoBEIREvaluator, you can easily evaluate your models on a much faster benchmark that should give similar insights in performance as BEIR. You can use it like so:

from sentence_transformers.evaluation import NanoBEIREvaluator
from sentence_transformers import SentenceTransformer
import logging

# Optional, but nice to get human-readable results in the terminal
logging.basicConfig(
    format="%(asctime)s - %(message)s", datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO
)

# 1. Load a model
model = SentenceTransformer("all-mpnet-base-v2", backend="onnx")

# 2. Initialize the evaluator
evaluator = NanoBEIREvaluator()

# 3. Call the evaluator to get a dictionary of metric names to values
results = evaluator(model)
"""
NanoBEIR Evaluation of the model on ['climatefever', 'dbpedia', 'fever', 'fiqa2018', 'hotpotqa', 'msmarco', 'nfcorpus', 'nq', 'quoraretrieval', 'scidocs', 'arguana', 'scifact', 'touche2020'] dataset:
Evaluating NanoClimateFEVER
Information Retrieval Evaluation of the model on the NanoClimateFEVER dataset:
Queries: 50
Corpus: 3408

Score-Function: cosine
Accuracy@1: 24.00%
Accuracy@3: 36.00%
Accuracy@5: 44.00%
Accuracy@10: 66.00%
Precision@1: 24.00%
Precision@3: 14.00%
Precision@5: 10.40%
Precision@10: 9.00%
Recall@1: 9.50%
Recall@3: 17.33%
Recall@5: 22.90%
Recall@10: 36.07%
MRR@10: 0.3311
NDCG@10: 0.2618
MAP@100: 0.1982
Evaluating NanoDBPedia
Information Retrieval Evaluation of the model on the NanoDBPedia dataset:
Queries: 50
Corpus: 6045

Score-Function: cosine
Accuracy@1: 66.00%
Accuracy@3: 88.00%
Accuracy@5: 88.00%
Accuracy@10: 88.00%
Precision@1: 66.00%
Precision@3: 58.00%
Precision@5: 52.00%
Precision@10: 43.60%
Recall@1: 6.87%
Recall@3: 14.70%
Recall@5: 20.30%
Recall@10: 27.62%
MRR@10: 0.7533
NDCG@10: 0.5384
MAP@100: 0.3796
Evaluating NanoFEVER
Information Retrieval Evaluation of the model on the NanoFEVER dataset:
Queries: 50
Corpus: 4996

... (truncated for brevity)

Aggregated for Score Function: cosine
Accuracy@1: 52.87%
Accuracy@3: 71.35%
Accuracy@5: 78.45%
Accuracy@10: 85.07%
Precision@1: 52.87%
Recall@1: 30.28%
Precision@3: 33.78%
Recall@3: 47.93%
Precision@5: 26.23%
Recall@5: 55.04%
Precision@10: 18.07%
Recall@10: 62.54%
MRR@10: 0.6334
NDCG@10: 0.5758
"""

# 4. Print the results
print(evaluator.primary_metric)
# => "NanoBEIR_mean_cosine_ndcg@10"
print(results[evaluator.primary_metric])
# => 0.5758124378869705
Advanced Usage

You can also specify a subset of datasets, and you can specify query and/or corpus prompts, if your model uses them. For example:

import logging
from sentence_transformers import SentenceTransformer
from sentence_transformers.evaluation import NanoBEIREvaluator

# Optional, but nice to get human-readable results in the terminal
logging.basicConfig(
    format="%(asctime)s - %(message)s", datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO
)

model = SentenceTransformer('intfloat/multilingual-e5-large-instruct')

datasets = ["QuoraRetrieval", "MSMARCO"]
query_prompts = {
    "QuoraRetrieval": "Instruct: Given a question, retrieve questions that are semantically equivalent to the given question\\nQuery: ",
    "MSMARCO": "Instruct: Given a web search query, retrieve relevant passages that answer the query\\nQuery: "
}

evaluator = NanoBEIREvaluator(
    dataset_names=datasets,
    query_prompts=query_prompts,
)

results = evaluator(model)
'''
NanoBEIR Evaluation of the model on ['QuoraRetrieval', 'MSMARCO'] dataset:
Evaluating NanoQuoraRetrieval
Information Retrieval Evaluation of the model on the NanoQuoraRetrieval dataset:
Queries: 50
Corpus: 5046

Score-Function: cosine
Accuracy@1: 92.00%
Accuracy@3: 98.00%
Accuracy@5: 100.00%
Accuracy@10: 100.00%
Precision@1: 92.00%
Precision@3: 40.67%
Precision@5: 26.00%
Precision@10: 14.00%
Recall@1: 81.73%
Recall@3: 94.20%
Recall@5: 97.93%
Recall@10: 100.00%
MRR@10: 0.9540
NDCG@10: 0.9597
MAP@100: 0.9395

Evaluating NanoMSMARCO
Information Retrieval Evaluation of the model on the NanoMSMARCO dataset:
Queries: 50
Corpus: 5043

Score-Function: cosine
Accuracy@1: 40.00%
Accuracy@3: 74.00%
Accuracy@5: 78.00%
Accuracy@10: 88.00%
Precision@1: 40.00%
Precision@3: 24.67%
Precision@5: 15.60%
Precision@10: 8.80%
Recall@1: 40.00%
Recall@3: 74.00%
Recall@5: 78.00%
Recall@10: 88.00%
MRR@10: 0.5849
NDCG@10: 0.6572
MAP@100: 0.5892
Average Queries: 50.0
Average Corpus: 5044.5

Aggregated for Score Function: cosine
Accuracy@1: 66.00%
Accuracy@3: 86.00%
Accuracy@5: 89.00%
Accuracy@10: 94.00%
Precision@1: 66.00%
Recall@1: 60.87%
Precision@3: 32.67%
Recall@3: 84.10%
Precision@5: 20.80%
Recall@5: 87.97%
Precision@10: 11.40%
Recall@10: 94.00%
MRR@10: 0.7694
NDCG@10: 0.8085
'''
print(evaluator.primary_metric)
# => "NanoBEIR_mean_cosine_ndcg@10"
print(results[evaluator.primary_metric])
# => 0.8084508771660436

PEFT compatibility (#3000, #2980, #3046)

Sentence Transformers has been integrated much more closely with PEFT. Notably, we introduce new methods:

These methods allow you to add new PEFT adapters or load pretrained ones, for example:

Adding a adapter

from sentence_transformers import SentenceTransformer

# 1. Load a model to finetune with 2. (Optional) model card data
model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    model_card_data=SentenceTransformerModelCardData(
        language="en",
        license="apache-2.0",
        model_name="all-MiniLM-L6-v2 adapter finetuned on GooAQ pairs",
    ),
)

# 2. Create a LoRA adapter for the model & add it
peft_config = LoraConfig(
    task_type=TaskType.FEATURE_EXTRACTION,
    inference_mode=False,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
)
model.add_adapter(peft_config)

# Proceed as usual... See https://sbert.net/docs/sentence_transformer/training_overview.html

Loading a pretrained adapter

Given sentence-transformers-testing/stsb-bert-tiny-lora as a small adapter model (the adapter_model.safetensors file is only 33.8kB!) on top of sentence-transformers-testing/stsb-bert-tiny-safetensors, you can either load this adapter directly:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers-testing/stsb-bert-tiny-lora")
embeddings = model.encode(["This is an example sentence", "Each sentence is converted"])
print(embeddings.shape)
# (2, 128)

Or you can load the original model and load the adapter into it:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers-testing/stsb-bert-tiny-safetensors")
model.load_adapter("sentence-transformers-testing/stsb-bert-tiny-lora")
embeddings = model.encode(["This is an example sentence", "Each sentence is converted"])
print(embeddings.shape)
# (2, 128)

Transformers v4.46.0 compatibility (#3026, #3035, #3037, #3038)

The recent transformers v4.46.0 update introduced a few changes that were incompatible with Sentence Transformers. For example:

  • Use "processing_class" argument instead of "tokenizers"
  • Add a num_items_in_batch argument to the compute_loss method in the Trainer
  • Adding a ValueError if eval_dataset is None while eval_strategy is not "no" (this should be possible in Sentence Transformers, as we accept evaluating with just an evaluator as well)

These issues and deprecation warnings have been resolved.

Drop Python 3.8 support (#3033)

Given that Python 3.8 has now reached it's end of life, Sentence Transformers will no longer support it.

All Changes

  • [peft] If AutoModel is wrapped with PEFT for prompt learning, then extend the attention mask by @tomaarsen in #3000
  • [integration] Add support for Transformers v4.46.0 by @tomaarsen in #3026
  • add an ImportError to tell the user that datasets must be install to fit a model by @h4c5 in #3020
  • [feat] Integrate NanoBeIR datasets; use model.similarity by default in evaluators by @ArthurCamara in #2966
  • Fix model name typo in example by @programmer-ke in #3028
  • Support OpenVINO int8 static quantization by @l-bat in #3025
  • [fix] Avoid passing eval_dataset=None to transformers due to >=v4.46.0 crash by @tomaarsen in #3035
  • [docs] Update the dated example in the NanoBEIREvaluator by @tomaarsen in #3034
  • [deprecate] Drop Python 3.8 support due to EOL by @tomaarsen in #3033
  • [tests] Remove evaluation_steps from model.fit test without evaluator by @tomaarsen in #3037
  • [fix] Fix loading pre-exported OV/ONNX model if export=False by @tomaarsen in #3036
  • [chore] If Transformers 4.46.0, use processing_class instead of tokenizer when saving by @tomaarsen in #3038
  • [docs] Add some missing docs for include_prompt in Pooling by @tomaarsen in #3042
  • [feat] Trainer with prompts and prompt masking by @ArthurCamara in #2964
  • [fix] Fix model loading inconsistency after Peft training by using PeftModel by @pesuchin in #2980
  • [enh] Add Support for multiple adapters on Transformers-based models by @carlesonielfa in #3046 & #2993
  • Moved Model Card Callback init in Trainer to a separate function by @tRosenflanz in #3047

New Contributors

Special Thanks

Big thanks to @ArthurCamara for leading the work on both 1) training with prompts and 2) NanoBEIR.

Full Changelog: v3.2.1...v3.3.0