[Inference] Add `SentenceTransformers` support to `pipeline` for `feature-extration` #583

philschmid · 2024-04-30T11:29:57Z

What does this PR do?

This PR adds a new, slightly modified FeatureExtractionPipeline from Transformers that allows us to use it with sentence-transformers models. When using the pipeline object from optimum,, the library checks if the requested model for feature-extraction is a sentence-transformers model and if so, it would return the sentence_embeddings instead of the first hidden state.

Thats is done by adding a new is_sentence_transformer_model that checks if the requested model is a transformers or sentence-transformers model. If it is a sentence-transformers model, it uses NeuronModelForSentenceTransformers and the FeatureExtractionPipeline returns _model_outputs.sentence_embedding[0] instead of model_outputs[0]

Example:

from optimum.neuron import pipeline

input_shapes = {"batch_size": 1, "sequence_length": 64} 
p = pipeline("feature-extraction","sentence-transformers/all-MiniLM-L6-v2",export=True, input_shapes=input_shapes)
> # Using Sentence Transformers compatible Feature extraction pipeline

p("test")
> [0.06765521317720413,
> 0.06349243223667145,
> 0.04871273413300514,
> 0.0793028473854065,

Validated with torch.allclose

Implications:

sentence-transformers models will now always return the sentence_embeddings when initialized with the FeatureExtractionPipeline pipeline.

Alternatives options:

Instead of modifying the feature-extraction pipeline, we could also introduce a new task sentence-embeddings to optimum, but that might hinder more general adoption since it is unique to optimum-neuron.

philschmid · 2024-04-30T11:30:28Z

@tomaarsen can you also do a review?

HuggingFaceDocBuilderDev · 2024-04-30T11:33:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/neuron/pipelines/transformers/sentence_transformers.py

dacorvo

Several pipeline tests are failing. It is unclear to me if they should be triggered or not.

tomaarsen · 2024-05-03T07:27:24Z

optimum/neuron/pipelines/transformers/sentence_transformers.py

+
+        return preprocess_params, {}, postprocess_params
+
+    def preprocess(self, inputs, **tokenize_kwargs) -> Dict[str, GenericTensor]:


My primary concern at this time is that the Sentence Transformer tokenizer uses this max_seq_length as the "correct" maximum length as opposed to the value defined in the tokenizer_config.json.

Here, we are relying on the tokenizer defined in the Pipeline, which won't use the max_seq_length. As a result, I think this ST pipeline component will perform differently (worse, to be precise) for longer input texts. A solution is to use model_inputs = self.model.tokenize(inputs) instead.
Do note that the ST tokenize method does not allow for extra tokenize kwargs such as truncation, return_tensors, or padding. These are unfortunately hardcoded at the moment.

Yet another solution is to rely exclusively on self.model.encode(...) in def _forward, but I recognize that this might clash with some requirements of the Pipeline.

@tomaarsen for inferentia models need to be traced to a sequence length before running inference since we have static shapes. You always need to specify a sequence_length and batch_size before you can compile a model which is then used.
This abstracted away by the user in the NeuronModelForSentenceTransformers class.

There is no "sentence-transformers" used at all at the end. since the model is traced and it happens with transformers. We should be good here.

See here:

optimum-neuron/optimum/neuron/modeling.py

Line 201 in 18460aa

class NeuronModelForSentenceTransformers(NeuronBaseModel):

Ahh, I see! Thanks for the heads up. I figured it was more like Intel Gaudi, which does just work with regular Sentence Transformers (as long as the padding is "max_length" to also get static shapes & the device is "hpu").

Then my concern still stands: I think the max_seq_length might not be taken into account correctly.

The sentence_bert_config.json is not taken into consideration during the export, and maybe we should. @tomaarsen where can we usually find the max_seq_length? Is there a specific name / path that it's stored, if so could we add it to config.json?

fyi, max_seq_length is not taken into account at all by the export of Neuron model. Shall we prevent users from setting static seq len higher than this value?

For Sentence Transformer models, max_seq_length should have priority. In 99% of cases, this is stored in the sentence_bert_config.json file in the root of the model directory/repository. You might indeed want to store it in a config.json when exporting for Neuron, or override model_max_length in tokenizer_config.json as that should work in a more "expected" fashion.

dacorvo

Disclaimer: I don't know sentence transformers, so this review is on the general outlook of the code and test, which looks good to me. @JingyaHuang and @tomaarsen reviews will be more relevant.

JingyaHuang

Thanks @philschmid, left some small nits.

optimum/neuron/pipelines/transformers/sentence_transformers.py

JingyaHuang · 2024-05-03T09:16:31Z

optimum/neuron/pipelines/transformers/sentence_transformers.py

+
+        return preprocess_params, {}, postprocess_params
+
+    def preprocess(self, inputs, **tokenize_kwargs) -> Dict[str, GenericTensor]:


The sentence_bert_config.json is not taken into consideration during the export, and maybe we should. @tomaarsen where can we usually find the max_seq_length? Is there a specific name / path that it's stored, if so could we add it to config.json?

JingyaHuang

LGTM, thanks @philschmid for adding it!

And thanks @tomaarsen for raising the concern of max_seq_length, it's on the backlog, will improve it in a coming PR.

v1

da186de

philschmid requested a review from JingyaHuang April 30, 2024 11:30

philschmid changed the title ~~[Infernece] Add SentenceTransformers support to pipeline for feature-extration~~ [Inference] Add SentenceTransformers support to pipeline for feature-extration Apr 30, 2024

JingyaHuang reviewed Apr 30, 2024

View reviewed changes

optimum/neuron/pipelines/transformers/sentence_transformers.py Outdated Show resolved Hide resolved

dacorvo reviewed Apr 30, 2024

View reviewed changes

apply feedback

f489c43

philschmid requested review from dacorvo and JingyaHuang May 3, 2024 05:57

tomaarsen reviewed May 3, 2024

View reviewed changes

dacorvo approved these changes May 3, 2024

View reviewed changes

JingyaHuang reviewed May 3, 2024

View reviewed changes

add feedback

a0fc175

philschmid requested a review from JingyaHuang May 3, 2024 11:22

JingyaHuang added 2 commits May 6, 2024 15:30

fix style

4161dff

Merge branch 'main' into st-pipeline

9bc1d80

JingyaHuang approved these changes May 6, 2024

View reviewed changes

JingyaHuang merged commit 9361b55 into main May 6, 2024
11 of 12 checks passed

JingyaHuang deleted the st-pipeline branch May 6, 2024 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inference] Add `SentenceTransformers` support to `pipeline` for `feature-extration` #583

[Inference] Add `SentenceTransformers` support to `pipeline` for `feature-extration` #583

philschmid commented Apr 30, 2024

philschmid commented Apr 30, 2024

HuggingFaceDocBuilderDev commented Apr 30, 2024

dacorvo left a comment

tomaarsen May 3, 2024

tomaarsen May 3, 2024

philschmid May 3, 2024

philschmid May 3, 2024

philschmid May 3, 2024

tomaarsen May 3, 2024

JingyaHuang May 3, 2024

JingyaHuang May 3, 2024

tomaarsen May 3, 2024 •

edited

Loading

dacorvo left a comment

JingyaHuang left a comment •

edited

Loading

JingyaHuang May 3, 2024

JingyaHuang left a comment


		return preprocess_params, {}, postprocess_params

		def preprocess(self, inputs, **tokenize_kwargs) -> Dict[str, GenericTensor]:

[Inference] Add SentenceTransformers support to pipeline for feature-extration #583

[Inference] Add SentenceTransformers support to pipeline for feature-extration #583

Conversation

philschmid commented Apr 30, 2024

What does this PR do?

philschmid commented Apr 30, 2024

HuggingFaceDocBuilderDev commented Apr 30, 2024

dacorvo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomaarsen May 3, 2024 • edited Loading

Choose a reason for hiding this comment

dacorvo left a comment

Choose a reason for hiding this comment

JingyaHuang left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JingyaHuang left a comment

Choose a reason for hiding this comment

[Inference] Add `SentenceTransformers` support to `pipeline` for `feature-extration` #583

[Inference] Add `SentenceTransformers` support to `pipeline` for `feature-extration` #583

tomaarsen May 3, 2024 •

edited

Loading

JingyaHuang left a comment •

edited

Loading