Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference] Add
SentenceTransformers
support topipeline
forfeature-extration
#583[Inference] Add
SentenceTransformers
support topipeline
forfeature-extration
#583Changes from 2 commits
da186de
f489c43
a0fc175
4161dff
9bc1d80
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My primary concern at this time is that the Sentence Transformer tokenizer uses this
max_seq_length
as the "correct" maximum length as opposed to the value defined in thetokenizer_config.json
.Here, we are relying on the tokenizer defined in the
Pipeline
, which won't use the max_seq_length. As a result, I think this ST pipeline component will perform differently (worse, to be precise) for longer input texts. A solution is to usemodel_inputs = self.model.tokenize(inputs)
instead.Do note that the ST tokenize method does not allow for extra tokenize kwargs such as truncation, return_tensors, or padding. These are unfortunately hardcoded at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet another solution is to rely exclusively on
self.model.encode(...)
indef _forward
, but I recognize that this might clash with some requirements of thePipeline
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomaarsen for inferentia models need to be traced to a sequence length before running inference since we have static shapes. You always need to specify a
sequence_length
andbatch_size
before you can compile a model which is then used.This abstracted away by the user in the NeuronModelForSentenceTransformers class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no "sentence-transformers" used at all at the end. since the model is traced and it happens with transformers. We should be good here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here:
optimum-neuron/optimum/neuron/modeling.py
Line 201 in 18460aa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see! Thanks for the heads up. I figured it was more like Intel Gaudi, which does just work with regular Sentence Transformers (as long as the padding is "max_length" to also get static shapes & the device is "hpu").
Then my concern still stands: I think the
max_seq_length
might not be taken into account correctly.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
sentence_bert_config.json
is not taken into consideration during the export, and maybe we should. @tomaarsen where can we usually find themax_seq_length
? Is there a specific name / path that it's stored, if so could we add it toconfig.json
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi,
max_seq_length
is not taken into account at all by the export of Neuron model. Shall we prevent users from setting static seq len higher than this value?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Sentence Transformer models,
max_seq_length
should have priority. In 99% of cases, this is stored in thesentence_bert_config.json
file in the root of the model directory/repository. You might indeed want to store it in aconfig.json
when exporting for Neuron, or overridemodel_max_length
intokenizer_config.json
as that should work in a more "expected" fashion.