Upgrading from static vectors to contextual vectors #12915
Gitclop
started this conversation in
Help: Best practices
Replies: 1 comment 2 replies
-
Hi @Gitclop, if you just want to extract transformer embeddings and copy them into your dataframe, you are pretty close already:
As long as the underlying model has been trained to reflect document similarity in its embeddings, yes. Note that the pretrained transformer pipelines offered by spaCy aren't. For the purpose of comparing documents by their embeddings we recommend |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey, i have build a semantic-similarity pipeline using the following steps:
Load a set of n documents. For each document do the following:
the static vectors have been trained with word2vec, and i want to compare the accuracy of my similarity task between those static vectors and different transformer models.
For the static vectors, i use a trained spacy-model and this code:
data.loc[:, ('Vektoren')] = data['bag_of_words'].map(lambda s: nlp(s).vector)
changing my vectors within the pipeline is probably not as easy as:
so how would i upgrade to transoformer models? Is it usefull to still extract the document-vector or are there better ways of finding the x nearest neighbours/documents?
Beta Was this translation helpful? Give feedback.
All reactions