To detect the robustness of word embeddings of different transfer learning models and semantic word aligning according to the models.
BERT(Bi-directional encoding representations for transformer) and ELMo(Embeddings for Language Modelling) are two transfer learning models which were pretrained with a huge corpus of text data. We can use the corresponding word embeddings from those models for various tasks by finetuning the ultimate layers. Here we tried to analyze the transfer learning models' embeddings and their role in semantics of the words. We also assign the similar words of second sentence for each word in the first sentence as a list.
- Download the word embeddinlot on the right side reprgs of transfer learning models.(Currently BERT and ELMo)
- Input two different sentences.
- Tokenizing the sentences into words.
- Assigning the corresponding word embeddings of a transfer learning model.
- Calculating cosine similarity between the corresponding word embeddings.
- Plotting the similarity matrix.
-
NumPy
-
Scipy
-
NLTK
-
Seaborn
-
Matplotlib
-
bert-embedding
-
allennlp
- Note: The lighter the cell, the more similar the words are!!
- ELMo is case sensitive. 'A' and 'a' are not equal in ELMo.
- As explained in the paper, the higher level LSTM embeddings of ELMo are highly sensitive to the context. This means that two equal words are not similar, when they are surrounded by different neighbouring words in different sentences.
- These high level features works better with polysemy.
- BERT is not case sensitive. It is not context sensitive either.
- GPT is both case sensitive and context sensitive.
- Analyzing the word embeddings of ULMFit, GPT-2 and XLNet.
- Analyzing the sentence embeddings.