Skip to content

Efficiently use frozen word embeddings outside Keras models

Notifications You must be signed in to change notification settings

harkous/keras-external-embeddings

Repository files navigation

Keras External Embeddings

Problem

Keras supports using pretrained word embeddings in your models. In a lot of cases, it makes sense to freeze the pretrained word embeddings at training time. Keras provides an easy option for that in its Embedding layer, by setting the trainable argument to False (check the FAQ section).

However, by adding the Embedding layer to your model, you will be saving the word embeddings alongside your model. This is fine if you are dealing with a couple of models. In production environments, however, you might have several models, all using frozen pretrained embeddings. In that case, you will be duplicating the embeddings in all models. This results in orders of magnitude increase in storage on disk and in much higher RAM usage.

Solution

It is more efficient to share the embeddings across the models and to perform the mapping from words to vectors only once for all your models. This repository shows how this can be done by building on the same example provided with Keras with GloVe embeddings and the 20 Newsgroup dataset.

The first file pretrained_word_embeddings.py is the original file from Keras. The second file pretrained_external_word_embeddings.py is the one where the embeddings are external to the model. The main changes are in how the data is loaded and in the first layer of the model.

To run it yourself, head to the files and adjust the directories GLOVE_DIR and TEXT_DATA_DIR to your preference, along with other parameters. Then simply run:

python pretrained_external_word_embeddings.py

You can check the difference between the two files with your favorite diff checker to understand the differences between them.

Prerequisites:

Developer

Hamza Harkous

License

MIT

About

Efficiently use frozen word embeddings outside Keras models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages