This repository has been archived by the owner on Mar 19, 2021. It is now read-only.
Reduce model load time with quantized embeddings
This release contains one large change: the loading of quantized models is speeded up by computing the unknown word embedding as an avarage of the subquantizers, rather than an average of all in-vocab word embeddings.