Skip to content
This repository has been archived by the owner on Mar 19, 2021. It is now read-only.

Reduce model load time with quantized embeddings

Compare
Choose a tag to compare
@danieldk danieldk released this 09 Oct 13:51
· 67 commits to master since this release

This release contains one large change: the loading of quantized models is speeded up by computing the unknown word embedding as an avarage of the subquantizers, rather than an average of all in-vocab word embeddings.