Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory consumption of PAV backend #339

Closed
osma opened this issue Oct 24, 2019 · 1 comment
Closed

Reduce memory consumption of PAV backend #339

osma opened this issue Oct 24, 2019 · 1 comment

Comments

@osma
Copy link
Member

osma commented Oct 24, 2019

Currently the PAV backend collects the subject vectors for each document into memory. This takes up a lot of RAM and limits the number of training documents it can handle. Several improvements could be made in this area:

  • limit the precision of subject vectors (float32 instead of float64)
  • use sparse arrays for the subject vectors
  • when collecting the subject vectors training the PAV backend, store them on disk (e.g. in a temporary file or LMDB database) instead of in RAM
@osma
Copy link
Member Author

osma commented Jan 27, 2020

The float32 part was implemented in #340; other ideas have been split up into individual issues. Closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant