Reduce memory consumption of PAV backend #339

osma · 2019-10-24T12:33:55Z

Currently the PAV backend collects the subject vectors for each document into memory. This takes up a lot of RAM and limits the number of training documents it can handle. Several improvements could be made in this area:

limit the precision of subject vectors (float32 instead of float64)
use sparse arrays for the subject vectors
when collecting the subject vectors training the PAV backend, store them on disk (e.g. in a temporary file or LMDB database) instead of in RAM

osma · 2020-01-27T09:55:35Z

The float32 part was implemented in #340; other ideas have been split up into individual issues. Closing this one.

osma added the enhancement label Oct 24, 2019

osma added this to the Short term milestone Oct 24, 2019

osma mentioned this issue Oct 25, 2019

Reduce vector memory usage #340

Merged

This was referenced Jan 27, 2020

Use sparse arrays for subject vectors #377

Closed

Use LMDB to store vectors in PAV backend #378

Open

osma closed this as completed Jan 27, 2020

osma modified the milestones: Short term, No further action needed Jul 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory consumption of PAV backend #339

Reduce memory consumption of PAV backend #339

osma commented Oct 24, 2019

osma commented Jan 27, 2020

Reduce memory consumption of PAV backend #339

Reduce memory consumption of PAV backend #339

Comments

osma commented Oct 24, 2019

osma commented Jan 27, 2020