Use sparse arrays for subject vectors #377

osma · 2020-01-27T09:03:15Z

Currently subject vectors (wrapped by VectorSuggestionResult and ListSuggestionResult objects) are basic, dense NumPy vectors. These take up a lot of RAM, for example a single YSO vector takes about 40,000 subjects * 4 bytes (float32) = 160kB and these add up especially for ensemble backends (pav and nn_ensemble) that need to keep many such vectors in memory. SciPy sparse vectors would likely be much more space-efficient so we should (try to) switch to them.

(related to #339)

osma · 2020-01-27T13:55:42Z

On second thought (and after some not-very-succesful experimentation), it might make sense to stick to dense NumPy arrays within the VectorSuggestionResult and ListSuggestionResult classes, and only use sparse arrays when large numbers of subject vectors need to be aggregated, for example within the PAV and NN ensemble backends and possibly in the evaluation functionality. There are several kinds of sparse arrays and each have their own pros and cons in specific kinds of usage scenarios.

…art of #377)

…part of #377)

osma · 2020-02-04T12:31:05Z

PR #381 switches to sparse vectors within the nn_ensemble backend.

osma · 2020-02-04T12:32:53Z

I think the most important uses for sparse vectors are already covered in #379 and #381. Sparse vectors could potentially be useful for evaluation functionality (saving RAM) but I don't think that's crucial, and might be problematic for performance. If it seems necessary to do so then we can open a new issue. Closing this one.

osma added the enhancement label Jan 27, 2020

osma added this to the Short term milestone Jan 27, 2020

osma added a commit that referenced this issue Jan 28, 2020

Store returned scores as sparse vectors in PAV backend, saving RAM (p…

7008316

…art of #377)

osma added a commit that referenced this issue Jan 28, 2020

Store also true labels as sparse vectors in PAV backend, saving RAM (…

e2dc157

…part of #377)

osma mentioned this issue Jan 28, 2020

Use sparse vectors in PAV backend #379

Merged

osma closed this as completed Feb 4, 2020

osma modified the milestones: Short term, Blue Sky, No further action needed Jul 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use sparse arrays for subject vectors #377

Use sparse arrays for subject vectors #377

osma commented Jan 27, 2020

osma commented Jan 27, 2020

osma commented Feb 4, 2020

osma commented Feb 4, 2020

Use sparse arrays for subject vectors #377

Use sparse arrays for subject vectors #377

Comments

osma commented Jan 27, 2020

osma commented Jan 27, 2020

osma commented Feb 4, 2020

osma commented Feb 4, 2020