-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use sparse arrays for subject vectors #377
Comments
On second thought (and after some not-very-succesful experimentation), it might make sense to stick to dense NumPy arrays within the VectorSuggestionResult and ListSuggestionResult classes, and only use sparse arrays when large numbers of subject vectors need to be aggregated, for example within the PAV and NN ensemble backends and possibly in the evaluation functionality. There are several kinds of sparse arrays and each have their own pros and cons in specific kinds of usage scenarios. |
PR #381 switches to sparse vectors within the nn_ensemble backend. |
I think the most important uses for sparse vectors are already covered in #379 and #381. Sparse vectors could potentially be useful for evaluation functionality (saving RAM) but I don't think that's crucial, and might be problematic for performance. If it seems necessary to do so then we can open a new issue. Closing this one. |
Currently subject vectors (wrapped by VectorSuggestionResult and ListSuggestionResult objects) are basic, dense NumPy vectors. These take up a lot of RAM, for example a single YSO vector takes about 40,000 subjects * 4 bytes (float32) = 160kB and these add up especially for ensemble backends (pav and nn_ensemble) that need to keep many such vectors in memory. SciPy sparse vectors would likely be much more space-efficient so we should (try to) switch to them.
(related to #339)
The text was updated successfully, but these errors were encountered: