Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce vector memory usage #340

Merged
merged 6 commits into from
Oct 25, 2019
Merged

Reduce vector memory usage #340

merged 6 commits into from
Oct 25, 2019

Conversation

osma
Copy link
Member

@osma osma commented Oct 25, 2019

This PR changes the data type of many NumPy arrays used to encode subject and suggestion vectors. Suggestions are now encoded using float32 (instead of float64) and known subjects are encoded using boolean arrays (instead of int8).

The end result is reduced memory usage and in some cases also faster execution. For example the eval command on a tfidf backend is now noticeably faster (around a third in one test).

Among other improvements this should reduce the memory consumption of the PAV backend during training (#339)

@osma osma added this to the 0.43 milestone Oct 25, 2019
@osma osma marked this pull request as ready for review October 25, 2019 12:02
@osma
Copy link
Member Author

osma commented Oct 25, 2019

Drone builds are failing and BCH is complaining about an extra line in a method, but I'm ignoring those as the important tests are all clear.

@osma osma merged commit e1b27ae into master Oct 25, 2019
@juhoinkinen
Copy link
Member

Drone builds are failing and BCH is complaining about an extra line in a method, but I'm ignoring those as the important tests are all clear.

Drone builds fixed by #341.

@osma osma deleted the reduce-vector-memory-usage branch December 13, 2019 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants