Optimize memory footprint of resources #151
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
HashSetGazetteer
,HashMapStemmer
andHashMapWordClusterer
and replace raw string keys by i32 keysHashMapWordClusterer
: try to load word clusters asu16
with a fallback toString
for all values as soon as one value can't be converted tou16
In English, with the current word clusters included in the resources, this results in a constant 25MB gain in memory.
For other languages without word clusters, the expected gain is between 0.5MB and 1MB.
Backward compatibility
The new implementation is backward compatible. Old word clusters, which typically are stored like hierarchical binary paths of the form "10001011001", can still be loaded. In this case, clusters will be loaded as strings.
New word clusters, introduced in snipsco/snips-nlu-language-resources#33, will benefit from this improved implementation, as all clusters are u16-like.