Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
perf: compact lexicon entries to take less RAM
Store the lexicon and joined groups of 16 entries to reduce the str object memory overhead. Experimented with various block sizes to see the memory impact Measured by running `import g2p; g2p.get_arpabet_lang()`: - original: 71MB - blocks of 4: 59MB - blocks of 16: 56MB - blocks of 256: 55MB I decided the 15MB RAM savings were worth it for blocks of 16, but the gain beyond that is trivial and not worth it. In terms of speed the original code and blocks of 16 are the same, at least within the error of measurement, which was running `g2p convert --file en.txt eng eng-ipa` where en.txt is a file containing all the words in the cmudict lexicon: original and 16 both took 20-21 seconds depending on the run. At blocks of 256, I was getting 23 seconds, not a big difference, but measurable for not significant memory gain.
- Loading branch information