Java implementation of Neural Machine Translation of Rare Words with Subword Units
fastbpe implements byte pair encoding tokenization in java. The library is supposed to be
- simple,
- fast,
- handle large volumes of text - minimum 1 gb,
- flawless with tests,
- ready for alpha testing soon.
References,