An implementation of Unicode Text Segmentation (tr29). The splitting is made through a fast DFA.
See nim-graphemes for grapheme cluster segmentation
nimble install segmentation
Nim 0.19, 0.20, +1.0.4
import sequtils
import segmentation
assert toSeq("The (“brown”) fox can’t jump 32.3 feet, right?".words) ==
@["The", " ", "(", "“", "brown", "”", ")", " ", "fox", " ",
"can’t", " ", "jump", " ", "32.3", " ", "feet", ",", " ",
"right", "?"]
nimble test
MIT