New features:
- Retrain a new tokenization model on a much bigger dataset. F1 score =0.985
- Add training data and training code
- Better integration to spacy.io (removing redundant spaces between tokens after tokenization. Eg. Việt Nam ,
12 / 22 / 2020 => Việt Nam, 12/22/2020]