Releases: mideind/Tokenizer
Releases · mideind/Tokenizer
Version 3.4.5
- Compatibility with Python 3.13
- Now requires Python 3.9 or later
Full Changelog: 3.4.4...3.4.5
Version 3.4.4
- Better handling of abbreviations
Full Changelog: 3.4.3...3.4.4
Version 3.4.3
- Various minor fixes.
- Now requires Python 3.8 or later.
Full Changelog: 3.4.2...3.4.3
Version 3.4.2
- Some abbreviations and phrases added
- META_BEGIN token added to help users distinguish between metatokens and regular tokens
Version 3.4.1
- Improved performance on large input chunks
Version 3.4.0
- Improved handling and normalization of punctuation
Version 3.3.3
- Better support for token-level errors
Version 3.3.2
- Internal refactoring
- Fixes in paragraph handling
Version 3.3.0
- Fixed bug where opening quotes following beginning-of-paragraph markers were incorrectly recognized and normalized.
Version 3.2.0
- Numbers and amounts that consist exclusively of alphabetic words (sjö hundruð) are now returned as the original
TOK.WORD
tokens (sjö and hundruð), not coalesced intoTOK.NUMBER
/TOK.AMOUNT
/etc. tokens as before.