Releases: Jyonn/UnifiedTokenizer
Releases · Jyonn/UnifiedTokenizer
3.0.12 Released
Fantastic features for UniTok 3.0!
UniDep Cache (from 2.4.3.2)
UniDep might suffer inefficiency when unioning
other depots. Depot cache will generate samples all at once.
UniDep Export (from 3.0.11)
Easy to export unioned or filtered depot.
More Easy-to-use Vocab
- support
len(vocab)
to get vocab size - support vocab iterating by
for obj in vocab
- support
list(vocab)
to get token list - support
vocab.i2o(index)
to get vocab by index, andvocab.o2i(obj)
to get index by object
Two New Tokenizers
- NumberTok
- SeqTok
Compatible Meta
- support
print(depot)
to get detailed description of depot - support meta upgrading
2.3.1.2 LTS Released
New features for UniTok 2.3.x
Series:
- optimize the
Classify
class which returnsNoneClassify
when the target dict path not exists - provide the pre-handler for tokenizers
- provide
GlobalSetting
for selient mode (only for now)