Releases · Kuuuube/japanese_text_analyzer · GitHub

10 Nov 23:14

0.1.3 Latest

Latest

Changelog:

Fixed --any allowing the bytes of some characters to be split causing invalid unicode
Optimized chunking of large strings for --any
Optimizaed tokenization speed for --any by removing ascii before tokenization
Added word_list_raw.csv output

Assets 4

10 Nov 01:36

0.1.2

Changelog:

Fix chance of lossy string replacement character pushing string over max tokenizer length

Assets 4

10 Nov 01:18

0.1.1

Changelog:

Added option to filter by extension when using --any
Added option to parse .mokuro files
Fixed --any overflowing sudachi's max byte length
Removed redundant --txt option

Assets 4

09 Nov 23:19

0.1.0

Changelog:

Added options for other analysis formats (--any, --txt)

Assets 4

31 Jul 22:20

0.0.4

Changelog:

Added average page length
Added total page count
Added average volume length
Added total volume count
Added total textbox count

Assets 4

16 Jun 21:16

0.0.3

Changelog:

Add average (and shortest/longest) textbox length to stats
Improve prints to help see what's going on
Show time each part took to complete in ms

Assets 4

16 Jun 16:31

0.0.2

Changelog:

Uses Sudachi Mode B to remove edge case of tokenizing ridiculously long compound words that are impossible to search in most dictionaries
Print usage message and panic when run without args instead of only panicking
Continue and print error on unreadable files instead of panicking
Continue and print error on tokenization fails insteadof panicking

Assets 4

15 Jun 23:27

0.0.1

Initial release

Assets 4