Skip to content

Releases: Kuuuube/japanese_text_analyzer

0.1.3

10 Nov 23:14
0.1.3
cee3ad3
Compare
Choose a tag to compare

Changelog:

  • Fixed --any allowing the bytes of some characters to be split causing invalid unicode
  • Optimized chunking of large strings for --any
  • Optimizaed tokenization speed for --any by removing ascii before tokenization
  • Added word_list_raw.csv output

0.1.2

10 Nov 01:36
0.1.2
8042a71
Compare
Choose a tag to compare

Changelog:

  • Fix chance of lossy string replacement character pushing string over max tokenizer length

0.1.1

10 Nov 01:18
0.1.1
2aaa5f2
Compare
Choose a tag to compare

Changelog:

  • Added option to filter by extension when using --any
  • Added option to parse .mokuro files
  • Fixed --any overflowing sudachi's max byte length
  • Removed redundant --txt option

0.1.0

09 Nov 23:19
0.1.0
f1db6a5
Compare
Choose a tag to compare

Changelog:

  • Added options for other analysis formats (--any, --txt)

0.0.4

31 Jul 22:20
0.0.4
5d565a7
Compare
Choose a tag to compare

Changelog:

  • Added average page length
  • Added total page count
  • Added average volume length
  • Added total volume count
  • Added total textbox count

0.0.3

16 Jun 21:16
Compare
Choose a tag to compare

Changelog:

  • Add average (and shortest/longest) textbox length to stats
  • Improve prints to help see what's going on
  • Show time each part took to complete in ms

0.0.2

16 Jun 16:31
Compare
Choose a tag to compare

Changelog:

  • Uses Sudachi Mode B to remove edge case of tokenizing ridiculously long compound words that are impossible to search in most dictionaries
  • Print usage message and panic when run without args instead of only panicking
  • Continue and print error on unreadable files instead of panicking
  • Continue and print error on tokenization fails insteadof panicking

0.0.1

15 Jun 23:27
Compare
Choose a tag to compare

Initial release