Fork of forsakeninfinity’s script to support converting CSJ and NWJC. Check his repo for information.
Goes up to 31,605 frequency
“The Corpus of Spontaneous Japanese” (or CSJ) is a database containing a large collection of Japanese spoken language data and information for use in linguistic research; jointly developed by NINJAL, NICT and the Tokyo Institute of Technology, the CSJ is world-class in both the quantity and quality of the available data.
Has different domains you can download from the CSJ Releases folder.
More information can be found here
Goes up to 106,762 frequency
More information can be found here (in Japanese)