Skip to content

A tool for reading in manga/comic pages and generating useful JSON files, using Microsoft's Read API.

License

Notifications You must be signed in to change notification settings

Pyreko/KaguyaOCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KaguyaOCR

Note that this tool is likely not getting any more changes and is in a stable state; if I make improvements it will be for a new tool from scratch

A tool for reading in Kaguyamanga pages and generating a resulting OCR JSON file for a chapter, in addition to a master dictionary, using MS's Reader tool from their Cognitive Services. In particular, it uses the Read API, not the OCR or Recognize Text API (see here for the differences).

Note this is probably not the cleanest code or anything, and was kinda cobbled together to just work. Improvements to usability and less jankiness will come in the future (I swear). Furthermore, quality of the OCR is entirely dependant on MS's OCR. This just formats the results into useful forms.

Despite the name and the fact that it was written for https://guya.moe/, this will work fine for any image/manga/comic.

What it does

The main use case is to generate a JSON file containing all words found and their locations for a chapter (I'll refer to this as the "chapter" JSON) and a dictionary containing all words for that series and their respective locations, chapter and page wise (henceforth refered to as the "master" dictionary).

When fed a directory containing image files for a chapter, it'll output a chapter JSON (ie: chapter 1 gives 1.json) and a master dictionary JSON (defaults to master_dictionary.json in the same directory). See below for an example of the JSON format.

To use

To build from source, clone and open the solution, then Publish. If you're using VS Studio, it should be fairly straightforward. If not, you can publish from command line using dotnet:

dotnet publish MangaReader.sln

To just use the latest release, download from Releases. Note that the most up-to-date version will always be from the repo, so cloning from that is probably preferable.

In both cases, you'll need to include a config.json file with the following structure in the same directory as the built program (where MangaReader.dll is located):

{
    "subscriptionKey": "your-microsoft-cognitive-key",
    "personalEndpoint": "your-endpoint"
}

To run the program, you will need .NET Core, as well as an internet connection (or if you're running Cognitive off a Docker container, I suppose not).

Then, open a prompt in the release directory. To see a list of flags:

dotnet MangaReader.dll --help

To OCR a folder (representing a chapter), run the following:

dotnet MangaReader.dll -i "your/input/directory/folder" -c chapter_number -o "optional/output/json/file/path.json" -m "optional/output/master/dictionary/path.json"

Note that your files in the directory must be in order, namewise! That is, page 1 should be the first file, page 2 should be the second file, etc. To be safe, just pad your filenames with 0's if needed (01, 02, 03... 15, for example).

To just add a chapter JSON file to a master dictionary (or to just create a master dictionary from a chapter JSON file):

dotnet MangaReader.dll -i "your/input/json/file.json" -m "optional/output/master/dictionary/path.json"

To add multiple chapter JSON files in a directory to a master dictionary:

dotnet MangaReader.dll -b "chapter/json/directory/path" -m "optional/output/master/dictionary/path"

To add verbosity or logging (logs to diagnostics.log in the program directory), use -v or -l tags respectively.

JSON format

Two JSON files are generated at the end (at most). A chapter JSON file and a master dictionary JSON.

The chapter JSON file has the following structure:

{
  "Chapter": 1.0,
  "Pages": [
    {
      "Words": [
        {
          "BoundingBox": [
            {
              "X": 156,
              "Y": 1196
            },
            {
              "X": 455,
              "Y": 1194
            },
            {
              "X": 456,
              "Y": 1225
            },
            {
              "X": 157,
              "Y": 1227
            }
          ],
          "Text": "BOOK DESIGN IN TSUKANO HIROTAKA"
        }
      ],
      "Page": 1,
      "Height": 1250,
      "Width": 3095
        }
  ],
  "MentionedWordChapterLocation": {
    "CANDIDACY": [
      30
    ],
    "HIS": [
      13,
      16,
      17,
      23,
      25
    ]
  }
}

(note this is a heavily truncated result). The bounding box coordinates go top left, top right, bottom right, bottom left, in that order. Each bounding box represents a phrase or a group of words that MS Reader detected.

The master dictionary has a slightly similar format to the MentionedWordChapterLocation field, but includes a reference to the chapter:

{
    "MentionedWordLocation": {
      "KAGUYA": {
          "1": [
            11,
            12,
            13
          ],
          "10": [
            5,
            11,
            15,
            17,
            20
          ],
          "100": [
            1,
            3,
            4,
            5
          ],
          "101": [
            6,
            7,
            11,
            13,
            15,
            18
          ],
          "101-1": [
            7
          ]
        }
    }
}

where each word maps to a (chapter, page array) dictionary.

Contributions

Well, this was originally a private repo, but if anyone wants to contribute, feel free to.

About

A tool for reading in manga/comic pages and generating useful JSON files, using Microsoft's Read API.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages