Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Improve black and white output quality #169

Open
rudolphos opened this issue Mar 13, 2021 · 2 comments
Open

Request: Improve black and white output quality #169

rudolphos opened this issue Mar 13, 2021 · 2 comments

Comments

@rudolphos
Copy link

rudolphos commented Mar 13, 2021

I'm only using ScanTailor for deskewing and margins as those are the best features this software has to offer, but most of the time black-and-white text output quality is bad and too noisy, it removes too much from the original and creates some parts of a character missing.

Original
image

This is how an output looks in ABBYY FineReader using just 'Whiten background' feature with no other options. I suspect it just uses grayscale and posterize? Since one page tif export is around the same size (80 KB) as on ST.
image

Here's ScanTailor output with default settings
image

Savitzky, Morph. disabled:
image

ABBYY had the result most similar to the original text.

@mara004
Copy link

mara004 commented Mar 17, 2021

You're right that the ScanTailor output is not exactly as precise and smooth as the ABBY output, but overall it seems fairly similar to me.

@ftrebien
Copy link

ftrebien commented Apr 22, 2024

I've always wanted to delve deeper into how DjVu does this (see the sixth example here). Meanwhile, when I want to remove the background from a dirty scanned page, I do the following in GIMP:

  1. Open an image and duplicate its single layer
  2. Estimate the background color by applying a median blur filter to the new layer with a percentile above 50% (usually 70%) adjusting the radius to the content (usually 80px at 300dpi or more if there are large illustrations, until the preview has no blobs related to actual content)
  3. Remove the estimated background inverting the colors of the new layer, setting the layer mode to Addition (Legacy) and merging it with the original layer

As this requires previewing to determine the best blur radius, it cannot be applied automatically without risking destroying some content. But a manual adjustment may be compatible with ScanTailor's UI. This works great with these "default" settings for text-only documents with no graphics other than thin lines.

Sometimes even a very large radius can still leave some content-related blobs. In these cases, between steps 2 and 3, I select the remaining blobs and repeat the median blurring to reduce them further, which fills them with the surrounding background color, and then I apply a bit of Gaussian blur blurring to soften the edges of the image. selection. But all this manual work may not suit ScanTailor's UI.

After removing the background, I apply GIMP White Balance to make the text very readable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants