-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request: Improve black and white output quality #169
Comments
You're right that the ScanTailor output is not exactly as precise and smooth as the ABBY output, but overall it seems fairly similar to me. |
I've always wanted to delve deeper into how DjVu does this (see the sixth example here). Meanwhile, when I want to remove the background from a dirty scanned page, I do the following in GIMP:
As this requires previewing to determine the best blur radius, it cannot be applied automatically without risking destroying some content. But a manual adjustment may be compatible with ScanTailor's UI. This works great with these "default" settings for text-only documents with no graphics other than thin lines. Sometimes even a very large radius can still leave some content-related blobs. In these cases, between steps 2 and 3, I select the remaining blobs and repeat the median blurring to reduce them further, which fills them with the surrounding background color, and then I apply a bit of Gaussian blur blurring to soften the edges of the image. selection. But all this manual work may not suit ScanTailor's UI. After removing the background, I apply GIMP White Balance to make the text very readable. |
I'm only using ScanTailor for deskewing and margins as those are the best features this software has to offer, but most of the time black-and-white text output quality is bad and too noisy, it removes too much from the original and creates some parts of a character missing.
Original
This is how an output looks in ABBYY FineReader using just 'Whiten background' feature with no other options. I suspect it just uses grayscale and posterize? Since one page tif export is around the same size (80 KB) as on ST.
Here's ScanTailor output with default settings
Savitzky, Morph. disabled:
ABBYY had the result most similar to the original text.
The text was updated successfully, but these errors were encountered: