Skip to content

My attempt for Kaggle's Denoising Dirty Document competition.

Notifications You must be signed in to change notification settings

nhatsmrt/DenoisingDirtyDocuments

Repository files navigation

Denoising Dirty Documents

Author

Nhat Pham (https://github.com/nhatsmrt) & Hoang Phan (https://github.com/petrpan26)

Introduction

This project is based on Kaggle's competition: https://www.kaggle.com/c/denoising-dirty-documents
The challenge is to removed different types of synthetic noises from scanned texts.
NOTE: This project is writen in Tensorflow 1.9.

Approach

Small windows (e.g of size equation) of the scanned texts are passed through an autoencoder-like neural network.
The network has a convolutional encoder with residual connections. For the decoder component, a simple feedforward layer is sufficient. However, a deconvolutional layer is used because it has less parameters, which speeds up training time.
Detailed architecture can be found in code and project report.

Some demo (from competition's test files)

Before:

Before

After:

After

Before:

Before

After:

After

Before:

Before

After:

After

About

My attempt for Kaggle's Denoising Dirty Document competition.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages