Denoising Dirty Documents

Author

Nhat Pham (https://github.com/nhatsmrt) & Hoang Phan (https://github.com/petrpan26)

Introduction

This project is based on Kaggle's competition: https://www.kaggle.com/c/denoising-dirty-documents
The challenge is to removed different types of synthetic noises from scanned texts.
NOTE: This project is writen in Tensorflow 1.9.

Approach

Small windows (e.g of size ) of the scanned texts are passed through an autoencoder-like neural network.
The network has a convolutional encoder with residual connections. For the decoder component, a simple feedforward layer is sufficient. However, a deconvolutional layer is used because it has less parameters, which speeds up training time.
Detailed architecture can be found in code and project report.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Predictions		Predictions
Source		Source
Denoising_Dirty_Documents.pdf		Denoising_Dirty_Documents.pdf
README.md		README.md
main.py		main.py
main_bm3d.py		main_bm3d.py
main_sliding.py		main_sliding.py
main_smaller_images.py		main_smaller_images.py
post_processing.py		post_processing.py
test_slicing.py		test_slicing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Denoising Dirty Documents

Author

Introduction

Approach

Some demo (from competition's test files)

Before:

After:

Before:

After:

Before:

After:

About

Releases

Packages

Contributors 2

Languages

nhatsmrt/DenoisingDirtyDocuments

Folders and files

Latest commit

History

Repository files navigation

Denoising Dirty Documents

Author

Introduction

Approach

Some demo (from competition's test files)

Before:

After:

Before:

After:

Before:

After:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages