Git Product home page Git Product logo

gandalf1819 / denoise-noisy-docs Goto Github PK

View Code? Open in Web Editor NEW
25.0 3.0 11.0 60.09 MB

Removal of stains from noisy docs using image processing, machine learning, neural nets and autoencoder

License: Creative Commons Attribution Share Alike 4.0 International

Jupyter Notebook 98.24% Python 0.49% CSS 0.12% HTML 1.15%
computer-vision autoencoder image-processing median-filter canny-edge-detection adaptive-thresholding machine-learning cnn-keras erosion dilation

denoise-noisy-docs's Introduction

Denoising Noisy Documents

License: CC BY-SA 4.0 Python PRs Welcome

Buy Me A Coffee

Cover-pic

Numerous scientific papers, historical documentaries/artifacts, recipes, books are stored as papers be it handwritten/typewritten. With time, the paper/notes tend to accumulate noise/dirt through fingerprints, weakening of paper fibers, dirt, coffee/tea stains, abrasions, wrinkling, etc. There are several surface cleaning methods used for both preserving and cleaning, but they have certain limits, the major one being: that the original document might get altered during the process. The purpose of this project is to do a comparative study of traditional computer vision techniques vs deep learning networks when denoising dirty documents.

Check out the Medium post for the complete analysis published in Towards Data Science here: https://towardsdatascience.com/denoising-noisy-documents-6807c34730c4

Autoencoder architecture

Autoencoder architecture

The network is composed of 5 convolutional layers to extract meaningful features from images. In the first four convolutions, we use 64 kernels. Each kernel has different weights, perform different convolutions on the input layer, and produce a different feature map. Each output of the convolution, therefore, is composed of 64 channels.

The encoder uses max-pooling for compression. A sliding filter runs over the input image, to construct a smaller image where each pixel is the max of a region represented by the filter in the original image. The decoder uses up-sampling to restore the image to its original dimensions, by simply repeating the rows and columns of the layer input before feeding it to a convolutional layer.

Batch normalization reduces covariance shift, that is the difference in the distribution of the activations between layers, and allows each layer of the model to learn more independently of other layers.

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
image_input (InputLayer)     (None, 420, 540, 1)       0         
_________________________________________________________________
Conv1 (Conv2D)               (None, 420, 540, 32)      320       
_________________________________________________________________
pool1 (MaxPooling2D)         (None, 210, 270, 32)      0         
_________________________________________________________________
Conv2 (Conv2D)               (None, 210, 270, 64)      18496     
_________________________________________________________________
pool2 (MaxPooling2D)         (None, 105, 135, 64)      0         
_________________________________________________________________
Conv3 (Conv2D)               (None, 105, 135, 64)      36928     
_________________________________________________________________
upsample1 (UpSampling2D)     (None, 210, 270, 64)      0         
_________________________________________________________________
Conv4 (Conv2D)               (None, 210, 270, 32)      18464     
_________________________________________________________________
upsample2 (UpSampling2D)     (None, 420, 540, 32)      0         
_________________________________________________________________
Conv5 (Conv2D)               (None, 420, 540, 1)       289       
=================================================================
Total params: 74,497
Trainable params: 74,497
Non-trainable params: 0
_________________________________________________________________

Regression

Along with autoencoder, another machine learning technique we have used is Linear Regression. Instead of modelling the entire image at once, we tried predicting the cleaned-up intensity for each pixel within the image, and constructed a cleaned image by combining together a set of predicted pixel intensities using linear regression. Except at the extremes, there is a linear relationship between the brightness of the dirty images and the cleaned images. There is a broad spread of x values as y approaches 1, and these pixels probably represent stains that need to be removed.

Regression-result

AWS architecture

AWS Architecture

Analysis Approach

  1. Use median filter to get a “background” of the image, with the text being “foreground” (due to the fact that the noise takes more space than the text in large localities). Next subtract this “background” from the original image.
  2. Apply canny edge detection to extract edges. Perform dilation (i.e. make text/lines thicker, and noise/lines thinner) then erosion while preserving thicker lines and removing thinner ones (i.e. noisy edges)
  3. Use adaptive thresholding. (works really well since often text is darker than noise). Thus, preserve pixels that are darkest “locally” and threshold rest to 0 (i.e. foreground)
  4. CNN Autoencoder: The network is composed of 5 convolutional layers to extract meaningful features from images.
  • During convolutions, same padding mode will be used. We pad with zeros around the input matrix, to preserve the same image dimensions after convolution.
  • The encoder uses max-pooling for compression. A sliding filter runs over the input image, to construct a smaller image where each pixel is the max of a region represented by the filter in the original image.
  • The decoder uses up-sampling to restore the image to its original dimensions, by simply repeating the rows and columns of the layer input before feeding it to a convolutional layer.
  • Perform batch-normalization as required. For the output, we use sigmoid activation to predict pixel intensities between 0 and 1.
  1. Compare results from {1, 2, 3, 4} using the following metrics: RMSE, PSNR, SSIM, UQI

Results:

Median Filtering

Clean Dirty Background Final

Adaptive Thresholding

Results Clean Dirty

Canny Edge Detection

After Clean Dilation Dirty Erosion

Autoencoder

Noisy Trained-Clean

Linear Regression

Noisy Trained-Clean

Screens

Login Screen

Login Autoencoder
Login Screen Autoencoder
Median Median-Results
Median Filtering Median Filtering Results

Project Schema

Directory structure for Denoizer repository:

|-- dataset
 |-- test.zip → test image dataset
	|-- train.zip → training image dataset containing noisy dataset
	|-- train_cleaned.zip → cleaned images for respective noisy images in train.zip
|-- frontend
	|-- static → CSS, JS for flask web app 
	|-- templates → HTML pages for flask web app
|-- reports → collection of reports submitted on this project
|-- results → resultant images for each technique 
	|-- adaptive-results
	|-- autoencoder-results
	|-- edge-detection-results
	|-- median-results
	|-- regression-results
|-- screens → screenshot/snippets  for each tab/page on webapp

Data:

Our data is collected from UCI's machine learning repository. The dataset comprises of train and test images of Noisy documents which contain noise from various sources like accidental spills, creases, ink spots and so on.

[1] https://archive.ics.uci.edu/ml/datasets/NoisyOffice
[2] https://www.kaggle.com/sthabile/noisy-and-rotated-scanned-documents

Team

denoise-noisy-docs's People

Contributors

gandalf1819 avatar kart2k15 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

denoise-noisy-docs's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.