Git Product home page Git Product logo

waifu2x's Introduction

Waifu2x

Re-implementation on the original waifu2x in PyTorch with additional super resolution models.

Dependencies

  • Python 3x
  • PyTorch 0.4

Optinal: Nvidia GPU. Model inference can run in cpu only.

Demos

Examples can be found in the "example" folder, but they may require users to tweak some lines to load image. This project is under development, so it might not be user-friendly.

Models

Models Comparison

I am not able to distinguish the outcome between DCSCN and Upconv, which is the main model in waifu2x. Note, the first model runs 5x slower than the second model.

2x upscale

Images are from Key: サマボケ(Summer Pocket).

models_comparison (Ensembling is NOT used.)

Memory usage

The image is cropped into 48x48 overlapping patches and then merged back to save memory and reduce runtime. memory

Another Example

The image is 2x down scaled by Image.BICUBIC and then up scaled. upscales

Scores

The list will be updated after I add more models.

Images are twitter icons (PNG) from Key: サマボケ(Summer Pocket). They are cropped into non-overlapping 96x96 patches and down-scaled by 2. Then images are re-encoded into JPEG format with quality from [75, 95]. Scores are PSNR and MS-SSIM.

Total Parameters BICUBIC Random*
DCSCN 12 1,889,974 31.5358 (0.9851) 31.1457 (0.9834)
Upconv 7 552,480 31.4566 (0.9788) 30.9492 (0.9772)

*uniformly select down scale methods from Image.BICUBIC, Image.BILINEAR, Image.LANCZOS.

DCSCN

Fast and Accurate Image Super Resolution by Deep CNN with Skip Connection and Network in Network

DCSCN is very interesting as it has relatively quick forward computation, and both the shallow model (layerr 8) and deep model (layer 12) are quick to train. The settings are different from the paper.

  • I use exponential decay to decrease the number of feature filters in each layer. Here is the original filter decay method.

  • I also increase the reconstruction filters from 48 to 128.

  • All activations are replaced by SELU. Dropout and weight decay are not added neither because they significantly increase the training time.

  • The loss function is changed from MSE to L1. According to Loss Functions for Image Restoration with Neural Networks, L1 seems to be more robust and converges faster than MSE. But the authors find the results from L1 and MSE are similar.

I need to thank jiny2001 (one of the paper's author) to test the difference of SELU and PRELU. SELU seems more stable and has fewer parameters to train. It is a good drop in replacement

layers=8, filters=96 and dataset=yang91+bsd200. The details can be found in here.

A pre-trained 12-layer model as well as model parameters are available. The model run time is around 3-5 times of Waifu2x. The output quality is usually visually indistinguishable, but its PSNR and SSIM are bit higher. Though, such comparison is not fair since the 12-layer model has around 1,889,974 parameters, 5 times more than waifu2x's Upconv_7 model.

Waifu2x Original Models

Models can load waifu2x's pre-trained weights. The function forward_checkpoint sets the nn.LeakyReLU to compute data inplace.

Upconv_7

Original waifu2x's model. PyTorch's implementation with cpu only is around 5 times longer for large images. The output images have very close PSNR and SSIM scores compared to images generated from the caffe version , thought they are not identical.

Vgg_7

Not tested yet, but it is ready to use.

ESPCN_7

  • Need more configurations and tests.

Modified from Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. Computations are done on the low resolution images, and the last layer is Pixel Shuffle that scale up the input image.

A selection unit is added in between of convolutional filters. Details on the selection unit can be found in A Deep Convolutional Neural Network with Selection Units for Super-Resolution. But the activation function is changed to SELU. It seems quite powerful.

ESPCN_7 Loss

Image Processing

High resolution PNGs are cropped into 96x96 non-overlapping patches, so some parts of images are dropped. A lesson I learn is that DON'T save processed patches. I save over 2x80 thousands small patches (both high and low resolution), and I fail to open the folder.

High resolution images are loaded all at once in memory and cut into patches. Low resolution patches are also saved in memory. They are then dumped into PyTorch's dataloader and feed into the neural net.

Re-sampling methods are uniformly chosen among [PIL.Image.BILINEAR, PIL.Image.BICUBIC, PIL.Image.LANCZOS] , so different patches in the same image might be down-scaled in different ways.

Image noise are from JPEG format only. They are added by re-encoding PNG images into PIL's JPEG data with various quality. Noise level 1 means quality ranges uniformly from [75, 95]; level 2 means quality ranges uniformly from [50, 75].

TODO:

  • [ESPCN] for real time ?
  • DRRN (planned) (Note: DRRN is not realistic for CPU only usage. A modified version might be used.)
  • and find some interesting paper

waifu2x's People

Contributors

yu45020 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.