Git Product home page Git Product logo

wct2's Introduction

 

 

WCT2 (ICCV 2019 accepted)

Photorealistic Style Transfer via Wavelet Transforms | paper | supplementary materials | video stylization results

Jaejun Yoo*, Youngjung Uh*, Sanghyuk Chun*, Byeonkyu Kang, Jung-Woo Ha

Clova AI Research, NAVER (* equal contributions)

PyTorch implementation for photorealistic style transfer that does not need any further post-processing steps; e.g. from day to sunset, from summer to winter, etc. This is the first end-to-end model that can stylize 1024×1024 resolution image in 4.7 seconds, giving a pleasing and photorealistic quality without any post-processing.

The code was written by Jaejun Yoo and Byeongkyu Kang.

Getting Started

Dependency

  • PyTorch >= 0.4.1
  • Check the requirements.txt
pip install -r requirements.txt

Installation

  • Clone this repo:
git clone https://github.com/clovaai/WCT2.git
cd WCT2
  • Pretrained models can be found in the ./model_checkpoints
  • Prepare image dataset
    • Images can be found in DPST repo
      • You can find the entire content and style images (with paired segmentation label maps) in the following link DPST images. input folder has the content images and the style folder has the style images. Every segmention map can be found in the segmentation folder.
    • To make a new dataset with label pairs, please follow the instruction of PhotoWCT repo
    • Put the content and style images with their segment label pairs (if available) into the example folder accordingly.
      • Currently there are several example images so that you can execute the code as soon as you clone this repo.
  • Finally, test the model:
python transfer.py --option_unpool cat5 -a --content ./examples/content --style ./examples/style --content_segment ./examples/content_segment --style_segment ./examples/style_segment/ --output ./outputs/ --verbose --image_size 512 

The test results will be saved to ./outputs by default.

Arguments

  • --content: FOLDER-PATH-TO-CONTENT-IMAGES
  • --content_segment: FOLDER-PATH-TO-CONTENT-SEGMENT-LABEL-IMAGES
  • --style: FOLDER-PATH-TO-STYLE-IMAGES
  • --style_segment: FOLDER-PATH-TO-STYLE-SEGMENT-LABEL-IMAGES
  • --output: FOLDER-PATH-TO-OUTPUT-IMAGES
  • --image_size: output image size
  • --alpha: alpha determines the blending ratio between content and stylized features
  • --option_unpool: two versions of our model (sum, cat5)
  • -e, --transfer_at_encoder: stylize at the encoder module
  • -d, --transfer_at_decoder: stylize at the decoder module
  • -s, --transfer_at_skip: stylize at the skipped high frequency components
  • -a, --transfer_all: stylize and save for every composition; i.e. power set of {-e,-d,-s})
  • --cpu: run on CPU
  • --verbose

Photorealistic Style Transfer

  • DPST: "Deep Photo Style Transfer" | Paper | Code
  • PhotoWCT: "A Closed-form Solution to Photorealistic Image Stylization" | Paper | Code
  • PhotoWCT (full): PhotoWCT + post processing

Schematic illustration of our wavelet module

Component-wise Stylization

  • Only for option_unpool = sum version
  • Full stylization
python transfer.py --option_unpool sum -e -s --content ./examples/content --style ./examples/style --content_segment ./examples/content_segment --style_segment ./examples/style_segment/ --output ./outputs/ --verbose --image_size 512
  • Low-frequency-only stylization
python transfer.py --option_unpool sum -e --content ./examples/content --style ./examples/style --content_segment ./examples/content_segment --style_segment ./examples/style_segment/ --output ./outputs/ --verbose --image_size 512

Results

  • option_unpool = cat5 version
python transfer.py --option_unpool cat5 -a --content ./examples/content --style ./examples/style --content_segment ./examples/content_segment --style_segment ./examples/style_segment/ --output ./outputs/ --verbose --image_size 512

Acknowledgement

  • Our implementation is highly inspired from NVIDIA's PhotoWCT Code.

Citation

If you find this work useful for your research, please cite:

@inproceedings{yoo2019photorealistic,  
  title={Photorealistic Style Transfer via Wavelet Transforms},
  author={Yoo, Jaejun and Uh, Youngjung and Chun, Sanghyuk and Kang, Byeongkyu and Ha, Jung-Woo},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year={2019}
}

Contact

Feel free to contact me if there is any question (Jaejun Yoo [email protected]).

License

Copyright (c) 2019 NAVER Corp.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

wct2's People

Contributors

dependabot[bot] avatar jaejun-yoo avatar kiranscaria avatar stefanhige avatar youngjung avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wct2's Issues

Without Seg

Hi,

Thank you for this nice work! Is there a way that can run the code without providing the segmentation masks? Thanks!

Cheers,

Fixed VGG encoder weights query

In your paper you mention that 'We use the encoder-decoder architecture with fixed VGG
encoder weights
' and you only train the decoder for COCO dataset. However, since you have changed the max pooling/unpooling layers from the original VGG-19 architecture to the wavelet pooling/unpooling layers, you need to train the entire encoder-decoder VGG-19 architecture right? Since these layers are changed, the VGG-19 weights will also get affected as a result of which we cannot use the original ImageNet weights. Can you please provide some clarification on this.

trying to port wct transfer to numpy.

torch and numpy svd are different

u, e, v = torch.svd(conv, some=False)
ut, et, vt = [ x.data.cpu().numpy() for x in (u,e,v) ]
un, en, vn = np.linalg.svd ( conv.data.cpu().numpy() )
print ("diff=", np.sum(np.abs(ut-un)))
print ("diff=", np.sum(np.abs(et-en)))
print ("diff=", np.sum(np.abs(vt-vn)))

result:

diff= 10380.845
diff= 0.007629454
diff= 12652.448

scipy.linalg.svd - same wrong result

diff with tensorflow:

import tensorflow as tf
    tf_sess = tf.Session(config=tf.ConfigProto(device_count={'GPU': 0}) )
    tf_inp = tf.placeholder(tf.float32, shape=conv.shape )

    tfe, tfu, tfv = tf.linalg.svd( tf_inp, full_matrices=True ) #conv.data.cpu().numpy() )

    tfe, tfu, tfv =  tf_sess.run ([tfe, tfu, tfv], feed_dict={tf_inp: conv.data.cpu().numpy() } )
    print ("diff=", np.sum(np.abs(ut-tfu)))
    print ("diff=", np.sum(np.abs(et-tfe)))
    print ("diff=", np.sum(np.abs(vt-tfv)))
diff= 10259.318
diff= 0.03329885
diff= 10259.33

so is your net trained with torch.svd , and there is no way to use your method in other frameworks?

found similar issue pytorch/pytorch#16076
but it is closed

How to generate video stylisation results?

Is there a recommended way to perform video stylisation? For now, Im converting the original video file and style video file into frames, performing photorealistic style transfer and then making a video out of the output frames. However there seems to be problem while encoding the frames to video leading to blurry video output. Any suggestions are welcome!

QS about the depth map of transfered images

Hi @jaejun-yoo , thank you for your great work!

I have some original images and their corresponding depth maps. I wanna ask that after getting the style-transfered images, could the original depth maps still match them? Namely, will the style-transfer process induces a minor depth offset?

Big thanks!

How does it work for an indoor dataset?

I have an indoor dataset where the goal is to take the texture of one furniture and overlay on a similar furniture in another room setup. When I directly run this code, the color gets transferred but not the exact texture. Is there any paramater I should tweak to get the desired output? Can you shed some light on this?

Question about the loss functions

Hi, thank you for your great work. I have a question about the loss function in the training phase.
In session 5.1, you have mentioned:

minimizing the L2 reconstruction loss and the additional feature Gram matching loss with the encoder.

So you input to network a single image x (from MS-COCO) and calculate the loss with the output is image x' as

Loss = recontruction_loss + gram_matching_loss
with:
+ recontruction_loss = L2(x - x')
+ gram_matching_loss = gram_matrix(encoder(x)) - gram_matrix(encoder(x'))

Do I miss something? Thank you.

Can I train with my data?

Hello. Thank you for sharing your good research.
Can I try to train with my data? If I can, please explain the command and method .
Thank you in advance.

How to add temporal consistency

Hi , I want to transfer color for the video data. I was wondering how to add temporal consistency constraint such as optical flow on the model.

Details about training decoder

Did you train the decoder with cat5 settings, using only COCO dataset? And caculated feature Gram loss between reconstructed images and input images?

I'm suprised that the training process involve no style images, only COCO dataset as content images. And the Gram loss is not between stylized images and style images. Still, the network works well for style transfer

Error with installing requirements

On google colab it's failing to install when I do pip3 install -r requirements.txt

Could not find a version that satisfies the requirement torch==0.4.1.post2 (from -r requirements.txt (line 1)) 
(from versions: 0.1.2, 0.1.2.post1, 0.3.1, 0.4.0, 0.4.1, 1.0.0, 1.0.1, 1.0.1.post2)
No matching distribution found for torch==0.4.1.post2 (from -r requirements.txt (line 1))

RuntimeError: got 4 channels instead

Hi!

I followed your instruction step by step, and after typing

!python3 transfer.py --option_unpool cat5 -a --content ./examples/content --style ./examples/style --content_segment ./examples/content_segment --style_segment ./examples/style_segment/ --output ./outputs/ --verbose --image_size 512

I got

Namespace(alpha=1, content='./examples/content', content_segment='./examples/content_segment', cpu=False, image_size=512, option_unpool='cat5', output='./outputs/', style='./examples/style', style_segment='./examples/style_segment/', transfer_all=True, transfer_at_decoder=False, transfer_at_encoder=False, transfer_at_skip=False, verbose=True)
0% 0/1 [00:00<?, ?it/s]------ transfer: 1.png
Elapsed time in whole WCT: 0:00:02.926612

Traceback (most recent call last):
File "transfer.py", line 205, in
run_bulk(config)
File "transfer.py", line 175, in run_bulk
img = wct2.transfer(content, style, content_segment, style_segment, alpha=config.alpha)
File "transfer.py", line 79, in transfer
style_feats, style_skips = self.get_all_feature(style)
File "transfer.py", line 64, in get_all_feature
x = self.encode(x, skips, level)
File "transfer.py", line 55, in encode
return self.encoder.encode(x, skips, level)
File "/content/drive/WCT2/model.py", line 163, in encode
out = self.conv0(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 338, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size 3 3 1 1, expected input[1, 4, 512, 512] to have 3 channels, but got 4 channels instead

Could you help me please? Thank you!

Sincerely,

Amber
@jaejun-yoo

Could you please explain the idea of passing on the information captured by low frequency filters of wavelet pooling to encoder layer while the high frequency components are skip connected to decoder module directly?

In section 3 of your paper, where you discuss the model architecture, there is this paragraph -

"The max-pooling layers are replaced with wavelet pooling where high frequency components (LH,HL, HH) are skipped to the decoder directly. Thus, only the low frequency component (LL) is passed to the next encoding layer"

Is there a particular reason for this?

Qs about the Dataset

Hello, it's so lucky for me to know you've publish your source code and training data. But I don't find the test data of video. Can you publish the video data? or, you can tell me where to find the video data.
Thank you very much!

LL components vs. avg. pooling

Hi,

in the original paper equivalence between avg. pooling and the LL-component of the decomposition are claimed. However, in your code the L-component evaluates to [[0.5, 0.5], [0.5, 0.5]], which contradicts this statement. Am I missing something?

Some questiona about network?

Thank you for your good job ! I have a problem that the photowct-network had not used the raw picture and it yield some error points. what it will be if we add some raw information like unet structure? Does the 'unet' idea success or the haar wavelet works?

Query regarding "conv0" layer

Could you please explain the reason for using the "conv0" layer, as it is not part of the original VGG19? Also, during training, are the weights of this particular layer set as trainable or fixed?

Segmentation maps precision

I'm trying to use the net to transfer the style of a mountain landscape photo to another.
I tried building a coarse segmentation map for the 2 photos, but the result is off (see attachment).
I would like to know if there is a way to set the net to use the segmentation map as a "hint" and not as a strict constraint (from the attachment you can clearly tell where are the bounds of the segmentation map).

Thank you.

zing_cat5_encoder

In "cat5" skip network, only the last layer matters

Thank you for your great work. In the skip decoder network, I try to test just one layer with wct and find only the last layer matters. The other layers seem to contribute little to the final result. It seems the skip structure just bypass the normal flow, am I right?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.