clovaai / wct2 Goto Github PK

View Code? Open in Web Editor NEW

861.0 16.0 143.0 112.75 MB

Software that can perform photorealistic style transfer without the need of any post-processing steps.

License: MIT License

Python 100.00%

wct2's Introduction

WCT2 (ICCV 2019 accepted)

Photorealistic Style Transfer via Wavelet Transforms | paper | supplementary materials | video stylization results

Jaejun Yoo*, Youngjung Uh*, Sanghyuk Chun*, Byeonkyu Kang, Jung-Woo Ha

Clova AI Research, NAVER (* equal contributions)

PyTorch implementation for photorealistic style transfer that does not need any further post-processing steps; e.g. from day to sunset, from summer to winter, etc. This is the first end-to-end model that can stylize 1024×1024 resolution image in 4.7 seconds, giving a pleasing and photorealistic quality without any post-processing.

The code was written by Jaejun Yoo and Byeongkyu Kang.

Getting Started

Dependency

PyTorch >= 0.4.1
Check the requirements.txt

pip install -r requirements.txt

Installation

Clone this repo:

git clone https://github.com/clovaai/WCT2.git
cd WCT2

Pretrained models can be found in the ./model_checkpoints

Prepare image dataset
- Images can be found in DPST repo
  - You can find the entire content and style images (with paired segmentation label maps) in the following link DPST images. input folder has the content images and the style folder has the style images. Every segmention map can be found in the segmentation folder.
- To make a new dataset with label pairs, please follow the instruction of PhotoWCT repo
- Put the content and style images with their segment label pairs (if available) into the example folder accordingly.
  - Currently there are several example images so that you can execute the code as soon as you clone this repo.
Finally, test the model:

python transfer.py --option_unpool cat5 -a --content ./examples/content --style ./examples/style --content_segment ./examples/content_segment --style_segment ./examples/style_segment/ --output ./outputs/ --verbose --image_size 512

The test results will be saved to ./outputs by default.

Arguments

--content: FOLDER-PATH-TO-CONTENT-IMAGES
--content_segment: FOLDER-PATH-TO-CONTENT-SEGMENT-LABEL-IMAGES
--style: FOLDER-PATH-TO-STYLE-IMAGES
--style_segment: FOLDER-PATH-TO-STYLE-SEGMENT-LABEL-IMAGES
--output: FOLDER-PATH-TO-OUTPUT-IMAGES
--image_size: output image size
--alpha: alpha determines the blending ratio between content and stylized features
--option_unpool: two versions of our model (sum, cat5)
-e, --transfer_at_encoder: stylize at the encoder module
-d, --transfer_at_decoder: stylize at the decoder module
-s, --transfer_at_skip: stylize at the skipped high frequency components
-a, --transfer_all: stylize and save for every composition; i.e. power set of {-e,-d,-s})
--cpu: run on CPU
--verbose

Photorealistic Style Transfer

DPST: "Deep Photo Style Transfer" | Paper | Code
PhotoWCT: "A Closed-form Solution to Photorealistic Image Stylization" | Paper | Code
PhotoWCT (full): PhotoWCT + post processing

Schematic illustration of our wavelet module

Component-wise Stylization

Only for option_unpool = sum version
Full stylization

python transfer.py --option_unpool sum -e -s --content ./examples/content --style ./examples/style --content_segment ./examples/content_segment --style_segment ./examples/style_segment/ --output ./outputs/ --verbose --image_size 512

Low-frequency-only stylization

python transfer.py --option_unpool sum -e --content ./examples/content --style ./examples/style --content_segment ./examples/content_segment --style_segment ./examples/style_segment/ --output ./outputs/ --verbose --image_size 512

Results

option_unpool = cat5 version

python transfer.py --option_unpool cat5 -a --content ./examples/content --style ./examples/style --content_segment ./examples/content_segment --style_segment ./examples/style_segment/ --output ./outputs/ --verbose --image_size 512

Acknowledgement

Our implementation is highly inspired from NVIDIA's PhotoWCT Code.

Citation

If you find this work useful for your research, please cite:

@inproceedings{yoo2019photorealistic,  
  title={Photorealistic Style Transfer via Wavelet Transforms},
  author={Yoo, Jaejun and Uh, Youngjung and Chun, Sanghyuk and Kang, Byeongkyu and Ha, Jung-Woo},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year={2019}
}

Contact

Feel free to contact me if there is any question (Jaejun Yoo [email protected]).

License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

wct2's People

Contributors

Stargazers

Watchers

Forkers

youngjung ml-lab jaejun-yoo jungwoo-ha chaoso awesome-archive sinkingsugar wutianyirosun kastnerkyle johndpope kripaz777 krasin alabarga arain-sh amirunpri2018 oracle9i88 lixueqingqq sg47 pranjalsahu alibabade thelittlekid ssgalitsky kp-forks kiranscaria youngminbaek kunato leix8 459548764 wynmew happog vdedyukhin jcpan7 coallaoh anasgit seongl w121211 kcimit github-ftr hitluobin lisongjiang yonghoonkwon stefanhige angtrim tamwaiban huanyuhello re3write qq2737499951 shuuchen baifengbai rdpli salisbury-espinosa cai880210 zeitgeistqian evolu8 fengkai11 jjandnn killsking pkurainbow dongyuya mhaboali hello-trouble richzhang liangbinxie xishen0220 nour-7 jy-d gihunsong wahyubram82 hermes2k sooperset kiloa44 yinxuping yonde-yonsei jacobwjs kaka24-99 naver-ai piaofu110 cocoruss ruanjiyang xiaohaipeng hwangjs546 emptywj hologerry githubfragments zaoyum-liao suigenk ly451x gordonguo98 fredericmenezes susmitabanerjee jaysulk pingponglabs rahul-sindhu bfirsh btautist hhsinping scey26 molu1019 dvdbisong junhopark0314

wct2's Issues

RuntimeError: got 4 channels instead

Hi!

I followed your instruction step by step, and after typing

!python3 transfer.py --option_unpool cat5 -a --content ./examples/content --style ./examples/style --content_segment ./examples/content_segment --style_segment ./examples/style_segment/ --output ./outputs/ --verbose --image_size 512

I got

Namespace(alpha=1, content='./examples/content', content_segment='./examples/content_segment', cpu=False, image_size=512, option_unpool='cat5', output='./outputs/', style='./examples/style', style_segment='./examples/style_segment/', transfer_all=True, transfer_at_decoder=False, transfer_at_encoder=False, transfer_at_skip=False, verbose=True)
0% 0/1 [00:00<?, ?it/s]------ transfer: 1.png
Elapsed time in whole WCT: 0:00:02.926612

Traceback (most recent call last):
File "transfer.py", line 205, in
run_bulk(config)
File "transfer.py", line 175, in run_bulk
img = wct2.transfer(content, style, content_segment, style_segment, alpha=config.alpha)
File "transfer.py", line 79, in transfer
style_feats, style_skips = self.get_all_feature(style)
File "transfer.py", line 64, in get_all_feature
x = self.encode(x, skips, level)
File "transfer.py", line 55, in encode
return self.encoder.encode(x, skips, level)
File "/content/drive/WCT2/model.py", line 163, in encode
out = self.conv0(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 338, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size 3 3 1 1, expected input[1, 4, 512, 512] to have 3 channels, but got 4 channels instead

Could you help me please? Thank you!

Sincerely,

Amber
@jaejun-yoo

Without Seg

Hi,

Thank you for this nice work! Is there a way that can run the code without providing the segmentation masks? Thanks!

Cheers,

Fixed VGG encoder weights query

In your paper you mention that 'We use the encoder-decoder architecture with fixed VGG
encoder weights' and you only train the decoder for COCO dataset. However, since you have changed the max pooling/unpooling layers from the original VGG-19 architecture to the wavelet pooling/unpooling layers, you need to train the entire encoder-decoder VGG-19 architecture right? Since these layers are changed, the VGG-19 weights will also get affected as a result of which we cannot use the original ImageNet weights. Can you please provide some clarification on this.

Does it support large-size photos style transfer?

I took some photos with a SLR camera.
The size is up to 5000×5000 px.

Does it support large-size photos style transfer?

Question about the loss functions

Hi, thank you for your great work. I have a question about the loss function in the training phase.
In session 5.1, you have mentioned:

minimizing the L2 reconstruction loss and the additional feature Gram matching loss with the encoder.

So you input to network a single image x (from MS-COCO) and calculate the loss with the output is image x' as

Loss = recontruction_loss + gram_matching_loss
with:
+ recontruction_loss = L2(x - x')
+ gram_matching_loss = gram_matrix(encoder(x)) - gram_matrix(encoder(x'))

Do I miss something? Thank you.

Error with installing requirements

On google colab it's failing to install when I do pip3 install -r requirements.txt

Could not find a version that satisfies the requirement torch==0.4.1.post2 (from -r requirements.txt (line 1)) 
(from versions: 0.1.2, 0.1.2.post1, 0.3.1, 0.4.0, 0.4.1, 1.0.0, 1.0.1, 1.0.1.post2)
No matching distribution found for torch==0.4.1.post2 (from -r requirements.txt (line 1))

why the encoder has the layer conv0

@kiranscaria hi, In the paper, a encoder use some pretrained layers of vgg19, but vgg19 does not have that layer conv0, so i want to know why in the model, it has conv0 layer. thank u.

Create

trying to port wct transfer to numpy.

torch and numpy svd are different

u, e, v = torch.svd(conv, some=False)
ut, et, vt = [ x.data.cpu().numpy() for x in (u,e,v) ]
un, en, vn = np.linalg.svd ( conv.data.cpu().numpy() )
print ("diff=", np.sum(np.abs(ut-un)))
print ("diff=", np.sum(np.abs(et-en)))
print ("diff=", np.sum(np.abs(vt-vn)))

result:

diff= 10380.845
diff= 0.007629454
diff= 12652.448

scipy.linalg.svd - same wrong result

diff with tensorflow:

import tensorflow as tf
    tf_sess = tf.Session(config=tf.ConfigProto(device_count={'GPU': 0}) )
    tf_inp = tf.placeholder(tf.float32, shape=conv.shape )

    tfe, tfu, tfv = tf.linalg.svd( tf_inp, full_matrices=True ) #conv.data.cpu().numpy() )

    tfe, tfu, tfv =  tf_sess.run ([tfe, tfu, tfv], feed_dict={tf_inp: conv.data.cpu().numpy() } )
    print ("diff=", np.sum(np.abs(ut-tfu)))
    print ("diff=", np.sum(np.abs(et-tfe)))
    print ("diff=", np.sum(np.abs(vt-tfv)))

diff= 10259.318
diff= 0.03329885
diff= 10259.33

so is your net trained with torch.svd , and there is no way to use your method in other frameworks?

found similar issue pytorch/pytorch#16076
but it is closed

How to generate video stylisation results?

Is there a recommended way to perform video stylisation? For now, Im converting the original video file and style video file into frames, performing photorealistic style transfer and then making a video out of the output frames. However there seems to be problem while encoding the frames to video leading to blurry video output. Any suggestions are welcome!

How do I generate segmentation maps?

Great tutorial! I was wondering is there a recommended method to generate the segmentation masks used for stylising?

Thanks!

LL components vs. avg. pooling

Hi,

in the original paper equivalence between avg. pooling and the LL-component of the decomposition are claimed. However, in your code the L-component evaluates to [[0.5, 0.5], [0.5, 0.5]], which contradicts this statement. Am I missing something?

Segmentation maps precision

I'm trying to use the net to transfer the style of a mountain landscape photo to another.
I tried building a coarse segmentation map for the 2 photos, but the result is off (see attachment).
I would like to know if there is a way to set the net to use the segmentation map as a "hint" and not as a strict constraint (from the attachment you can clearly tell where are the bounds of the segmentation map).

Thank you.

Could you please explain the idea of passing on the information captured by low frequency filters of wavelet pooling to encoder layer while the high frequency components are skip connected to decoder module directly?

In section 3 of your paper, where you discuss the model architecture, there is this paragraph -

"The max-pooling layers are replaced with wavelet pooling where high frequency components (LH,HL, HH) are skipped to the decoder directly. Thus, only the low frequency component (LL) is passed to the next encoding layer"

Is there a particular reason for this?

In "cat5" skip network, only the last layer matters

Thank you for your great work. In the skip decoder network, I try to test just one layer with wct and find only the last layer matters. The other layers seem to contribute little to the final result. It seems the skip structure just bypass the normal flow, am I right?

About training source

Do you have any plan to share your training source ?

WCT2 seems cannot transfer styles without segmentation maps? Could you please indicate how to make it works without segmentation maps since you claimed in your paper that it supports transfer directly and without segmentation.

Qs about the Dataset

Hello, it's so lucky for me to know you've publish your source code and training data. But I don't find the test data of video. Can you publish the video data? or, you can tell me where to find the video data.
Thank you very much!

QS about the depth map of transfered images

Hi @jaejun-yoo , thank you for your great work!

I have some original images and their corresponding depth maps. I wanna ask that after getting the style-transfered images, could the original depth maps still match them? Namely, will the style-transfer process induces a minor depth offset?

Big thanks!

Query regarding "conv0" layer

Could you please explain the reason for using the "conv0" layer, as it is not part of the original VGG19? Also, during training, are the weights of this particular layer set as trainable or fixed?

Can I train with my data?

Hello. Thank you for sharing your good research.
Can I try to train with my data? If I can, please explain the command and method .
Thank you in advance.

Details about training decoder

Did you train the decoder with cat5 settings, using only COCO dataset? And caculated feature Gram loss between reconstructed images and input images?

I'm suprised that the training process involve no style images, only COCO dataset as content images. And the Gram loss is not between stylized images and style images. Still, the network works well for style transfer

some questions about the figure 9

Thank you for your good job ! I have a problem that how to computer the value of SSIM and Gram loss of Figure 9?

How does it work for an indoor dataset?

I have an indoor dataset where the goal is to take the texture of one furniture and overlay on a similar furniture in another room setup. When I directly run this code, the color gets transferred but not the exact texture. Is there any paramater I should tweak to get the desired output? Can you shed some light on this?

Could you release the pretrained model?

Could you release the pretrained model? We could test by ourself.

How to add temporal consistency

Hi , I want to transfer color for the video data. I was wondering how to add temporal consistency constraint such as optical flow on the model.

Some questiona about network?

Thank you for your good job ! I have a problem that the photowct-network had not used the raw picture and it yield some error points. what it will be if we add some raw information like unet structure? Does the 'unet' idea success or the haar wavelet works?

About SSIM metircs

Is there any scripts for SSIM metrics ?