Git Product home page Git Product logo

bcmi / cdtnet-high-resolution-image-harmonization Goto Github PK

View Code? Open in Web Editor NEW
110.0 12.0 8.0 30.84 MB

[CVPR 2022] We unify pixel-to-pixel transformation and color-to-color transformation in a coherent framework for high-resolution image harmonization. We also release 100 high-resolution real composite images for evaluation.

Python 84.67% C++ 6.78% Cuda 7.22% Shell 0.53% C 0.60% Objective-C 0.20%
image-harmonization high-resolution-image-harmonization image-composition

cdtnet-high-resolution-image-harmonization's Introduction

CDTNet-High-Resolution-Image-Harmonization

This is the official repository for the following paper:

High-Resolution Image Harmonization via Collaborative Dual Transformations [arXiv]

Wenyan Cong, Xinhao Tao, Li Niu, Jing Liang, Xuesong Gao, Qihao Sun, Liqing Zhang
Accepted by CVPR2022.

Our CDTNet(sim) has been integrated into our image composition toolbox libcom https://github.com/bcmi/libcom. Welcome to visit and try \(^▽^)/

This is the first paper focusing on high-resolution image harmonization. We divide image harmonization methods into pixel-to-pixel transformation and color-to-color transformation. We propose CDTNet to combine these two coherently in an end-to-end framework. As shown in the image below, our CDTNet consists of a low-resolution generator for pixel-to-pixel transformation, a color mapping module for RGB-to-RGB transformation, and a refinement module to take advantage of both. For efficiency, you can use CDTNet(sim) which only has color-to-color transformation.

Note that CDTNet(sim) only supports global color transformation. To achieve local (spatially-variant) color transformation, you can refer to more recent works like PCTNet.

Getting Started

Prerequisites

Please refer to iSSAM and 3D LUT for guidance on setting up the environment.

Installation

  • Clone this repo:
git clone https://github.com/bcmi/CDTNet-High-Resolution-Image-Harmonization
cd ./CDTNet-High-Resolution-Image-Harmonization

Training

If you want to train CDTNet-512 on 2048*2048 HAdobe5K training set with 4 LUTs and pre-trained pixel-to-pixel transformation model "issam256.pth", you can run this command:

python3 train.py models/CDTNet.py --gpus=0 --workers=10 --exp_name=CDTNet_1024 --datasets HAdobe5k --batch_size=4 --hr_w 1024 --hr_h 1024 --lr 256 --weights ./issam256.pth --n_lut 4

We have also provided some commands in the "train.sh" for your convenience.

Testing

If you want to test CDTNet-512 on 2048*2048 HAdobe5K test set with the "HAdobe5k_2048.pth" checkpoint and save the results in "CDTNet_2048_result", you can run this command:

python3 evaluate_model.py CDTNet ./HAdobe5k_2048.pth --gpu 0 --datasets HAdobe5k --hr_w 2048 --hr_h 2048 --lr 512 --save_dir ./CDTNet_2048_result

We have also provided some commands in the "test.sh" for your convenience.

Prediction

If you want to make predictions using your own dataset which the composite images are in ./predict_images and the masks are in ./masks using CDTNet-512 on 2048*2048 with the "HAdobe5k_2048.pth" checkpoint and save the results in "CDTNet_2048_generate", you can run this command:

python3 scripts/predict_for_dir.py CDTNet ./HAdobe5k_2048.pth --images ./predict_images --masks ./predict_masks --gpu 0 --hr_h 2048 --hr_w 2048 --lr 512 --results-path ./CDTNet_2048_generate

We have also provided some commands in the "predict.sh" for your convenience.

Datasets

1. HAdobe5k

HAdobe5k is one of the four synthesized sub-datasets in iHarmony4 dataset, which is the benchmark dataset for image harmonization. Specifically, HAdobe5k is generated based on MIT-Adobe FiveK dataset and contains 21597 image triplets (composite image, real image, mask) as shown below, where 19437 triplets are used for training and 2160 triplets are used for test. Official training/test split could be found in Baidu Cloud (Alternative_address).

MIT-Adobe FiveK provides with 6 retouched versions for each image, so we manually segment the foreground region and exchange foregrounds between 2 versions to generate composite images. High-resolution images in HAdobe5k sub-dataset are with random resolution from 1296 to 6048, which could be downloaded from Baidu Cloud (Alternative_address).

2. 100 High-Resolution Real Composite Images

Considering that the composite images in HAdobe5k are synthetic composite images, we additionally provide 100 high-resolution real composite images for qualitative comparison in real scenarios with image pairs (composite image, mask), which are generated based on Open Image Dataset V6 dataset and Flickr.

Open Image Dataset V6 contains ~9M images with 28M instance segmentation annotations of 350 categories, where enormous images are collected from Flickr and with high resolution. So the foreground images are collected from the whole Open Image Dataset V6, where the provided instance segmentations are used to crop the foregrounds. The background images are collected from both Open Image Dataset V6 and Flickr, considering the resolutions and semantics. Then cropped foregrounds and background images are combined using PhotoShop, leading to obviously inharmonious composite images.

100 high-resolution real composite images are with random resolution from 1024 to 6016, which could be downloaded from Baidu Cloud (access code: vnrp) (Alternative_address).

Results

1. High-resolution (1024×1024 and 2048×2048) results on HAdobe5k test set

We test our CDTNet on 1024×1024 and 2048×2048 images from HAdobe5k dataset and report the harmonization performance based on MSE, PSNR, fMSE, and SSIM. Here we also release all harmonized results on both resolutions. Due to JPEG compression, the performance tested on our provided results would be not surprisingly worse than our reported performance.

Image Size Model MSE PSNR fMSE SSIM Test Images Download
1024×1024 CDTNet-256 21.24 38.77 152.13 0.9868 Baidu Cloud (access code: i8l1)
2048×2048 CDTNet-512 20.82 38.34 155.24 0.9847 Baidu Cloud (access code: rj9p)

We show several results on 1024×1024 resolution below, where yellow boxes zoom in the particular regions for a better observation.

2. High-resolution (1024×1024) results on 100 real composite images

We test our CDTNet on 100 high-resolution real composite images as mentioned above, and provide the results on Baidu Cloud (access code: lr7k).

3. Low-resolution (256×256) results on iHarmony4 test set

We also test our CDTNet on 256×256 images from iHarmony4 dataset. We also provide all harmonized results on Baidu Cloud (access code: l7gh).

4. Low-resolution (256×256) results on 99 real composite images

We also test our CDTNet on another 99 real composite images used in previous works, and provide the results on Baidu Cloud (access code: i6e8).

Other Resources

Acknowledgement

Our code is heavily borrowed from iSSAM and 3D LUT.

cdtnet-high-resolution-image-harmonization's People

Contributors

mia-cong avatar taoxinhao13 avatar ustcnewly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cdtnet-high-resolution-image-harmonization's Issues

Request test results

Dear author, I am doing a comparative test recently, and I need the test results of your model, but I noticed that the online disk link of 256X256 on the iHarmony4 test set is broken, could you update it again? Thank you very much for receiving your reply!

关于训练结果?

我依据ISSAM,查看训练的可视化结果:我没有看到有任何变化,设置 120个epoch,现在是12个epoch的结果,
249000_reconstruction
195000_reconstruction
158000_reconstruction
197000_reconstruction
我的损失函数采用了:L = L pix + L rgb + L ref . L1 loss,没有添加3DLUT 的tv_cons以及mncons loss计算(如下计算),
loss = mse + opt.lambda_smooth * (weights_norm + tv_cons) + opt.lambda_monotonicity * mn_cons
细化的最终输出参照小分辨率的输出设计:
output = attention_map * image + (1.0 - attention_map) * self.to_rgb(conv_2)

我不确定,是我设计有问题还是什么问题,这是三个L1 loss的训练损失
2022-03-26 17-48-36屏幕截图

指令输入的位置

请问Read Me里面所给的指令在哪里输入呀?就是python3....那些指令,是在Git Bash上输入吗?

About the memory cost described in the Introduction.

Hello there,

I have noticed that in the introduction of your paper, it is said that iDIH will cost more than 20GB memory when harmonize a 2048*2048 image. However, in our test, it seems it cost only about 2.5GB. We conduct the test as follows:

        with torch.no_grad():
            input_tmp = torch.randn(1, 3, 2048, 2048).cuda()
            mask_tmp = torch.randn(1, 1, 2048, 2048).cuda()
            start = torch.cuda.memory_allocated() / 1024 / 1024
            self.output = self.net(input_tmp,mask_tmp)
            end_max = torch.cuda.max_memory_allocated() / 1024 / 1024

            print("Max_memory:", (end_max-start))

Is there any wrong above? And I found that if I enable the grad, then the memory cost is about 20GB. So should I test that without the codeline "with torch.no_grad()" ?

Looking forward to your reply, many thanks.

使用HAdobe5k_2048.pth复现的结果比较奇怪

您好,非常棒的工作!

我想要用这个模型对custom image做推理,所以我按照

python3 evaluate_model.py CDTNet ./HAdobe5k_2048.pth --gpu 0 --datasets HAdobe5k --hr 2048 --lr 512 --save_dir ./CDTNet_2048_result

来进行测试。

我只测试了一张图,也就是说,HAdobe5k_test.txt里面只包含:

a3630_1_5.jpg
a3630_1_1.jpg
a3630_1_2.jpg
a3630_1_3.jpg
a3630_1_4.jpg

测试出来后指标结果如图:
image
图片结果的效果也比较差。

感觉自己是不是哪里设置错了,请问是我加载的模型不对吗?

There is no pre-trained model on 1024×1024 HAdobe5k

Or I should also use the model HAdobe5k_2048.pth?I want to look at some data such as the running time and memory cost,I think it should be tested on the same device to make a fair comparison,is that right?

The download link for the Results seems failed

Hi,

It's a nice work. I hope to make use of your results for visualization comparisons, but the current download link seems failed. Would you mind re-opening the download link for test results on both the HAdobe5K and the 100 Real Composite Images?

关于网络细节

Our paper
2022-03-16 11-03-54屏幕截图
ISSAM paper
2022-03-16 11-04-22屏幕截图

想请教一下老师,这个像素转换的实现,仅仅包含ISSAM项目中的Encoder以及Decoder(第二张图像的)编解码还是说也包含了前半部分(HRNet + OCR)只是没有画出来,我看咱们论文(第一张图像)只有编解码的(Encoder和Decoder实现),是否只有编解码这部分,没有HRNet+OCR这部分

关于训练设置

请问在训练的时候,关于LUT的相关参数设置,以及epoch设置,大概训练了多久呢?另外关于Light-weighted Refinement模块,我也是写了简单的两层卷积,但是在测试的时候发现图片都已经不是正常的拍摄图,内容没变,只是颜色外观扭曲了,所以想问下关于这个模块的详细信息,谢谢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.