Git Product home page Git Product logo

cat-dm's Introduction

CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model

1Tianjin University   2Tencent LightSpeed Studio

[Paper] [Project]

Abstract

Image-based virtual try-on enables users to virtually try on different garments by altering original clothes in their photographs. Generative Adversarial Networks (GANs) dominate the research field in image-based virtual try-on, but have not resolved problems such as unnatural deformation of garments and the blurry generation quality. Recently, diffusion models have emerged with surprising performance across various image generation tasks. While the generative quality of diffusion models is impressive, achieving controllability poses a significant challenge when applying it to virtual try-on tasks and multiple denoising iterations limit its potential for real-time applications. In this paper, we propose Controllable Accelerated virtual Try-on with Diffusion Model called CAT-DM. To enhance the controllability, a basic diffusion-based virtual try-on network is designed, which utilizes ControlNet to introduce additional control conditions and improves the feature extraction of garment images. In terms of acceleration, CAT-DM initiates a reverse denoising process with an implicit distribution generated by a pre-trained GAN-based model. Compared with previous try-on methods based on diffusion models, CAT-DM not only retains the pattern and texture details of the in-shop garment but also reduces the sampling steps without compromising generation quality. Extensive experiments demonstrate the superiority of CAT-DM against both GAN-based and diffusion-based methods in producing more realistic images and accurately reproducing garment patterns.

Hardware Requirement

Our experiments were conducted on two NVIDIA GeForce RTX 4090 graphics cards, with a single RTX 4090 having 24GB of video memory. Please note that our model cannot be trained on graphics cards with less video memory than the RTX 4090.

Environment Requirement

  1. Clone the repository
git clone https://github.com/zengjianhao/CAT-DM
  1. A suitable conda environment named CAT-DM can be created and activated with:
cd CAT-DM
conda env create -f environment.yaml
conda activate CAT-DM
  • If you want to change the name of the environment you created, you need to modify the name in both environment.yaml and setup.py.
  • You need to make sure that conda is installed on your computer.
  • If there is a network error, try updating the environment using conda env update -f environment.yaml.
  1. Installing xFormers:
git clone https://github.com/facebookresearch/xformers.git
cd xformers
git submodule update --init --recursive
pip install -r requirements.txt
pip install -U xformers
cd ..
rm -rf xformers
  1. open src/taming-transformers/taming/data/utils.py, delete from torch._six import string_classes, and change elif isinstance(elem, string_classes): to elif isinstance(elem, str):

Dataset Preparing

VITON-HD

  1. Download the VITON-HD dataset
  2. Create a folder datasets
  3. Put the VITON-HD dataset into this folder and rename it to vitonhd
  4. Generate the mask images
# Generate the train dataset mask images
python tools/mask_vitonhd.py datasets/vitonhd/train datasets/vitonhd/train/mask
# Generate the test dataset mask images
python tools/mask_vitonhd.py datasets/vitonhd/test datasets/vitonhd/test/mask

DressCode

  1. Download the DressCode dataset
  2. Create a folder datasets
  3. Put the DressCode dataset into this folder and rename it to dresscode
  4. Generate the mask images and the agnostic images
# Generate the dresses dataset mask images and the agnostic images
python tools/mask_dresscode.py datasets/dresscode/dresses datasets/dresscode/dresses/mask
# Generate the lower_body dataset mask images and the agnostic images
python tools/mask_dresscode.py datasets/dresscode/lower_body datasets/dresscode/lower_body/mask
# Generate the upper_body dataset mask images and the agnostic images
python tools/mask_dresscode.py datasets/dresscode/upper_body datasets/dresscode/upper_body/mask

Details

datasets folder should be as follows:

datasets
├── vitonhd
│   ├── test
│   │   ├── agnostic-mask
│   │   ├── mask
│   │   ├── cloth
│   │   ├── image
│   │   ├── image-densepose
│   │   ├── ...
│   ├── test_pairs.txt
│   ├── train
│   │   ├── agnostic-mask
│   │   ├── mask
│   │   ├── cloth
│   │   ├── image
│   │   ├── image-densepose
│   │   ├── ...
│   └── train_pairs.txt
├── dresscode
│   ├── dresses
│   │   ├── dense
│   │   ├── images
│   │   ├── mask
│   │   ├── ...
│   ├── lower_body
│   │   ├── dense
│   │   ├── images
│   │   ├── mask
│   │   ├── ...
│   ├── upper_body
│   │   ├── dense
│   │   ├── images
│   │   ├── mask
│   │   ├── ...
│   ├── test_pairs_paired.txt
│   ├── test_pairs_unpaired.txt
│   ├── train_pairs.txt
│   └── ...

PS: When we conducted the experiment, VITON-HD did not release the agnostic-mask. We used our own implemented mask, so if you are using VITON-HD's agnostic-mask, the generated results may vary.

Required Model

  1. Download the Paint-by-Example model
  2. Create a folder checkpoints
  3. Put the Paint-by-Example model into this folder and rename it to pbe.ckpt
  4. Make the ControlNet model:
  • VITON-HD:
python tools/add_control.py checkpoints/pbe.ckpt checkpoints/pbe_dim6.ckpt configs/train_vitonhd.yaml
  • DressCode:
python tools/add_control.py checkpoints/pbe.ckpt checkpoints/pbe_dim5.ckpt configs/train_dresscode.yaml
  1. checkpoints folder should be as follows:
checkpoints
├── pbe.ckpt
├── pbe_dim5.ckpt
└── pbe_dim6.ckpt

Training

VITON-HD

bash scripts/train_vitonhd.sh

DressCode

bash scripts/train_dresscode.sh

Testing

VITON-HD

  1. Download the checkpoint for VITON-HD dataset and put it into checkpoints folder.

  2. Directly generate the try-on results:

bash scripts/test_vitonhd.sh
  1. Poisson Blending
python tools/poisson_vitonhd.py

DressCode

  1. Download the checkpoint for DressCode dataset and put it into checkpoints folder.

  2. Directly generate the try-on results:

bash scripts/test_dresscode.sh
  1. Poisson Blending
python tools/poisson_dresscode.py

Evaluation

Citing

@article{zeng2023cat,
  title={CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model},
  author={Zeng, Jianhao and Song, Dan and Nie, Weizhi and Tian, Hongshuo and Wang, Tongtong and Liu, Anan},
  journal={arXiv preprint arXiv:2311.18405},
  year={2023}
}

cat-dm's People

Contributors

zengjianhao avatar

Stargazers

FreddyGump avatar Xueting Yang avatar Zain Ali Shah avatar Ehsan Hoseinzade avatar Edge Micro avatar Alan Liu avatar  avatar Su-Hyeon Jun avatar Yuqiang Zhang avatar  avatar Niranjan Anandkumar avatar zachary avatar  avatar Mengxi Zhou avatar Sushanta Ratna Adhikari avatar Dmitry Abulkhanov avatar Baixin XU avatar Wayne Scholar avatar Sebastián avatar Yakul Garg avatar purple111 avatar Jiayi Liang avatar Guangyuan Li avatar master avatar Duc-Thuc Pham (Daniel) avatar Anirudh P S avatar  avatar  avatar  avatar Johan.Coetzee avatar Jeff Carpenter avatar  avatar  avatar  avatar Aoyu avatar Saleh Ahmad avatar 0xhephaistos avatar Starry avatar  avatar  avatar George Davila Durendal avatar Fabián avatar  avatar sunnytomy avatar  avatar  avatar zhaoyun007 avatar Martin Salo avatar code-liyu avatar boooomerQAQ avatar Akmal Masud avatar 顾小东(Xiaodong Gu) avatar Shun Lu avatar JiahaoTian avatar Jianhua Sun avatar  avatar damone avatar GunZ avatar Ferhat Sobay avatar  avatar Cheng-Bin Jin avatar  avatar Suyash Sharma avatar Rustam Dushabaev avatar jaehyunshinML avatar BIBHUTI JHA avatar LY avatar  avatar HUANG Yuanhao avatar  avatar  avatar Alexandre Willame avatar Lacan avatar peng avatar glowworm avatar Chenyang Yu avatar Guilherme Euzébio avatar  avatar Taenam avatar big cornflake avatar Modasshir avatar Tianbao Li avatar WendaShi avatar  avatar  avatar Jin, Hyunwoo avatar Nguyen Quy Bao avatar Vo Van Phuc avatar test avatar  avatar Nirmal Kumar Ahirwar avatar Lu Ming avatar Straughter "BatmanOsama" Guthrie avatar Serdar Yıldız avatar  avatar  avatar Kadir Nar avatar Devansh Khandekar avatar  avatar

Watchers

Jianhua Sun avatar Ben Kazemi avatar  avatar  avatar Rustam Dushabaev avatar Gursharanpreet Singh avatar Modasshir avatar  avatar Lacan avatar Netsales.BG Ltd. avatar  avatar Chenyang Yu avatar IronMan avatar Saleh Ahmad avatar  avatar  avatar Inferencer avatar  avatar  avatar

cat-dm's Issues

Image Size

First: Great job! It works quite well and achieves very similar results.

I think the limitation I'm encountering is the output size; it might be too small to generate details. Maybe I'm doing something wrong. Is there a way to produce the output larger than 384x512px? (I mean natively, not through a later upscaler)

Which facebookresearch_dinov2 is the code looking for?

In CAT-DM\ldm\models\diffusion\control.py, I see below lines. They aren't correct. Which dinno files is the code looking for?

self.dinov2_vits14 = torch.hub.load('/home/sd/.cache/torch/hub/facebookresearch_dinov2_main', 'dinov2_vitl14', source='local', pretrained=False)
state_dict = torch.load('/home/sd/Harddisk/zjh/Teacher/checkpoints/dinov2_vitl14_pretrain.pth')

training

Hi, there

Great job, Regarding training, can you tell us how many epochs were actually trained?

What is "gan" in dataset loaded

In test.py, I see
gan = batch["gan"].to(torch.float16).to(device)

However, the dataset Loader, gives:

return {"GT": img,                  # [3, 512, 512]
                "inpaint_image": inpaint,   # [3, 512, 512]
                "inpaint_mask": mask,       # [1, 512, 512]
                "ref_imgs": refernce,       # [3, 224, 224]
                "hint": hint                # [5, 512, 512]
                }

What is the gan here?

Error encountered when testing vitonhd dataset

Hi Jianhao, this work looks very valuable and interesting, and I can't wait to follow up on your research. However, in my testing based on your code, I encountered the following problem:

Traceback (most recent call last): File "test.py", line 198, in <module> gan = batch["gan"].to(torch.float16).to(device) KeyError: 'gan'

I checked the Dataloader and it doesn't seem to provide 'gan' there. Can you tell me how to fix this?

Code?

Where is the code?

About implement of LaDI-VTON

Hi, your work is vey interesting and I admire your experiment very much. I would like to ask how you complete the experiments of LaDI-VTON on two NVIDIA GeForce RTX 4090 GPUs. When I do this, it comes to OutOfMemoryError. Many thanks.

`gan = batch["gan"].to(torch.float16).to(device)`

Hi, thanks very much for the model code and checkpoints!

I have a problem with the following line in test.py:

gan = batch["gan"].to(torch.float16).to(device)

Dataloader does not seem to provide this data, though it is used in prediction. What should I do?

Thank you!

PS, In order to load generated checkpoint for VITON-HD dataset I needed to change this line

model.load_state_dict(torch.load(opt.ckpt, map_location="cpu"), strict=False)

Gradio Demo

Great work, is it possible to get a gradio demo of this? Also when will the full code release?

Evaluation code

Thank you for your amazing work.
Can you release your evaluation code for all of the metrics ? (FID, KID, SSIM, LPIPS)

Prompt for insufficient graphics memory and training failure

Hello author, I am very pleased that you have done such a great job. I used a NVIDIA A10 24GB GPU for reproduction, and when executing the bash scripts/train_vitonhd.sh script according to your readme, it prompted insufficient graphics memory. Is it due to my configuration issue or is the 24GB graphics memory itself insufficient to support training?

About poisson blending process

I have noticed that the blending code poisson_dresscode.py and poisson_vitonhd.py are currently identical.

Moreover, is it correct to say that "The process of poisson blending is applied with the inputs being the generated images and its mask ?"
If so, in order to generate the correct result, should I use the images from the "Unpaired_Concatenation" output or "Unpaired_Direst" output ?
Cuz, in the current poisson_vitonhd.py, I see that you perform blending with the test image (groundtruth), which is incorrect. Please correct me if I am wrong.

Thank you for your amazing work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.