Git Product home page Git Product logo

zheng-chong / catvton Goto Github PK

View Code? Open in Web Editor NEW
580.0 7.0 64.0 15.84 MB

CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).

License: Other

Python 39.29% Jupyter Notebook 57.25% Cuda 0.40% C++ 0.40% Shell 0.27% Dockerfile 0.04% Makefile 0.01% C 0.02% HTML 0.29% CSS 0.04% JavaScript 1.99%
diffusion-models fashion try-on

catvton's People

Contributors

eltociear avatar zheng-chong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

catvton's Issues

Host the demo on Huggingface Spaces ZeroGPU

Hi @Zheng-Chong Congratulations on CatVTON release! Would be great to have the demo up on Huggingface Spaces. We provide GPU grants for interesting projects and paper-implementations, and would be happy to support CatVTON with ZeroGPU (A100s) sponsorship!

You might need to modify the current gradio code for ZeroGPU Spaces usage, actually. To understand this better, please refer to the usage section of the organization: https://huggingface.co/zero-gpu-explorers.

We also have a step-by-step guide for using the gradio sdk on Spaces: https://huggingface.co/docs/hub/en/spaces-sdks-gradio.

Applying for grants on Spaces is fairly easy using the Settings tab of your Space. For more information on how to apply for GPU grants on Spaces, please visit: https://huggingface.co/docs/hub/en/spaces-gpus#community-gpu-grants."

Comfyui无法加载节点

When loading the graph, the following node types were not found:
LoadAutoMasker
CatVTON
AutoMasker
LoadCatVTONPipeline

训练结果不同gs的异常

图像由左到右分别是服装、cloth guidance scale分别为1.0、1.5、2.0、2.5的生成图。
ddpm_result
ddim_result_c1

gs从小到大变化,服装细节逐渐可控,但重绘区域变暗变黑。整体图像光感正常的gs时,服装细节又控不了。不知大佬之前训练时有没有碰到这个情况,或者知道可能是什么原因导致的?

limitation of CatVTON & training code request

Dear authors,

It is great that you just made Diffusion-based VTON models much simpler and lightweight. It is quite intuitive to use only self-attention. I noticed that your model can mostly preserve the structure of the garment but for some examples, it can not really model simple textures and also it can change the color of the garment quite vastly. I think these limitations mostly come from the lack of training samples in the input space. Therefore it would be quite useful if you could share the training code to address this limitation of CatVTON.

Screenshot 2024-08-07 at 14 08 26
Screenshot 2024-08-07 at 14 08 38

SCHP unable to run on CPU only environment

Thanks for the great work.
I am encountering a problem when running the script in a CPU-only environment (Colab with no GPU). Below are the error details:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
[<ipython-input-22-144066cfdfe5>](https://localhost:8080/#) in <cell line: 5>()
      3 
      4 from utils import resize_and_crop
----> 5 from model.cloth_masker import AutoMasker as AM

9 frames
[/content/CatVTON/model/cloth_masker.py](https://localhost:8080/#) in <module>
      7 import torch
      8 
----> 9 from model.SCHP import SCHP  # type: ignore
     10 from model.DensePose import DensePose  # type: ignore
     11 

[/content/CatVTON/model/SCHP/__init__.py](https://localhost:8080/#) in <module>
----> 1 from model.SCHP import networks
      2 from model.SCHP.utils.transforms import get_affine_transform, transform_logits
      3 
      4 from collections import OrderedDict
      5 import torch

[/content/CatVTON/model/SCHP/networks/__init__.py](https://localhost:8080/#) in <module>
      1 from __future__ import absolute_import
      2 
----> 3 from model.SCHP.networks.AugmentCE2P import resnet101
      4 
      5 __factory = {

[/content/CatVTON/model/SCHP/networks/AugmentCE2P.py](https://localhost:8080/#) in <module>
     19 # Note here we adopt the InplaceABNSync implementation from https://github.com/mapillary/inplace_abn
     20 # By default, the InplaceABNSync module contains a BatchNorm Layer and a LeakyReLu layer
---> 21 from model.SCHP.modules import InPlaceABNSync
     22 
     23 BatchNorm2d = functools.partial(InPlaceABNSync, activation='none')

[/content/CatVTON/model/SCHP/modules/__init__.py](https://localhost:8080/#) in <module>
----> 1 from .bn import ABN, InPlaceABN, InPlaceABNSync
      2 from .functions import ACT_RELU, ACT_LEAKY_RELU, ACT_ELU, ACT_NONE
      3 from .misc import GlobalAvgPool2d, SingleGPU
      4 from .residual import IdentityResidualBlock
      5 from .dense import DenseModule

[/content/CatVTON/model/SCHP/modules/bn.py](https://localhost:8080/#) in <module>
      8     from Queue import Queue
      9 
---> 10 from .functions import *
     11 
     12 

[/content/CatVTON/model/SCHP/modules/functions.py](https://localhost:8080/#) in <module>
      8 
      9 _src_path = path.join(path.dirname(path.abspath(__file__)), "src")
---> 10 _backend = load(name="inplace_abn",
     11                 extra_cflags=["-O3"],
     12                 sources=[path.join(_src_path, f) for f in [

[/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py](https://localhost:8080/#) in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
   1307         ...     verbose=True)
   1308     """
-> 1309     return _jit_compile(
   1310         name,
   1311         [sources] if isinstance(sources, str) else sources,

[/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py](https://localhost:8080/#) in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
   1743         return _get_exec_path(name, build_directory)
   1744 
-> 1745     return _import_module_from_library(name, build_directory, is_python_module)
   1746 
   1747 

[/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py](https://localhost:8080/#) in _import_module_from_library(module_name, path, is_python_module)
   2141         spec = importlib.util.spec_from_file_location(module_name, filepath)
   2142         assert spec is not None
-> 2143         module = importlib.util.module_from_spec(spec)
   2144         assert isinstance(spec.loader, importlib.abc.Loader)
   2145         spec.loader.exec_module(module)

ImportError: /tmp/inplace_abn/inplace_abn.so: cannot open shared object file: No such file or directory

I believe the issue arises because some dependencies of SCHP require CUDA and are only available in a CUDA environment.
By the way, I have set export TORCH_EXTENSIONS_DIR=/tmp to overcome another issue, so you might see import errors from /tmp.
Do you have a solution to run SCHP in a CPU-only environment?

Love your work!

Thanks for sharing this work! Just let you know that I really love the simplicity and effectiveness of this model! Cheers!

ModuleNotFoundError: No module named 'cv2'

(Catvton) C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable>C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\run_nvidia_gpu.bat

(Catvton) C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build
[START] Security scan
[DONE] Security scan

ComfyUI-Manager: installing dependencies done.

** ComfyUI startup time: 2024-08-02 18:17:57.433249
** Platform: Windows
** Python version: 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)]
** Python executable: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\python_embeded\python.exe
** ComfyUI Path: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI
** Log path: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\comfyui.log

Prestartup times for custom nodes:
1.1 seconds: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Total VRAM 8192 MB, total RAM 32632 MB
pytorch version: 2.3.1+cu121
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 2070 SUPER : cudaMallocAsync
Using pytorch cross attention
[Prompt Server] web root: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\web
Traceback (most recent call last):
File "C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1941, in load_custom_node
module_spec.loader.exec_module(module)
File "", line 940, in exec_module
File "", line 241, in call_with_frames_removed
File "C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CatVTON_init
.py", line 3, in
from .model.cloth_masker import AutoMasker as AM
File "C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CatVTON\model\cloth_masker.py", line 5, in
import cv2
ModuleNotFoundError: No module named 'cv2'

Cannot import C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CatVTON module for custom nodes: No module named 'cv2'

Loading: ComfyUI-Manager (V2.48.4)

ComfyUI Revision: 2445 [369f459b] | Released on '2024-08-01'

Import times for custom nodes:
0.0 seconds: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\websocket_image_save.py 0.0 seconds (IMPORT FAILED): C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CatVTON
0.3 seconds: C:\ComfyUI_windows_portable_nvidia\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Starting server

there is an problem about cv2 import, but i can correctly import cv2 in my virtual environment,could u help me with this issue?

Evaluation

Thanks for open-sourcing this work! I have a concern about the quantitative results reported in the paper. I used the vitonhd-16k-512 checkpoint to evaluate on VITON-HD, but the results did not match those reported in the paper, specifically I got LPIPS=0.1019, SSIM=0.8649, FID=13.5417(unpair), KID=6.748(unpair), which is relatively low compared to the paper report.

VITON-HD results

Thank you for your great work on CatVTON!

I tested the VITON-HD model and generated 512x384 images.
I resized the ground truth to 512x384 and tested SSIM and FID, and found that the metrics are SSIM=0.856 and FID=8.63. This does not match the metrics in the paper.
So, were the metrics in the paper obtained using a "mix model", rather than just the model trained on VITON-HD?

Details about training setting

Good work for the design of such simple vton pipeline.

I have tried to train CatVTON on vitonhd dataset, but the result is a little blurry as shown below. (38k iteration batchsize 8x32 512x384 resolution input, only attention parameters are trained)
image

I'm wondering is there any specific setting or trick in the loss part, for example how to compute the loss? (i.e. compute loss of latents of human images or the concat latents. )

I also noticed the training loss is relatively small at the beginning of the training, is this normal?

Epoch 0, step 0, step_loss: 0.06322, data_time: 2.104, time: 4.421
Epoch 0, step 1, step_loss: 0.04681, data_time: 0.058, time: 2.126
Epoch 0, step 2, step_loss: 0.06814, data_time: 0.058, time: 2.124
Epoch 0, step 3, step_loss: 0.03120, data_time: 0.064, time: 2.139
Epoch 0, step 4, step_loss: 0.02966, data_time: 0.059, time: 2.132
Epoch 0, step 5, step_loss: 0.03977, data_time: 0.059, time: 2.132
Epoch 0, step 6, step_loss: 0.05645, data_time: 0.059, time: 2.133

xformers is not compatible with MacOS

Hey, I just wonder how to fix the compatibility issue with MacOS. Can't install the requirements file because the xformers is not compatible with MacOS.

在Windows上通过Gradio使用的本地部署教程!

非常感谢[Zheng-Chong]大佬的工作,项目的效果非常出色。我在Windows上使用Gradio的本地部署过程中,遇到了不少问题,包括 #12 的问题,后来通过查阅资料和不断尝试,最终部署成功了。
我出了一期教程,希望能对其他朋友们有所帮助。再次感谢[Zheng-Chong]大佬的工作和开源精神,点赞!!

Windows本地部署教程:
https://www.bilibili.com/video/BV173YueAEdi/?vd_source=6c8b8679b818b05d24c65f49a65eb994

agnostic masks

Very good work, if I want to test my own models, how can I make agnostic masks?

Request for Training Code

Hello! This is great work. Hats off to you and your team. I would love to re-implement the results with training on my personal machine. I was wondering if there are plans to release the training code?

Dependencies issues

Hi, it would be great to try out such project. However the requirements.txt is a bit messed up. Lot's of broken or missing dependencies.
For example, densepose module is nowhere to be found as pip package, as well as detectron2 (this one I installed from git repo).
Can you please do a clean check on your requirements.txt and maybe update readme with an installation section?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.