Git Product home page Git Product logo

upgpt's Introduction

UPGPT

This is the official Github repo of "UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer" https://arxiv.org/abs/2304.08870. The first model that could combine all the person image generation functions, and conditioning using pose, text and visual prompts. banner

Simultaneous pose and camera view interpolation via SMPL parameter linear interpolation.

Interpolation

The code was adapted from https://github.com/Stability-AI/stablediffusion/.

Featured in RSIP Vision Newsletter September 2023. rsip rsip2

Video Demo (HD)

Click on the icon to view demonstration of earlier version of our app on Youtube.

Video Demo (HD)

BibTeX:

@misc{upgpt23,
      title={UPGPT: Universal Diffusion Model for Person Image Generation,
Editing and Pose Transfer}, 
      author={Soon Yau Cheong and Armin Mustafa and Andew Gilbert},
      year={2023},
      journal={IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
      primaryClass={cs.CV}
}

News

[2023.09.25] Selected for live demo at ICCV 2023 on Friday Oct 6th. https://iccv2023.thecvf.com/demos-111.php

[2023.09.05] Featured in RSIP Vision Newsletter September 2023.

[2023.08.16] Accepted at 2023 IEEE/CVF International Conference on Computer Vision (ICCV) Workshops.

[2023.07.27] Updated Arxiv paper.

[2023.06.05] Training data and script released - pose transfer with bounding box as RPM. This concludes all the planned releases.

[2023.06.01] I have updated the code for pose interpolation. However, you will need to download the new model file interp_256.zip (previously pt_256.zip). The app now also come with pre-loaded style images and generated examples.

Paper's Result

The ground truth and generated images used in the paper can be downaloded from the repo release.

Requirements

A suitable conda environment named upgpt can be created and activated with:

conda env create -f environment.yaml
conda activate upgpt

Files

Model checkpoints and dataset can be downloaded from HuggingFace.

App Demo

App

This demonstration uses pre-segmented style images from DeepFashion Multimodal dataset and does not support arbitrary images that you upload. We provide a few samples in the app for you to play with. If you want to try more style images, follow instructions in "Additional Data".

  • Download models interp_256.zip and upscale.zip(optional) and unzip into ./models/upgpt
  • Start the app by typing in terminal streamlit run app.py --server.fileWatcherType none
  • Click "Image Styles->Browse files" to select images from ./fashion. Then "select styles" and click "Show/Get Styles" to extract style images. The model is trained for pose transfer, hence a face style image is advised to produce good result.
  • Entering "style text" will override corresponding style images, therefore remove style text if you want to use style image.

Additional data

  1. Download and unzip deepfashion_inshop.zip into datasets/deepfashion_inshop.
  2. You can try more style images from the DeepFashion Multimodal dataset by downloading and unzip images.zip from DeepFashion Multimodal dataset. Use this inplace of ./fashion to select fashion images from. Also, run rm -r app_cache/styles && ln -s deepfashion_inshop/styles app_cache/styles to link to the full dataset style images.

Training

There are several configurations proposed in the paper but for simplicity we provide only one config (bounding box as RPM) that can perform both pose transfer and pose interpolation. If you want to compare our result (silhouette mask as RPM), we suggest you to download the generated images (see section "Paper's Result" above).

  1. Download and unzip deepfashion_inshop.zip into datasets/deepfashion_inshop.
  2. Download deepfashion_256_v2.ckpt and place it in models/first_stage_models/kl-f8-deepfashion
  3. Run train.sh, or

python main.py -t --base configs/deepfashion/bbox.yaml --gpus 0, --scale_lr False --num_nodes 1

Checkpoints and generated images will be saved in ./logs.

upgpt's People

Contributors

soon-yau avatar pesser avatar rromb avatar patrickvonplaten avatar owenvincent avatar apolinario avatar cpacker avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.