Git Product home page Git Product logo

nashihikari / ctrl-adapter Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hl-hanlin/ctrl-adapter

0.0 0.0 0.0 67.87 MB

Official implementation of Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Home Page: https://ctrl-adapter.github.io/

License: Apache License 2.0

Shell 2.08% Ruby 0.05% C++ 1.20% Python 74.90% Java 13.75% Swift 7.10% CMake 0.80% Dockerfile 0.11%

ctrl-adapter's Introduction

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Official implementation of Ctrl-Adapter, an efficient and versatile framework that adds diverse controls to any image/video diffusion models by adapting pretrained ControlNets.

arXiv projectpage checkpoints

Han Lin, Jaemin Cho, Abhay Zala, Mohit Bansal



CTRL-Adapter is an efficient and versatile framework for adding diverse spatial controls to any image or video diffusion model. It supports a variety of useful applications, including video control, video control with multiple conditions, video control with sparse frame conditions, image control, zero-shot transfer to unseen conditions, and video editing.

๐Ÿ”ฅ News

  • Apr. 30, 2024. Training code released now! It's time to train Ctrl-Adapter on your desired backbone! ๐Ÿš€๐Ÿš€
  • Apr. 29, 2024. SDXL, I2VGen-XL, and SVD inference code and checkpoints are all released!

๐Ÿ”ง Setup

Environment Setup

If you only need to perform inference with our code, please install from requirements_inference.txt. To make our codebase easy to use, the primary libraries that need to be installed are Torch, Diffusers, and Transformers. Specific versions of these libraries are not required; the default versions should work fine :)

If you are planning to conduct training, please install from requirements_train.txt instead, which contains more dependent libraries needed.

conda create -n ctrl-adapter python==3.10
conda activate ctrl-adapter
pip install -r requirements_inference.txt # install from this if you only need to perform inference
pip install -r requirements_train.txt # install from this if you plan to do some training

Here we list several questions that we believe important when you start using this

๐Ÿ”ฎ Inference

We provde model checkpoints and inference scripts for Ctrl-Adapter trained on SDXL, I2VGen-XL, and SVD. All inference scripts are put under ./inference_scripts.

๐Ÿ“Œ Notice Before You Begin

Please note that there is usually no single model that excels at generating images/videos for all motion styles across various control conditions.

Different image/video generation backbones may perform better with specific types of motion. For instance, we have observed that SVD excels at slide motions, while it generally performs worse than I2VGen-XL with complex motions (this is consistent wtih the findings in DynamiCrafter). Additionally, using different control conditions can lead to significantly different results in the generated images/videos, and some control conditions may be more informative than others for certain types of motion.

๐Ÿ“Œ Inference Data Structure

We put some sample images/frames for inference under the folder ./assets/evaluation. You can add your custom examples following the same file structure illustrated below.

For model inference, we support two options:

  • If you already have condition image/frames extracted from some image/video, you can use inference (w/ extracted condition).
./assets/evaluation/images
    โ”œโ”€โ”€ depth
    โ”‚   โ”œโ”€โ”€ anime_corgi.png
    โ”œโ”€โ”€ raw_input
    โ”‚   โ”œโ”€โ”€ anime_corgi.png
    โ”œโ”€โ”€ captions.json

./assets/evaluation/frames
    โ”œโ”€โ”€ depth
    โ”‚   โ”œโ”€โ”€ newspaper_cat
    โ”‚   โ”‚   โ”œโ”€โ”€ 00000.png
    โ”‚   โ”‚   โ”œโ”€โ”€ 00001.png
    โ”‚   โ”‚   ...
    โ”‚   โ”‚   โ”œโ”€โ”€ 00015.png
    โ”œโ”€โ”€ raw_input
    โ”‚   โ”œโ”€โ”€ newspaper_cat
    โ”‚   โ”‚   โ”œโ”€โ”€ 00000.png # only the 1st frame is needed for I2V models
    โ”œโ”€โ”€ captions.json
  • If you haven't extracted control conditions and only have the raw image/frames, you can use inference (w/o extracted condition). In this way, our code can automatically extract the control conditions from the input image/frames and then generate corresponding image/video.
./assets/evaluation/images
    โ”œโ”€โ”€ raw_input
    โ”‚   โ”œโ”€โ”€ anime_corgi.png
    โ”œโ”€โ”€ captions.json

./assets/evaluation/frames
    โ”œโ”€โ”€ raw_input
    โ”‚   โ”œโ”€โ”€ newspaper_cat
    โ”‚   โ”‚   โ”œโ”€โ”€ 00000.png
    โ”‚   โ”‚   โ”œโ”€โ”€ 00001.png
    โ”‚   โ”‚   ...
    โ”‚   โ”‚   โ”œโ”€โ”€ 00015.png
    โ”œโ”€โ”€ captions.json

๐Ÿ“Œ Run Inference Scripts

Here is a sample command to run inference on SDXL with depth map as control (w/ extracted condition).

sh inference_scripts/sdxl/sdxl_inference_depth.sh

โš ๏ธ --control_guidance_end: this is the most important parameter that balances generated image/video quality with control strength. If you notice the generated image/video does not follow the spatial control well, you can increase this value; and if you notice the generated image/video quality is not good because the spatial control is too strong, you can decrease this value. Detailed discussion of control strength via this parameter is shown in our paper.

We list the inference scripts for different tasks mentioned in our paper as follows โฌ‡๏ธ

Controllable Image Generation



SDXL

Control Conditions Checkpoints Inference (w/ extracted condition) Inference (w/o extracted condition)
Depth Map HF link command command
Canny Edge HF link command command
Soft Edge HF link command command
Normal Map HF link command command
Segmentation HF link command command
Scribble HF link command command
Lineart HF link command command

Controllable Video Generation



I2VGen-XL

Control Conditions Checkpoints Inference (w/ extracted condition) Inference (w/o extracted condition)
Depth Map HF link command command
Canny Edge HF link command command
Soft Edge HF link command command

SVD

Control Conditions Checkpoints Inference (w/ extracted condition) Inference (w/o extracted condition)
Depth Map HF link command command
Canny Edge HF link command command
Soft Edge HF link command command

Video Generation with Multi-Condition Control



We currently implemented multi-condition control on I2VGen-XL. The following checkpoint are trained on 7 control conditions, including depth, canny, normal, softedge, segmentation, lineart, and openpose. Here are the sample inference scripts that uses depth, canny, segmentation, and openpose as control conditions.

Adapter Checkpoint Router Checkpoint Inference (w/ extracted condition) Inference (w/o extracted condition)
HF link HF link command command

Video Generation with Sparse Control



Here we provide a sample inference script that uses user scribbles as condition, and 4 out of 16 frames for sparse control.

Control Conditions Checkpoint Inference (w/ extracted condition)
Scribbles HF link command

๐Ÿš… How To Train

๐ŸŽ‰ To make our method reproducible and adaptable to new backbones, we have released all of our training code :)

You can find detailed training guideline for Ctrl-Adapter here!

๐Ÿ“ TODO List

  • Release environment setup, inference code, and model checkpoints.
  • Release training code.
  • Training guideline to adapt our Ctrl-Adapter to new image/video diffusion models.
  • Ctrl-Adapter + DiT-based image/video generation backbones. (WIP)
  • Release evaluation code.

๐Ÿ’— Please let us know in the issues or PRs if you're interested in any relevant backbones or down-stream tasks that can be implemented by our Ctrl-Adapter framework! Welcome to collaborate and contribute!

๐Ÿ“š BibTeX

๐ŸŒŸ If you find our project useful in your research or application development, citing our paper would be the best support for us!

@misc{lin2024ctrladapter,
      title={Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model}, 
      author={Han Lin and Jaemin Cho and Abhay Zala and Mohit Bansal},
      year={2024},
      eprint={2404.09967},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

๐Ÿ™ Acknowledgements

The development of Ctrl-Adapter has been greatly inspired by the following amazing works and teams:

We hope that releasing this model/codebase helps the community to continue pushing these creative tools forward in an open and responsible way.

ctrl-adapter's People

Contributors

hl-hanlin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.