Git Product home page Git Product logo

boxdiff's Introduction

BoxDiff 🎨 (ICCV 2023)

BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

Jinheng Xie1  Yuexiang Li2  Yawen Huang2  Haozhe Liu2,3  Wentian Zhang2 Yefeng Zheng2  Mike Zheng Shou1

1 National University of Singapore  2 Tencent Jarvis Lab  3 KAUST

arXiv

Setup

Note that we only test the code using PyTorch==1.12.0. You can build the environment via pip as follow:

pip3 install -r requirements.txt

To apply BoxDiff on GLIGEN pipeline, please install diffusers as follow:

git clone [email protected]:gligen/diffusers.git
pip3 install -e .

Usage

To add spatial control on the Stable Diffusion model, you can simply use run_sd_boxdiff.py. For example:

CUDA_VISIBLE_DEVICES=0 python3 run_sd_boxdiff.py --prompt "as the aurora lights up the sky, a herd of reindeer leisurely wanders on the grassy meadow, admiring the breathtaking view, a serene lake quietly reflects the magnificent display, and in the distance, a snow-capped mountain stands majestically, fantasy, 8k, highly detailed" --P 0.2 --L 1 --seeds [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,21,22,23,24,25,26,27,28,29,30] --token_indices [3,12,21,30,46] --bbox [[1,3,512,202],[75,344,421,495],[1,327,508,507],[2,217,507,341],[1,135,509,242]] --refine False

or another example:

CUDA_VISIBLE_DEVICES=0 python3 run_sd_boxdiff.py --prompt "A rabbit wearing sunglasses looks very proud"  --P 0.2 --L 1 --seeds [1,2,3,4,5,6,7,8,9] --token_indices [2,4] --bbox [[67,87,366,512],[66,130,364,262]]

Note that you can specify the token indices as the indices of words you want control in the text prompt and one token index has one corresponding conditoning box. P and L are hyper-parameters for the proposed constraints.

When --bbox is not specified, there is a interface to draw bounding boxes as conditions.

CUDA_VISIBLE_DEVICES=0 python3 run_sd_boxdiff.py --prompt "A rabbit wearing sunglasses looks very proud"  --P 0.2 --L 1 --seeds [1,2,3,4,5,6,7,8,9] --token_indices [2,4]

To add spatial control on the GLIGEN model, you can simply use run_gligen_boxdiff.py. For example:

CUDA_VISIBLE_DEVICES=0 python3 run_gligen_boxdiff.py --prompt "A rabbit wearing sunglasses looks very proud" --gligen_phrases ["a rabbit","sunglasses"] --P 0.2 --L 1 --seeds [1,2,3,4,5,6,7,8,9] --token_indices [2,4] --bbox [[67,87,366,512],[66,130,364,262]] --refine False

The direcory structure of synthetic results are as follows:

outputs/
|-- text prompt/
|   |-- 0.png 
|   |-- 0_canvas.png 
|   |-- 1.png
|   |-- 1_canvas.png 
|   |-- ...

Customize Your Layout

VisorGPT can customize layouts as spatial conditions for image synthesis using BoxDiff.

Citation

@article{xie2023boxdiff,
  title={BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion},
  author={Xie, Jinheng and Li, Yuexiang and Huang, Yawen and Liu, Haozhe and Zhang, Wentian and Zheng, Yefeng and Shou, Mike Zheng},
  journal={arXiv preprint arXiv:2307.10816},
  year={2023}
}

Acknowledgment - the code is highly based on the repository of diffusers, google, and yuval-alaluf.

boxdiff's People

Contributors

sierkinhane avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.