Prompt: "A cloudy blue sky AND A mountain in the horizon AND Cherry Blossoms in front of the mountain." (Samples generated by Stable-Diffusion using our compositional generation operator.)
This is the official codebase for Compositional Visual Generation with Composable Diffusion Models.
Compositional Visual Generation with Composable Diffusion Models
Nan Liu 1*,
Shuang Li 2*,
Yilun Du 2*,
Antonio Torralba 2,
Joshua B. Tenenbaum 2
* Equal Contributation
1UIUC, 2MIT CSAIL
ECCV 2022
- Now you can try to use compose Stable-Diffusion Model using our or to sample 512x512 images.
- The codebase is built upon GLIDE and Improved-Diffusion.
- This codebase provides both training and inference code.
- The codebase can be used to train text-conditioned diffusion model in a similar manner as GLIDE.
Run following to create a conda environment, and activate it:
conda create -n compose_diff python=3.8
conda activate compose_diff
To install this package, clone this repository and then run:
pip install -e .
The demo notebook shows how to compose natural language descriptions, and CLEVR objects for image generation.
Compose natural language descriptions using Stable-Diffusion:
python scripts/image_sample_compose_stable_diffusion.py --prompt "a camel | a forest" --scale 10 --steps 50
Compose natural language descriptions using pretrained GLIDE:
python scripts/image_sample_compose_glide.py --prompt "a camel | a forest" --scale 10 --steps 100
Compose objects:
MODEL_FLAGS="--image_size 128 --num_channels 192 --num_res_blocks 2 --learn_sigma False --use_scale_shift_norm False --num_classes 2 --dataset clevr_pos --raw_unet True"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule squaredcos_cap_v2 --rescale_learned_sigmas False --rescale_timesteps False"
python scripts/image_sample_compose_clevr_pos.py $MODEL_FLAGS $DIFFUSION_FLAGS --ckpt_path $YOUR_CHECKPOINT_PATH
Compose objects relational descriptions:
MODEL_FLAGS="--image_size 128 --num_channels 192 --num_res_blocks 2 --learn_sigma True --use_scale_shift_norm False --num_classes 4,3,9,3,3,7 --raw_unet True"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule squaredcos_cap_v2 --rescale_learned_sigmas False --rescale_timesteps False"
python scripts/image_sample_compose_clevr_rel.py $MODEL_FLAGS $DIFFUSION_FLAGS --ckpt_path $YOUR_CHECKPOINT_PATH
- We follow the same manner as Improved-Diffusion for training.
To train a model on CLEVR Objects, we need to decide some hyperparameters as follows:
MODEL_FLAGS="--image_size 128 --num_channels 192 --num_res_blocks 2 --learn_sigma True --use_scale_shift_norm False --num_classes 2 --raw_unet True"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule squaredcos_cap_v2 --rescale_learned_sigmas False --rescale_timesteps False"
TRAIN_FLAGS="--lr 1e-5 --batch_size 16 --use_kl False --schedule_sampler loss-second-moment --microbatch -1"
Then, we run training script as such:
python scripts/image_train.py --data_dir ./dataset/ --dataset clevr_pos $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAG
Similarly, we use following commands to train a model on CLEVR Relations:
MODEL_FLAGS="--image_size 128 --num_channels 192 --num_res_blocks 2 --learn_sigma True --use_scale_shift_norm False --num_classes 4,3,9,3,3,7 --raw_unet True"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule squaredcos_cap_v2 --rescale_learned_sigmas False --rescale_timesteps False"
TRAIN_FLAGS="--lr 1e-5 --batch_size 16 --use_kl False --schedule_sampler loss-second-moment --microbatch -1"
python scripts/image_train.py --data_dir ./dataset/ --dataset clevr_rel $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS
To train a text-conditioned GLIDE model, we also provide code for training on MS-COCO dataset.
Firstly, specify the image root directory path and corresponding json file for captions
in image_dataset file.
Then, we can use following command example to train a model on MS-COCO captions:
MODEL_FLAGS="--image_size 128 --num_channels 192 --num_res_blocks 2 --learn_sigma True --use_scale_shift_norm False"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule squaredcos_cap_v2 --rescale_learned_sigmas False --rescale_timesteps False"
TRAIN_FLAGS="--lr 1e-5 --batch_size 16 --use_kl False --schedule_sampler loss-second-moment --microbatch -1"
python scripts/image_train.py --data_dir ./dataset/ --dataset coco $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS
Training datasets for both CLEVR Objects and CLEVR Relations will be downloaded automatically when running the script above.
If you need to manually download, the datasets used for training our models can be found at:
Dataset | Link |
---|---|
CLEVR Objects | https://www.dropbox.com/s/5zj9ci24ofo949l/clevr_pos_data_128_30000.npz?dl=0 |
CLEVR Relations | https://www.dropbox.com/s/urd3zgimz72aofo/clevr_training_data_128.npz?dl=0 |
If you find our code useful for your research, please consider citing
@article{liu2022compositional,
title={Compositional Visual Generation with Composable Diffusion Models},
author={Liu, Nan and Li, Shuang and Du, Yilun and Torralba, Antonio and Tenenbaum, Joshua B},
journal={arXiv preprint arXiv:2206.01714},
year={2022}
}