Git Product home page Git Product logo

interactdiffusion's Introduction

InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model

Jiun Tian Hoe, Xudong Jiang, Chee Seng Chan, Yap Peng Tan, Weipeng Hu

Project Page | Paper | WebUI | Demo | Video | Diffuser | Colab

Paper Page Views Count Hugging Face Open In Colab

Teaser figure

  • Existing methods lack ability to control the interactions between objects in the generated content.
  • We propose a pluggable interaction control model, called InteractDiffusion that extends existing pre-trained T2I diffusion models to enable them being better conditioned on interactions.

News

  • [2024.3.13] Diffusers code is available at here.
  • [2024.3.8] Demo is available at Huggingface Spaces.
  • [2024.3.6] Code is released.
  • [2024.2.27] InteractionDiffusion paper is accepted at CVPR 2024.
  • [2023.12.12] InteractionDiffusion paper is released. WebUI of InteractDiffusion is available as alpha version.

Results

Model Interaction Controllability FID KID
Tiny Large
v1.0 29.53 31.56 18.69 0.00676
v1.1 30.20 31.96 17.90 0.00635
v1.2 30.73 33.10 17.32 0.00585

Interaction Controllability is measured using FGAHOI detection score. In this table, we measure the Full subset in Default setting on Swin-Tiny and Swin-Large backbone. More details on the protocol is in the paper.

Download InteractDiffusion models

We provide three checkpoints with different training strategies.

Version Dataset SD Download
v1.0 HICO-DET v1.4 HF Hub
v1.1 HICO-DET v1.5 HF Hub
v1.2 HICO-DET + VisualGenome v1.5 HF Hub

Note that the experimental results in our paper is referring to v1.0.

  • v1.0 is based on Stable Diffusion v1.4 and GLIGEN. We train at batch size of 16 for 250k steps on HICO-DET. Our paper is based on this.
  • v1.1 is based on Stable Diffusion v1.5 and GLIGEN. We train at batch size of 32 for 250k steps on HICO-DET.
  • v1.1 is based on InteractDiffusion v1.1. We train further at batch size of 32 for 172.5k steps on HICO-DET and VisualGenome.

Extension for AutomaticA111's Stable Diffusion WebUI

We develop an AutomaticA111's Stable Diffuion WebUI extension to allow the use of InteractDiffusion over existing SD models. Check out the plugin at sd-webui-interactdiffusion. Note that it is still on alpha version.

Gallery

Some examples generated with InteractDiffusion, together with other DreamBooth and LoRA models.

       
image (7) image (5) image (6) image (4)
cuteyukimix_1 cuteyukimix_7 darksushimix_1 toonyou_6
image (8) cuteyukimix_4 darksushimix_5 rcnzcartoon_1

Diffusers

from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained(
    "interactdiffusion/diffusers-v1-2",
    trust_remote_code=True,
    variant="fp16", torch_dtype=torch.float16
)
pipeline = pipeline.to("cuda")

images = pipeline(
    prompt="a person is feeding a cat",
    interactdiffusion_subject_phrases=["person"],
    interactdiffusion_object_phrases=["cat"],
    interactdiffusion_action_phrases=["feeding"],
    interactdiffusion_subject_boxes=[[0.0332, 0.1660, 0.3359, 0.7305]],
    interactdiffusion_object_boxes=[[0.2891, 0.4766, 0.6680, 0.7930]],
    interactdiffusion_scheduled_sampling_beta=1,
    output_type="pil",
    num_inference_steps=50,
    ).images

images[0].save('out.jpg')

Reproduce & Evaluate

  1. Change ckpt.pth in interence_batch.py to selected checkpoint.

  2. Made inference on InteractDiffusion to synthesis the test set of HICO-DET based on the ground truth.

    python inference_batch.py --batch_size 1 --folder generated_output --seed 489 --scheduled-sampling 1.0 --half
  3. Setup FGAHOI at ../FGAHOI. See FGAHOI repo on how to setup FGAHOI and also HICO-DET dataset in data/hico_20160224_det.

  4. Prepare for evaluate on FGAHOI. See id_prepare_inference.ipynb

  5. Evaluate on FGAHOI.

    python main.py --backbone swin_tiny --dataset_file hico --resume weights/FGAHOI_Tiny.pth --num_verb_classes 117 --num_obj_classes 80 --output_dir logs  --merge --hierarchical_merge --task_merge --eval --hoi_path data/id_generated_output --pretrain_model_path "" --output_dir logs/id-generated-output-t
  6. Evaluate for FID and KID. We recommend to resize hico_det dataset to 512x512 before perform image quality evaluation, for a fair comparison. We use torch-fidelity.

    fidelity --gpu 0 --fid --isc --kid --input2 ~/data/hico_det_test_resize  --input1 ~/FGAHOI/data/data/id_generated_output/images/test2015
  7. This should provide a brief overview of how the evaluation process works.

Training

  1. Prepare the necessary dataset and pretrained models, see DATA

  2. Run the following command:

    CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 main.py --yaml_file configs/hoi_hico_text.yaml --ckpt <existing_gligen_checkpoint> --name test --batch_size=4 --gradient_accumulation_step 2 --total_iters 500000 --amp true --disable_inference_in_training true --official_ckpt_name <existing SD v1.4/v1.5 checkpoint>

TODO

  • Code Release
  • HuggingFace demo
  • WebUI extension
  • Diffuser

Citation

@inproceedings{hoe2023interactdiffusion,
      title={InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models}, 
      author={Jiun Tian Hoe and Xudong Jiang and Chee Seng Chan and Yap-Peng Tan and Weipeng Hu},
      year={2024},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
}

Acknowledgement

This work is developed based on the codebase of GLIGEN and LDM.

interactdiffusion's People

Contributors

jiuntian avatar cs-chan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.