Git Product home page Git Product logo

mulan-code's Introduction

MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion

Stable-Diffusion SDXL GPT4 GPT4V Gemini-Pro LLaVA Generation

Sen Li, Ruochen Wang, Cho-Jui Hsieh, Minhao Cheng, Tianyi Zhou

ARC-AIGC Research Collaboration

HKUST, UCLA, PSU, UMD

Paper, Project website, Code

Main Framework Main Visualization

TODO

  • MuLan with SD v1.4
  • MuLan with SDXL

More visualization results

More results

Progressive multi-object diffusion

Installation

git clone https://github.com/measure-infinity/mulan-code
cd mulan-code
conda create -n mulan python=3.10 -y
conda activate mulan
pip install -r ./requirements.txt
pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
pip install -e git+https://github.com/openai/CLIP.git@main#egg=clip

Configuring LLaVA (default VLM in the code)

git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
pip install -e .

Demo

Please modify you own GPT-4 API key in query.py, which is utilized for planning during the generation process. We recommend GPT-4 for the planning which is the default model in the code.

SD-v1.4

Please download the weights of Stable Diffusion v1.4 here and put it into the folder sd-models.

To generate an image with a complex prompt, first cd scripts, and then run

from pipeline_sd import mulan_sd

mulan_sd(prompt="a black headphone is on the left of a green phone", seed=42, sd_model="../sd-models/sd-v1-4-full-ema.ckpt")

seed: Random seed, prompt: User prompt

The results will be saved in outputs by default. You can easily adjust the hyper-parameters of the backward guidance, weight (110. by default) and thresh (0.15 by default), to see how the results will change.

SDXL

Please download the weights of SDXL here and put it into the folder sd-models. Currently we use DDIM sampler for the generation instead of the original one. Please replace the corresponding config files in the downloaded models with the files in sdxl_configs.

Please uninstall the library diffusers if you have one in the current environment. The code contains the modified library diffusers.

To generate an image with a complex prompt, first cd scripts, and then run

from pipeline_sdxl import mulan_sdxl

mulan_sdxl(prompt="a black headphone is on the left of a green phone", seed=42)

seed: Random seed, propmt: User prompt

The results will be saved in sdxl_outputs by default.

Bibtex

@misc{li2024mulan,
    title={MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion},
    author={Li, Sen and Wang, Ruochen and Hsieh, Cho-jui and Cheng, Minhao and Zhou, Tianyi},
    publisher={arXiv:2402.12741},
    year={2024},
}

Acknowledgements

  1. Stable Diffusion
  2. Backward Guidance
  3. LLaVA

mulan-code's People

Contributors

measure-infinity avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.