Git Product home page Git Product logo

dimba's Introduction

🚀 Dimba: Transformer-Mamba Diffusion Models


This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper Transformer-Mamba Diffusion Models. You can find more visualizations on our project page.

TL; DR: Dimba is a new text-to-image diffusion model that employs a hybrid architecture combining Transformer and Mamba elements, thus capitalizing on the advantages of both architectural paradigms.


some generated cases.


1. Environments

  • Python 3.10

    • conda create -n your_env_name python=3.10
  • Requirements file

    • pip install -r requirements.txt
  • Install causal_conv1d and mamba

    • pip install -e causal_conv1d
    • pip install -e mamba

2. Download Models

Models reported in paper can be directly dounloaded as follows (Urgent upload in progress):

Model #Params url
t5 4.3B huggingface
vae 80M huggingface
Dimba-L-512 0.9B huggingface
Dimba-L-1024 0.9B -
Dimba-L-2048 0.9B -
Dimba-G-512 1.8B -
Dimba-G-1024 1.8B -

The datasets used to quality tuning for aesthetic performance enhancement can be download as:

Dataset Size url
Quality tuning 600k huggingface

3. Inference

We include a inference script which samples images from a Dimba model accroding to textual prompts. It supports DDIM and dpm-solver sampling algorithm. You can run the scripts as:

python scripts/inference.py \
--image_size 512 \
--model_version dimba-l \
--model_path /path/to/model \
--txt_file asset/examples.txt \
--save_path /path/to/save/results

4. Training

We provide a training script for Dimba in scripts/train.py. This script can be used to fine-tuning with different settings. You can run the scripts as:

python -m torch.distributed.launch --nnodes=4 --nproc_per_node=8 \
    --master_port=1234 scripts/train.py \
    configs/dimba_xl2_img512.py \
    --work-dir outputs

5. BibTeX

@misc{fei2024dimba,
    title={Dimba: Transformer-Mamba Diffusion Models}, 
    author={Zhengcong Fei and Mingyuan Fan and Changqian Yu and Debang Li and Youqiang Zhang and Junshi Huang},
    year={2024},
    eprint={2406.01159},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

6. Acknowledgments

The codebase is based on the awesome PixArt, Vim, and DiS repos.

The Dimba paper is polished with ChatGPT using prompt.

dimba's People

Contributors

feizc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.