The dimba from zhikanggfu

🚀 Dimba: Transformer-Mamba Diffusion Models

This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper Transformer-Mamba Diffusion Models. You can find more visualizations on our project page.

TL; DR: Dimba is a new text-to-image diffusion model that employs a hybrid architecture combining Transformer and Mamba elements, thus capitalizing on the advantages of both architectural paradigms.

1. Environments

Python 3.10
- conda create -n your_env_name python=3.10
Requirements file
- pip install -r requirements.txt
Install causal_conv1d and mamba
- pip install -e causal_conv1d
- pip install -e mamba

2. Download Models

Models reported in paper can be directly dounloaded as follows （Urgent upload in progress）:

Model	#Params	url
t5	4.3B	huggingface
vae	80M	huggingface
Dimba-L-512	0.9B	huggingface
Dimba-L-1024	0.9B	-
Dimba-L-2048	0.9B	-
Dimba-G-512	1.8B	-
Dimba-G-1024	1.8B	-

The datasets used to quality tuning for aesthetic performance enhancement can be download as:

Dataset	Size	url
Quality tuning	600k	huggingface

3. Inference

We include a inference script which samples images from a Dimba model accroding to textual prompts. It supports DDIM and dpm-solver sampling algorithm. You can run the scripts as:

python scripts/inference.py \
--image_size 512 \
--model_version dimba-l \
--model_path /path/to/model \
--txt_file asset/examples.txt \
--save_path /path/to/save/results

4. Training

We provide a training script for Dimba in scripts/train.py. This script can be used to fine-tuning with different settings. You can run the scripts as:

python -m torch.distributed.launch --nnodes=4 --nproc_per_node=8 \
    --master_port=1234 scripts/train.py \
    configs/dimba_xl2_img512.py \
    --work-dir outputs

5. BibTeX

@misc{fei2024dimba,
    title={Dimba: Transformer-Mamba Diffusion Models}, 
    author={Zhengcong Fei and Mingyuan Fan and Changqian Yu and Debang Li and Youqiang Zhang and Junshi Huang},
    year={2024},
    eprint={2406.01159},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

6. Acknowledgments

The codebase is based on the awesome PixArt, Vim, and DiS repos.

The Dimba paper is polished with ChatGPT using prompt.

zhikanggfu / dimba Goto Github PK

dimba's Introduction

🚀 Dimba: Transformer-Mamba Diffusion Models

1. Environments

2. Download Models

3. Inference

4. Training

5. BibTeX

6. Acknowledgments

dimba's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent