Unofficial implementation of 2D ProlificDreamer

This is a third-party implementation of the 2D demos in the paper: ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation.

This code does NOT serve as a faithful re-implementation, and please feel free to raise issues for potential bugs.

Feel free to try the colab notebook https://github.com/yuanzhi-zhu/prolific_dreamer2d/blob/main/prolific_dreamer_2D.ipynb

Progress of generation with VSD sampling (1 particle, CFG=7.5)

Final results of 16 particles (VSD)

Comparison between SDS, VSD and Text2Image with different CFG

TODO List

SGD optimizer does not work for sds
make mlp particle work
make DeepFloyd IF (SDS & VSD) work

Commands to Reproduce Results

VSD command line

python prolific_dreamer2d.py \
        --num_steps 500 --log_steps 50 \
        --lr 0.03 --phi_lr 0.0001 --use_t_phi true \
        --model_path 'runwayml/stable-diffusion-v1-5' \
        --loss_weight '1m_alphas_cumprod' \
        --t_schedule random --generation_mode 'vsd' \
        --phi_model 'lora' --lora_scale 1. \
        --prompt "a photograph of an astronaut riding a horse" \
        --height 512 --width 512 --batch_size 1 --guidance_scale 7.5 \
        --log_progress true --save_x0 true --save_phi_model true \

VSD command line multiple particles

python prolific_dreamer2d.py \
        --num_steps 1500 --log_steps 50 \
        --seed 1024 --lr 0.03 --phi_lr 0.0001 --use_t_phi true \
        --model_path 'stabilityai/stable-diffusion-2-1-base' \
        --loss_weight_type '1m_alphas_cumprod' --t_schedule 'random' \
        --generation_mode 'vsd' \
        --phi_model 'lora' --lora_scale 1. --lora_vprediction false \
        --prompt "a photograph of an astronaut riding a horse" \
        --height 512 --width 512 --batch_size 16 --guidance_scale 7.5 \
        --particle_num_vsd 2 --particle_num_phi 2 \
        --log_progress false --save_x0 false --save_phi_model true

SDS command line

python prolific_dreamer2d.py \
        --num_steps 500 --log_steps 50 \
        --lr 0.03 --phi_lr 0.0001 --use_t_phi true \
        --model_path 'runwayml/stable-diffusion-v1-5' \
        --loss_weight '1m_alphas_cumprod' \
        --t_schedule random --generation_mode 'sds' \
        --prompt "a photograph of an astronaut riding a horse" \
        --height 512 --width 512 --batch_size 1 --guidance_scale 7.5 \
        --log_progress true --save_x0 true \

Short Intro to Important Parameters

generation_mode: 'sds' or 'vsd' or 't2i' (just text-to-image sampling)

guidance_scale: CFG scale

phi_model: a lora model or simple unet model to track the (particles) distribution

num_steps: overall sampling steps

use_t_phi: use different t to train phi model

loss_weight: which weight to use for SDS/VSD loss, see https://github.com/yuanzhi-zhu/prolific_dreamer2d/blob/main/model_utils.py#L94

t_schedule: generate a sequence of timesteps, see https://github.com/yuanzhi-zhu/prolific_dreamer2d/blob/main/model_utils.py#L17; by default we use 'random', to use the 2 stage time schedule $U[0.02,0.98] \rightarrow U[0.5,0.98]$ as in the paper, we can use 't_stages2'

lora_vprediction: use v-prediction for lora model training

batch_size: batch_size or total particle numbers

particle_num_vsd: batch size (particle numbers) for VSD training

particle_num_phi: number of particles to train phi model

rgb_as_latents: initialize particles in latent space

use_mlp_particle: use siren mlp as the 2d representation of image

half_inference: half-precision inference, requires under 6 GB GPU memory, is faster but has worse performance on vsd

deepshwang / prolific_dreamer2d Goto Github PK

prolific_dreamer2d's Introduction

Unofficial implementation of 2D ProlificDreamer

TODO List

Commands to Reproduce Results

VSD command line

VSD command line multiple particles

SDS command line

Short Intro to Important Parameters

prolific_dreamer2d's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent