Git Product home page Git Product logo

evelynmitchell / alignprop Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mihirp1998/alignprop

0.0 1.0 0.0 3.94 MB

AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more sample and compute efficient than reinforcement learning methods (PPO) for finetuning Stable Diffusion

Home Page: https://align-prop.github.io/

License: MIT License

Python 100.00%

alignprop's Introduction

Aligning Text-to-Image Diffusion Models with Reward Backpropagation

AlignProp

arXiv Website

This is the official implementation of our paper Aligning Text-to-Image Diffusion Models with Reward Backpropagation by Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, and Katerina Fragkiadaki.

Abstract

Text-to-image diffusion models have recently emerged at the forefront of image generation, powered by very large-scale unsupervised or weakly supervised text-to-image training datasets. Due to the weakly supervised training, controlling their behavior in downstream tasks, such as maximizing human-perceived image quality, image-text alignment, or ethical image generation, is difficult. Recent works finetune diffusion models to downstream reward functions using vanilla reinforcement learning, notorious for the high variance of the gradient estimators. In this paper, we propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient through the denoising process. While naive implementation of such backpropagation would require prohibitive memory resources for storing the partial derivatives of modern text-to-image models, AlignProp finetunes low-rank adapter weight modules and uses gradient checkpointing, to render its memory usage viable. We test AlignProp in finetuning diffusion models to various objectives, such as image-text semantic alignment, aesthetics, compressibility and controllability of the number of objects present, as well as their combinations. We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler, making it a straightforward choice for optimizing diffusion models for differentiable reward functions of interest.

Code

Coming Soon

  • Model Checkpoint
  • Other Reward functions
  • Shift to Stable Diffusion 2.1

Installation

Create a conda environment with the following command:

conda create -n alignprop python=3.10
conda activate alignprop
pip install -r requirements.txt

Please use accelerate==0.17.0, other library dependancies might be flexible.

Training Code

Accelerate will automatically handle multi-GPU setting. The code can work on a single GPU, as we automatically handle gradient accumulation as per the available GPUs in the CUDA_VISIBLE_DEVICES environment variable. For our experiments, we used 4 A100s- 40GB RAM to run our code. If you are using a GPU with a smaller RAM, please edit the per_gpu_capacity variable accordingly

Aesthetic Reward model.

Currently we early stop the code to prevent overfitting, however feel free to play with the num_epochs variable as per your needs.

accelerate launch main.py --config config/align_prop.py:aesthetic

HPSv2 Reward model.

accelerate launch main.py --config config/align_prop.py:hps

Evaluation

Evaluates the model checkpoint, as per the resume_from variable in the config file. Evaluation includes calculating the reward and storing/uploading the images to local/wandb.

normal evaluation.

accelerate launch main.py --config config/align_prop.py:evaluate

with mixing.

Update the resume_from and resume_from_2 varaibles to mention the checkpoints to mix. Set resume_from_2 to stablediffusion to interpolate between resume_from and Stable diffusion weights. The coefficient of mixing is based on the variable mixing_coef_1 which can be edited in the config file.

accelerate launch main.py --config config/align_prop.py:evaluate_soup

Acknowledgement

Our codebase is directly built on top of DDPO. We would like to thank Kevin Black and team, for opensourcing their code.

Citation

If you find this work useful in your research, please cite:

@misc{prabhudesai2023aligning,
      title={Aligning Text-to-Image Diffusion Models with Reward Backpropagation}, 
      author={Mihir Prabhudesai and Anirudh Goyal and Deepak Pathak and Katerina Fragkiadaki},
      year={2023},
      eprint={2310.03739},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.