Git Product home page Git Product logo

storydalle's Introduction

StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation

PyTorch code for the ECCV 2022 paper "StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation".

[Paper] [Model Card] [Spaces Demo] [Replicate Demo]

image

image

Training

Prepare Repository:

Download the PororoSV dataset and associated files from here (updated) and save it as ./data/pororo/.
Download the FlintstonesSV dataset and associated files from here and save it as ./data/flintstones
Download the DiDeMoSV dataset and associated files from here and save it as ./data/didemo

This repository contains separate folders for training StoryDALL-E based on minDALL-E and DALL-E Mega models i.e. the ./story_dalle/ and ./mega-story-dalle models respectively.

Training StoryDALL-E based on minDALL-E:

  1. To finetune the minDALL-E model for story continuation, first migrate to the corresponding folder:
    cd story-dalle
  2. Set the environment variables in train_story.sh to point to the right locations in your system. Specifically, change the $DATA_DIR, $OUTPUT_ROOT and $LOG_DIR if different from the default locations.
  3. Download the pretrained checkpoint from here and save it in ./1.3B
  4. Run the following command: bash train_story.sh <dataset_name>

Training StoryDALL-E based on DALL-E Mega:

  1. To finetune the DALL-E Mega model for story continuation, first migrate to the corresponding folder:
    cd mega-story-dalle
  2. Set the environment variables in train_story.sh to point to the right locations in your system. Specifically, change the $DATA_DIR, $OUTPUT_ROOT and $LOG_DIR if different from the default locations.
  3. Pretrained checkpoints for generative model and VQGAN detokenizer are automatically downloaded upon initialization. Download the pretrained weights for VQGAN tokenizer from here and place it in the same folder as VQGAN detokenizer.
  4. Run the following command: bash train_story.sh <dataset_name>

Inference

Pretrained checkpoints for minDALL-E based StoryDALL-E can be downloaded from here: PororoSV

For a demo of inference using cog, check out this repo.

Inferring from StoryDALL-E based on minDALL-E:

  1. To infer from the minDALL-E model for story continuation, first migrate to the corresponding folder:
    cd story-dalle
  2. Set the environment variables in infer_story.sh to point to the right locations in your system. Specifically, change the $DATA_DIR, $OUTPUT_ROOT and $MODEL_CKPT if different from the default locations. 3Run the following command: bash infer_story.sh <dataset_name>

Memory Requirements for Inference:

For double-precision inference, the StoryDALLE model requires nearly 40 GB of space. The memory requirements can be reduced to 20GB by performing mixed precision inference from the autoregressive decoder (included in codebase, see line 1095 in story-dalle/dalle/models/_init.py). Note that the VQGAN model needs to operate at full precision to retain high-quality of the generated images.

Acknowledgements

Thanks to the fantastic folks at Kakao Brain and HuggingFace for their work on open-sourced versions of min-DALLE and DALL-E Mega. Much of this codebase has been adapted from here and here.

storydalle's People

Contributors

adymaharana avatar daanelson avatar

Stargazers

vinit kumar pandey avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.