Git Product home page Git Product logo

cogvideo's Introduction

CogVideo

This is the official repo for the paper: CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers.

News! The demo for CogVideo is available!

News! The code and model for text-to-video generation is now available! Currently we only supports simplified Chinese input.

CogVideo_samples.mp4
@article{hong2022cogvideo,
  title={CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers},
  author={Hong, Wenyi and Ding, Ming and Zheng, Wendi and Liu, Xinghan and Tang, Jie},
  journal={arXiv preprint arXiv:2205.15868},
  year={2022}
}

Web Demo

The demo for CogVideo is at https://wudao.aminer.cn/cogvideo/, where you can get hands-on practice on text-to-video generation. The original input is in Chinese.

Generated Samples

Video samples generated by CogVideo. The actual text inputs are in Chinese. Each sample is a 4-second clip of 32 frames, and here we sample 9 frames uniformly for display purposes.

Intro images

More samples

CogVideo is able to generate relatively high-frame-rate videos. A 4-second clip of 32 frames is shown below.

High-frame-rate sample

Getting Started

Setup

  • Hardware: Linux servers with Nvidia A100s are recommended, but it is also okay to run the pretrained models with smaller --max-inference-batch-size and --batch-size or training smaller models on less powerful GPUs.
  • Environment: install dependencies via pip install -r requirements.txt.
  • LocalAttention: Make sure you have CUDA installed and compile the local attention kernel.
git clone https://github.com/Sleepychord/Image-Local-Attention
cd Image-Local-Attention && python setup.py install

Download

Our code will automatically download or detect the models into the path defined by environment variable SAT_HOME. You can also manually download CogVideo-Stage1 and CogVideo-Stage2 and place them under SAT_HOME (with folders named cogvideo-stage1 and cogvideo-stage2)

Text-to-Video Generation

./script/inference_cogvideo_pipeline.sh

Arguments useful in inference are mainly:

  • --input-source [path or "interactive"]. The path of the input file with one query per line. A CLI would be launched when using "interactive".
  • --output-path [path]. The folder containing the results.
  • --batch-size [int]. The number of samples will be generated per query.
  • --max-inference-batch-size [int]. Maximum batch size per forward. Reduce it if OOM.
  • --stage1-max-inference-batch-size [int] Maximum batch size per forward in Stage 1. Reduce it if OOM.
  • --both-stages. Run both stage1 and stage2 sequentially.
  • --use-guidance-stage1 Use classifier-free guidance in stage1, which is strongly suggested to get better results.

You'd better specify an environment variable SAT_HOME to specify the path to store the downloaded model.

Currently only Chinese input is supported.

cogvideo's People

Contributors

wenyihong avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.