Git Product home page Git Product logo

var's Introduction

VAR: a new visual generation method elevates GPT-style models beyond diffusion🚀 & Scaling laws observed📈

demo platform  arXiv  huggingface weights  SOTA

This is the official PyTorch implementation of Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction.


🕹️ Try and Play with VAR!

We provide a demo website for you to play with VAR models and generate images interactively. Enjoy the fun of visual autoregressive modeling!

We also provide demo_sample.ipynb for you to see more technical details about VAR.

What's New?

🔥 Introducing VAR: a new paradigm in autoregressive visual generation✨:

Visual Autoregressive Modeling (VAR) redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction".

🔥 For the first time, GPT-style autoregressive models surpass diffusion models🚀:

🔥 Discovering power-law Scaling Laws in VAR transformers📈:

🔥 Zero-shot generalizability🛠️:

For a deep dive into our analyses, discussions, and evaluations, check out our paper.

VAR zoo

We provide VAR models for you to play with, which are on or can be downloaded from the following links:

model reso. FID rel. cost #params HF weights🤗
VAR-d16 256 3.55 0.4 310M var_d16.pth
VAR-d20 256 2.95 0.5 600M var_d20.pth
VAR-d24 256 2.33 0.6 1.0B var_d24.pth
VAR-d30 256 1.97 1 2.0B var_d30.pth
VAR-d30-re 256 1.80 1 2.0B var_d30.pth

You can load these models to generate images via the codes in demo_sample.ipynb. Note: you need to download vae_ch160v4096z32.pth first.

Installation

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If our work assists your research, feel free to give us a star ⭐ or cite us using:

@Article{VAR,
      title={Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction}, 
      author={Keyu Tian and Yi Jiang and Zehuan Yuan and Bingyue Peng and Liwei Wang},
      year={2024},
      eprint={2404.02905},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

var's People

Contributors

keyu-tian avatar nielsrogge avatar ifighting avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.