Git Product home page Git Product logo

awesome-video-diffusion-models's Introduction

(Source: Make-A-Video, SimDA, PYoCo, SVD , Video LDM and Tune-A-Video)

  • [News] We are planning to update the survey soon to encompass the latest work. If you have any suggestions, please feel free to contact us.
  • [News] The Chinese translation is available on Zhihu. Special thanks to Dai-Wenxun for this.

Open-source Toolboxes and Foundation Models

Methods Task Github
VideoPoet T2V Generation & Editing -
Stable Video Diffusion T2V Generation Star
NeverEnds T2V Generation -
Pika T2V Generation -
EMU-Video T2V Generation -
GEN-2 T2V Generation & Editing -
ModelScope T2V Generation Star
ZeroScope T2V Generation -
T2V Synthesis Colab T2V Genetation Star
VideoCraft T2V Genetation & Editing Star
Diffusers (T2V synthesis) T2V Genetation -
AnimateDiff Personalized T2V Genetation Star
Text2Video-Zero T2V Genetation Star
HotShot-XL T2V Genetation Star
Genmo T2V Genetation -
Fliki T2V Generation -

Table of Contents

Video Generation

Data

Caption-level

Title arXiv Github WebSite Pub. & Date
CelebV-Text: A Large-Scale Facial Text-Video Dataset arXiv Star - CVPR, 2023
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation arXiv Star - May, 2023
VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation arXiv - - May, 2023
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions arXiv - - Nov, 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval arXiv - - ICCV, 2021
MSR-VTT: A Large Video Description Dataset for Bridging Video and Language arXiv - - CVPR, 2016

Category-level

Title arXiv Github WebSite Pub. & Date
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild arXiv - - Dec., 2012
First Order Motion Model for Image Animation arXiv - - May, 2023
Learning to Generate Time-Lapse Videos Using Multi-Stage Dynamic Generative Adversarial Networks arXiv - - CVPR,2018

Metric and BenchMark

Title arXiv Github WebSite Pub. & Date
FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation arXiv - - NeurIPS, 2023
CVPR 2023 Text Guided Video Editing Competition arXiv - - Oct., 2023
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models arXiv - - Oct., 2023
Measuring the Quality of Text-to-Video Model Outputs: Metrics and Dataset arXiv - - Sep., 2023

Text-to-Video Generation

Training-based

Title arXiv Github WebSite Pub. & Date
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos arXiv Star Website Dec, 2023
InstructVideo: Instructing Video Diffusion Models with Human Feedback arXiv Star Website Dec, 2023
VideoLCM: Video Latent Consistency Model arXiv - - Dec, 2023
Photorealistic Video Generation with Diffusion Models arXiv - - Dec, 2023
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation arXiv Star Website Dec, 2023
Delving Deep into Diffusion Transformers for Image and Video Generation arXiv - Website Dec, 2023
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter arXiv Star Website Nov, 2023
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation arXiv - Website Nov, 2023
ART•V: Auto-Regressive Text-to-Video Generation with Diffusion Models arXiv - Website Nov, 2023
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets arXiv Star Website Nov, 2023
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline arXiv - Website Nov, 2023
MoVideo: Motion-Aware Video Generation with Diffusion Models arXiv - Website Nov, 2023
Make Pixels Dance: High-Dynamic Video Generation arXiv - Website Nov, 2023
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning arXiv - Website Nov, 2023
Optimal Noise pursuit for Augmenting Text-to-Video Generation arXiv - - Nov, 2023
VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning arXiv - Website Nov, 2023
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation arXiv Star Website Oct, 2023
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction arXiv Star Website Oct, 2023
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors arXiv Star - Oct., 2023
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation arXiv Star Website Oct., 2023
DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model arXiv - - Oct, 2023
MotionDirector: Motion Customization of Text-to-Video Diffusion Models arXiv Star Website Oct, 2023
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning arXiv - Website Sep., 2023
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation arXiv Star Website Sep., 2023
LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models arXiv - Website Sep., 2023
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation arXiv - Website Sep., 2023
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation arXiv - - Sep., 2023
Text2Performer: Text-Driven Human Video Generation arXiv Star Website Apr., 2023
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning arXiv Star Website Jul., 2023
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models arXiv - Website Aug., 2023
SimDA: Simple Diffusion Adapter for Efficient Video Generation arXiv Star Website Aug., 2023
Dual-Stream Diffusion Net for Text-to-Video Generation arXiv - - Aug., 2023
ModelScope Text-to-Video Technical Report arXiv - Website Aug., 2023
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation arXiv Star - Jul., 2023
VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation arXiv - - May, 2023
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models arXiv - Website May, 2023
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models arXiv - Website -
Latent-Shift: Latent Diffusion with Temporal Shift arXiv - Website -
Probabilistic Adaptation of Text-to-Video Models arXiv - Website Jun., 2023
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation arXiv - Website Mar., 2023
ED-T2V: An Efficient Training Framework for Diffusion-based Text-to-Video Generation - - - IJCNN, 2023
MagicVideo: Efficient Video Generation With Latent Diffusion Models arXiv - Website -
Imagen Video: High Definition Video Generation With Diffusion Models arXiv - Website -
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation arXiv - Website -
Make-A-Video: Text-to-Video Generation without Text-Video Data arXiv - Website -
Latent Video Diffusion Models for High-Fidelity Video Generation With Arbitrary Lengths arXiv Star Website Nov., 2022
Video Diffusion Models arXiv - Website -

Training-free

Title arXiv Github WebSite Pub. & Date
FreeInit: Bridging Initialization Gap in Video Diffusion Models arXiv Star - Dec, 2023
MTVG : Multi-text Video Generation with Text-to-Video Models arXiv - Website Dec, 2023
F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis arXiv - - Nov, 2023
AdaDiff: Adaptive Step Selection for Fast Diffusion arXiv - - Nov, 2023
FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax arXiv Star Website Nov, 2023
🏀GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning arXiv - Website Nov, 2023
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling arXiv Star Website Oct, 2023
ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation arXiv Star Website Oct, 2023
LLM-grounded Video Diffusion Models arXiv - - Oct, 2023
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator arXiv Star - NeurIPS, 2023
DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis arXiv - - Aug, 2023
Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation arXiv Star - May, 2023
Text2video-Zero: Text-to-Image Diffusion Models Are Zero-Shot Video Generators arXiv Star Website Mar., 2023

Video Generation with other conditions

Pose-guided Video Generation

Title arXiv Github WebSite Pub. & Date
DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models arXiv - Website Dec., 2023
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model arXiv Star Website Nov., 2023
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation arXiv Star Website Nov., 2023
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer arXiv Star Website Nov., 2023
DisCo: Disentangled Control for Referring Human Dance Generation in Real World arXiv Star Website Jul., 2023
Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model arXiv - - Aug., 2023
DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion arXiv Star Website Apr., 2023
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos arXiv Star Website Apr., 2023

Motion-guided Video Generation

Title arXiv Github WebSite Pub. & Date
Customizing Motion in Text-to-Video Diffusion Models arXiv - Website Dec., 2023
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models arXiv Star Website Nov., 2023
Motion-Conditioned Diffusion Model for Controllable Video Synthesis arXiv - Website Apr., 2023
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory arXiv - - Aug., 2023

Sound-guided Video Generation

Title arXiv Github WebSite Pub. & Date
The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion arXiv - - ICCV, 2023
Generative Disco: Text-to-Video Generation for Music Visualization arXiv - - Apr., 2023
AADiff: Audio-Aligned Video Synthesis with Text-to-Image Diffusion arXiv - - CVPRW, 2023

Image-guided Video Generation

Title arXiv Github WebSite Pub. & Date
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models arXiv - Website Nov., 2023
DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance arXiv - Website Nov., 2023
LivePhoto: Real Image Animation with Text-guided Motion Control arXiv Star Website Nov., 2023
VideoBooth: Diffusion-based Video Generation with Image Prompts arXiv Star Website Nov., 2023
Decouple Content and Motion for Conditional Image-to-Video Generation arXiv - - Nov, 2023
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models arXiv - - Nov, 2023
Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image arXiv - - MM, 2023
Generative Image Dynamics arXiv - Website Sep., 2023
LaMD: Latent Motion Diffusion for Video Generation arXiv - - Apr., 2023
Conditional Image-to-Video Generation with Latent Flow Diffusion Models arXiv Star - CVPR 2023

Brain-guided Video Generation

Title arXiv Github WebSite Pub. & Date
Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity arXiv Star Website May, 2023

Depth-guided Video Generation

Title arXiv Github WebSite Pub. & Date
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation arXiv Star Website Jul., 2023
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance arXiv Star Website Jun., 2023

Multi-modal guided Video Generation

Title arXiv Github WebSite Pub. & Date
PEEKABOO: Interactive Video Generation via Masked-Diffusion arXiv - Website Dec., 2023
CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling arXiv - - Dec., 2023
Fine-grained Controllable Video Generation via Object Appearance and Context arXiv - Website Nov., 2023
GPT4Video: A Unified Multimodal Large Language Model for Instruction-Followed Understanding and Safety-Aware Generation arXiv - Website Nov., 2023
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving arXiv - Website Nov., 2023
SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models arXiv - Website Nov., 2023
VideoComposer: Compositional Video Synthesis with Motion Controllability arXiv Star Website Jun., 2023
NExT-GPT: Any-to-Any Multimodal LLM arXiv - - Sep, 2023
MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images arXiv - Website Jun, 2023
Any-to-Any Generation via Composable Diffusion arXiv Star Website May, 2023
Mm-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation arXiv Star - CVPR 2023

Unconditional Video Generation

U-Net based

Title arXiv Github WebSite Pub. & Date
Video Probabilistic Diffusion Models in Projected Latent Space arXiv Star Website CVPR 2023
VIDM: Video Implicit Diffusion Models arXiv Star Website AAAI 2023
GD-VDM: Generated Depth for better Diffusion-based Video Generation arXiv Star - Jun., 2023
LEO: Generative Latent Image Animator for Human Video Synthesis arXiv Star Website May., 2023

Transformer based

Title arXiv Github WebSite Pub. & Date
VDT: An Empirical Study on Video Diffusion with Transformers arXiv Star - May, 2023

Video Completion

Video Enhancement and Restoration

Title arXiv Github WebSite Pub. & Date
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution - - - WACW, 2023
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution arXiv Star Website Dec., 2023
AVID: Any-Length Video Inpainting with Diffusion Model arXiv Star Website Dec., 2023
Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution arXiv Star - CVPR 2023
LDMVFI: Video Frame Interpolation with Latent Diffusion Models arXiv - - Mar., 2023
CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming arXiv - - Nov., 2022
Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos arXiv - - May., 2023

Video Prediction

Title arXiv Github Website Pub. & Date
STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video Prediction arXiv Star - Dec, 2023
Video Diffusion Models with Local-Global Context Guidance arXiv Star - IJCAI, 2023
Seer: Language Instructed Video Prediction with Latent Diffusion Models arXiv - Website Mar., 2023
Diffusion Models for Video Prediction and Infilling arXiv Star Website TMLR 2022
McVd: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation arXiv Star Website NeurIPS 2022
Diffusion Probabilistic Modeling for Video Generation arXiv Star - Mar., 2022
Flexible Diffusion Modeling of Long Videos arXiv Star Website May, 2022
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models arXiv Star Website May, 2023

Video Editing

General Editing Model

Title arXiv Github Website Pub. Date
MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers arXiv - Website Dec, 2023
Neutral Editing Framework for Diffusion-based Video Editing arXiv - Website Dec, 2023
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence arXiv - Website Nov, 2023
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models arXiv Star Website Nov, 2023
Motion-Conditioned Image Animation for Video Editing arXiv - Website Nov, 2023
MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation arXiv - - Sep, 2023
MagicEdit: High-Fidelity and Temporally Coherent Video Editing arXiv - - Aug, 2023
Edit Temporal-Consistent Videos with Image Diffusion Model arXiv - - Aug, 2023
Structure and Content-Guided Video Synthesis With Diffusion Models arXiv - Website ICCV, 2023
Dreamix: Video Diffusion Models Are General Video Editors arXiv - Website Feb, 2023

Training-free Editing Model

Title arXiv Github Website Pub. Date
RealCraft: Attention Control as A Solution for Zero-shot Long Video Editing arXiv - - Dec, 2023
VidToMe: Video Token Merging for Zero-Shot Video Editing arXiv Star Website Dec, 2023
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing arXiv Star Website Dec, 2023
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators arXiv Star - Dec, 2023
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models arXiv Star Website Dec, 2023
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models arXiv - Website Nov., 2023
Highly Detailed and Temporal Consistent Video Stylization via Synchronized Multi-Frame Diffusion arXiv - - Nov., 2023
FastBlend: a Powerful Model-Free Toolkit Making Video Stylization Easier arXiv Star - Oct., 2023
LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation arXiv - - Nov., 2023
Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models arXiv - - Oct., 2023
LOVECon: Text-driven Training-Free Long Video Editing with ControlNet arXiv Star - Oct., 2023
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing arXiv - Website Oct., 2023
Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models arXiv - Website Oct., 2023
MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance arXiv - - Aug., 2023
EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints arXiv - - Aug., 2023
ControlVideo: Training-free Controllable Text-to-Video Generation arXiv Star - May, 2023
TokenFlow: Consistent Diffusion Features for Consistent Video Editing arXiv Star Website Jul., 2023
VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing arXiv - Website Jun., 2023
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation arXiv - Website Jun., 2023
Zero-Shot Video Editing Using Off-the-Shelf Image Diffusion Models arXiv Star Website Mar., 2023
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing arXiv Star Website Mar., 2023
Pix2video: Video Editing Using Image Diffusion arXiv - Website Mar., 2023
InFusion: Inject and Attention Fusion for Multi Concept Zero Shot Text based Video Editing arXiv - Website Aug., 2023
Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising arXiv Star Website May, 2023

One-shot Editing Model

Title arXiv Github Website Pub. & Date
MotionCrafter: One-Shot Motion Customization of Diffusion Models arXiv Star - Dec., 2022
DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing arXiv - Website Dec., 2022
MotionEditor: Editing Video Motion via Content-Aware Diffusion arXiv Star Website Nov., 2022
Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning arXiv - Website Nov., 2023
Cut-and-Paste: Subject-Driven Video Editing with Attention Control arXiv - - Nov, 2023
StableVideo: Text-driven Consistency-aware Diffusion Video Editing arXiv Star Website ICCV, 2023
Shape-aware Text-driven Layered Video Editing arXiv - - CVPR, 2023
SAVE: Spectral-Shift-Aware Adaptation of Image Diffusion Models for Text-guided Video Editing arXiv Star - May, 2023
Towards Consistent Video Editing with Text-to-Image Diffusion Models arXiv - - Mar., 2023
Edit-A-Video: Single Video Editing with Object-Aware Consistency arXiv - Website Mar., 2023
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation arXiv Star Website ICCV, 2023
ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing arXiv Star Website May, 2023
Video-P2P: Video Editing with Cross-attention Control arXiv Star Website Mar., 2023
SinFusion: Training Diffusion Models on a Single Image or Video arXiv Star Website Nov., 2022

Instruct-guided Video Editing

Title arXiv Github Website Pub. Date
Fairy: Fast Parallellized Instruction-Guided Video-to-Video Synthesis arXiv - Website Dec, 2023
Neural Video Fields Editing arXiv Star Website Dec, 2023
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models arXiv Star Website Nov, 2023
Consistent Video-to-Video Transfer Using Synthetic Dataset arXiv - - Nov., 2023
InstructVid2Vid: Controllable Video Editing with Natural Language Instructions arXiv - - May, 2023
Collaborative Score Distillation for Consistent Visual Synthesis arXiv - - July, 2023

Motion-guided Video Editing

Title arXiv Github Website Pub. Date
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation arXiv Star Website Nov, 2023
Drag-A-Video: Non-rigid Video Editing with Point-based Interaction arXiv - Website Nov, 2023
DragVideo: Interactive Drag-style Video Editing arXiv Star - Nov, 2023
VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet arXiv - Website July, 2023

Sound-guided Video Editing

Title arXiv Github Website Pub. Date
Speech Driven Video Editing via an Audio-Conditioned Diffusion Model arXiv - - May., 2023
Soundini: Sound-Guided Diffusion for Natural Video Editing arXiv Star Website Apr., 2023

Multi-modal Control Editing Model

Title arXiv Github Website Pub. Date
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion arXiv Star Website Dec, 2023
MagicStick: Controllable Video Editing via Control Handle Transformations arXiv Star Website Nov, 2023
SAVE: Protagonist Diversification with Structure Agnostic Video Editing arXiv - Website Nov, 2023
MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation arXiv - - May, 2023
CCEdit: Creative and Controllable Video Editing via Diffusion Models arXiv - - Sep, 2023
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts arXiv Star Website May, 2023

Domain-specific Editing Model

Title arXiv Github Website Pub. Date
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models arXiv - Website CVPR 2023
Multimodal-driven Talking Face Generation via a Unified Diffusion-based Generator arXiv - - May, 2023
DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis arXiv - - Aug, 2023
Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer arXiv Star - May, 2023
Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions arXiv Star - Jun, 2023
Video Colorization with Pre-trained Text-to-Image Diffusion Models arXiv Star Website Jun, 2023
Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding arXiv Star Website CVPR 2023

Non-diffusion Editing model

Title arXiv Github Website Pub. Date
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing arXiv - Website Oct., 2023
INVE: Interactive Neural Video Editing arXiv - Website Jul., 2023
Shape-Aware Text-Driven Layered Video Editing arXiv - Website Jan., 2023

Video Understanding

Title arXiv Github Website Pub. Date
Diffusion Reward: Learning Rewards via Conditional Video Diffusion arXiv Star Website Dec., 2023
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models arXiv - Website Nov., 2023
Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models arXiv Star - Nov., 2023
Flow-Guided Diffusion for Video Inpainting arXiv Star - Nov., 2023
Breathing Life Into Sketches Using Text-to-Video Priors arXiv - - Nov., 2023
Infusion: Internal Diffusion for Video Inpainting arXiv - - Nov., 2023
DiffusionVMR: Diffusion Model for Video Moment Retrieval arXiv - - Aug., 2023
DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation arXiv - - Aug., 2023
Unsupervised Video Anomaly Detection with Diffusion Models Conditioned on Compact Motion Representations arXiv - - ICIAP, 2023
Exploring Diffusion Models for Unsupervised Video Anomaly Detection arXiv - - Apr., 2023
Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection arXiv - - ICCV, 2023
Diffusion Action Segmentation arXiv - - Mar., 2023
DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion arXiv Star Website Mar., 2023
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model arXiv Star - ICCV, 2023
MomentDiff: Generative Video Moment Retrieval from Random to Real arXiv Star Website Jul., 2023
Refined Semantic Enhancement Towards Frequency Diffusion for Video Captioning arXiv - - Nov., 2022
A Generalist Framework for Panoptic Segmentation of Images and Videos arXiv Star Website Oct., 2022
DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models arXiv - - Jul., 2023
CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming arXiv - - Mar., 2023
Spatial-temporal Transformer-guided Diffusion based Data Augmentation for Efficient Skeleton-based Action Recognition arXiv - - Jul., 2023
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos arXiv Star - CVPR 2023

Contact

If you have any suggestions or find our work helpful, feel free to contact us

Homepage: Zhen Xing

Email: [email protected]

If you find our work useful, please consider citing it:

@article{vdmsurvey,
  title={A Survey on Video Diffusion Models},
  author={Zhen Xing and Qijun Feng and Haoran Chen and Qi Dai and Han Hu and Hang Xu and Zuxuan Wu and Yu-Gang Jiang}, 
  journal={arXiv preprint arXiv:2310.10647},
  year={2023}
}

awesome-video-diffusion-models's People

Contributors

chenhsing avatar tangfqj avatar ruizhaocv avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.