Git Product home page Git Product logo

echoreel's Introduction

EchoReel: Enhancing Action Generation of Existing Video Diffusion Models

     

University of Electronic Science and Technology of China

An innovative method designed to augment the capabilities of existing video diffusion models that can:
1️⃣ utilize multiple reference videos to achieve a broader spectrum of action imitation and generate novel actions without fine-tuning;
2️⃣ distill effective and related visual motion features instead of replicating the referred content.

"Imitation is the sincerest form of flattery that mediocrity can pay to greatness." — Oscar Wilde

✌️ Results

input text Original VideoCrafter2 + EchoReel
"A man is studying in the library"
"A man is skiing"
"A man is running"
"Couple walking on the beach"
"A man is carving a stone statue"

📝 Changelog

  • [2024.4.21] Release pretrain weight
  • [2024.3.18] Release train and inference code

⏳ TODO

  • Release code of LVDM text-to-video with EchoReel
  • Release training code
  • Release pretrained weight
  • Release image-to-video VideoCrafter code with EchoReel

⚙️ Setup

Please prepare .json data in the following format:

[
	{
		"input_text": ...,
		"gt_video_path": ...,
		"reference_text": ...,
		"reference_video_path": ...
	},
    ...
]

Install Environment via Anaconda

conda create -n EchoReel python=3.10.13
conda activate EchoReel
pip install -r requirements.txt

💫 For Try

Please ensure the pretrained weights are downloaded from our Hugging Face repository and subsequently placed in the designated 'checkpoint' folder. To optimize functionality, it is strongly advised to download the WebVid .csv file into the specified 'dataset' directory, thereby enabling seamless automatic reference video selection.

mkdir checkpoint
cd checkpoint
wget https://huggingface.co/cscrisp/EchoReel/resolve/main/checkpoint/checkpoint.pt
cd ..
mkdir dataset
cd datset
wget wget http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_train.csv
cd ..
python gr.py

💫 For Train

% use original LVDM pretrain weight to initialize model
wget -O models/t2v/model.ckpt https://huggingface.co/Yingqing/LVDM/resolve/main/lvdm_short/t2v.ckpt
bash train_EchoReel.sh

💫 For Sample

bash sample_EchoReel.sh

🔮 Pipeline

😉 Citation

@article{Liu2024EchoReel,
      title={EchoReel: Enhancing Action Generation of Existing Video Diffusion Models}, 
      author={Jianzhi Liu, Junchen Zhu, Lianli Gao, Jingkuan Song},
      year={2024},
      eprint={2403.11535},
      archivePrefix={arXiv},
}

🤗 Acknowledgements

We built our code partially based on latent video diffusion models. Thanks for their wonderful work!

echoreel's People

Contributors

liujianzhi avatar

Stargazers

ryan avatar ~Cc avatar 爱可可-爱生活 avatar Wanglong Lu avatar 胡钧耀 avatar Cassie avatar Stéphane Monté avatar Feng Chen avatar wtchong avatar 冯祥卫 avatar  avatar  avatar kai wang avatar Yabo Zhang avatar Said avatar  avatar Shengming Yuan avatar Haonan Zhang avatar  avatar

Watchers

 avatar Kostas Georgiou avatar  avatar

echoreel's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.