guoyww / animatediff Goto Github PK

Official implementation of AnimateDiff.

Home Page: https://animatediff.github.io

License: Apache License 2.0

Python 100.00%

animatediff's Introduction

AnimateDiff

This repository is the official implementation of AnimateDiff [ICLR2024 Spotlight]. It is a plug-and-play module turning most community text-to-image models into animation generators, without the need of additional training.

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
Yuwei Guo, Ceyuan Yang✝, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, Bo Dai (✝Corresponding Author)

Note: The main branch is for Stable Diffusion V1.5; for Stable Diffusion XL, please refer sdxl-beta branch.

Quick Demos

More results can be found in the Gallery. Some of them are contributed by the community.

Model：ToonYou

Model：Realistic Vision V2.0

Quick Start

Note: AnimateDiff is also offically supported by Diffusers. Visit AnimateDiff Diffusers Tutorial for more details. Following instructions is for working with this repository.

Note: For all scripts, checkpoint downloading will be automatically handled, so the script running may take longer time when first executed.

1. Setup repository and environment

git clone https://github.com/guoyww/AnimateDiff.git
cd AnimateDiff

pip install -r requirements.txt

2. Launch the sampling script!

The generated samples can be found in samples/ folder.

2.1 Generate animations with comunity models

python -m scripts.animate --config configs/prompts/1_animate/1_1_animate_RealisticVision.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_2_animate_FilmVelvia.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_3_animate_ToonYou.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_4_animate_MajicMix.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_5_animate_RcnzCartoon.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_6_animate_Lyriel.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_7_animate_Tusun.yaml

2.2 Generate animation with MotionLoRA control

python -m scripts.animate --config configs/prompts/2_motionlora/2_motionlora_RealisticVision.yaml

2.3 More control with SparseCtrl RGB and sketch

python -m scripts.animate --config configs/prompts/3_sparsectrl/3_1_sparsectrl_i2v.yaml
python -m scripts.animate --config configs/prompts/3_sparsectrl/3_2_sparsectrl_rgb_RealisticVision.yaml
python -m scripts.animate --config configs/prompts/3_sparsectrl/3_3_sparsectrl_sketch_RealisticVision.yaml

2.4 Gradio app

We created a Gradio demo to make AnimateDiff easier to use. By default, the demo will run at localhost:7860.

python -u app.py

Technical Explanation

AnimateDiff

AnimateDiff aims to learn transferable motion priors that can be applied to other variants of Stable Diffusion family. To this end, we design the following training pipeline consisting of three stages.

In 1. Alleviate Negative Effects stage, we train the domain adapter, e.g., v3_sd15_adapter.ckpt, to fit defective visual aritfacts (e.g., watermarks) in the training dataset. This can also benefit the distangled learning of motion and spatial appearance. By default, the adapter can be removed at inference. It can also be integrated into the model and its effects can be adjusted by a lora scaler.
In 2. Learn Motion Priors stage, we train the motion module, e.g., v3_sd15_mm.ckpt, to learn the real-world motion patterns from videos.
In 3. (optional) Adapt to New Patterns stage, we train MotionLoRA, e.g., v2_lora_ZoomIn.ckpt, to efficiently adapt motion module for specific motion patterns (camera zooming, rolling, etc.).

SparseCtrl

SparseCtrl aims to add more control to text-to-video models by adopting some sparse inputs (e.g., few RGB images or sketch inputs). Its technicall details can be found in the following paper:

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
Yuwei Guo, Ceyuan Yang✝, Anyi Rao, Maneesh Agrawala, Dahua Lin, Bo Dai (✝Corresponding Author)

Model Versions

AnimateDiff v3 and SparseCtrl (2023.12)

In this version, we use Domain Adapter LoRA for image model finetuning, which provides more flexiblity at inference. We also implement two (RGB image/scribble) SparseCtrl encoders, which can take abitary number of condition maps to control the animation contents.

AnimateDiff v3 Model Zoo

Name	HuggingFace	Type	Storage	Description
`v3_adapter_sd_v15.ckpt`	Link	Domain Adapter	97.4 MB
`v3_sd15_mm.ckpt.ckpt`	Link	Motion Module	1.56 GB
`v3_sd15_sparsectrl_scribble.ckpt`	Link	SparseCtrl Encoder	1.86 GB	scribble condition
`v3_sd15_sparsectrl_rgb.ckpt`	Link	SparseCtrl Encoder	1.85 GB	RGB image condition

Limitations

Small fickering is noticable;
To stay compatible with comunity models, there is no specific optimizations for general T2V, leading to limited visual quality under this setting;
(Style Alignment) For usage such as image animation/interpolation, it's recommanded to use images generated by the same community model.

Demos

Input (by RealisticVision)	Animation	Input	Animation

Input Scribble	Output	Input Scribbles	Output

AnimateDiff SDXL-Beta (2023.11)

Release the Motion Module (beta version) on SDXL, available at Google Drive / HuggingFace / CivitAI. High resolution videos (i.e., 1024x1024x16 frames with various aspect ratios) could be produced with/without personalized models. Inference usually requires ~13GB VRAM and tuned hyperparameters (e.g., sampling steps), depending on the chosen personalized models.
Checkout to the branch sdxl for more details of the inference.

AnimateDiff SDXL-Beta Model Zoo

Name	HuggingFace	Type	Storage Space
`mm_sdxl_v10_beta.ckpt`	Link	Motion Module	950 MB

Demos

Original SDXL	Community SDXL	Community SDXL

AnimateDiff v2 (2023.09)

In this version, the motion module mm_sd_v15_v2.ckpt (Google Drive / HuggingFace / CivitAI) is trained upon larger resolution and batch size. We found that the scale-up training significantly helps improve the motion quality and diversity.
We also support MotionLoRA of eight basic camera movements. MotionLoRA checkpoints take up only 77 MB storage per model, and are available at Google Drive / HuggingFace / CivitAI.

AnimateDiff v2 Model Zoo

Name	HuggingFace	Type	Parameter	Storage
`mm_sd_v15_v2.ckpt`	Link	Motion Module	453 M	1.7 GB
`v2_lora_ZoomIn.ckpt`	Link	MotionLoRA	19 M	74 MB
`v2_lora_ZoomOut.ckpt`	Link	MotionLoRA	19 M	74 MB
`v2_lora_PanLeft.ckpt`	Link	MotionLoRA	19 M	74 MB
`v2_lora_PanRight.ckpt`	Link	MotionLoRA	19 M	74 MB
`v2_lora_TiltUp.ckpt`	Link	MotionLoRA	19 M	74 MB
`v2_lora_TiltDown.ckpt`	Link	MotionLoRA	19 M	74 MB
`v2_lora_RollingClockwise.ckpt`	Link	MotionLoRA	19 M	74 MB
`v2_lora_RollingAnticlockwise.ckpt`	Link	MotionLoRA	19 M	74 MB

Demos (MotionLoRA)

Zoom In	Zoom Out	Zoom Pan Left	Zoom Pan Right

Tilt Up	Tilt Down	Rolling Anti-Clockwise	Rolling Clockwise

Demos (Improved Motions)

Here's a comparison between mm_sd_v15.ckpt (left) and improved mm_sd_v15_v2.ckpt (right).

AnimateDiff v1 (2023.07)

The first version of AnimateDiff!

AnimateDiff v1 Model Zoo

Name	HuggingFace	Parameter	Storage Space
mm_sd_v14.ckpt	Link	417 M	1.6 GB
mm_sd_v15.ckpt	Link	417 M	1.6 GB

Training

Please check Steps for Training for details.

Related Resources

AnimateDiff for Stable Diffusion WebUI: sd-webui-animatediff (by @continue-revolution)
AnimateDiff for ComfyUI: ComfyUI-AnimateDiff-Evolved (by @Kosinkadink)
Google Colab: Colab (by @camenduru)

Disclaimer

This project is released for academic use. We disclaim responsibility for user-generated content. Also, please be advised that our only official website are https://github.com/guoyww/AnimateDiff and https://animatediff.github.io, and all the other websites are NOT associated with us at AnimateDiff.

Contact Us

Yuwei Guo: [email protected]
Ceyuan Yang: [email protected]
Bo Dai: [email protected]

BibTeX

@article{guo2023animatediff,
  title={AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning},
  author={Guo, Yuwei and Yang, Ceyuan and Rao, Anyi and Liang, Zhengyang and Wang, Yaohui and Qiao, Yu and Agrawala, Maneesh and Lin, Dahua and Dai, Bo},
  journal={International Conference on Learning Representations},
  year={2024}
}

@article{guo2023sparsectrl,
  title={SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models},
  author={Guo, Yuwei and Yang, Ceyuan and Rao, Anyi and Agrawala, Maneesh and Lin, Dahua and Dai, Bo},
  journal={arXiv preprint arXiv:2311.16933},
  year={2023}
}

Acknowledgements

Codebase built upon Tune-a-Video.

animatediff's People

Contributors

Stargazers

Watchers

Forkers

h4rk8s blueangel1313 camenduru fly-open-hpc iamleon121 hubin858130 hyojunguy x-ck-x merecesarchviz leftomelas juandavidgf deyh2020 jjandnn wangzhiwei-ai jackzhousz facok techventurebuilder talesofai sentient-22 wyhsirius gaowudao kustomzone lcolok heitorrapela jinwook-shim babyblue26 learners-superpumped vovkanastasia31 sdbds marinavasilyuk nadezhdavdovichenko olgaviktorenko eltociear rustlessprout witchfindertr di-dimmasik dezigns333 stability-ai tecworks-dev carole1q wjgaas songtao-liu-mt justmaier evdcush jieyoujun yushlest vrgz2022 lidi100 hhy5277 pterameta hoooon89 davalinasosalina bestinokentij943 svetlanavinnichenko bzova hadryan coffeeorgreentea dajes kunato davidko3 jnpatrick99 andlike jekwolf789789 szxysdt oreml mgcollie ntc-ai tonywhite11 swisscakerowl pfxjacky ck-mm syguan96 eddychen0309 thanhpham1987 jmanhype sausax ailabteam vasuprint dan4k-tosh farmer322 reysun utkarshx bconline2002 alexyen1000 hurricanejin pex06df idoganzer pkin6017 cobanov baker880 emanuelbaget25 ksksks2222 sivajid788 zkcr0001 ai-ar4s-dev thinh-huynh-re bswnth48 expert68 karbon0x aleutianxie

animatediff's Issues

Higher resolution in generated animation?

Like title. Is there an option for adjusting the resolution/size of animated images? If not, is this a feature that you may implement in the future? Thanks!

Generating different number of frames

Hi! Thanks for a very interesting paper, I wonder if you've tried generating shorter/longer clips? I see that there is temporal_position_encoding_max_len=24 which limits the length to be 24 frames, but what about shorter clips?

Also I'm struggling to understand what is the shape of attention in Temporal Transformer? Here you resize (b f) d c -> (b d) f c where the batch (b) is probably 1 and the frames (f) is probably 16, (d) corresponds to reshaped features right? So each "super-pixel" is processed separately and the shape of attentions maps should be (B * D) x F x F which isn't really big. Why does then the inference take 60 Gb?

what's the different with base and path?

Not very sure about this
NewModel:
path: "[path to your DreamBooth/LoRA model .safetensors file]"
base: "[path to LoRA base model .safetensors file, leave it empty string if not needed]"

In Ghibli example case
GhibliBackground:
base: "models/DreamBooth_LoRA/CounterfeitV30_25.safetensors"
path: "models/DreamBooth_LoRA/lora_Ghibli_n3.safetensors"
it seems like the base means the base model, and the path means the lora, but in other example like
ToonYou:
base: ""
path: "models/DreamBooth_LoRA/toonyou_beta3.safetensors"

so I try to change it to like this, something goes strange
base: "models/DreamBooth_LoRA/toonyou_beta3.safetensors"
path: ""
I got this

It's not the style of Toonyou, I don't know.

And I did another try, I set the prompt like:
base: "models/DreamBooth_LoRA/toonyou_beta3.safetensors"
path: "models/DreamBooth_LoRA/smv1-10.safetensors"
the smv1-10.safetensors is a lora download from civitai, then I got an error.
omegaconf.errors.ConfigAttributeError: Missing key lora_alpha
full_key: ToonYou.lora_alpha
object_type=dict

so what's the different with base and path, and how to set it correct?

【来自动画行业建议】NICE!从它身上我看到了AI RENDER的可能

作为专业向使用，建议能够接入一些像Blender这样的生产软件中，只靠Gradio Interface这样的不是很适合于生产

如图，这是我在Blender接入了ComfyUI，3D软件可以input更精确的数据（而不是预处理器的不稳定计算），效果会更好，同时也更符合当前动画生产的需求

Integrate with webui as an extension?

Wow, this is really cool.
Is there a way to integrate it with automatic's webui? I mean to use external VAEs, advanced prompting with brackets and weights, controlnet etc?

UPDATE, oh I see it's a duplicate question.

Review added at repo-reviews.github.io

Thank you for building AnimateDiff!

@richardanderson001 created a review titled:

interesting experimental SD based animation

on repo-reviews.github.io to share their experience using AnimateDiff.

link to review

If you would like to help your super-users share their experiences using your repo, add a badge to your README.md.

We hope that sharing these experiences helps your users increase their productivity.

--
Please be kind,
I’m a human!

after the version update, got an error.

At the first version, which cost 60G GPU memory, I can run but I don't have such size memory, so I failed. but everything looks fine.
But now I updated the code, follow the new install guide, then I got this error.

RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA versions. PyTorch has CUDA Version=11.7 and torchvision has CUDA Version=11.8. Please reinstall the torchvision that matches your PyTorch install.

conda list

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_kmp_llvm conda-forge
accelerate 0.21.0 pypi_0 pypi
antlr4-python3-runtime 4.9.3 pypi_0 pypi
arrow 1.2.3 pypi_0 pypi
attrs 23.1.0 pypi_0 pypi
blas 1.0 mkl
brotli-python 1.0.9 py310hd8f1fbe_9 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
ca-certificates 2023.5.7 hbcca054_0 conda-forge
certifi 2023.5.7 pyhd8ed1ab_0 conda-forge
charset-normalizer 3.2.0 pyhd8ed1ab_0 conda-forge
cudatoolkit 11.3.1 ha36c431_9 nvidia
diffusers 0.11.1 pypi_0 pypi
einops 0.6.1 pypi_0 pypi
ffmpeg 4.3 hf484d3e_0 pytorch
fqdn 1.5.1 pypi_0 pypi
freetype 2.12.1 hca18f0e_1 conda-forge
fsspec 2023.6.0 pypi_0 pypi
gdown 4.7.1 pypi_0 pypi
gmp 6.2.1 h58526e2_0 conda-forge
gnutls 3.6.13 h85f3911_1 conda-forge
huggingface-hub 0.16.4 pypi_0 pypi
icu 72.1 hcb278e6_0 conda-forge
idna 3.4 pyhd8ed1ab_0 conda-forge
imageio 2.27.0 pypi_0 pypi
importlib-metadata 6.8.0 pypi_0 pypi
isoduration 20.11.0 pypi_0 pypi
jpeg 9e h0b41bf4_3 conda-forge
jsonpointer 2.4 pypi_0 pypi
jsonschema 4.18.3 pypi_0 pypi
jsonschema-specifications 2023.6.1 pypi_0 pypi
lame 3.100 h166bdaf_1003 conda-forge
lcms2 2.15 hfd0df8a_0 conda-forge
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
lerc 4.0.0 h27087fc_0 conda-forge
libdeflate 1.17 h0b41bf4_0 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 13.1.0 he5830b7_0 conda-forge
libhwloc 2.9.1 nocuda_h7313eea_6 conda-forge
libiconv 1.17 h166bdaf_0 conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libpng 1.6.39 h753d276_0 conda-forge
libsqlite 3.42.0 h2797004_0 conda-forge
libstdcxx-ng 13.1.0 hfd8a6a1_0 conda-forge
libtiff 4.5.0 h6adf6a1_2 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libwebp-base 1.3.1 hd590300_0 conda-forge
libxcb 1.13 h7f98852_1004 conda-forge
libxml2 2.11.4 h0d562d8_0 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
llvm-openmp 16.0.6 h4dfa4b3_0 conda-forge
mkl 2021.4.0 h8d4b97c_729 conda-forge
mkl-service 2.4.0 py310ha2c4b55_0 conda-forge
mkl_fft 1.3.1 py310h2b4bcf5_1 conda-forge
mkl_random 1.2.2 py310h00e6091_0
ncurses 6.4 hcb278e6_0 conda-forge
nettle 3.6 he412f7d_0 conda-forge
numpy 1.24.3 py310hd5efca6_0
numpy-base 1.24.3 py310h8e6c178_0
omegaconf 2.3.0 pypi_0 pypi
openh264 2.1.1 h780b84a_0 conda-forge
openjpeg 2.5.0 hfec8fc6_2 conda-forge
openssl 3.1.1 hd590300_1 conda-forge
pillow 9.4.0 py310h023d228_1 conda-forge
pip 23.1.2 pyhd8ed1ab_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
pysocks 1.7.1 pyha2e5f31_6 conda-forge
python 3.10.12 hd12c33a_0_cpython conda-forge
python_abi 3.10 3_cp310 conda-forge
pytorch 1.12.1 py3.10_cuda11.3_cudnn8.3.2_0 pytorch
pytorch-mutex 1.0 cuda pytorch
pyyaml 6.0 pypi_0 pypi
readline 8.2 h8228510_1 conda-forge
referencing 0.29.1 pypi_0 pypi
requests 2.31.0 pyhd8ed1ab_0 conda-forge
rpds-py 0.8.10 pypi_0 pypi
safetensors 0.3.1 pypi_0 pypi
setuptools 68.0.0 pyhd8ed1ab_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
tbb 2021.9.0 hf52228f_0 conda-forge
tk 8.6.12 h27826a3_0 conda-forge
torchaudio 0.12.1 py310_cu113 pytorch
torchvision 0.13.1 py310_cu113 pytorch
tqdm 4.65.0 pypi_0 pypi
transformers 4.25.1 pypi_0 pypi
typing_extensions 4.7.1 pyha770c72_0 conda-forge
tzdata 2023c h71feb2d_0 conda-forge
uri-template 1.3.0 pypi_0 pypi
urllib3 2.0.3 pyhd8ed1ab_1 conda-forge
webcolors 1.13 pypi_0 pypi
wheel 0.40.0 pyhd8ed1ab_0 conda-forge
xformers 0.0.20 py310_cu11.6.2_pyt1.12.1 xformers
xorg-libxau 1.0.11 hd590300_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
zipp 3.16.2 pypi_0 pypi
zlib 1.2.13 hd590300_5 conda-forge
zstd 1.5.2 hfc55251_7 conda-forge

Will animateDiff be added to Hugging Face Hub?

I am refering to https://huggingface.co/docs/transformers/add_new_model

ModuleNotFoundError: No module named 'animatediff'

Hi everyone,
Running Windows 10 x64, followed the instructions for python, I am getting:

(animatediff) C:\Apps\AnimateDiff\scripts>python -m animate --config configs\prompts\10-InitImageYoimiya.yml
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Apps\AnimateDiff\scripts\animate.py", line 15, in <module>
    from animatediff.models.unet import UNet3DConditionModel
ModuleNotFoundError: No module named 'animatediff'

Any help will be greatly appreciated! Thank you

A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton'

(animatediff) ubuntu@104-171-203-67:~/AnimateDiff$ python -m scripts.animate --config configs/prompts/2-Lyriel.yaml
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/animatediff/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ubuntu/anaconda3/envs/animatediff/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/ubuntu/AnimateDiff/scripts/animate.py", line 159, in
main(args)
File "/home/ubuntu/AnimateDiff/scripts/animate.py", line 51, in main
text_encoder = CLIPTextModel.from_pretrained(args.pretrained_model_path, subfolder="text_encoder")
File "/home/ubuntu/anaconda3/envs/animatediff/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2230, in from_pretrained
state_dict = load_state_dict(resolved_archive_file)
File "/home/ubuntu/anaconda3/envs/animatediff/lib/python3.10/site-packages/transformers/modeling_utils.py", line 386, in load_state_dict
with safe_open(checkpoint_file, framework="pt") as f:
OSError: No such device (os error 19)

Could you, please, release motion module training code?

Image2Anim?

Are you planning to implement image-to-animation pipeline?

GIF with diffusion noise only

Got any ideas where I should start looking to figure out why the resulting GIF is just jittery diffusion noise and nothing else? Much appreciated!

Could this issue be related to the sampler being used? Is there a possibility to choose a different sampler?

cc @talesofai, that's related to this fork - https://github.com/talesofai/AnimateDiff
P.S. I'm facing this with original repo too

"AssertionError"

(animatediff) PS F:\animediff\AnimateDiff> python -m scripts.animate --config configs/prompts/1-ToonYou.yaml
C:\Users\dbodbo\miniconda3\envs\animatediff\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: Could not find module 'C:\Users\dbodbo\miniconda3\envs\animatediff\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
warn(f"Failed to load image Python extension: {e}")
loaded temporal unet's pretrained weights from models/StableDiffusion\unet ...

missing keys: 560;

unexpected keys: 0;

Temporal Module Parameters: 417.1376 M

Traceback (most recent call last):
File "C:\Users\dbodbo\miniconda3\envs\animatediff\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\dbodbo\miniconda3\envs\animatediff\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "F:\animediff\AnimateDiff\scripts\animate.py", line 159, in
main(args)
File "F:\animediff\AnimateDiff\scripts\animate.py", line 56, in main
else: assert False
AssertionError

First Run and got a lot of errors!

I can´t run it, when i try to run got those errors!, any know how to solve them?

(animatediff) F:\AnimateDiff>python -m scripts.animate --config configs/prompts/5-RealisticVision.yaml
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Traceback (most recent call last):
File "C:\Users\Ryzen_Reaper\miniconda3\envs\animatediff\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Ryzen_Reaper\miniconda3\envs\animatediff\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "F:\AnimateDiff\scripts\animate.py", line 15, in
from animatediff.models.unet import UNet3DConditionModel
ModuleNotFoundError: No module named 'animatediff'

(animatediff) F:\AnimateDiff>

Google Collab

Hello, I really want to try AnimateDiff,
sadly I only have an 8Gb RTX 3070,

Do you think you will ever release a notebook?
Thanks

Save as frames?

I wanted to see if there was a way to get the frames for upscaling.

Is this the same as PikaLabs?

Is this AnimateDiff the same as PikaLabs where you input an text prompt+image to generate a video?

huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'models/StableDiffusion/stable-diffusion-v1-5'. Use `repo_type` argument if needed.

huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'models/StableDiffusion/stable-diffusion-v1-5'. Use repo_type argument if needed.

(animatediff) G:\AI\AnimateDiff>

win10 conda3 pytorch2.01 cuda118.
模型都下好，放在G:\AI\AnimateDiff\models\StableDiffusion,文件名为stable-diffusion-v1-5 .safetensors. 还是报这个错误。

Hand + face Pose Guide to generate

Hi,
Is it possible to generate a single character from the Pose for about 5 seconds?

I have a video of Pose ( openpose + hands + face) and i was wondering if it is possible to generate an output video withe the length of 5 seconds that has a consistent character/Avatar which plays Dance, .... from the controled (pose) input?

i want to generate human like animation (No matter what, but just a consistent Character/Avatar)
Sample Video

Thanks
Best regards

How to add negative prompt?

Start with existing init image?

Hi, is it possible to start with an existing image instead of generating from scratch?

Generate only 1 gif

I saw that it generates a sample.gif with two gif how do I generate only one I would like to keep only the 0-1.gif to leave the generation faster

Crash when generating longer video

When passing CLI arg --L to change video length. The program will crash if length is set more than 24.

for example, when passing --L 26:

RuntimeError: The size of tensor a (26) must match the size of tensor b (24) at non-singleton dimension 1

horrible results when using sd 1.5

I'm trying to run the base model without any dreambooth lora but the results are random or meaningless stuff

.yaml config

NewModel:
  path: ""
  base: ""

  motion_module:
    - "models/Motion_Module/mm_sd_v14.ckpt"
    - "models/Motion_Module/mm_sd_v15.ckpt"
    
  steps:          25
  guidance_scale: 7.5

  prompt:
    - "a girl"

  n_prompt:
    - ""

if something is wrong can you tell me how to configure it well for sd 1.5

Solving environment: failed ResolvePackageNotFound: - xformers

My problem is that conda can't install xformers on windows,could you provide pip requirements.txt dependent installation method.

I tried to move -xformers to pip, and finally created an environment error
cuda version 11.6, conda 22.9.0, WIN10

safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

Traceback (most recent call last):
File "/home/dmitry/miniconda3/envs/animatediff/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/dmitry/miniconda3/envs/animatediff/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/mnt/f/project/stable_animation/AnimateDiff/scripts/animate.py", line 159, in
main(args)
File "/mnt/f/project/stable_animation/AnimateDiff/scripts/animate.py", line 51, in main
text_encoder = CLIPTextModel.from_pretrained(args.pretrained_model_path, subfolder="text_encoder")
File "/home/dmitry/miniconda3/envs/animatediff/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2230, in from_pretrained
state_dict = load_state_dict(resolved_archive_file)
File "/home/dmitry/miniconda3/envs/animatediff/lib/python3.10/site-packages/transformers/modeling_utils.py", line 386, in load_state_dict
with safe_open(checkpoint_file, framework="pt") as f:
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

Did everything according to the instructions, what could be the problem? Perhaps I missed something?

How to generate videos based on StableDiffusion v1.5

Thanks for your great work! Could you provide scripts or configs to produce videos based on SD v1.5?

(base) C:\AnimateDiff>conda env create -f environment.yaml Collecting package metadata (repodata.json): done Solving environment: failed ResolvePackageNotFound: - xformers

(base) C:\AnimateDiff>conda env create -f environment.yaml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

xformers

pip install xformers didnt help

Proposal for clip skip on webui gradio.

Hello and if it also look good from gradio, you can also add on clip skip? Because if you set it to clip skip.. then you don't have to set it to 1, instead you can also set it to clip skip for 2, 3 or more.. then they won't have so much trouble about clip skip, without clip skip it would be dumb. you know. Thanks! :)

controlnet support

is there a way to use controlnet ?

Very little movement with mm_sd_v15.ckpt?

Is it intended for images to have barely any movement with the mm_sd_v15 ckpt?

See:

The results generated by yaml are far better than those generated by gradio

yaml generated results

same prompt,same seed,same cfg
gradio generated results
https://github.com/guoyww/AnimateDiff/assets/131337141/836f2bc5-b84f-44e2-9a3c-613dd28f083a

Any way to control the parameters of animation?

I love the results generated with AnimateDiff so far, but most of the animations are fast and a lot of content changes in the animation which makes it a little incoherent. Is there any way to reduce the speed of animation or to tweak other settings of animation?

ResolvePackageNotFound: xformers

(base) H:\Stablediff\aniumateddiff\AnimateDiff\AnimateDiff>conda env create -f environment.yaml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

xformers

I tried using anaconda navigator as well for a premade venv, then i got:

ModuleNotFoundError: No module named 'diffusers.modeling_utils'

Nor can i download the other configs as you used google drive, download quota was reached, so can't download those.

scripts.animate cannot continue

Hello and I have found an error that cannot continue. I make google colab and I have already prepared it but it shows an error for me.

[Errno 2] No such file or directory: '/content/animatediff'
/content
/usr/bin/python3: No module named scripts.animate

I've tried everything but it just doesn't work! question is, can you fix this?
I have already done with the jupyter code.

!python -m scripts.animate --config /content/AnimateDiff/configs/prompts/9-yiffymix31.yaml --pretrained_model_path /content/AnimateDiff/models/StableDiffusion --L 16 --W 512 --H 512

Using huggingface for model distribution

It seems that the google drive link that currently distributes the motion module is locked due to excessive access.
I suggest using huggingface instead.

Is it possible to run only for one frame and set the 'frame history' ourselves?

I am pretty sure this is not currently supported by the API but I'm wondering if this would be possible with the current model architecture? The use case is for psychedelic animation (loopback, deforum, etc.), for example an animation which is always zooming or using some programmed camera movement with img2img - it would be truly spectacular if we could use AnimateDiff to inject lifelike movement! We currently have a lesser version of this with a model called FlowR, which predicts a flow map given the last 4 images. The result is very abstract, but it provides a subtle grounding.

I understand that the AnimateDiff model is trained on buffers of 16 frames, but in theory since it has been trained on random samples without start/end it should be possible to simply keep the buffer rolling and have infinitely long animations.

Are Textual Inversion embeddings supported?

In the examples in configs/prompts/, the prompts contain references to textual inversion embeddings like badhandv4, easynegative, ng_deepnegative_v1_75t:

  n_prompt:
    - "easynegative,bad_construction,bad_structure,bad_wail,bad_windows,blurry,cloned_window,cropped,deformed,disfigured,error,extra_windows,extra_chimney,extra_door,extra_structure,extra_frame,fewer_digits,fused_structure,gross_proportions,jpeg_artifacts,long_roof,low_quality,structure_limbs,missing_windows,missing_doors,missing_roofs,mutated_structure,mutation,normal_quality,out_of_frame,owres,poorly_drawn_structure,poorly_drawn_house,signature,text,too_many_windows,ugly,username,uta,watermark,worst_quality"

From looking at the source code I'm not sure where the support for textual inversion was implemented - is this functionality implemented in AnimateDiff?

Will the training code release?

AssertionError

after banging my head with other issues, I'm now stuck here:
(animatediff) PS I:\AnimateDiff\AnimateDiff> python -m scripts.animate --config configs/prompts/1-ToonYou.yaml
loaded temporal unet's pretrained weights from models/StableDiffusion/stable-diffusion-v1-5\unet ...

missing keys: 560;

unexpected keys: 0;

Temporal Module Parameters: 417.1376 M

Traceback (most recent call last):
File "C:\Users\Lucas\miniconda3\envs\animatediff\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Lucas\miniconda3\envs\animatediff\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "I:\AnimateDiff\AnimateDiff\scripts\animate.py", line 159, in
main(args)
File "I:\AnimateDiff\AnimateDiff\scripts\animate.py", line 56, in main
else: assert False
AssertionError

A1111 extension available

I am the author of the SAM extension and I wrote one extension for AnimateDiff.

https://github.com/continue-revolution/sd-webui-animatediff

Have fun with it!

Good job! 请问如何做成auto1111 的插件？这个和masactrl 锁定前景注意力类似吗？

大佬们好！这个稳定性已经很好了，工程库里面我没发现是如何提升帧与帧之间的稳定性的， prompt参数都是生成单帧的，请问如何做到如此大的相似性？

之前我有关注过下面这个，他们的思路给前景的关键词，用于锁定固定的内容。
https://github.com/ashen-sensored/sd_webui_masactrl
https://arxiv.org/abs/2304.08465

加油！

Google Colab Notebook for AnimateDiff

I wrote a community colab notebook for AnimateDiff, I can't say it's a demo because I add other fork to it.
https://colab.research.google.com/github/Linaqruf/sd-notebook-collection/blob/main/AnimateDiff.ipynb

Can i use it with any model?

Save as mp4

Can you save as video, I've noticed gif aminations are lower quality and have banding.

Can a still image drive be used to get a video?

like this demo
https://twitter.com/pika_labs/status/1678892871670464513

Extension for Auto1111?

May we hope that you will offer this great functionality as an extension in for stable diffusion webui?

Installing xformers with cuda / gpu on windows - ResolvePackageNotFound: - xformers

Posted in solution in #33

how to use lora generate gif

lyriel_v16:
base: "models/DreamBooth_LoRA/lyriel_v16.safetensors"
path: "models/DreamBooth_LoRA/DANWGFWL.safetensors"
motion_module:
- "models/Motion_Module/mm_sd_v14.ckpt"
- "models/Motion_Module/mm_sd_v15.ckpt"

seed: [10788741199826055526, 6520604954829636163, 6519455744612555650]
steps: 35
guidance_scale: 7

prompt:
- "1 girl solo, perfect_hand, (8k, RAW photo, best quality, masterpiece:1.2), (realistic, photo-realistic:1.4), (extremely detailed CG unity 8k wallpaper),(full body), (neon lights), machop, mechanical arms, hanfu, lora:DANWGFWL:0.6,Chinese clothes, dress, pretty face,(dark shot:1.1), epic realistic, RAW, analog, sharp focus, volumetric fog"
- "(masterpiece, best quality), 1girl, nude, closed eyes, upper body, splashing, abstract, psychedelic,"
- "(masterpiece, best quality), 1boy, muscular, beard, cyberpunk, (blurry, bokeh, fisheye lens), night, looking at viewer, contrast, contrapposto, neon oversized jacket,"

n_prompt:
- "(worst quality, low quality:2), monochrome, zombie,overexposure, watermark,text,bad anatomy,bad hand,extra hands,extra fingers,too many fingers,fused fingers,bad arm,distorted arm,extra arms,fused arms,extra legs,missing leg,disembodied leg,extra nipples, detached arm, liquid hand,inverted hand,disembodied limb, small breasts, loli, oversized head,extra body,completely nude, extra navel,easynegative,(hair between eyes),sketch, duplicate, ugly, huge eyes, text, logo, worst face, (bad and mutated hands:1.3), (blurry:2.0), horror, geometry, bad_prompt, (bad hands), (missing fingers), multiple limbs, bad anatomy, (interlocked fingers:1.2), Ugly Fingers, (extra digit and hands and fingers and legs and arms:1.4), ((2girl)), (deformed fingers:1.2), (long fingers:1.2),(bad-artist-anime), bad-artist, bad hand, extra legs ,(ng_deepnegative_v1_75t)"
- "(worst quality, low quality:2), monochrome, zombie,overexposure, watermark,text,bad anatomy,bad hand,extra hands,extra fingers,too many fingers,fused fingers,bad arm,distorted arm,extra arms,fused arms,extra legs,missing leg,disembodied leg,extra nipples, detached arm, liquid hand,inverted hand,disembodied limb, small breasts, loli, oversized head,extra body,completely nude, extra navel,easynegative,(hair between eyes),sketch, duplicate, ugly, huge eyes, text, logo, worst face, (bad and mutated hands:1.3), (blurry:2.0), horror, geometry, bad_prompt, (bad hands), (missing fingers), multiple limbs, bad anatomy, (interlocked fingers:1.2), Ugly Fingers, (extra digit and hands and fingers and legs and arms:1.4), ((2girl)), (deformed fingers:1.2), (long fingers:1.2),(bad-artist-anime), bad-artist, bad hand, extra legs ,(ng_deepnegative_v1_75t)"
- "(worst quality, low quality:2), monochrome, zombie,overexposure, watermark,text,bad anatomy,bad hand,extra hands,extra fingers,too many fingers,fused fingers,bad arm,distorted arm,extra arms,fused arms,extra legs,missing leg,disembodied leg,extra nipples, detached arm, liquid hand,inverted hand,disembodied limb, small breasts, loli, oversized head,extra body,completely nude, extra navel,easynegative,(hair between eyes),sketch, duplicate, ugly, huge eyes, text, logo, worst face, (bad and mutated hands:1.3), (blurry:2.0), horror, geometry, bad_prompt, (bad hands), (missing fingers), multiple limbs, bad anatomy, (interlocked fingers:1.2), Ugly Fingers, (extra digit and hands and fingers and legs and arms:1.4), ((2girl)), (deformed fingers:1.2), (long fingers:1.2),(bad-artist-anime), bad-artist, bad hand, extra legs ,(ng_deepnegative_v1_75t)"

guoyww / animatediff Goto Github PK

animatediff's Introduction

AnimateDiff

Quick Demos

Quick Start

1. Setup repository and environment

2. Launch the sampling script!

2.1 Generate animations with comunity models

2.2 Generate animation with MotionLoRA control

2.3 More control with SparseCtrl RGB and sketch

2.4 Gradio app

Technical Explanation

AnimateDiff

SparseCtrl

Model Versions

AnimateDiff v3 and SparseCtrl (2023.12)

Limitations

Demos

AnimateDiff SDXL-Beta (2023.11)

Demos

AnimateDiff v2 (2023.09)

Demos (MotionLoRA)

Demos (Improved Motions)

AnimateDiff v1 (2023.07)

Training

Related Resources

Disclaimer

Contact Us

BibTeX

Acknowledgements

animatediff's People

Contributors

Stargazers

Watchers

Forkers

animatediff's Issues

Thank you for building AnimateDiff!

Name Version Build Channel

missing keys: 560;

unexpected keys: 0;

Temporal Module Parameters: 417.1376 M

missing keys: 560;

unexpected keys: 0;

Temporal Module Parameters: 417.1376 M

Recommend Projects

Recommend Topics

Recommend Org