Git Product home page Git Product logo

animatediff's Introduction

AnimateDiff

This repository is the official implementation of AnimateDiff [ICLR2024 Spotlight]. It is a plug-and-play module turning most community text-to-image models into animation generators, without the need of additional training.

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
Yuwei Guo, Ceyuan Yang✝, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, Bo Dai (✝Corresponding Author)
arXiv Project Page Open in OpenXLab Hugging Face Spaces

Note: The main branch is for Stable Diffusion V1.5; for Stable Diffusion XL, please refer sdxl-beta branch.

Quick Demos

More results can be found in the Gallery. Some of them are contributed by the community.

Model:ToonYou

Model:Realistic Vision V2.0

Quick Start

Note: AnimateDiff is also offically supported by Diffusers. Visit AnimateDiff Diffusers Tutorial for more details. Following instructions is for working with this repository.

Note: For all scripts, checkpoint downloading will be automatically handled, so the script running may take longer time when first executed.

1. Setup repository and environment

git clone https://github.com/guoyww/AnimateDiff.git
cd AnimateDiff

pip install -r requirements.txt

2. Launch the sampling script!

The generated samples can be found in samples/ folder.

2.1 Generate animations with comunity models

python -m scripts.animate --config configs/prompts/1_animate/1_1_animate_RealisticVision.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_2_animate_FilmVelvia.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_3_animate_ToonYou.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_4_animate_MajicMix.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_5_animate_RcnzCartoon.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_6_animate_Lyriel.yaml
python -m scripts.animate --config configs/prompts/1_animate/1_7_animate_Tusun.yaml

2.2 Generate animation with MotionLoRA control

python -m scripts.animate --config configs/prompts/2_motionlora/2_motionlora_RealisticVision.yaml

2.3 More control with SparseCtrl RGB and sketch

python -m scripts.animate --config configs/prompts/3_sparsectrl/3_1_sparsectrl_i2v.yaml
python -m scripts.animate --config configs/prompts/3_sparsectrl/3_2_sparsectrl_rgb_RealisticVision.yaml
python -m scripts.animate --config configs/prompts/3_sparsectrl/3_3_sparsectrl_sketch_RealisticVision.yaml

2.4 Gradio app

We created a Gradio demo to make AnimateDiff easier to use. By default, the demo will run at localhost:7860.

python -u app.py

Technical Explanation

Technical Explanation

AnimateDiff

AnimateDiff aims to learn transferable motion priors that can be applied to other variants of Stable Diffusion family. To this end, we design the following training pipeline consisting of three stages.

  • In 1. Alleviate Negative Effects stage, we train the domain adapter, e.g., v3_sd15_adapter.ckpt, to fit defective visual aritfacts (e.g., watermarks) in the training dataset. This can also benefit the distangled learning of motion and spatial appearance. By default, the adapter can be removed at inference. It can also be integrated into the model and its effects can be adjusted by a lora scaler.

  • In 2. Learn Motion Priors stage, we train the motion module, e.g., v3_sd15_mm.ckpt, to learn the real-world motion patterns from videos.

  • In 3. (optional) Adapt to New Patterns stage, we train MotionLoRA, e.g., v2_lora_ZoomIn.ckpt, to efficiently adapt motion module for specific motion patterns (camera zooming, rolling, etc.).

SparseCtrl

SparseCtrl aims to add more control to text-to-video models by adopting some sparse inputs (e.g., few RGB images or sketch inputs). Its technicall details can be found in the following paper:

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
Yuwei Guo, Ceyuan Yang✝, Anyi Rao, Maneesh Agrawala, Dahua Lin, Bo Dai (✝Corresponding Author)
arXiv Project Page

Model Versions

Model Versions

AnimateDiff v3 and SparseCtrl (2023.12)

In this version, we use Domain Adapter LoRA for image model finetuning, which provides more flexiblity at inference. We also implement two (RGB image/scribble) SparseCtrl encoders, which can take abitary number of condition maps to control the animation contents.

AnimateDiff v3 Model Zoo
Name HuggingFace Type Storage Description
v3_adapter_sd_v15.ckpt Link Domain Adapter 97.4 MB
v3_sd15_mm.ckpt.ckpt Link Motion Module 1.56 GB
v3_sd15_sparsectrl_scribble.ckpt Link SparseCtrl Encoder 1.86 GB scribble condition
v3_sd15_sparsectrl_rgb.ckpt Link SparseCtrl Encoder 1.85 GB RGB image condition

Limitations

  1. Small fickering is noticable;
  2. To stay compatible with comunity models, there is no specific optimizations for general T2V, leading to limited visual quality under this setting;
  3. (Style Alignment) For usage such as image animation/interpolation, it's recommanded to use images generated by the same community model.

Demos

Input (by RealisticVision) Animation Input Animation
Input Scribble Output Input Scribbles Output

AnimateDiff SDXL-Beta (2023.11)

Release the Motion Module (beta version) on SDXL, available at Google Drive / HuggingFace / CivitAI. High resolution videos (i.e., 1024x1024x16 frames with various aspect ratios) could be produced with/without personalized models. Inference usually requires ~13GB VRAM and tuned hyperparameters (e.g., sampling steps), depending on the chosen personalized models.
Checkout to the branch sdxl for more details of the inference.

AnimateDiff SDXL-Beta Model Zoo
Name HuggingFace Type Storage Space
mm_sdxl_v10_beta.ckpt Link Motion Module 950 MB

Demos

Original SDXL Community SDXL Community SDXL

AnimateDiff v2 (2023.09)

In this version, the motion module mm_sd_v15_v2.ckpt (Google Drive / HuggingFace / CivitAI) is trained upon larger resolution and batch size. We found that the scale-up training significantly helps improve the motion quality and diversity.
We also support MotionLoRA of eight basic camera movements. MotionLoRA checkpoints take up only 77 MB storage per model, and are available at Google Drive / HuggingFace / CivitAI.

AnimateDiff v2 Model Zoo
Name HuggingFace Type Parameter Storage
mm_sd_v15_v2.ckpt Link Motion Module 453 M 1.7 GB
v2_lora_ZoomIn.ckpt Link MotionLoRA 19 M 74 MB
v2_lora_ZoomOut.ckpt Link MotionLoRA 19 M 74 MB
v2_lora_PanLeft.ckpt Link MotionLoRA 19 M 74 MB
v2_lora_PanRight.ckpt Link MotionLoRA 19 M 74 MB
v2_lora_TiltUp.ckpt Link MotionLoRA 19 M 74 MB
v2_lora_TiltDown.ckpt Link MotionLoRA 19 M 74 MB
v2_lora_RollingClockwise.ckpt Link MotionLoRA 19 M 74 MB
v2_lora_RollingAnticlockwise.ckpt Link MotionLoRA 19 M 74 MB

Demos (MotionLoRA)

Zoom In Zoom Out Zoom Pan Left Zoom Pan Right
Tilt Up Tilt Down Rolling Anti-Clockwise Rolling Clockwise

Demos (Improved Motions)

Here's a comparison between mm_sd_v15.ckpt (left) and improved mm_sd_v15_v2.ckpt (right).

AnimateDiff v1 (2023.07)

The first version of AnimateDiff!

AnimateDiff v1 Model Zoo
Name HuggingFace Parameter Storage Space
mm_sd_v14.ckpt Link 417 M 1.6 GB
mm_sd_v15.ckpt Link 417 M 1.6 GB

Training

Please check Steps for Training for details.

Related Resources

AnimateDiff for Stable Diffusion WebUI: sd-webui-animatediff (by @continue-revolution)
AnimateDiff for ComfyUI: ComfyUI-AnimateDiff-Evolved (by @Kosinkadink)
Google Colab: Colab (by @camenduru)

Disclaimer

This project is released for academic use. We disclaim responsibility for user-generated content. Also, please be advised that our only official website are https://github.com/guoyww/AnimateDiff and https://animatediff.github.io, and all the other websites are NOT associated with us at AnimateDiff.

Contact Us

Yuwei Guo: [email protected]
Ceyuan Yang: [email protected]
Bo Dai: [email protected]

BibTeX

@article{guo2023animatediff,
  title={AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning},
  author={Guo, Yuwei and Yang, Ceyuan and Rao, Anyi and Liang, Zhengyang and Wang, Yaohui and Qiao, Yu and Agrawala, Maneesh and Lin, Dahua and Dai, Bo},
  journal={International Conference on Learning Representations},
  year={2024}
}

@article{guo2023sparsectrl,
  title={SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models},
  author={Guo, Yuwei and Yang, Ceyuan and Rao, Anyi and Agrawala, Maneesh and Lin, Dahua and Dai, Bo},
  journal={arXiv preprint arXiv:2311.16933},
  year={2023}
}

Acknowledgements

Codebase built upon Tune-a-Video.

animatediff's People

Contributors

anyirao avatar doubledaibo avatar guoyww avatar huxiuhan avatar limbo0000 avatar yuwei849 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

animatediff's Issues

Generating different number of frames

Hi! Thanks for a very interesting paper, I wonder if you've tried generating shorter/longer clips? I see that there is temporal_position_encoding_max_len=24 which limits the length to be 24 frames, but what about shorter clips?

Also I'm struggling to understand what is the shape of attention in Temporal Transformer? Here you resize (b f) d c -> (b d) f c where the batch (b) is probably 1 and the frames (f) is probably 16, (d) corresponds to reshaped features right? So each "super-pixel" is processed separately and the shape of attentions maps should be (B * D) x F x F which isn't really big. Why does then the inference take 60 Gb?

what's the different with base and path?

Not very sure about this
NewModel:
path: "[path to your DreamBooth/LoRA model .safetensors file]"
base: "[path to LoRA base model .safetensors file, leave it empty string if not needed]"

In Ghibli example case
GhibliBackground:
base: "models/DreamBooth_LoRA/CounterfeitV30_25.safetensors"
path: "models/DreamBooth_LoRA/lora_Ghibli_n3.safetensors"
it seems like the base means the base model, and the path means the lora, but in other example like
ToonYou:
base: ""
path: "models/DreamBooth_LoRA/toonyou_beta3.safetensors"

so I try to change it to like this, something goes strange
base: "models/DreamBooth_LoRA/toonyou_beta3.safetensors"
path: ""
I got this
sample

It's not the style of Toonyou, I don't know.

And I did another try, I set the prompt like:
base: "models/DreamBooth_LoRA/toonyou_beta3.safetensors"
path: "models/DreamBooth_LoRA/smv1-10.safetensors"
the smv1-10.safetensors is a lora download from civitai, then I got an error.
omegaconf.errors.ConfigAttributeError: Missing key lora_alpha
full_key: ToonYou.lora_alpha
object_type=dict

so what's the different with base and path, and how to set it correct?

【来自动画行业建议】NICE!从它身上我看到了AI RENDER的可能

作为专业向使用,建议能够接入一些像Blender这样的生产软件中,只靠Gradio Interface这样的不是很适合于生产
image

如图,这是我在Blender接入了ComfyUI,3D软件可以input更精确的数据(而不是预处理器的不稳定计算),效果会更好,同时也更符合当前动画生产的需求

Integrate with webui as an extension?

Wow, this is really cool.
Is there a way to integrate it with automatic's webui? I mean to use external VAEs, advanced prompting with brackets and weights, controlnet etc?

UPDATE, oh I see it's a duplicate question.

after the version update, got an error.

At the first version, which cost 60G GPU memory, I can run but I don't have such size memory, so I failed. but everything looks fine.
But now I updated the code, follow the new install guide, then I got this error.

RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA versions. PyTorch has CUDA Version=11.7 and torchvision has CUDA Version=11.8. Please reinstall the torchvision that matches your PyTorch install.

conda list

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_kmp_llvm conda-forge
accelerate 0.21.0 pypi_0 pypi
antlr4-python3-runtime 4.9.3 pypi_0 pypi
arrow 1.2.3 pypi_0 pypi
attrs 23.1.0 pypi_0 pypi
blas 1.0 mkl
brotli-python 1.0.9 py310hd8f1fbe_9 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
ca-certificates 2023.5.7 hbcca054_0 conda-forge
certifi 2023.5.7 pyhd8ed1ab_0 conda-forge
charset-normalizer 3.2.0 pyhd8ed1ab_0 conda-forge
cudatoolkit 11.3.1 ha36c431_9 nvidia
diffusers 0.11.1 pypi_0 pypi
einops 0.6.1 pypi_0 pypi
ffmpeg 4.3 hf484d3e_0 pytorch
fqdn 1.5.1 pypi_0 pypi
freetype 2.12.1 hca18f0e_1 conda-forge
fsspec 2023.6.0 pypi_0 pypi
gdown 4.7.1 pypi_0 pypi
gmp 6.2.1 h58526e2_0 conda-forge
gnutls 3.6.13 h85f3911_1 conda-forge
huggingface-hub 0.16.4 pypi_0 pypi
icu 72.1 hcb278e6_0 conda-forge
idna 3.4 pyhd8ed1ab_0 conda-forge
imageio 2.27.0 pypi_0 pypi
importlib-metadata 6.8.0 pypi_0 pypi
isoduration 20.11.0 pypi_0 pypi
jpeg 9e h0b41bf4_3 conda-forge
jsonpointer 2.4 pypi_0 pypi
jsonschema 4.18.3 pypi_0 pypi
jsonschema-specifications 2023.6.1 pypi_0 pypi
lame 3.100 h166bdaf_1003 conda-forge
lcms2 2.15 hfd0df8a_0 conda-forge
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
lerc 4.0.0 h27087fc_0 conda-forge
libdeflate 1.17 h0b41bf4_0 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 13.1.0 he5830b7_0 conda-forge
libhwloc 2.9.1 nocuda_h7313eea_6 conda-forge
libiconv 1.17 h166bdaf_0 conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libpng 1.6.39 h753d276_0 conda-forge
libsqlite 3.42.0 h2797004_0 conda-forge
libstdcxx-ng 13.1.0 hfd8a6a1_0 conda-forge
libtiff 4.5.0 h6adf6a1_2 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libwebp-base 1.3.1 hd590300_0 conda-forge
libxcb 1.13 h7f98852_1004 conda-forge
libxml2 2.11.4 h0d562d8_0 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
llvm-openmp 16.0.6 h4dfa4b3_0 conda-forge
mkl 2021.4.0 h8d4b97c_729 conda-forge
mkl-service 2.4.0 py310ha2c4b55_0 conda-forge
mkl_fft 1.3.1 py310h2b4bcf5_1 conda-forge
mkl_random 1.2.2 py310h00e6091_0
ncurses 6.4 hcb278e6_0 conda-forge
nettle 3.6 he412f7d_0 conda-forge
numpy 1.24.3 py310hd5efca6_0
numpy-base 1.24.3 py310h8e6c178_0
omegaconf 2.3.0 pypi_0 pypi
openh264 2.1.1 h780b84a_0 conda-forge
openjpeg 2.5.0 hfec8fc6_2 conda-forge
openssl 3.1.1 hd590300_1 conda-forge
pillow 9.4.0 py310h023d228_1 conda-forge
pip 23.1.2 pyhd8ed1ab_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
pysocks 1.7.1 pyha2e5f31_6 conda-forge
python 3.10.12 hd12c33a_0_cpython conda-forge
python_abi 3.10 3_cp310 conda-forge
pytorch 1.12.1 py3.10_cuda11.3_cudnn8.3.2_0 pytorch
pytorch-mutex 1.0 cuda pytorch
pyyaml 6.0 pypi_0 pypi
readline 8.2 h8228510_1 conda-forge
referencing 0.29.1 pypi_0 pypi
requests 2.31.0 pyhd8ed1ab_0 conda-forge
rpds-py 0.8.10 pypi_0 pypi
safetensors 0.3.1 pypi_0 pypi
setuptools 68.0.0 pyhd8ed1ab_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
tbb 2021.9.0 hf52228f_0 conda-forge
tk 8.6.12 h27826a3_0 conda-forge
torchaudio 0.12.1 py310_cu113 pytorch
torchvision 0.13.1 py310_cu113 pytorch
tqdm 4.65.0 pypi_0 pypi
transformers 4.25.1 pypi_0 pypi
typing_extensions 4.7.1 pyha770c72_0 conda-forge
tzdata 2023c h71feb2d_0 conda-forge
uri-template 1.3.0 pypi_0 pypi
urllib3 2.0.3 pyhd8ed1ab_1 conda-forge
webcolors 1.13 pypi_0 pypi
wheel 0.40.0 pyhd8ed1ab_0 conda-forge
xformers 0.0.20 py310_cu11.6.2_pyt1.12.1 xformers
xorg-libxau 1.0.11 hd590300_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
zipp 3.16.2 pypi_0 pypi
zlib 1.2.13 hd590300_5 conda-forge
zstd 1.5.2 hfc55251_7 conda-forge

ModuleNotFoundError: No module named 'animatediff'

Hi everyone,
Running Windows 10 x64, followed the instructions for python, I am getting:

(animatediff) C:\Apps\AnimateDiff\scripts>python -m animate --config configs\prompts\10-InitImageYoimiya.yml
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Apps\AnimateDiff\scripts\animate.py", line 15, in <module>
    from animatediff.models.unet import UNet3DConditionModel
ModuleNotFoundError: No module named 'animatediff'

image

Any help will be greatly appreciated! Thank you

A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton'

(animatediff) ubuntu@104-171-203-67:~/AnimateDiff$ python -m scripts.animate --config configs/prompts/2-Lyriel.yaml
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/animatediff/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ubuntu/anaconda3/envs/animatediff/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/ubuntu/AnimateDiff/scripts/animate.py", line 159, in
main(args)
File "/home/ubuntu/AnimateDiff/scripts/animate.py", line 51, in main
text_encoder = CLIPTextModel.from_pretrained(args.pretrained_model_path, subfolder="text_encoder")
File "/home/ubuntu/anaconda3/envs/animatediff/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2230, in from_pretrained
state_dict = load_state_dict(resolved_archive_file)
File "/home/ubuntu/anaconda3/envs/animatediff/lib/python3.10/site-packages/transformers/modeling_utils.py", line 386, in load_state_dict
with safe_open(checkpoint_file, framework="pt") as f:
OSError: No such device (os error 19)

Image2Anim?

Are you planning to implement image-to-animation pipeline?

GIF with diffusion noise only

Got any ideas where I should start looking to figure out why the resulting GIF is just jittery diffusion noise and nothing else? Much appreciated!

Could this issue be related to the sampler being used? Is there a possibility to choose a different sampler?

cc @talesofai, that's related to this fork - https://github.com/talesofai/AnimateDiff
P.S. I'm facing this with original repo too

"AssertionError"

(animatediff) PS F:\animediff\AnimateDiff> python -m scripts.animate --config configs/prompts/1-ToonYou.yaml
C:\Users\dbodbo\miniconda3\envs\animatediff\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: Could not find module 'C:\Users\dbodbo\miniconda3\envs\animatediff\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
warn(f"Failed to load image Python extension: {e}")
loaded temporal unet's pretrained weights from models/StableDiffusion\unet ...

missing keys: 560;

unexpected keys: 0;

Temporal Module Parameters: 417.1376 M

Traceback (most recent call last):
File "C:\Users\dbodbo\miniconda3\envs\animatediff\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\dbodbo\miniconda3\envs\animatediff\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "F:\animediff\AnimateDiff\scripts\animate.py", line 159, in
main(args)
File "F:\animediff\AnimateDiff\scripts\animate.py", line 56, in main
else: assert False
AssertionError

First Run and got a lot of errors!

I can´t run it, when i try to run got those errors!, any know how to solve them?

(animatediff) F:\AnimateDiff>python -m scripts.animate --config configs/prompts/5-RealisticVision.yaml
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Traceback (most recent call last):
File "C:\Users\Ryzen_Reaper\miniconda3\envs\animatediff\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Ryzen_Reaper\miniconda3\envs\animatediff\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "F:\AnimateDiff\scripts\animate.py", line 15, in
from animatediff.models.unet import UNet3DConditionModel
ModuleNotFoundError: No module named 'animatediff'

(animatediff) F:\AnimateDiff>

Google Collab

Hello, I really want to try AnimateDiff,
sadly I only have an 8Gb RTX 3070,

Do you think you will ever release a notebook?
Thanks

Save as frames?

I wanted to see if there was a way to get the frames for upscaling.

huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'models/StableDiffusion/stable-diffusion-v1-5'. Use `repo_type` argument if needed.

huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 'models/StableDiffusion/stable-diffusion-v1-5'. Use repo_type argument if needed.

(animatediff) G:\AI\AnimateDiff>

win10 conda3 pytorch2.01 cuda118.
模型都下好,放在G:\AI\AnimateDiff\models\StableDiffusion,文件名为stable-diffusion-v1-5 .safetensors. 还是报这个错误。

Hand + face Pose Guide to generate

Hi,
Is it possible to generate a single character from the Pose for about 5 seconds?

I have a video of Pose ( openpose + hands + face) and i was wondering if it is possible to generate an output video withe the length of 5 seconds that has a consistent character/Avatar which plays Dance, .... from the controled (pose) input?

i want to generate human like animation (No matter what, but just a consistent Character/Avatar)
Sample Video

Thanks
Best regards

Generate only 1 gif

I saw that it generates a sample.gif with two gif how do I generate only one I would like to keep only the 0-1.gif to leave the generation faster

Crash when generating longer video

When passing CLI arg --L to change video length. The program will crash if length is set more than 24.

for example, when passing --L 26:

RuntimeError: The size of tensor a (26) must match the size of tensor b (24) at non-singleton dimension 1

屏幕截图 2023-07-15 090130.png

horrible results when using sd 1.5

I'm trying to run the base model without any dreambooth lora but the results are random or meaningless stuff

.yaml config

NewModel:
  path: ""
  base: ""

  motion_module:
    - "models/Motion_Module/mm_sd_v14.ckpt"
    - "models/Motion_Module/mm_sd_v15.ckpt"
    
  steps:          25
  guidance_scale: 7.5

  prompt:
    - "a girl"

  n_prompt:
    - ""

if something is wrong can you tell me how to configure it well for sd 1.5

safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

Traceback (most recent call last):
File "/home/dmitry/miniconda3/envs/animatediff/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/dmitry/miniconda3/envs/animatediff/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/mnt/f/project/stable_animation/AnimateDiff/scripts/animate.py", line 159, in
main(args)
File "/mnt/f/project/stable_animation/AnimateDiff/scripts/animate.py", line 51, in main
text_encoder = CLIPTextModel.from_pretrained(args.pretrained_model_path, subfolder="text_encoder")
File "/home/dmitry/miniconda3/envs/animatediff/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2230, in from_pretrained
state_dict = load_state_dict(resolved_archive_file)
File "/home/dmitry/miniconda3/envs/animatediff/lib/python3.10/site-packages/transformers/modeling_utils.py", line 386, in load_state_dict
with safe_open(checkpoint_file, framework="pt") as f:
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

Did everything according to the instructions, what could be the problem? Perhaps I missed something?

Proposal for clip skip on webui gradio.

Hello and if it also look good from gradio, you can also add on clip skip? Because if you set it to clip skip.. then you don't have to set it to 1, instead you can also set it to clip skip for 2, 3 or more.. then they won't have so much trouble about clip skip, without clip skip it would be dumb. you know. Thanks! :)

Any way to control the parameters of animation?

I love the results generated with AnimateDiff so far, but most of the animations are fast and a lot of content changes in the animation which makes it a little incoherent. Is there any way to reduce the speed of animation or to tweak other settings of animation?

ResolvePackageNotFound: xformers

(base) H:\Stablediff\aniumateddiff\AnimateDiff\AnimateDiff>conda env create -f environment.yaml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

  • xformers

I tried using anaconda navigator as well for a premade venv, then i got:

ModuleNotFoundError: No module named 'diffusers.modeling_utils'

Nor can i download the other configs as you used google drive, download quota was reached, so can't download those.

scripts.animate cannot continue

Hello and I have found an error that cannot continue. I make google colab and I have already prepared it but it shows an error for me.

[Errno 2] No such file or directory: '/content/animatediff'
/content
/usr/bin/python3: No module named scripts.animate

I've tried everything but it just doesn't work! question is, can you fix this?
I have already done with the jupyter code.

!python -m scripts.animate --config /content/AnimateDiff/configs/prompts/9-yiffymix31.yaml --pretrained_model_path /content/AnimateDiff/models/StableDiffusion --L 16 --W 512 --H 512

Is it possible to run only for one frame and set the 'frame history' ourselves?

I am pretty sure this is not currently supported by the API but I'm wondering if this would be possible with the current model architecture? The use case is for psychedelic animation (loopback, deforum, etc.), for example an animation which is always zooming or using some programmed camera movement with img2img - it would be truly spectacular if we could use AnimateDiff to inject lifelike movement! We currently have a lesser version of this with a model called FlowR, which predicts a flow map given the last 4 images. The result is very abstract, but it provides a subtle grounding.

I understand that the AnimateDiff model is trained on buffers of 16 frames, but in theory since it has been trained on random samples without start/end it should be possible to simply keep the buffer rolling and have infinitely long animations.

Are Textual Inversion embeddings supported?

In the examples in configs/prompts/, the prompts contain references to textual inversion embeddings like badhandv4, easynegative, ng_deepnegative_v1_75t:

  n_prompt:
    - "easynegative,bad_construction,bad_structure,bad_wail,bad_windows,blurry,cloned_window,cropped,deformed,disfigured,error,extra_windows,extra_chimney,extra_door,extra_structure,extra_frame,fewer_digits,fused_structure,gross_proportions,jpeg_artifacts,long_roof,low_quality,structure_limbs,missing_windows,missing_doors,missing_roofs,mutated_structure,mutation,normal_quality,out_of_frame,owres,poorly_drawn_structure,poorly_drawn_house,signature,text,too_many_windows,ugly,username,uta,watermark,worst_quality"

From looking at the source code I'm not sure where the support for textual inversion was implemented - is this functionality implemented in AnimateDiff?

AssertionError

after banging my head with other issues, I'm now stuck here:
(animatediff) PS I:\AnimateDiff\AnimateDiff> python -m scripts.animate --config configs/prompts/1-ToonYou.yaml
loaded temporal unet's pretrained weights from models/StableDiffusion/stable-diffusion-v1-5\unet ...

missing keys: 560;

unexpected keys: 0;

Temporal Module Parameters: 417.1376 M

Traceback (most recent call last):
File "C:\Users\Lucas\miniconda3\envs\animatediff\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Lucas\miniconda3\envs\animatediff\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "I:\AnimateDiff\AnimateDiff\scripts\animate.py", line 159, in
main(args)
File "I:\AnimateDiff\AnimateDiff\scripts\animate.py", line 56, in main
else: assert False
AssertionError

Good job! 请问如何做成auto1111 的插件?这个和masactrl 锁定前景注意力类似吗?

大佬们好! 这个稳定性已经很好了,工程库里面我没发现是如何提升帧与帧之间的稳定性的, prompt参数都是生成单帧的,请问如何做到如此大的相似性?

之前我有关注过下面这个,他们的思路给前景的关键词,用于锁定固定的内容。
https://github.com/ashen-sensored/sd_webui_masactrl
https://arxiv.org/abs/2304.08465

加油!

Save as mp4

Can you save as video, I've noticed gif aminations are lower quality and have banding.

Extension for Auto1111?

May we hope that you will offer this great functionality as an extension in for stable diffusion webui?

how to use lora generate gif

lyriel_v16:
base: "models/DreamBooth_LoRA/lyriel_v16.safetensors"
path: "models/DreamBooth_LoRA/DANWGFWL.safetensors"
motion_module:
- "models/Motion_Module/mm_sd_v14.ckpt"
- "models/Motion_Module/mm_sd_v15.ckpt"

seed: [10788741199826055526, 6520604954829636163, 6519455744612555650]
steps: 35
guidance_scale: 7

prompt:
- "1 girl solo, perfect_hand, (8k, RAW photo, best quality, masterpiece:1.2), (realistic, photo-realistic:1.4), (extremely detailed CG unity 8k wallpaper),(full body), (neon lights), machop, mechanical arms, hanfu, lora:DANWGFWL:0.6,Chinese clothes, dress, pretty face,(dark shot:1.1), epic realistic, RAW, analog, sharp focus, volumetric fog"
- "(masterpiece, best quality), 1girl, nude, closed eyes, upper body, splashing, abstract, psychedelic,"
- "(masterpiece, best quality), 1boy, muscular, beard, cyberpunk, (blurry, bokeh, fisheye lens), night, looking at viewer, contrast, contrapposto, neon oversized jacket,"

n_prompt:
- "(worst quality, low quality:2), monochrome, zombie,overexposure, watermark,text,bad anatomy,bad hand,extra hands,extra fingers,too many fingers,fused fingers,bad arm,distorted arm,extra arms,fused arms,extra legs,missing leg,disembodied leg,extra nipples, detached arm, liquid hand,inverted hand,disembodied limb, small breasts, loli, oversized head,extra body,completely nude, extra navel,easynegative,(hair between eyes),sketch, duplicate, ugly, huge eyes, text, logo, worst face, (bad and mutated hands:1.3), (blurry:2.0), horror, geometry, bad_prompt, (bad hands), (missing fingers), multiple limbs, bad anatomy, (interlocked fingers:1.2), Ugly Fingers, (extra digit and hands and fingers and legs and arms:1.4), ((2girl)), (deformed fingers:1.2), (long fingers:1.2),(bad-artist-anime), bad-artist, bad hand, extra legs ,(ng_deepnegative_v1_75t)"
- "(worst quality, low quality:2), monochrome, zombie,overexposure, watermark,text,bad anatomy,bad hand,extra hands,extra fingers,too many fingers,fused fingers,bad arm,distorted arm,extra arms,fused arms,extra legs,missing leg,disembodied leg,extra nipples, detached arm, liquid hand,inverted hand,disembodied limb, small breasts, loli, oversized head,extra body,completely nude, extra navel,easynegative,(hair between eyes),sketch, duplicate, ugly, huge eyes, text, logo, worst face, (bad and mutated hands:1.3), (blurry:2.0), horror, geometry, bad_prompt, (bad hands), (missing fingers), multiple limbs, bad anatomy, (interlocked fingers:1.2), Ugly Fingers, (extra digit and hands and fingers and legs and arms:1.4), ((2girl)), (deformed fingers:1.2), (long fingers:1.2),(bad-artist-anime), bad-artist, bad hand, extra legs ,(ng_deepnegative_v1_75t)"
- "(worst quality, low quality:2), monochrome, zombie,overexposure, watermark,text,bad anatomy,bad hand,extra hands,extra fingers,too many fingers,fused fingers,bad arm,distorted arm,extra arms,fused arms,extra legs,missing leg,disembodied leg,extra nipples, detached arm, liquid hand,inverted hand,disembodied limb, small breasts, loli, oversized head,extra body,completely nude, extra navel,easynegative,(hair between eyes),sketch, duplicate, ugly, huge eyes, text, logo, worst face, (bad and mutated hands:1.3), (blurry:2.0), horror, geometry, bad_prompt, (bad hands), (missing fingers), multiple limbs, bad anatomy, (interlocked fingers:1.2), Ugly Fingers, (extra digit and hands and fingers and legs and arms:1.4), ((2girl)), (deformed fingers:1.2), (long fingers:1.2),(bad-artist-anime), bad-artist, bad hand, extra legs ,(ng_deepnegative_v1_75t)"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.