showlab / tune-a-video Goto Github PK

View Code? Open in Web Editor NEW

4.1K 4.1K 366.0 2.88 MB

[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Home Page: https://tuneavideo.github.io

License: Apache License 2.0

Python 88.29% Jupyter Notebook 11.71%

tune-a-video's People

Contributors

Stargazers

Watchers

Forkers

pollinations codeaudit johndpope techthiyanes abhishekiitd327 w0lramd sidward kalufinnle geekygladiator cedro3 thesoulharmonic fiqgant aadityaverma russpalms lyrl izumisatoshi duyuankai1992 linhuaiyi saulocatharino davidhefan vmenezes599 dango233 jeffery9707 anthonyyuan siran575 costiash ssusantachary weikaikong xdhhh jackylee1 hoooon89 pharmapsychotic aerovfx assassindesign abdoiiii chenchy jorgemcgomes jackzhousz kanguyen-vn asksasasa83 zero506 dguo98 hongjing-qing treksis dgxbl abhinay1997 apprikatai hyojunguy 111111m junhaozhang98 chenyangqiqi xiaoya-li overclockteam smurf-1119 likeboo hyoukachitandaeru nscosine michael203522 avcssef cheeryleeyy plzcm 0xdigiscore amandafanny hhy5277 bedoyajiomara15 chivalrouss latifahape uakbr hufeihu lixirui142 lsclone qxmao237 github-arong gaoyuexit jacob780712 danxiangjie jianqingzheng liuyi61111 forkrepoandreference iwillcodeu nanwang0 iamleon121 system63mush shenjiawei19 lingerwsk rocket-tsang vinthony heshujing chenhao5188 smallvq123 fubearlai wanily alexanderinum cocowy1 zyxin810 leedaga aibabelx 10000plus lyhiving wodole

tune-a-video's Issues

Something error happened In the training step ，see the picture.

More flexible/powerful attention mechanism for longer videos?

The proposed SCAttention is very computationally efficient, but it intuitively seems to be insufficient when generating slightly longer videos. Ex: attending only on the previous frame is not sufficient for establishing the direction of movement of an object.
I guess this works in the provided examples because the videos only have 8 frames, so the first video frame combined with the previous frame might be sufficient for that. But in longer videos, the first frame of the video starts to become kind of irrelevant.

Have you considered or tried using other attention masks such as Local Attention (sliding window) or other non-quadratic sparse attention mechanisms? Any thoughts on this?

Thanks and congrats for the great work!

link is invalid

Pre-trained Tune-A-Video models are available on Hugging Face Library. This is a failed link

Attention weight in CrossAttnDownBlock3D is not trained?

Hi, Congrats on this awesome work!

One question. For this line, it seems the gradient is not going through.

I'm not familiar with torch.utils.checkpoint.checkpoint. Is this something to re-calculate the gradient during the backward pass while using no_grad() during the forward time?

However, even so, I didn't observe the weight change after one iteration of training. Specifically, unet.down_blocks[0].attentions[0].transformer_blocks[0].attn1.to_q.weight is not changed.

Is this correct? or I missed something?

当下载完成所需的预先条件后，遇到运行出错的问题，不知该如何设置参数

请问如何解决呢

FileNotFoundError: [WinError 3] 指定されたパスが見つかりません。: ''

Traceback (most recent call last):
File "C:\Users\coron\Desktop\Tune-A-Video\inference.py", line 15, in
save_videos_grid(video, f"{prompt}.gif")
File "C:\Users\coron\Desktop\Tune-A-Video\tuneavideo\util.py", line 21, in save_videos_grid
os.makedirs(os.path.dirname(path), exist_ok=True)
File "C:\Users\coron\AppData\Local\Programs\Python\Python310\lib\os.py", line 225, in makedirs
mkdir(name, mode)
FileNotFoundError: [WinError 3] 指定されたパスが見つかりません。: ''

execution,inferens.py

from tuneavideo.pipelines.pipeline_tuneavideo import TuneAVideoPipeline
from tuneavideo.models.unet import UNet3DConditionModel
from tuneavideo.util import save_videos_grid
import torch

model_id = "outputs/man-surfing_lr3e-5_seed33/2023-01-30T23-03-19"
unet = UNet3DConditionModel.from_pretrained(model_id, subfolder='unet', torch_dtype=torch.float16).to('cuda')
unet.enable_xformers_memory_efficient_attention()
pipe = TuneAVideoPipeline.from_pretrained("checkpoints/stable-diffusion-v1-4", unet=unet, torch_dtype=torch.float16).to("cuda")

torch.cuda.manual_seed(0)
prompt = "a panda is surfing"
video = pipe(prompt, video_length=8, height=512, width=512, num_inference_steps=50, guidance_scale=7.5).videos

save_videos_grid(video, f"{prompt}.gif")

I don't know if this is relevant.
Tune-A-Video\tuneavideo\util.py

import os
import imageio
import numpy as np
import torch
import torchvision

from einops import rearrange

def save_videos_grid(videos: torch.Tensor, path: str, rescale=False, n_rows=4, fps=3):
videos = rearrange(videos, "b c t h w -> t b c h w")
outputs = []
for x in videos:
x = torchvision.utils.make_grid(x, nrow=n_rows)
x = x.transpose(0, 1).transpose(1, 2).squeeze(-1)
if rescale:
x = (x + 1.0) / 2.0 # -1,1 -> 0,1
x = (x * 255).numpy().astype(np.uint8)
outputs.append(x)

os.makedirs(os.path.dirname(path), exist_ok=True)
imageio.mimsave(path, outputs, fps=fps)

Colab

Is it possible that you could make you code available as a Google colab? I find that the most accessible interface.

请问下下载模型报错，如何解决？

We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like ./checkpoints/stable-diffusion-v1-4 is not the path to a directory containing a scheduler_config.jso
n file.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/diffusers/installation#offline-mode
上面的地址可以访问，但是会出现上述的报错，请问要如何解决？

Change the device 0 to 1

How do I change the device 0 to one so that Pytorch will use device cuda 1

@zhangjiewu ?

How to train with video size 256*256

Hi there! Due to limited GPU memory size, during training process it will trigger OOM, thus I turned to train with video size 256*256, batch_size=1. However, it leads to error like this:

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss.
You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel, and by making sure all forward function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable).
Parameter indices which did not receive grad for rank 1: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error

I don't quite sure what to do with this. Any advise to this issue? Thanks!

Model compatibility.

I tried like 4 different sd models and none of them worked, but it works with the standard sd 1.4 model. Any help? How can i use different sd custom models?

Python version

Thanks for the great job. May I know which version of python you are using?

video prediction

Hello, your work is very attractive. Can this method be used as a video frame prediction task?

How many training steps are required to achieve the effect in the sample?

I tried the 100,000 steps training, but the results still look strange, is this normal?

Can you tell me how many steps I need to take to achieve the right result? Thank you!

attention_mask is None during training

In the paper it mentioned that the ST-Attn layer only consider the previous frame and the first frame. But in the code it seems that during training, the attention_mask for the SC-Attn layer is None. Could you explain why this setting is not same as in the paper?

Pretraining model download

I can only download ckpt weight files， raise EnvironmentError(f"It looks like the config file at '{config_file}' is not a valid JSON file.")
OSError: It looks like the config file at './checkpoints/stable-diffusion-v1-4/sd-v1-4.ckpt' is not a valid JSON file.

In the training step, do you shuffle clips?

Hi, thank you first for amazing works.

I have a few questions about your training process.
(1) Did you fix the number of frames (clips) as 24? Because in every config file, clip length is consistently 24.
Does it impose that any number bigger or smaller than 24 doesn't perform as well as 24?

(2) In the training step, do you shuffle the order of frames (clips)? I have a feeling that it is not proper to shuffle the frames because the frame-related attention parts learn the order of frames too?

Thank you again.

set_use_memory_efficient_attention_xformers()

Hello. Thank you for sharing this great work!

When I run this project, I encountered the following error.

TypeError: set_use_memory_efficient_attention_xformers() takes 2 positional arguments but 3 were given

It seems to be the error made by the diffusers but I ask the help in this project since I installed the latest version of diffusers (0.13.0) and the previous version (0.11.1) also raises up the error.

Any help would be appreciated. Thank you!

Have you received a letter from your lawyer? If not, I will introduce a photo of a certain celebrity in my future research. Because I am a true ikun.

OSError: Error

Hello,

When i run on colab i get this error :

OSError: Error no file named diffusion_pytorch_model.bin found in directory

Do you have any tutorial on how use the colab ?

Best Regards

Good Job！鸡你太美！

Improved Consistency using DDIM inversion (?)

Hi, thank you so much for impressive works!

I noticed there was an updated 'News' of 'Improved Consistency through DDIM inversion'.
Can you explain a bit more about this update? So what I understood is "before : DDPM inversion (DDPM forward and reverse)" then "after: DDIM inversion (DDIM forward and reverse).
Am I right?
Also, then is DDIM sampler used in both fine-tuning an inference?

Thank you again for nice works.

本来挺喜欢你的工作的！

你干嘛！！！！

How does it run on Mac M1

pip3 install -r requirement.txt error like:

ERROR: Could not find a version that satisfies the requirement decord==0.6.0 (from versions: none)
ERROR: No matching distribution found for decord==0.6.0

The decord library has no arm adaptation, Is there an alternative library?
https://pypi.tuna.tsinghua.edu.cn/simple/decord/

Added the model to Replicate

Hello,

I wrapped the model in a cog container and uploaded it to replicate here: https://replicate.com/pollinations/tune-a-video

Let me know if I should change anything / add credits / add you to the repo.

Here is the repository: https://github.com/pollinations/Tune-A-Video

ask for help

RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 23.70 GiB total capacity; 8.31 GiB already allocated; 254.06 MiB free; 8.76 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

My server environment is：

can this repo work on windows cpu machine?

I am trying to this repo on my window 10 cpu machine with pre-trained weight
Can this repo work on windows CPU machine?

小黑子，你干嘛！！！

About Dataset

There are 1024 video samples mentioned in the paper. If it is convenient, can you tell me the address of the data set?

When will be the controlnet part be released

Dear author,

Thanks for the excellent work. When would you release the code with control signals?

Best.

Pose Control Implementation

Hello,
I was wondering how exactly you guys managed to perform "pose control" with Tune-A-Video? To my knowledge, the process hasn't been outlined in the Tune-A-Video paper.

What is the license for this code?

Hi, I'm thinking of making a gradio demo app for this repo and I'd like to know the code license of this repo. Could you add a LICENSE file?

ValueError : xformers is not available. Make sure it is installed correctly

When I run "trian_tuneavideo_py" I got this ValueError, how can I deal this or how to install it?

CUDA error: invalid argument

I have met some problems when I run the code. I think maybe there is something wrong with "accelerator.backward(loss)" in line 294 in train_tuneavideo.py. Hope to seek your valuable opinions. Thank you.

Question about the results

I followed your README, modifying nothing of your code. Here is the final results, not that promising. What's more, I find that sample-100, sample-200 and sample-300 did not change much. Can u share how to improve it?

iKUN

My comment is: Lou Chu Ji Jiao.

Support Stable Diffusion 2

Your work is great! I have generated Anime easily by my model, Cool Japan Diffusion 1.x.

I would like to generate Anime by Stable Diffusion 2.1 because I develop Cool Japan Diffusion 2.x that is based on Stable Diffusion 2.1.
Do you plan to support Stable Diffusion 2.1?

Totally bad results, cannot reproduce!

Why I cannot reproduce your results, and actually the results are quite bad...
The only different thing is that I have not installed the xformer package, and I reduce the n_sample_frames=12.
$WX{OMP3B7P7LVQ}AB_TZ(TJ$

Unet model

No pretrained unet model provided

Do you have the original VDM attention implementation?

This needs controlnet pose estimation

With ability to control pose it would prevent the offset random movement between frames.

I get an error when I try to learn Dreambooth model

Thank you for a great job.
I'm having trouble training a normal model, but I'm having trouble training a Dreambooth model. Mr Potato doesn't work either, so I'd like to identify the cause.

$ accelerate launch train_tuneavideo.py --config="configs/mr-potato-head.yaml"
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
02/01/2023 10:24:30 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16

{'variance_type', 'prediction_type', 'clip_sample'} was not found in config. Values will be initialized to default values.
{'use_linear_projection', 'resnet_time_scale_shift', 'num_class_embeds', 'class_embed_type', 'mid_block_type', 'only_cross_attention', 'dual_cross_attention', 'upcast_attention'} was not found in config. Values will be initialized to default values.
{'prediction_type', 'clip_sample'} was not found in config. Values will be initialized to default values.
/home/ubuntu/Tune-A-Video/tuneavideo/pipelines/pipeline_tuneavideo.py:82: FutureWarning: The configuration file of this scheduler: DDIMScheduler {
"_class_name": "DDIMScheduler",
"_diffusers_version": "0.12.1",
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"clip_sample": true,
"num_train_timesteps": 1000,
"prediction_type": "epsilon",
"set_alpha_to_one": false,
"skip_prk_steps": true,
"steps_offset": 1,
"trained_betas": null
}
has not set the configuration clip_sample. clip_sample should be set to False in the configuration file. Please make sure to update the config accordingly as not setting clip_sample in the config might lead to incorrect results in future versions. If you have downloaded this checkpoint from the Hugging Face Hub, it would be very nice if you could open a Pull request for the scheduler/scheduler_config.json file
deprecate("clip_sample not set", "1.0.0", deprecation_message, standard_warn=False)
02/01/2023 10:24:42 - INFO - main - ***** Running training *****
02/01/2023 10:24:42 - INFO - main - Num examples = 1
02/01/2023 10:24:42 - INFO - main - Num Epochs = 500
02/01/2023 10:24:42 - INFO - main - Instantaneous batch size per device = 1
02/01/2023 10:24:42 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 1
02/01/2023 10:24:42 - INFO - main - Gradient Accumulation steps = 1
02/01/2023 10:24:42 - INFO - main - Total optimization steps = 500
Steps: 0%| | 0/500 [00:00<?, ?it/s]/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Traceback (most recent call last):
File "/home/ubuntu/Tune-A-Video/train_tuneavideo.py", line 352, in
main(**OmegaConf.load(args.config))
File "/home/ubuntu/Tune-A-Video/train_tuneavideo.py", line 284, in main
model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/accelerate/utils/operations.py", line 490, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet.py", line 364, in forward
sample, res_samples = downsample_block(
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet_blocks.py", line 301, in forward
hidden_states = torch.utils.checkpoint.checkpoint(
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet_blocks.py", line 294, in custom_forward
return module(*inputs, return_dict=return_dict)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/attention.py", line 111, in forward
hidden_states = block(
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/attention.py", line 243, in forward
hidden_states = self.attn1(norm_hidden_states, attention_mask=attention_mask, video_length=video_length) + hidden_states
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/attention.py", line 283, in forward
query = self.reshape_heads_to_batch_dim(query)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1269, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'SparseCausalAttention' object has no attribute 'reshape_heads_to_batch_dim'

This environment is as follows

Tesla V100(32GB)
Python 3.10.9
torch 1.13.1
torchaudio 0.13.1
torchtext 0.14.1
torchvision 0.14.1
transformers 4.26.0

Also, when I tried to train Tune-A-Video with a model I trained myself using the Diffusers examples, I got a different error.

$ accelerate launch train_tuneavideo.py --config="configs/man-surfing.yaml"
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
02/01/2023 10:07:58 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16

{'variance_type'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
File "/home/ubuntu/Tune-A-Video/train_tuneavideo.py", line 352, in
main(**OmegaConf.load(args.config))
File "/home/ubuntu/Tune-A-Video/train_tuneavideo.py", line 107, in main
unet = UNet3DConditionModel.from_pretrained_2d(pretrained_model_path, subfolder="unet")
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet.py", line 440, in from_pretrained_2d
model = cls.from_config(config)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/diffusers/configuration_utils.py", line 210, in from_config
model = cls(**init_dict)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/diffusers/configuration_utils.py", line 567, in inner_init
init(self, *args, **init_kwargs)
File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet.py", line 158, in init
raise ValueError(f"unknown mid_block_type : {mid_block_type}")
ValueError: unknown mid_block_type : UNetMidBlock2DCrossAttn
Traceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/bin/accelerate", line 8, in
sys.exit(main())
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/bin/python', 'train_tuneavideo.py', '--config=configs/man-surfing.yaml']' returned non-zero exit status 1.

Any hint would be appreciated

The video of the dancing man

Could the author provide the video of the dancing man?

when the source codes can be released?

你干嘛！！！

Error no file named scheduler_config.json found

I downloaded stable-diffusion-v1-4 checkpoints "sd-v1-4-full-ema.ckpt", "sd-v1-4.ckpt" and put them into folder "./checkpoints/stable-diffusion-v1-4"! However when I run the accelerate launch train_tuneavideo.py --config="configs/man-skiing.yaml" I get the following error:

Traceback (most recent call last): File "/home/ubuntu/project/Tune-A-Video/train_tuneavideo.py", line 367, in <module> main(**OmegaConf.load(args.config)) File "/home/ubuntu/project/Tune-A-Video/train_tuneavideo.py", line 105, in main noise_scheduler = DDPMScheduler.from_pretrained(pretrained_model_path, subfolder="scheduler") File "/opt/conda/envs/makevideo/lib/python3.9/site-packages/diffusers/schedulers/scheduling_utils.py", line 118, in from_pretrained config, kwargs = cls.load_config( File "/opt/conda/envs/makevideo/lib/python3.9/site-packages/diffusers/configuration_utils.py", line 320, in load_config raise EnvironmentError( OSError: Error no file named scheduler_config.json found in directory ./checkpoints/stable-diffusion-v1-4. Traceback (most recent call last): File "/opt/conda/envs/makevideo/bin/accelerate", line 8, in <module> sys.exit(main()) File "/opt/conda/envs/makevideo/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/opt/conda/envs/makevideo/lib/python3.9/site-packages/accelerate/commands/launch.py", line 915, in launch_command simple_launcher(args) File "/opt/conda/envs/makevideo/lib/python3.9/site-packages/accelerate/commands/launch.py", line 578, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/opt/conda/envs/makevideo/bin/python3.9', 'train_tuneavideo.py', '--config=configs/man-skiing.yaml']' returned non-zero exit status 1.

Where can I get scheduler_config.json?

True love ikun !

ValueError: CrossAttnDownBlock2D does not exist.

Code:

from tuneavideo.pipelines.pipeline_tuneavideo import TuneAVideoPipeline
from tuneavideo.models.unet import UNet3DConditionModel
from tuneavideo.util import save_videos_grid
import torch

model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet3DConditionModel.from_pretrained(model_id, subfolder='unet', torch_dtype=torch.float16).to('cuda')
pipe = TuneAVideoPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", unet=unet, torch_dtype=torch.float16).to("cuda")

prompt = "a panda is surfing"
video = pipe(prompt, video_length=8, height=512, width=512, num_inference_steps=50, guidance_scale=7.5).videos

save_videos_grid(video, f"{prompt}.gif")

Output:

ValueError: CrossAttnDownBlock2D does not exist.

Diffuser issue

I have installed the dev version diffuser from hugging face and gets this error "ModuleNotFoundError: No module named 'diffusers.modeling_utils'"

How shold I solve it?

help，请问这个怎么解决？

(video) D:\pythonProject\video>accelerate launch train_tuneavideo.py --config="configs/man-skiing.yaml"
The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 0
--num_machines was set to a value of 1
--mixed_precision was set to a value of 'no'
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
04/11/2023 13:50:51 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cpu

Mixed precision type: fp16

Traceback (most recent call last):
File "C:\Users\YFT-IT-002\anaconda3\envs\video\lib\site-packages\diffusers\configuration_utils.py", line 326, in load_config
config_file = hf_hub_download(
File "C:\Users\YFT-IT-002\anaconda3\envs\video\lib\site-packages\huggingface_hub\utils_validators.py", line 112, in _inner_fn
validate_repo_id(arg_value)
File "C:\Users\YFT-IT-002\anaconda3\envs\video\lib\site-packages\huggingface_hub\utils_validators.py", line 160, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': './checkpoints/stable-diffusion-v1-4'. Use repo_type argument if n
eeded.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_tuneavideo.py", line 367, in
main(**OmegaConf.load(args.config))
File "train_tuneavideo.py", line 105, in main
noise_scheduler = DDPMScheduler.from_pretrained(pretrained_model_path, subfolder="scheduler")
File "C:\Users\YFT-IT-002\anaconda3\envs\video\lib\site-packages\diffusers\schedulers\scheduling_utils.py", line 118, in from_pretrained
config, kwargs = cls.load_config(
File "C:\Users\YFT-IT-002\anaconda3\envs\video\lib\site-packages\diffusers\configuration_utils.py", line 363, in load_config
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like ./checkpoints/stable-diffusion-v1-4 is not the path to
a directory containing a scheduler_config.json file.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/diffusers/installation#offline-mode'.
Traceback (most recent call last):
File "C:\Users\YFT-IT-002\anaconda3\envs\video\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\YFT-IT-002\anaconda3\envs\video\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\YFT-IT-002\anaconda3\envs\video\Scripts\accelerate.exe_main.py", line 7, in
File "C:\Users\YFT-IT-002\anaconda3\envs\video\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Users\YFT-IT-002\anaconda3\envs\video\lib\site-packages\accelerate\commands\launch.py", line 923, in launch_command
simple_launcher(args)
File "C:\Users\YFT-IT-002\anaconda3\envs\video\lib\site-packages\accelerate\commands\launch.py", line 579, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\YFT-IT-002\anaconda3\envs\video\python.exe', 'train_tuneavideo.py', '--config=configs/man-skiing.yaml']' returned non-zero exit st
atus 1.