rehglab / rave Goto Github PK

RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models - CVPR 2024 - Official Repo

Home Page: https://rave-video.github.io/

License: MIT License

Shell 0.09% Python 94.94% C++ 1.64% Cuda 3.01% Dockerfile 0.02% CMake 0.12% Jupyter Notebook 0.19%

diffusion stable-diffusion video-editing

rave's People

Contributors

Stargazers

Watchers

Forkers

richesthumanalive kekewind alexandor91 dedkamaroz gatepoet soccerde skunkwerk lucataco guanghuisong anthonyyuan kobybibas anhlbt beiningwu ninjaneural tranlynhathao tungplclhd buinghia3101 wenhao728

rave's Issues

What is the minimum requirement of memory for this to work

Hi,

I tried this on windows

RTX 4060 8 GB VRAM
16 GB RAM

Input video info

General
Complete name                            : C:\stable_diffusion\input\2.mp4
Format                                   : MPEG-4
Format profile                           : Base Media
Codec ID                                 : isom (isom/iso2/avc1/mp41)
File size                                : 613 KiB
Duration                                 : 1 s 22 ms
Overall bit rate mode                    : Variable
Overall bit rate                         : 4 913 kb/s
Frame rate                               : 30.000 FPS
Movie name                               : 1
Recorded date                            : 2024-01-23
Writing application                      : vsdc

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : [email protected]
Format settings                          : CABAC / 3 Ref Frames
Format settings, CABAC                   : Yes
Format settings, Reference frames        : 3 frames
Codec ID                                 : avc1
Codec ID/Info                            : Advanced Video Coding
Duration                                 : 1 s 0 ms
Bit rate                                 : 4 843 kb/s
Width                                    : 1 080 pixels
Height                                   : 1 920 pixels
Display aspect ratio                     : 0.562
Frame rate mode                          : Constant
Frame rate                               : 30.000 FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.078
Stream size                              : 591 KiB (96%)
Codec configuration box                  : avcC

Audio
ID                                       : 2
Format                                   : AAC LC
Format/Info                              : Advanced Audio Codec Low Complexity
Codec ID                                 : mp4a-40-2
Duration                                 : 1 s 22 ms
Bit rate mode                            : Variable
Bit rate                                 : 154 kb/s
Maximum bit rate                         : 320 kb/s
Channel(s)                               : 2 channels
Channel layout                           : L R
Sampling rate                            : 44.1 kHz
Frame rate                               : 43.066 FPS (1024 SPF)
Compression mode                         : Lossy
Stream size                              : 19.3 KiB (3%)
Default                                  : Yes
Alternate group                          : 1

LOG

C:\Users\nitin\AppData\Local\Temp\gradio\9209523c0d5ddf7a61c02fd85739a512b19ca994\2.mp4
Frame count: 3
unet\diffusion_pytorch_model.safetensors not found
{   'batch_size': 4,
    'batch_size_vae': 1,
    'cond_step_start': 0,
    'control_path': 'C:\\RAVE\\generated\\data\\controls\\2\\depth_zoe_3x3_1',
    'controlnet_conditioning_scale': 1,
    'controlnet_guidance_end': 1,
    'controlnet_guidance_start': 0,
    'give_control_inversion': True,
    'grid_size': 3,
    'guidance_scale': 7.5,
    'hf_cn_path': 'lllyasviel/control_v11f1p_sd15_depth',
    'hf_path': 'runwayml/stable-diffusion-v1-5',
    'image_pil_list': [   <PIL.Image.Image image mode=RGB size=1152x2040 at 0x196FCA057E0>,
                          <PIL.Image.Image image mode=RGB size=1152x2040 at 0x196FCA07D90>,
                          <PIL.Image.Image image mode=RGB size=1152x2040 at 0x196FCA054B0>],
    'inverse_path': 'C:\\RAVE\\generated\\data\\inverses\\2\\depth_zoe_None_3x3_1',
    'inversion_prompt': '',
    'is_ddim_inversion': True,
    'is_shuffle': True,
    'model_id': 'None',
    'negative_prompts': '(worst quality, low quality:2), monochrome, '
                        'zombie,overexposure, watermark,text,bad anatomy,bad '
                        'hand,extra hands,extra fingers,too many fingers,fused '
                        'fingers,bad arm,distorted arm,extra arms,fused '
                        'arms,extra legs,missing leg,disembodied leg,extra '
                        'nipples, detached arm, liquid hand,inverted '
                        'hand,disembodied limb, small breasts, loli, oversized '
                        'head,extra body,completely nude, extra '
                        'navel,easynegative,(hair between eyes),sketch, '
                        'duplicate, ugly, huge eyes, text, logo, worst face, '
                        '(bad and mutated hands:1.3),  (blurry:2.0), horror, '
                        'geometry, bad_prompt, (bad hands), (missing fingers), '
                        'multiple limbs, bad anatomy, (interlocked '
                        'fingers:1.2), Ugly Fingers, (extra digit and hands '
                        'and fingers and legs and arms:1.4), ((2girl)), '
                        '(deformed fingers:1.2), (long '
                        'fingers:1.2),(bad-artist-anime), bad-artist, bad '
                        'hand, extra legs ,(ng_deepnegative_v1_75t)',
    'num_inference_steps': 20,
    'num_inversion_step': 20,
    'pad': 1,
    'positive_prompts': '1girl dancing',
    'preprocess_name': 'depth_zoe',
    'sample_size': 3,
    'save_folder': '2',
    'save_path': 'C:\\RAVE\\results\\03-14-2024\\2\\1girl dancing-00001',
    'seed': 212805578,
    'video_name': '2',
    'video_path': 'C:\\Users\\nitin\\AppData\\Local\\Temp\\gradio\\9209523c0d5ddf7a61c02fd85739a512b19ca994\\2.mp4'}
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
img_size [384, 512]
Traceback (most recent call last):
  File "C:\RAVE\venv\lib\site-packages\gradio\queueing.py", line 501, in call_prediction
    output = await route_utils.call_process_api(
  File "C:\RAVE\venv\lib\site-packages\gradio\route_utils.py", line 253, in call_process_api
    output = await app.get_blocks().process_api(
  File "C:\RAVE\venv\lib\site-packages\gradio\blocks.py", line 1695, in process_api
    result = await self.call_function(
  File "C:\RAVE\venv\lib\site-packages\gradio\blocks.py", line 1235, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\RAVE\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "C:\RAVE\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
  File "C:\RAVE\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "C:\RAVE\venv\lib\site-packages\gradio\utils.py", line 692, in wrapper
    response = f(*args, **kwargs)
  File "C:\RAVE\webui.py", line 143, in run
    res_vid, control_vid = CN(input_dict)
  File "C:\RAVE\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\RAVE\pipelines\sd_controlnet_rave.py", line 400, in __call__
    init_latents_pre = self.encode_imgs(img_batch)
  File "C:\RAVE\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\RAVE\pipelines\sd_controlnet_rave.py", line 208, in encode_imgs
    posterior = self.vae.encode(image).latent_dist
  File "C:\RAVE\venv\lib\site-packages\diffusers\utils\accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "C:\RAVE\venv\lib\site-packages\diffusers\models\autoencoder_kl.py", line 236, in encode
    h = self.encoder(x)
  File "C:\RAVE\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\RAVE\venv\lib\site-packages\diffusers\models\vae.py", line 139, in forward
    sample = down_block(sample)
  File "C:\RAVE\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\RAVE\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1150, in forward
    hidden_states = resnet(hidden_states, temb=None)
  File "C:\RAVE\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\RAVE\venv\lib\site-packages\diffusers\models\resnet.py", line 598, in forward
    hidden_states = self.nonlinearity(hidden_states)
  File "C:\RAVE\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\RAVE\venv\lib\site-packages\torch\nn\modules\activation.py", line 396, in forward
    return F.silu(input, inplace=self.inplace)
  File "C:\RAVE\venv\lib\site-packages\torch\nn\functional.py", line 2059, in silu
    return torch._C._nn.silu(input)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 8.00 GiB total capacity; 13.57 GiB already allocated; 0 bytes free; 13.89 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Windows issue

When I click RunAll under Windows I get
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'D:\\Tests\\RAVE/results/12-10-2023/woman_test/C:'
As if it always appends C: to the path for some reason that causes the script to fail.
Full error

Traceback (most recent call last):
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\gradio\queueing.py", line 456, in call_prediction
    output = await route_utils.call_process_api(
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\gradio\route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\gradio\blocks.py", line 1522, in process_api
    result = await self.call_function(
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\gradio\blocks.py", line 1144, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\gradio\utils.py", line 674, in wrapper
    response = f(*args, **kwargs)
  File "D:\Tests\RAVE\webui.py", line 114, in run
    input_ns = init_paths(input_ns)
  File "D:\Tests\RAVE\webui.py", line 34, in init_paths
    os.makedirs(save_dir, exist_ok=True)
  File "D:\Python\lib\os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "D:\Python\lib\os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "D:\Python\lib\os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  [Previous line repeated 5 more times]
  File "D:\Python\lib\os.py", line 225, in makedirs
    mkdir(name, mode)
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'D:\\Tests\\RAVE/results/12-10-2023/woman_test/C:'
Traceback (most recent call last):
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\gradio\queueing.py", line 456, in call_prediction
    output = await route_utils.call_process_api(
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\gradio\route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\gradio\blocks.py", line 1522, in process_api
    result = await self.call_function(
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\gradio\blocks.py", line 1144, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\gradio\utils.py", line 674, in wrapper
    response = f(*args, **kwargs)
  File "D:\Tests\RAVE\webui.py", line 114, in run
    input_ns = init_paths(input_ns)
  File "D:\Tests\RAVE\webui.py", line 34, in init_paths
    os.makedirs(save_dir, exist_ok=True)
  File "D:\Python\lib\os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "D:\Python\lib\os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "D:\Python\lib\os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  [Previous line repeated 5 more times]
  File "D:\Python\lib\os.py", line 225, in makedirs
    mkdir(name, mode)
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'D:\\Tests\\RAVE/results/12-10-2023/woman_test/C:'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\gradio\queueing.py", line 501, in process_events
    response = await self.call_prediction(awake_events, batch)
  File "D:\Tests\RAVE\voc_rave\lib\site-packages\gradio\queueing.py", line 465, in call_prediction
    raise Exception(str(error) if show_error else None) from error
Exception: None

Combine grid and shuffle latent to TokenFlow pipeline

I see the batched_denoise_step method like TokenFlow pipeline in your sd_controlnet_rave pipeline, i wonder that i can use keyframe-based interpolation mechanism of TokenFlow with the latent using grid and shuffle method?

local edit

Hi,
Thank you for your great work.
Which control are you using for the local editing?

Thanks,
Rotem

too slow to process a short video

why the RAVE never converge, it took more than 4 hour to process a short 10sec video of 300 frames (720 x 1280). it simply not usable in the realworld. any optimization options? thx

Where can I find the code for WarpSSIM metric?

Thank you for the awesome work!

As noted in title, I cannot find the implementation for WarpSSIM evaluations. Could you give me the guideline where I can find the implementation of WarpSSIM?

Thanks

Support for longer video length

I wonder this method will support for how many frames?

Does not work on windows 11. It created folder name using prompt.

Hello,

I have added the source video

Prompt: (masterpiece, top quality, best quality, official art, beautiful and aesthetic:1.2), (1girl), extreme detailed,(fractal art:1.3),colorful,highest detailed, dancing

Negative prompt: (worst quality, low quality:2), monochrome, zombie,overexposure, watermark,text,bad anatomy,bad hand,extra hands,extra fingers,too many fingers,fused fingers,bad arm,distorted arm,extra arms,fused arms,extra legs,missing leg,disembodied leg,extra nipples, detached arm, liquid hand,inverted hand,disembodied limb, small breasts, loli, oversized head,extra body,completely nude, extra navel,easynegative,(hair between eyes),sketch, duplicate, ugly, huge eyes, text, logo, worst face, (bad and mutated hands:1.3), (blurry:2.0), horror, geometry, bad_prompt, (bad hands), (missing fingers), multiple limbs, bad anatomy, (interlocked fingers:1.2), Ugly Fingers, (extra digit and hands and fingers and legs and arms:1.4), ((2girl)), (deformed fingers:1.2), (long fingers:1.2),(bad-artist-anime), bad-artist, bad hand, extra legs ,(ng_deepnegative_v1_75t)

Model id: SD 1.5

C:\RAVE>.\venv\scripts\activate

(venv) C:\RAVE>python webui.py
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
ControlNet preprocessor location: C:\RAVE\pretrained_models
Running on local URL: http://127.0.0.1:7860
Running on public URL: https://9f0b9b6176b4a3f6c2.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy from Terminal to deploy to Spaces (https://huggingface.co/spaces)

OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'C:\RAVE\results\03-14-2024\2\(masterpiece, top quality, best quality, official art, beautiful and aesthetic:1.2), (1girl), extreme detailed,(fractal art:1.3),colorful,highest detailed, dancing-00001'

Why does it create folder name using the prompt?

zoedepth/KeyError: 'version_name

File "F:\rave\RAVE\annotator\zoe\zoedepth\utils\config.py", line 384, in get_config
version_name = overwrite_kwargs.get("version_name", config["version_name"])
KeyError: 'version_name

i try to change into line art CN it work but it out of memory
look like my cheap GPU cant do it
and you put xformers into requirements but i'm not see it run it on any command line
........................

a little feedback since other tread is closed

you guy seem to forgot to put gradio in requirements

if any one had bash error don't forget to put git into environment path

and for me when convert model i had to manual create "diffusers_models" folder and "model ID" folder inside "CIVIT_AI" folder
otherwise script will not work.

and in webUI i don't know how to change model it just show sd1.5 when i try to blind run it give error like
mkdir(name, mode)
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'F:\rave\RAVE/results/12-09-2023/F:'

maybe i need to change root folder.
........................
So bad look like i had to wait for midvram version
hope it give some help for you. keep it up

So many not found or No matching in requirements.txt

I not sure what i do wrong but so many error when install requirements
matplotlib==3.8.2
caffe2==0.8.1
scipy==1.11.4
mc==0.0
skimage==0.0
tensorflow==2.15.0.post1
and maybe more

What is the minimum VRAM usage requirement of RAVE?

RuntimeError: torch.cat(): expected a non-empty list of Tensors

After running the demo notebook specifying lineart_realistic on a custom video I get this error

RuntimeError Traceback (most recent call last)

in <cell line: 1>()
----> 1 res = run(input_ns)
2 save_dir_name = 'animation.16'
3 save_dir = f'assets/notebook-generated/{save_dir_name}'
4 os.makedirs(save_dir, exist_ok=True)
5 if len(res) == 3:

3 frames

/content/RAVE/pipelines/sd_controlnet_rave.py in process_image_batch(self, image_pil_list)
313 control_torch_list.append(control_image)
314 image_torch_list.append(ipu.pil_img_to_torch_tensor(image_pil))
--> 315 control_torch = torch.cat(control_torch_list, dim=0).to(self.device)
316 img_torch = torch.cat(image_torch_list, dim=0).to(self.device)
317 torch.save(control_torch, os.path.join(self.controls_path, 'control.pt'))

RuntimeError: torch.cat(): expected a non-empty list of Tensors

Windows: How to save models within RAVE installation directory instead of huggingface cache

Hello,

I managed to install successfully on Windows. The instalation steps are easy to follow, thanks for that.

During the launch it download a lot of models, config, etc... and put them in C:\Users<username>.cache\huggingface\hub

I have a lot of installed tools related to stable diffusion which uses the same models like
diffusion_pytorch_model.safetensors:1.45G
vae\diffusion_pytorch_model.safetensors not found
tokenizer/merges.txt:525k
tokenizer/vocab.json:1.06M
vae/diffusion_pytorch_model.bin:335M
text_encoder/pytorch_model.bin:492M
safety_checker/pytorch_model.bin:1.22G
unet/diffusion_pytorch_model.bin:3.44G

To save space I use windows hard link feature. Is there a way to put these inside RAVE installation directory instead of cache folder.