(CVPR 2024) Official code for paper "Towards Language-Driven Video Inpainting via Multimodal Large Language Models"

Home Page: https://arxiv.org/abs/2401.10226

Python 93.02% HTML 1.27% JavaScript 1.65% CSS 0.30% Shell 3.76%

language-driven-video-inpainting's People

Contributors

Stargazers

Watchers

Forkers

chrischen1023 kill2013110

language-driven-video-inpainting's Issues

code an dataset release

Thank you for your excellent work. Do you have a specific plan to release the code and datasets?

Open Source Code and Model

Please open source the datasets, code, model, and demo ASAP. @jianzongwu

Questions about find_prompt function

Hello, I am a newbie about MLLM and would like to ask you some questions. I am sorry to bother you.

I don't understand the specific function of find_prompt. Is there any difference between and ?

Why didn't you choose to directly encode the text description generated by llava and then feed it to Inpainting network?

Thank u.

Inpainting with mask?

Could we use this model on the video inpainting task with masks?

Thanks!

LGVI checkpoint missing keys

Hi, Thank you for your excellent work.

I tried to inference LGVI for referring expressions using the repo on https://huggingface.co/jianzongwu/lgvi.
I cloned this repo in ./checkpoints and run the following command as described in README:

python -m inference_referring
--video_path videos/two-birds
--ckpt_path checkpoints/lgvi
--expr "remove the bird on left" \

But I encountered Error like this:
The config attributes {'st_attn': False} were passed to RoviModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Traceback (most recent call last):
File "/home/hboh/anaconda3/envs/rovi/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/hboh/anaconda3/envs/rovi/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/hboh/neurips24/Language_Driven_Video_Inpainting/inference_referring.py", line 86, in
main(args)
File "/home/hboh/anaconda3/envs/rovi/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/hboh/neurips24/Language_Driven_Video_Inpainting/inference_referring.py", line 45, in main
unet = RoviModel.from_pretrained(args.ckpt_path, subfolder='unet')
File "/home/hboh/anaconda3/envs/rovi/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn
return fn(*args, **kwargs)
File "/home/hboh/anaconda3/envs/rovi/lib/python3.9/site-packages/diffusers/models/modeling_utils.py", line 660, in from_pretrained
raise ValueError(
ValueError: Cannot load <class 'rovi.models.unet.RoviModel'> from /home/hboh/neurips24/Language_Driven_Video_Inpainting/checkpoints/lgvi because the following keys are missing:
condition_conv_in.bias, condition_conv_in.weight.
Please make sure to pass low_cpu_mem_usage=False and device_map=None if you want to randomly initialize those weights or else make sure your checkpoint file is correct.

I think it's because lgvi checkpoint is something wrong.
When I do the same inference for referring expressions with lgvi-i checkpoint on repo https://huggingface.co/jianzongwu/lgvi-i, it works fine with the provided videos (two-birds and city-bird).

Thank you for replying in advance! :)

p.s. Additionally, I would appreciate your insights on how well the model generalizes to other in-the-wild videos. Your honest feedback would be very helpful.

Missing Pre-Trained VAE

In the paper there was the mention of a Pre-Trained VAE used to extract image features.
Where can we find that model.

Thank You

Finetune help

I want fine-tune the model in my application. Can I refer to any code?

error: metadata-generation-failed

Hi , When we are installing flash-attn we are getting the below error: Any inputs will be of great help.

myenv) root@statefulset-0:/app/rovi/llm# pip install flash-attn --no-build-isolation
Collecting flash-attn
Using cached flash_attn-2.5.9.post1.tar.gz (2.6 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [20 lines of output]
fatal: not a git repository (or any of the parent directories): .git
/tmp/pip-install-2qz6v5nx/flash-attn_1a190833f5a84a41b3d68c829c524e4f/setup.py:78: UserWarning: flash_attn was requested, but nvcc was not found. Are you sure your environment has nvcc available? If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
warnings.warn(
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-2qz6v5nx/flash-attn_1a190833f5a84a41b3d68c829c524e4f/setup.py", line 134, in
CUDAExtension(
File "/root/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1048, in CUDAExtension
library_dirs += library_paths(cuda=True)
File "/root/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1179, in library_paths
if (not os.path.exists(_join_cuda_home(lib_dir)) and
File "/root/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2223, in _join_cuda_home
raise EnvironmentError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

  torch.__version__  = 2.0.1
  
  
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

Training Code and Training Cost

Great work! Do you have any plans to open source the training code for the project? How long will it take to train the model with 8 A100 GPUs?

ModuleNotFoundError: No module named 'llava'

While I am running the interactive inference I am getting the below error , saying cannot find the llava module, not sure why its complaining ... Can you please provide any inputs on this ?

(myenv) root@statefulset-0:/app# python -m inference_interactive \

--video_path videos/city-bird
--ckpt_path checkpoints/lgvi-i
--request "I have this incredible shot of a pelican gliding in the sky, but there's another bird also captured in the frame. Can you help me make the picture solely about the pelican?"
/root/miniconda3/envs/myenv/lib/python3.9/site-packages/diffusers/models/transformers/transformer_2d.py:34: FutureWarning: Transformer2DModelOutput is deprecated and will be removed in version 1.0.0. Importing Transformer2DModelOutput from diffusers.models.transformer_2d is deprecated and this will be removed in a future version. Please use from diffusers.models.modeling_outputs import Transformer2DModelOutput, instead.
deprecate("Transformer2DModelOutput", "1.0.0", deprecation_message)
Traceback (most recent call last):
File "/root/miniconda3/envs/myenv/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/myenv/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/app/inference_interactive.py", line 14, in
from rovi.pipelines.pipeline_rovi_mllm import RoviPipelineMLLM
File "/app/rovi/pipelines/pipeline_rovi_mllm.py", line 31, in
from rovi.llm.llava.conversation import conv_templates, SeparatorStyle
File "/app/rovi/llm/llava/init.py", line 1, in
from .model import LlavaLlamaForCausalLM
File "/app/rovi/llm/llava/model/init.py", line 1, in
from .language_model.llava_llama import LlavaLlamaForCausalLM, LlavaConfig
File "/app/rovi/llm/llava/model/language_model/llava_llama.py", line 27, in
from ..llava_arch import LlavaMetaModel, LlavaMetaForCausalLM
File "/app/rovi/llm/llava/model/llava_arch.py", line 24, in
from llava.constants import IGNORE_INDEX, IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_PATCH_TOKEN, DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN
ModuleNotFoundError: No module named 'llava'

Missing requirements.txt?

Hello!
I would like to try your work as a part of a potential new paper. However, I'm stuck while setting up the project.

Everything is fine until I try to install the dependencies of the requirements.txt file, which does not exist in the repo.
Am I doing something wrong and you are referring to another requirements.txt file?
Currently, I skipped this step and just installed the next packages you suggest, and trying to reverse-engineer the missing packages

Thanks! :)

(On Ubuntu 22.04 4 LTS)

which cuda version are you using?

I can't install the environment you give, I think it's because of the wrong cuda version I use, so can you give me which cuda verison are you using?

many thanks

Training or finetuning code?

Awesome work!
I wonder if there is any open-source plan for the training or finetuning code ?

Thanks

jianzongwu / language-driven-video-inpainting Goto Github PK

language-driven-video-inpainting's People

Contributors

Stargazers

Watchers

Forkers

language-driven-video-inpainting's Issues

Recommend Projects

Recommend Topics

Recommend Org