Git Product home page Git Product logo

language-driven-video-inpainting's People

Contributors

jianzongwu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

language-driven-video-inpainting's Issues

code an dataset release

Thank you for your excellent work. Do you have a specific plan to release the code and datasets?

Questions about find_prompt function

Hello, I am a newbie about MLLM and would like to ask you some questions. I am sorry to bother you.

I don't understand the specific function of find_prompt. Is there any difference between and ?

Why didn't you choose to directly encode the text description generated by llava and then feed it to Inpainting network?

Thank u.

LGVI checkpoint missing keys

Hi, Thank you for your excellent work.

I tried to inference LGVI for referring expressions using the repo on https://huggingface.co/jianzongwu/lgvi.
I cloned this repo in ./checkpoints and run the following command as described in README:

python -m inference_referring
--video_path videos/two-birds
--ckpt_path checkpoints/lgvi
--expr "remove the bird on left" \

But I encountered Error like this:
The config attributes {'st_attn': False} were passed to RoviModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Traceback (most recent call last):
File "/home/hboh/anaconda3/envs/rovi/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/hboh/anaconda3/envs/rovi/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/hboh/neurips24/Language_Driven_Video_Inpainting/inference_referring.py", line 86, in
main(args)
File "/home/hboh/anaconda3/envs/rovi/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/hboh/neurips24/Language_Driven_Video_Inpainting/inference_referring.py", line 45, in main
unet = RoviModel.from_pretrained(args.ckpt_path, subfolder='unet')
File "/home/hboh/anaconda3/envs/rovi/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 119, in _inner_fn
return fn(*args, **kwargs)
File "/home/hboh/anaconda3/envs/rovi/lib/python3.9/site-packages/diffusers/models/modeling_utils.py", line 660, in from_pretrained
raise ValueError(
ValueError: Cannot load <class 'rovi.models.unet.RoviModel'> from /home/hboh/neurips24/Language_Driven_Video_Inpainting/checkpoints/lgvi because the following keys are missing:
condition_conv_in.bias, condition_conv_in.weight.
Please make sure to pass low_cpu_mem_usage=False and device_map=None if you want to randomly initialize those weights or else make sure your checkpoint file is correct.

I think it's because lgvi checkpoint is something wrong.
When I do the same inference for referring expressions with lgvi-i checkpoint on repo https://huggingface.co/jianzongwu/lgvi-i, it works fine with the provided videos (two-birds and city-bird).

Thank you for replying in advance! :)

p.s. Additionally, I would appreciate your insights on how well the model generalizes to other in-the-wild videos. Your honest feedback would be very helpful.

Missing Pre-Trained VAE

In the paper there was the mention of a Pre-Trained VAE used to extract image features.
Where can we find that model.

image

Thank You

Finetune help

I want fine-tune the model in my application. Can I refer to any code?

error: metadata-generation-failed

Hi , When we are installing flash-attn we are getting the below error: Any inputs will be of great help.

myenv) root@statefulset-0:/app/rovi/llm# pip install flash-attn --no-build-isolation
Collecting flash-attn
Using cached flash_attn-2.5.9.post1.tar.gz (2.6 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [20 lines of output]
fatal: not a git repository (or any of the parent directories): .git
/tmp/pip-install-2qz6v5nx/flash-attn_1a190833f5a84a41b3d68c829c524e4f/setup.py:78: UserWarning: flash_attn was requested, but nvcc was not found. Are you sure your environment has nvcc available? If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
warnings.warn(
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-2qz6v5nx/flash-attn_1a190833f5a84a41b3d68c829c524e4f/setup.py", line 134, in
CUDAExtension(
File "/root/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1048, in CUDAExtension
library_dirs += library_paths(cuda=True)
File "/root/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1179, in library_paths
if (not os.path.exists(_join_cuda_home(lib_dir)) and
File "/root/miniconda3/envs/myenv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2223, in _join_cuda_home
raise EnvironmentError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

  torch.__version__  = 2.0.1
  
  
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

Training Code and Training Cost

Great work! Do you have any plans to open source the training code for the project? How long will it take to train the model with 8 A100 GPUs?

ModuleNotFoundError: No module named 'llava'

While I am running the interactive inference I am getting the below error , saying cannot find the llava module, not sure why its complaining ... Can you please provide any inputs on this ?

(myenv) root@statefulset-0:/app# python -m inference_interactive \

--video_path videos/city-bird
--ckpt_path checkpoints/lgvi-i
--request "I have this incredible shot of a pelican gliding in the sky, but there's another bird also captured in the frame. Can you help me make the picture solely about the pelican?"
/root/miniconda3/envs/myenv/lib/python3.9/site-packages/diffusers/models/transformers/transformer_2d.py:34: FutureWarning: Transformer2DModelOutput is deprecated and will be removed in version 1.0.0. Importing Transformer2DModelOutput from diffusers.models.transformer_2d is deprecated and this will be removed in a future version. Please use from diffusers.models.modeling_outputs import Transformer2DModelOutput, instead.
deprecate("Transformer2DModelOutput", "1.0.0", deprecation_message)
Traceback (most recent call last):
File "/root/miniconda3/envs/myenv/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/myenv/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/app/inference_interactive.py", line 14, in
from rovi.pipelines.pipeline_rovi_mllm import RoviPipelineMLLM
File "/app/rovi/pipelines/pipeline_rovi_mllm.py", line 31, in
from rovi.llm.llava.conversation import conv_templates, SeparatorStyle
File "/app/rovi/llm/llava/init.py", line 1, in
from .model import LlavaLlamaForCausalLM
File "/app/rovi/llm/llava/model/init.py", line 1, in
from .language_model.llava_llama import LlavaLlamaForCausalLM, LlavaConfig
File "/app/rovi/llm/llava/model/language_model/llava_llama.py", line 27, in
from ..llava_arch import LlavaMetaModel, LlavaMetaForCausalLM
File "/app/rovi/llm/llava/model/llava_arch.py", line 24, in
from llava.constants import IGNORE_INDEX, IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_PATCH_TOKEN, DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN
ModuleNotFoundError: No module named 'llava'

Missing requirements.txt?

Hello!
I would like to try your work as a part of a potential new paper. However, I'm stuck while setting up the project.

Everything is fine until I try to install the dependencies of the requirements.txt file, which does not exist in the repo.
Am I doing something wrong and you are referring to another requirements.txt file?
Currently, I skipped this step and just installed the next packages you suggest, and trying to reverse-engineer the missing packages

Thanks! :)

(On Ubuntu 22.04 4 LTS)

which cuda version are you using?

I can't install the environment you give, I think it's because of the wrong cuda version I use, so can you give me which cuda verison are you using?

many thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.