Git Product home page Git Product logo

ema-vfi's People

Contributors

guozhenzhang1999 avatar wanglimin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ema-vfi's Issues

Thank you! Very nice interpolation

Hi, Thank You for your great video frame interpolation model!

I wanted to try it out, and compare it to VFIformer. I had previously created a web-based tool: "VFIformer-WebUI". It made sense to me to make a version of my tool that uses your model instead.

I've created "EMA-VFI-WebUI", a version of my application that uses your 2x interpolation model:

https://github.com/jhogsett/EMA-VFI-WebUI

It makes no changes to your code, and just needs to be overlaid on top. The readme file has details on how to install and run it. I think this will make it easy and fun to use your model and also compare it with VFIformer. My original application is at https://github.com/jhogsett/VFIformer-WebUI.

Multi-frame context

Hi, thanks for the great research.

At the end of the paper, in 'Limitations and Future Work', you write:

Second, the input of our methods is restricted to two consecutive frames, which results in the inability to leverage information from multiple consecutive frames. In future work, we will attempt to extend our approach to multi-frame inputs without introducing excessive overhead

Would you be able to share any general thoughts on how you would approach this?

Thank you!

ROCm support

Hello,

Is it possible to use EMA-VFI with ROCm and an AMD GPU?

python3.9 demo_2x.py 
Traceback (most recent call last):
  File "/home/tyra/Downloads/EMA-VFI/demo_2x.py", line 36, in <module>
    model = Model(-1)
  File "/home/tyra/Downloads/EMA-VFI/Trainer.py", line 17, in __init__
    self.device()
  File "/home/tyra/Downloads/EMA-VFI/Trainer.py", line 32, in device
    self.net.to(torch.device("cuda"))
  File "/home/tyra/Downloads/EMA-VFI/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 673, in to
    return self._apply(convert)
  File "/home/tyra/Downloads/EMA-VFI/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 387, in _apply
    module._apply(fn)
  File "/home/tyra/Downloads/EMA-VFI/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 387, in _apply
    module._apply(fn)
  File "/home/tyra/Downloads/EMA-VFI/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 387, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/home/tyra/Downloads/EMA-VFI/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 409, in _apply
    param_applied = fn(param)
  File "/home/tyra/Downloads/EMA-VFI/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 671, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/home/tyra/Downloads/EMA-VFI/env/lib/python3.9/site-packages/torch/cuda/__init__.py", line 170, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

This error occurs when I train to the last batch_size of epoch2. Is there a problem with the data set? Should this path (/sequences//im3.png) contain images

[ WARN:[email protected]] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./vimeo_triplet/sequences//im1.png'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./vimeo_triplet/sequences//im2.png'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./vimeo_triplet/sequences//im3.png'): can't open/read file: check file path/integrity
Traceback (most recent call last):
File "train.py", line 105, in
train(model, args.local_rank, args.batch_size, args.data_path)
File "train.py", line 62, in train
evaluate(model, val_data, nr_eval, local_rank)
File "train.py", line 74, in evaluate
for _, imgs in enumerate(val_data):
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in next
data = self._next_data()
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
return self._process_data(data)
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise
raise exception
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/wan/EMA-VFI-main/dataset.py", line 89, in getitem
img0 = torch.from_numpy(img0.copy()).permute(2, 0, 1)
AttributeError: 'NoneType' object has no attribute 'copy'

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 894631) of binary: /home/wan/anaconda3/envs/vif/bin/python
Traceback (most recent call last):
File "/home/wan/anaconda3/envs/vif/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/wan/anaconda3/envs/vif/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in
main()
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, **kwargs)
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
run(args)
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-09-01_10:31:16
host : auto
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 894631)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Arbitrary time training

Hi, congrats to your work! I wonder whether or when the training code supporting arbitrary time interpolation would be open-sourced? Thanks a lot!

Running out of ram on large 4k images

Hi and conratulations on this awesome work!

I have been experimenting with this abit and I run out of ram when using the large model on 4k images when running on 24GB gpu

Any ideas on how to process large images on 24GB?

I am going to test splitting it up i four pieces and reassembling but I fear there will be seams.

About motion features

The first 3 elements of the motion feature list are empty []. Could you explain the reason for such a design? And the first 3 dims of motion features are 0, as shown below:

def init_model_config(F=32, W=7, depth=[2, 2, 2, 4, 4]): '''This function should not be modified''' return { 'embed_dims':[F, 2*F, 4*F, 8*F, 16*F], 'motion_dims':[0, 0, 0, 8*F//depth[-2], 16*F//depth[-1]], 'num_heads':[8*F//32, 16*F//32], 'mlp_ratios':[4, 4],

Looking forward to your reply. Thanks.

How can I generate ours_t?

Hello, thank you for your kind words.

I'm reaching out with a question regarding interpolating multiple frames during an experiment. I look forward to your helpful response.

I successfully interpolated n images during the inference process using our_t. Now, I want to interpolate n images using the trained pkl file. However, it seems there's a difference between ours_t and ours. How can I generate ours_t?

About arbitrary time training

Did you use other model for arbitrary time training? I think you refer to RIFE, but RIFE has different model (IFNet_m) for arbitrary time training. I would like to know.

Question about the number of training GPUs

Dear Authors,

Thank you very much for presenting such a great video interpolation approach!

We are implementing your work and have a question about the training process -- how many GPUs did you use when training your model with the batch size of 32? What's your GPU type?

We will be very appreciated if you can give the clarification.

Best regards

how to run the MotionFormer locally?

Dear author, I try to run your MotionFormer in my local environment, just for simple test to see the workflow of the IFA and its outputs. But I met the below errors and don't know how to fix it, could you please make a help ? thanks.

zcode1
zcode2
zcode3

Output optical flow

Hello author, I tried to output the characteristics of optical flow and visualize them, but I ran into problems. I would like to ask which part of the code is the final optical flow that I want to output. Generally, optical flow has two horizontal and vertical components with dimensions of (2, H, W), and I see that the flow that can be output is a list with dimensions of (4, H, W). So which part is the final representation of the optical flow information?

Help for a newbie

Hello! I am not an expert in Python or programming, but I am interested in the issue of video interpolation.
I absolutely don’t understand what needs to be done to start interpolation. Did I understand correctly that using this program you can increase the number of frames in a video or a series of frames in picture format?
I need help with installation and transition to the neighboring theme "EMA-VFI-WebUI"

Be faster and better

Try not using itmm, then it'll be simple to turn it into C/C++. I find it very slow for the initiation, I don't know why. In the example, it seems that cpu is used, is there some special reason not using gpu? GPUs are used more in video processing usually.

I've tried the demo, --n=8 for 30s and --n=32 for 2min(No initialization time) on my 1650gpu. The average of 3.75s per frame is much higher than RIFE(VapourSynth-RIFE-ncnn-Vulkan: rife-v4.6 ensemble=True , I tried using it on a 2h 1080p film from 24 to 60, the whole process is about 14h, average of it is 0.12s per frame), why is that?

MAde one click installer and new bat file to run it

HEy man, i did instsaller for it its for webui https://github.com/jhogsett/EMA-VFI-WebUI , it creates own venv in folder and then you can just run webui to activate venv and have fun in webui, i have one issue i see, i can interpolate just one frame inbetwee, with more frames it just takes forever , there is no progressbar so i have no clue whats happening .
Ok finally it did interpolate 7 frames in webui but its so slow its like snail speed compred to rife which is very fast... is that normal ?

feel free to include it no credit
Contents

@echo off

REM Create the virtual environment
python -m venv venv

REM Activate the virtual environment
call venv\Scripts\activate.bat

REM Install dependencies
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
pip install -U scikit-image
pip install numpy
pip install opencv-python
pip install timm
pip install tqdm
pip install gradio
pip install basicsr
pip install realesrgan

echo Virtual environment set up and dependencies installed!

REM Ask if the user wants to run the webui.py script
set /p run_webui="Do you want to run the webui.py script (Y/N)? "
if /i "%run_webui%"=="Y" (
python webui.py
)

pause

EMA-VFI-WebUI install.zip

encounter RuntimeError: unmatched '}' in format string when training for fixed-timestep interpolation

As title, I meet RuntimeError: unmatched '}' in format string when I want to train the model by my own. I wonder if there exists a way to by pass this error.
After typing the command python -m torch.distributed.launch --nproc_per_node=4 train.py --world_size 4 --batch_size 8 --data_path **YOUR_VIMEO_DATASET_PATH**(I did edit the dataset path), I get the following output in my terminal

Traceback (most recent call last):
...
File "C:\anaconda3\envs\tf2.5\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 538, in _rendezvous
    store, group_rank, group_world_size = spec.rdzv_handler.next_rendezvous()
File "C:\anaconda3\envs\tf2.5\lib\site-packages\torch\distributed\elastic\rendezvous\static_tcp_rendezvous.py", line 55, in next_rendezvous
    self._store = TCPStore(  # type: ignore[call-arg]
RuntimeError: unmatched '}' in format string

My environment: pytorch 1.13.1+cu117/python 3.8.15/win 10

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.