mcg-nju / ema-vfi Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2023] Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolatio
License: Apache License 2.0
[CVPR 2023] Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolatio
License: Apache License 2.0
Hi, Thank You for your great video frame interpolation model!
I wanted to try it out, and compare it to VFIformer. I had previously created a web-based tool: "VFIformer-WebUI". It made sense to me to make a version of my tool that uses your model instead.
I've created "EMA-VFI-WebUI", a version of my application that uses your 2x interpolation model:
https://github.com/jhogsett/EMA-VFI-WebUI
It makes no changes to your code, and just needs to be overlaid on top. The readme file has details on how to install and run it. I think this will make it easy and fun to use your model and also compare it with VFIformer. My original application is at https://github.com/jhogsett/VFIformer-WebUI.
Hi, thanks for the great research.
At the end of the paper, in 'Limitations and Future Work', you write:
Second, the input of our methods is restricted to two consecutive frames, which results in the inability to leverage information from multiple consecutive frames. In future work, we will attempt to extend our approach to multi-frame inputs without introducing excessive overhead
Would you be able to share any general thoughts on how you would approach this?
Thank you!
Hello,
Is it possible to use EMA-VFI with ROCm and an AMD GPU?
python3.9 demo_2x.py
Traceback (most recent call last):
File "/home/tyra/Downloads/EMA-VFI/demo_2x.py", line 36, in <module>
model = Model(-1)
File "/home/tyra/Downloads/EMA-VFI/Trainer.py", line 17, in __init__
self.device()
File "/home/tyra/Downloads/EMA-VFI/Trainer.py", line 32, in device
self.net.to(torch.device("cuda"))
File "/home/tyra/Downloads/EMA-VFI/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 673, in to
return self._apply(convert)
File "/home/tyra/Downloads/EMA-VFI/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
File "/home/tyra/Downloads/EMA-VFI/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
File "/home/tyra/Downloads/EMA-VFI/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/home/tyra/Downloads/EMA-VFI/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 409, in _apply
param_applied = fn(param)
File "/home/tyra/Downloads/EMA-VFI/env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 671, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/home/tyra/Downloads/EMA-VFI/env/lib/python3.9/site-packages/torch/cuda/__init__.py", line 170, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
[ WARN:[email protected]] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./vimeo_triplet/sequences//im1.png'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./vimeo_triplet/sequences//im2.png'): can't open/read file: check file path/integrity
[ WARN:[email protected]] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('./vimeo_triplet/sequences//im3.png'): can't open/read file: check file path/integrity
Traceback (most recent call last):
File "train.py", line 105, in
train(model, args.local_rank, args.batch_size, args.data_path)
File "train.py", line 62, in train
evaluate(model, val_data, nr_eval, local_rank)
File "train.py", line 74, in evaluate
for _, imgs in enumerate(val_data):
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in next
data = self._next_data()
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
return self._process_data(data)
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/_utils.py", line 461, in reraise
raise exception
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/wan/anaconda3/envs/vif/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/wan/EMA-VFI-main/dataset.py", line 89, in getitem
img0 = torch.from_numpy(img0.copy()).permute(2, 0, 1)
AttributeError: 'NoneType' object has no attribute 'copy'
How do I use the pre-trained model to continue training on my dataset?
Hi, congrats to your work! I wonder whether or when the training code supporting arbitrary time interpolation would be open-sourced? Thanks a lot!
Hi and conratulations on this awesome work!
I have been experimenting with this abit and I run out of ram when using the large model on 4k images when running on 24GB gpu
Any ideas on how to process large images on 24GB?
I am going to test splitting it up i four pieces and reassembling but I fear there will be seams.
The first 3 elements of the motion feature list are empty []. Could you explain the reason for such a design? And the first 3 dims of motion features are 0, as shown below:
def init_model_config(F=32, W=7, depth=[2, 2, 2, 4, 4]): '''This function should not be modified''' return { 'embed_dims':[F, 2*F, 4*F, 8*F, 16*F], 'motion_dims':[0, 0, 0, 8*F//depth[-2], 16*F//depth[-1]], 'num_heads':[8*F//32, 16*F//32], 'mlp_ratios':[4, 4],
Looking forward to your reply. Thanks.
timm==0.6.11 needs torch version over 1.7.0
Can you tell me what version did you actually use?
Hello, thank you for your kind words.
I'm reaching out with a question regarding interpolating multiple frames during an experiment. I look forward to your helpful response.
I successfully interpolated n images during the inference process using our_t. Now, I want to interpolate n images using the trained pkl file. However, it seems there's a difference between ours_t and ours. How can I generate ours_t?
Did you use other model for arbitrary time training? I think you refer to RIFE, but RIFE has different model (IFNet_m) for arbitrary time training. I would like to know.
Dear Authors,
Thank you very much for presenting such a great video interpolation approach!
We are implementing your work and have a question about the training process -- how many GPUs did you use when training your model with the batch size of 32? What's your GPU type?
We will be very appreciated if you can give the clarification.
Best regards
The existing version only includes the fixed step fill frame training process, I want to learn the process of arbitrary frame interpolation, thx
Hello author, I tried to output the characteristics of optical flow and visualize them, but I ran into problems. I would like to ask which part of the code is the final optical flow that I want to output. Generally, optical flow has two horizontal and vertical components with dimensions of (2, H, W), and I see that the flow that can be output is a list with dimensions of (4, H, W). So which part is the final representation of the optical flow information?
Hello! I am not an expert in Python or programming, but I am interested in the issue of video interpolation.
I absolutely don’t understand what needs to be done to start interpolation. Did I understand correctly that using this program you can increase the number of frames in a video or a series of frames in picture format?
I need help with installation and transition to the neighboring theme "EMA-VFI-WebUI"
Try not using itmm, then it'll be simple to turn it into C/C++. I find it very slow for the initiation, I don't know why. In the example, it seems that cpu is used, is there some special reason not using gpu? GPUs are used more in video processing usually.
I've tried the demo, --n=8 for 30s and --n=32 for 2min(No initialization time) on my 1650gpu. The average of 3.75s per frame is much higher than RIFE(VapourSynth-RIFE-ncnn-Vulkan: rife-v4.6 ensemble=True , I tried using it on a 2h 1080p film from 24 to 60, the whole process is about 14h, average of it is 0.12s per frame), why is that?
HEy man, i did instsaller for it its for webui https://github.com/jhogsett/EMA-VFI-WebUI , it creates own venv in folder and then you can just run webui to activate venv and have fun in webui, i have one issue i see, i can interpolate just one frame inbetwee, with more frames it just takes forever , there is no progressbar so i have no clue whats happening .
Ok finally it did interpolate 7 frames in webui but its so slow its like snail speed compred to rife which is very fast... is that normal ?
feel free to include it no credit
Contents
@echo off
REM Create the virtual environment
python -m venv venv
REM Activate the virtual environment
call venv\Scripts\activate.bat
REM Install dependencies
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
pip install -U scikit-image
pip install numpy
pip install opencv-python
pip install timm
pip install tqdm
pip install gradio
pip install basicsr
pip install realesrgan
echo Virtual environment set up and dependencies installed!
REM Ask if the user wants to run the webui.py script
set /p run_webui="Do you want to run the webui.py script (Y/N)? "
if /i "%run_webui%"=="Y" (
python webui.py
)
pause
As title, I meet RuntimeError: unmatched '}' in format string
when I want to train the model by my own. I wonder if there exists a way to by pass this error.
After typing the command python -m torch.distributed.launch --nproc_per_node=4 train.py --world_size 4 --batch_size 8 --data_path **YOUR_VIMEO_DATASET_PATH**
(I did edit the dataset path), I get the following output in my terminal
Traceback (most recent call last):
...
File "C:\anaconda3\envs\tf2.5\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 538, in _rendezvous
store, group_rank, group_world_size = spec.rdzv_handler.next_rendezvous()
File "C:\anaconda3\envs\tf2.5\lib\site-packages\torch\distributed\elastic\rendezvous\static_tcp_rendezvous.py", line 55, in next_rendezvous
self._store = TCPStore( # type: ignore[call-arg]
RuntimeError: unmatched '}' in format string
My environment: pytorch 1.13.1+cu117/python 3.8.15/win 10
Is it optimized for generating frames that are more than 2x?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.