Git Product home page Git Product logo

videoflow's People

Contributors

xiaoyushi97 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

videoflow's Issues

Domain adaptability

Hi, thank you for the inspiring work!

I am wondering how is the ability of the pre-trained model for domain adaptation. (e.g. performing optical flow estimation on arbitrary real-world video)

Since the performance improves a lot when the model is fine-tuned on Sintel and KITTI, how well do you think the pre-trained model would perform on the unseen datasets?

If I would like to apply the method in VideoFlow on different video sets, is it possible to leverage the model trained on "C+T+S+K+H"? Otherwise, what method would you suggest for such scenario?

Question on training code

Great work. I am wondering how you get the bidirectional ground truth from datasets like Sintel and where in the code do you use it as I see "oneside=True" everywhere. Thanks!

nan value

Good job!

I have a question. When I train the MOF model, I often encounter 'nan' values. Is this normal?

Reproduce results T+S+K+H

Hi, thanks for your code. I wonder which released model can reproduced the results T+S+K+H in Table 1

Missing of inference results

Thanks for your nice work!
581530d7fea5711283417bff9c2d603e
Above is my inference result. The first three numbers of the file name are the camera ID. Why is there no forward flow from time 0001 to 0002, and no backward flow from the last frame to the penultimate frame?

About evaluating on the Sintel dataset

Hello, I think my question is a bit low-level. I'm very sorry if it bothers you. I tried to make changes to the network and hoped to upload the evaluation results on the Sintel dataset to the official website. However, I was unable to successfully register an account and the webpage kept prompting me with "Domain not enabled." I don't know what caused this. I wrote an email to the official website of the Sintel dataset but did not respond. Please forgive my ignorance.

CUDA out of memory

I have trained the MOF model using 2x32GB 100V, and the parameters are set to
_CN.batch_size = 4 or 8,
_CN.image_size = [432, 960],
_CN.MOFNetStack.corr_fn = 'default' or use 'efficient',
_CN.MOFNetStack.decoder_depth = 6 or 12,
others parms is defualt in sintel_multiframes.py,
But get error:

Original Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, *kwargs)
File "/workspace/Flow3D/VideoFlow/core/Networks/MOFNetStack/network.py", line 172, in forward
cnet = self.cnet(images[:, 1:N-1, ...].reshape(B
(N-2), 3, H, W))
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/workspace/Flow3D/VideoFlow/core/Networks/encoders.py", line 31, in forward
x = blk(x, size)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/timm/models/twins.py", line 217, in forward
x = x + self.drop_path(self.attn(self.norm1(x), size))
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/timm/models/twins.py", line 188, in forward
attn = (q @ k.transpose(-2, -1)) * self.scale
RuntimeError: CUDA out of memory. Tried to allocate 926.00 MiB (GPU 0; 31.75 GiB total capacity; 29.04 GiB already allocated; 863.69 MiB free; 29.68 GiB reserved in total by PyTorch)

Debug the scripts, error occurred in the line:
'''
with autocast(enabled=self.cfg.mixed_precision):
net, motion_hidden_state, up_mask, delta_flow = self.update_block(net, motion_hidden_state, inp, forward_corr, backward_corr, forward_flow, backward_flow, forward_coords0, attention, bs=B)
'''

How should I resolve this issue?


Training the BOF model has same error:
File "train_BOFNet.py", line 83, in train
flow_predictions = model(images, output)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 165, in forward
return self.module(*inputs[0], **kwargs[0])
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/workspace/Flow3D/VideoFlow/core/Networks/BOFNet/network.py", line 103, in forward
corr_fn_23 = CorrBlock(fmap2, fmap3, num_levels=self.cfg.corr_levels, radius=self.cfg.corr_radius)
File "/workspace/Flow3D/VideoFlow/core/Networks/BOFNet/corr.py", line 82, in init
corr = CorrBlock.corr(fmap1, fmap2)
File "/workspace/Flow3D/VideoFlow/core/Networks/BOFNet/corr.py", line 121, in corr
corr = torch.matmul(fmap1.transpose(1, 2), fmap2)
RuntimeError: CUDA out of memory. Tried to allocate 1.25 GiB (GPU 0; 31.75 GiB total capacity; 29.41 GiB already allocated; 225.69 MiB free; 30.30 GiB reserved in total by PyTorch)

Strange pattern in smooth areas (e.g., sky)

Great work! I'm using VideoFlow to extract optical flow for video generation. However, I find that when estimate the optical flow in static areas, some strange patterns will appear in smooth areas (e.g., sky)
The input 5 key frames :
https://github.com/XiaoyuShi97/VideoFlow/assets/30406852/7c150287-eddc-408b-9f62-82612c03c19e

Then the strange pattern will appear in the sky:
zflow_4_to_5
zflow_2_to_1

How to avoid this phenomenon?
Can I change some hyperparameter to make the motion in the sky more smoother?
Or should I retrain the model?

Looking forward to your reply

Generalization error on KITTI

Hi, thanks for sharing your impressive work. I have downloaded the MOF_things.pth and changed _CN.model of configs/multiframes_sintel_submission.py accordingly. But after running command python -u evaluate_MOFNet.py --dataset=kitti, I get Validation KITTI: 4.284974, 15.392534, which is much worse than the results reported in the paper. How can I get the generalization results on KITTI(train) of Table 1 in the paper?

Computing resource

Good job! What kind of equipment and how many days does it take to use your published training strategy? In a closed issue, the answer is not very detailed, please allow me to ask one more time.

CUDA out of memory when running inference.py

I'm very sorry to bother you. I have followed all the settings previously issued to reduce memory, but still CUDA out of memory. I would like to inquire if there is a way to run inference.py with multiple GPUs to avoid CUDA out of memory? Any response from you will be of great help to me, and I look forward to your help.
image

future flow prediction

Amazing work thanks for sharing. what would happen if a model was trained with the same 3 input frames but with the output flow from the 3rd->4th frame? could it potentially do future flow prediction?

pretrained model of KITTI

Hi, thanks for sharing. I wanted to evaluate on KITTI, will you provide the pretrained model of KITTI? Thanks!

KITTI multi-view set

Thank you for your impressive VideoFlow!

Since the original KITTI is based on two-frame only, could you share your multi-frame/view KITTI data? Thanks!

Unable to find dataset.py

Hi, I am trying to run training, but every time I am getting PATH_TO_FINAL/final error. Though I have created a datasets folder and inside that folder I placed Sintel and KITTI datasets.

Guide me please.

about Compute resource

Great work! I'm curious about the GPUs you used to finish the experiment. Can you tell me more about it?

Consultation on color coding of output images

Hello,

I'm new to optical flow and using inference.py for my image processing. I'm struggling to interpret the colors in the output images, specifically regarding motion direction and speed. Could you please provide some guidance or documentation on how to read these color mappings? If you could give me some help, I would be extremely grateful.

Thank you!

video test

Please explain how you conduct video testing?

long sequences

First of all thanks for this excellent work! I am a physics researcher (fluid mechanics) and am currently investigating how the measure of optical flow can help extract measurements on experimental videos. Thus, I am only interested in the inference module, using the Sintel pretrained model.

I managed to configure the code so that computations can run on my images but I am facing a memory problem whenever the image sequences are too long. If I understand correctly, the flow is computed over 5 images but the whole image sequence is staked and then fed to the model. Is there a clever way to send the images by group of 5 and to collect the results progressively instead of in one huge variable?

higher cuda and torch version

Hi,
Another problem, I'm using cuda11.6 + torch 1.12. When I run the BOF mode with 3 images, I got this error

[Using twins as context encoder]
[Using twins as feature encoder]
[Using GMA-SK2]
[Using corr_fn default]
VideoFlow_ckpt/BOF_kitti.pth
Parameter Count: 12659389
preparing image...
Input image sequence dir = /home/sti/Downloads/VideoFlow/test/images
/home/sti/anaconda3/envs/limap/lib/python3.9/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /opt/conda/conda-bld/pytorch_1656352657443/work/aten/src/ATen/native/TensorShape.cpp:2894.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

Do I have to use the exact torch/torchvision/cudatoolkit version? I think it's pretty difficult since these version combinations are a little old.
And my configs/kitti.py file is like this

from yacs.config import CfgNode as CN
_CN = CN()

_CN.name = ''
_CN.suffix =''
_CN.gamma = 0.85
_CN.max_flow = 400
_CN.batch_size = 8
_CN.sum_freq = 100
_CN.val_freq = 100000000
_CN.image_size = [1080, 720]
_CN.add_noise = False
_CN.use_smoothl1 = False
_CN.critical_params = []

_CN.network = 'BOFNet'
_CN.mixed_precision = False
_CN.filter_epe = False


_CN.restore_ckpt = None

_CN.model = "VideoFlow_ckpt/BOF_kitti.pth"



_CN.BOFNet = CN()
_CN.BOFNet.pretrain = True
_CN.BOFNet.cnet = 'twins'
_CN.BOFNet.fnet = 'twins'
_CN.BOFNet.gma = 'GMA-SK2'
_CN.BOFNet.corr_fn = "default"
_CN.BOFNet.mixed_precision = False

_CN.BOFNet.decoder_depth = 12
_CN.BOFNet.critical_params = ["cnet", "fnet", "pretrain", "corr_fn", "mixed_precision"]

### TRAINER
_CN.trainer = CN()
_CN.trainer.scheduler = 'OneCycleLR'
_CN.trainer.optimizer = 'adamw'
_CN.trainer.canonical_lr = 12.5e-5
_CN.trainer.adamw_decay = 1e-4
_CN.trainer.clip = 1.0
_CN.trainer.num_steps = 80000
_CN.trainer.epsilon = 1e-8
_CN.trainer.anneal_strategy = 'linear'
def get_cfg():
    return _CN.clone()

Really want to see the performance of this repo, but haven't got it to work no matter how😂

minimum GPU memory

Hi,
I'm using an RTX2080 with 8G memory, I got "CUDA out of memory" error even if I reduced the number of images to 5 in MOF mode. I wonder what's the minimum GPU memory necessary to get inference to work.
Thanks~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.