Hi, Excellent work! It is a good idea to combine the stereo matching and optical f

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

About training epoch and time about bridgedepthflow HOT 16 OPEN

lelimite4444 commented on July 18, 2024

About training epoch and time

from bridgedepthflow.

Comments (16)

lelimite4444 commented on July 18, 2024

Thank you for asking. I trained 80 epochs on Monodepth model, but actually 40 epochs could show comparable results and it would take about 2 days.
For PWC net, because of the larger input size(832 x 256 compare to 512 x 256 in Monodepth) and more parameters, it would take about 4 days.

from bridgedepthflow.

zmlshiwo commented on July 18, 2024

Thank you for your reply. I trained on a 1080Ti. It was slow. Does this code support CUDA10?

from bridgedepthflow.

lelimite4444 commented on July 18, 2024

I've just tried it and it works on CUDA10.
There's an error you may meet: ModuleNotFoundError: No module named 'correlation_cuda'
Solution: under models/networks/correlation_package, run

python3 setup.py build
python3 setup.py install

from bridgedepthflow.

zmlshiwo commented on July 18, 2024

Ok, I will try this code on a new GPU TITAN RTX and CUDA10. Is the version of pytorch still 1.0.0 at CUDA10? And I trained the network and got a error about data stream. When I trained for a while, maybe about 8000 iters, I got the following error:
"OSError: unrecognized data stream contents when reading image file."
My python version is Python 3.5. And I change the 'jpg' in kitti_train_files_png_4frames.txt to 'png'. Because my data is 'png'.

from bridgedepthflow.

lelimite4444 commented on July 18, 2024

Yes, still PyTorch 1.0.0 and CUDA10.
I have tested it for 2 epochs but no error occured. Maybe you could paste all the error message here or refer to https://github.com/mrharicot/monodepth to convert png to jpeg.

from bridgedepthflow.

zmlshiwo commented on July 18, 2024

Thank you. I will try it.

from bridgedepthflow.

zmlshiwo commented on July 18, 2024

@lelimite4444 Hi, I am training the Monodepth network on a TITAN RTX GPU with batch size of 2 and epoch of 80. The input resolution is 512 x 256. I find that training one epoch takes 2.25 hours. So, if training 80 epochs, it will take 2.25*80/24=7.5 days. I find the GPU memory is taken about 6GB. So, why not improve the batch size and reduce the epochs? Thank you.

from bridgedepthflow.

lelimite4444 commented on July 18, 2024

As the input resolution of PWC-net is 832x256 and it would takes about 10GB. I just use the same batch size, but you can also try batch size of 3 to reduce the time. Thanks for suggestion.

from bridgedepthflow.

zmlshiwo commented on July 18, 2024

@lelimite4444 Hi,
I have trained the Monodepth network with 80 epochs. I get the following results.
For depth, on kitti2015 stereo dataset
abs_rel, sq_rel, rms, log_rms, d1_all, a1, a2, a3
0.0686, 0.8439, 4.372, 0.150, 9.455, 0.941, 0.978, 0.989
The results are worse than your paper.
For flow,
on KITTI2012
EPE-all
2.7403344286164057
EPE-noc
1.5548732144009207

on KITTI2015
EPE-all
8.196574949026108
Fl-all
0.30244611186808606
EPE-noc
5.489374106973409
Fl-noc
0.24511039975825583

The results are also worse than your paper.
So, maybe I loss some details?
I use this command to start training.

python3 train.py --data_path /home/ubuntu/Data/KITTI_raw_data/ --filenames_file ./utils/filenames/kitti_train_files_png_4frames.txt --batch_size 2 --num_epochs 80 --checkpoint_path /home/ubuntu/Data/Bridge_depth_flow_model/init/ --type_of_2warp 2
Can you find some problems about my training?

Best,
Zhai

from bridgedepthflow.

zmlshiwo commented on July 18, 2024

@lelimite4444 And I also find that in your paper, you set the five hyper-parameters (alpha, Beta, Lsm, Lr,L2warp) to (0.85,10,10,0.5,0.2). I find in train.py, the alpha is set to 0.85, Beta is set to 10, the Lr is set to 0.5 and the Lsm is set to 10. These four values are similar to your paper.
However, I find the L2warp is set to 0.1 in this function,
"loss += 0.1 * sum([warp_2(warp2_est_4[i], left_pyramid[i][[6,7]], mask_4[i], args) for i in range(4)])" where L2warp is set to 0.1.
So, this value L2warp are different from your paper. Is the difference of parameter L2warp settings causing me to get worse results?

from bridgedepthflow.

lelimite4444 commented on July 18, 2024

@zmlshiwo Actually, I trained on stereo and flow without 2warp modules as a pretrained model. Having this better initialization, adding 2warp may improve the performance.

I've tried both 0.1 and 0.2 of the L2warp, it doesn't cause that much.

from bridgedepthflow.

zmlshiwo commented on July 18, 2024

@lelimite4444 Thank you. So, you mean that you first train a only flow+stereo model with 80epochs. And then, you use this pretrained model as initialization model and train this model 80epochs with 2warp. So, the whole process is about 160epochs?

from bridgedepthflow.

lelimite4444 commented on July 18, 2024

I trained the pretrained model with 40 epochs, so the total is 120 epochs. But i think 40+40 is enough, maybe you can use tensorboard to check the how well your model performs now.

from bridgedepthflow.

zmlshiwo commented on July 18, 2024

@lelimite4444 Ok, thank you. I understand. Last time, I did not use the pre-trained model and just trained 80epochs using 2warp and without initialization model.

from bridgedepthflow.

zmlshiwo commented on July 18, 2024

@lelimite4444 Hi, one more question. Is the results of model Ours (flow + stereo) in Table 1, 2, 3 only trained for 40epochs? (Training flow+stereo without 2warp 40 epochs)

from bridgedepthflow.

wanghao14 commented on July 18, 2024

@lelimite4444 Hi, I want to know the setting of the super parameters when you use the model which pretrained on the stereo and flow without 2warp. Do they are consistent with these when trained from scratch?

from bridgedepthflow.

About training epoch and time about bridgedepthflow HOT 16 OPEN

Comments (16)

Related Issues (7)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent