Git Product home page Git Product logo

cfnet's People

Contributors

gallenszl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cfnet's Issues

Middlebury 数据集求指教

作者可以共享一下Middlebury数据集的下载链接吗?2014版本的好像是31个pair,但文中作者提到的是28个pair,求作者解答,感谢感谢。

Problem to export model from PyTorch to TensorFlow

Hi,
Thank you for your work, I find it really useful and I am trying to embed it for a test in a real-time environment.
In order to do that, I want to export the model to TensorFlowLite, so that I could do small changes (like quantization) which is more efficient than doing them with PyTorch.
To export the model to TFlite, I first exported it to ONNX and now I'm trying to export it from ONNX to TF with onnx-tf library.
I am using opset_version=11, the lowest version compatible with all the PyTorch operations in CFNet.

However I faced many problems in my journey, first I had a dimension problem with the conversion to ONNX so I decided to use a fixed input size for the images (wh = 512768). I tested the results with the Middlebury SDK (I have done a resize on the input images, run my ONNX model and then resized the disparity maps I get) and these results are quite good.

Then to export to TF, I first had an issue with an unsupported operation :

RuntimeError: Resize coordinate_transformation_mode=pytorch_half_pixel is not supported in Tensorflow.

I tried to add "align_corners=True" inside the upsample functions in the model code, and it solved the problem

Right now I am facing an other issue but I didn't find any way to solve it, here are the logs :

Traceback (most recent call last):
  File "/path/scripts/../export_TF.py", line 17, in <module>
 
   tf_rep.export_graph("%s/%s.pb" % (args.outdir,input_model_name))

  File "/path/onnx-tensorflow/onnx_tf/backend_rep.py", line 143, in export_graph

signatures=self.tf_module.__call__.get_concrete_function(

 File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 1264, in get_concrete_function

    concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)

 File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 1244, in _get_concrete_function_garbage_collected

    self._initialize(args, kwargs, add_initializers_to=initializers)

  File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 785, in _initialize

    self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access

  File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 2983, in _get_concrete_function_internal_garbage_collected

    graph_function, _ = self._maybe_define_function(args, kwargs)

  File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3292, in _maybe_define_function

    graph_function = self._create_graph_function(args, kwargs)

  File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3130, in _create_graph_function func_graph_module.func_graph_from_py_func(

  File "/path/venv/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py", line 1161, in func_graph_from_py_func

    func_outputs = python_func(*func_args, **func_kwargs)

  File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 677, in wrapped_fn

    out = weak_wrapped_fn().__wrapped__(*args, **kwds)

  File "/path/venv/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3831, in bound_method_wrapper

    return wrapped_fn(*args, **kwargs)

  File "/path/venv/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py", line 1147, in autograph_handler

    raise e.ag_error_metadata.to_exception(e)

ValueError: in user code:

    File "/path/onnx-tensorflow/onnx_tf/backend_tf_module.py", line 99, in __call__  *

        output_ops = self.backend._onnx_node_to_tensorflow_op(onnx_node,

    File "/path/onnx-tensorflow/onnx_tf/backend.py", line 347, in _onnx_node_to_tensorflow_op  *

        return handler.handle(node, tensor_dict=tensor_dict, strict=strict)

    File "/path/onnx-tensorflow/onnx_tf/handlers/handler.py", line 58, in handle  *

        cls.args_check(node, **kwargs)

    File "/path/onnx-tensorflow/onnx_tf/handlers/backend/resize.py", line 68, in args_check  *

        x_shape = x.get_shape().as_list()

ValueError: as_list() is not defined on an unknown TensorShape.

Using netron.app, I found that the node 10454 seemed to have a dimension problem (which corresponds to the upsample operation at the line 660 of cfnet.py), so I tried to hardcode all the dimensions with my input size :

pred1_s2 = F.upsample(pred1_s2 * 2, [512, 768], mode='bilinear', align_corners=True)

but it didn't resolve my problem at all, and I really don't have any idea on how to solve it.
My TF version is 2.8.0

Did you already tried (and succeeded) to export the model to TensorFlow, and if so how did you do it ?
If not, do you have any idea on how I could solve this problem ?

Thank you.

About HITNet

Your paper is great and efficient. I want to make further improvements on your basis.But I have a doubt, in your paper, HITNet's inference time is 0.015s. But i didn't find official code of HITNet,
how can I test its inference time on my GPUs. Can you give some guidance?Thanks!

how to obtain the same performance as the given pretrained model

I tried to train the cf-net model using the code from the github and just replace the Mish activation function to Relu for the first 20 epoches and then back to Mish for another 15 epochs just as the paper described. But the performance of the trained model is far from that by the pretrained model given in the gitlab. So what's wrong with my training ? is there any parameter that shoud be modified? I used ./scripts/sceneflow.sh on two V100 GPUs

inference is wrong

I use your weight (finetuning_model) to test, the results have a lot of partition state, I do not know what causes this result.
this is original img ,the original resolution is 1920×1080,resize to 675*380
2023-0706-145024-300204_1688626224 3002038

this is disp_est img
dispView_2023-0706-145024-300204_1688626224 3002038

this is ply img

2023-07-27 11-51-33屏幕截图

Weird warping results from pre-trained model disparity map

Hi, I am getting weird output images on warping the right image with disparity map obtained from pre-trained model. I learnt from the code that disparity map is with respect to left image, hence I tried warping the right image with the disparity map. Below is the warping code I used

def depth_read(filename):
    # loads depth map D from png file and returns it as a numpy array

    depth_png = np.array(Image.open(filename), dtype=np.int64)
    # make sure we have a proper 16bit depth map here.. not 8bit!
    #assert(np.max(depth_png) > 255)

    depth = depth_png.astype(np.float) / 256.0
    depth = depth / depth.shape[1]
    #depth[depth_png == 0] = -1.
    return depth

img = io.imread(<rightimg_filepath>)  # right image
disp = depth_read(<disparity_filepath>) # disparity map with respect to left image

print(img.shape, disp.shape) # (375,1242,3), (375,1242)

img = torch.from_numpy(img.transpose(2,0,1)).float().unsqueeze(0) / 255.0 # img
disp = torch.from_numpy(disp).float().unsqueeze(0).unsqueeze(0) # disp

print(img.shape, disp.shape) # (1, 3, 375, 1242), (1, 1, 375, 1242)


def apply_disparity(img,disp): # gets a warped output
  batch_size, _, height, width = img.size()

  # Original coordinates of pixels
  x_base = torch.linspace(0, 1, width).repeat(batch_size, height, 1).type_as(img)
  y_base = torch.linspace(0, 1, height).repeat(batch_size, width, 1).transpose(1, 2).type_as(img)

  # Apply shift in X direction
  x_shifts = disp[:, 0, :, :]  # Disparity is passed in NCHW format with 1 channel
  flow_field = torch.stack((x_base + x_shifts, y_base), dim=3)
  # In grid_sample coordinates are assumed to be between -1 and 1
  output = F.grid_sample(img, 2*flow_field - 1, mode='bilinear', padding_mode='zeros', 
  align_corners=True)

  return output


output = (apply_disparity(img, -disp)*255.0).detach()[0,:,:,:].cpu().numpy().transpose(1,2,0)
output.shape # (375, 1242, 3)

The disparity maps are obtained from both sceneflow_checkpoint and finetuned_model checkpoint. I warped the same image with these 2 disparity maps but I seem to get the same irregular output. I have used the above warping code many times and I don't think there is any problem with the code. I believe the problem is with disparity map itself. Can someone help me out regarding what could possibly have gone wrong.

Below is the input right image -
https://i.stack.imgur.com/aZia5.jpg

Below is the output warped right (also the estimated left) image I got -
https://i.stack.imgur.com/tHCGo.jpg

The inference time of HITNet

Your paper is great and efficient. I want to make further improvements on your basis.But I have a doubt, in your paper, HITNet's inference time is 0.015s. But i didn't find official code of HITNet,how can I test its inference time on my GPUs. Can you give some guidance?Thanks!

model training

Can this model be trained in unsupervised or supervised manner?

Thank you

Some errors in the code

for cfnet.py, First error:
def generate_search_range(self, sample_count, input_min_disparity, input_max_disparity):
"""
Description: Generates the disparity search range.
Returns:
:min_disparity: Lower bound of disparity search range
:max_disparity: Upper bound of disaprity search range.
"""

    min_disparity = torch.clamp(input_min_disparity - torch.clamp((
            sample_count - input_max_disparity + input_min_disparity), min=0) / 2.0, min=0, max=self.maxdisp)
    max_disparity = torch.clamp(input_max_disparity + torch.clamp(
            sample_count - input_max_disparity + input_min_disparity, min=0) / 2.0, min=0, max=self.maxdisp)

    return min_disparity, max_disparity

it should be "min_disparity = torch.clamp(input_min_disparity - torch.clamp((
sample_count - input_max_disparity + input_min_disparity), min=0) / 2.0, min=0, max=self.maxdisp//4-1)
max_disparity = torch.clamp(input_max_disparity + torch.clamp(
sample_count - input_max_disparity + input_min_disparity, min=0) / 2.0, min=0, max=self.maxdisp//4-1)"
or "min_disparity = torch.clamp(input_min_disparity - torch.clamp((
sample_count - input_max_disparity + input_min_disparity), min=0) / 2.0, min=0, max=self.maxdisp//2-1)
max_disparity = torch.clamp(input_max_disparity + torch.clamp(
sample_count - input_max_disparity + input_min_disparity, min=0) / 2.0, min=0, max=self.maxdisp//2-1)"

Second error: in line 643 of cfnet.py, it should be "predmid_s2 = F.upsample(predmid_s2 * 2, [left.size()[2], left.size()[3]], mode='bilinear', align_corners=True)", not "predmid_s2 = F.upsample(predmid_s2 * 4, [left.size()[2], left.size()[3]], mode='bilinear', align_corners=True)"

About the three stage strategy

In your paper,you mentioned that "we switch the activation function to Mish and prolong the pre-training process in the SceneFlow dataset for another 15 epochs".So,should the learning rate change in another 15 epochs?

Inference time

Hello, I saw CFNet inference time=0.18 on the kitti benchmark, but I tested the kitti dataset on a GTX1080ti, inference time=0.3, what is your test equipment?

about the performance on middlebury dataset.

image
image
image

hello, thanks for your nice job. I test the finetuning_model on some middlebury images, however, in some cases the performance is not satisfying. do you know the reason.

below is the code I used for testing.

from future import print_function, division
import argparse
import os
import glob
from PIL import Image
from matplotlib import pyplot as plt
import cv2
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim as optim
import torch.utils.data
from torch.autograd import Variable
import torchvision.utils as vutils
import torch.nn.functional as F
import numpy as np
import time
from datasets import datasets
from models import models
from utils import *
import PIL.Image
from torch.utils.data import DataLoader
from datasets import listfiles as ls
from datasets import MiddleburyLoader as DA
import sys
import gc
import skimage

cudnn.benchmark = False

parser = argparse.ArgumentParser(description='Cascade and Fused Cost Volume for Robust Stereo Matching(CFNet)')
parser.add_argument('--model', default='cfnet', help='select a model structure', choices=models.keys())
parser.add_argument('--maxdisp', type=int, default=256, help='maximum disparity')

parser.add_argument('--dataset', default='kitti', help='dataset name', choices=datasets.keys())
parser.add_argument('--loadckpt', default='/home/jucic/my_code/CFNet/finetuning_model', help='load the weights from a specific checkpoint')

parse arguments

args = parser.parse_args()

model, optimizer

model = modelsargs.model
model = nn.DataParallel(model)
model.cuda()
model.eval()

load parameters

print("loading model {}".format(args.loadckpt))
state_dict = torch.load(args.loadckpt)
model.load_state_dict(state_dict['model'])

def save_pfm(file, image, scale = 1):
color = None

if image.dtype.name != 'float32':
raise Exception('Image dtype must be float32.')

if len(image.shape) == 3 and image.shape[2] == 3: # color image
color = True
elif len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1: # greyscale
color = False
else:
raise Exception('Image must have H x W x 3, H x W x 1 or H x W dimensions.')

file.write('PF\n' if color else 'Pf\n')
file.write('%d %d\n' % (image.shape[1], image.shape[0]))

endian = image.dtype.byteorder

if endian == '<' or endian == '=' and sys.byteorder == 'little':
scale = -scale

file.write('%f\n' % scale)

image.tofile(file)

def test():

data_path = '/home/jucic/my_code/RAFT_Opti/topdownshelfframe'
with torch.no_grad():
    left_images = glob.glob(os.path.join(data_path,'left/*.png')) + \
             glob.glob(os.path.join(data_path,'left/*.jpg'))
    right_images = glob.glob(os.path.join(data_path,'right/*.png')) + \
             glob.glob(os.path.join(data_path,'right/*.jpg'))
    
    left_images = sorted(left_images)
    right_images = sorted(right_images)

    count = 1
    for imfile1, imfile2 in zip(left_images, right_images):
        image1 = np.array(Image.open(imfile1).convert('RGB'))
        image2 = np.array(Image.open(imfile2).convert('RGB'))

        height = image1.shape[0]/2
        width = image1.shape[1]/2
        height =  int(height+(((height // 32) + 1) * 32 - height) % 32)
        width = int(width+(((width // 32) + 1) * 32 - width) % 32)

        image1 = cv2.resize(image1, (width, height))
        image2 = cv2.resize(image2, (width, height))

        image1 = image1/255.0
        image2 = image2/255.0

        image1 = torch.from_numpy(image1).permute(2,0,1)[None].float()
        image2 = torch.from_numpy(image2).permute(2,0,1)[None].float()

        print(image1.shape)
        
        begin = time.time()
        disp_ests, pred3_s3, pred_s4 = model(image1.cuda(), image2.cuda())
        print("{}ms elapsed by cfnet".format((time.time()-begin)*1000))

        result_folder = os.path.join('/home/jucic/my_code/CFNet', "result_topdownstereo")
        if not os.path.isdir(result_folder):
            os.mkdir(result_folder)

        plt.imsave("{}/{}.png".format(result_folder,str(count).zfill(7)),(disp_ests[-1].cpu().numpy().squeeze()))
        count += 1

if name == 'main':
test()

about Table 3 in the paper

Hello, thanks for the good work.
Just about the Cross-domain generalization evaluation of PSMNet in Table 3.
In the Table 3, the KITTI2015 D1_all of PSMNet trained on Scene Flow datatest is 16.3, while we got the 28.7, which is far from that reported in your paper. And the pre-trained model from github performances 28.
Wondering the reason about it.
Thanks.

Datasets download and placement can be clearer

Sorry for bothering you. Your algorithm is very impressive and helpful. But I did not see clearly how to place the data set in "Data Preparation", can you make the download link more clear, or put the datasets you use in google_drive? Thanks a lot for your kind help!

Problem about pretrained models

Hello, I heard someone say that there was a problem with this code before, I would like to ask if the two pretrained models on the web page are updated now?

Evaluation on Middlebury dataset

Hi,

Thank you for the fantastic work. I am just wondering if the results reported for Middlebury in Table 3 cover the non-occluded or the occluded regions?

Thank you.

Data Augmentation

Hi,

Thank you for sharing this interesting work.

I am just wondering if you have cross-domain generalization results of CFNet trained without the asymmetrical chromatic augmentation and asymmetrical occlusion?

Thank you :)

Error when robust_test

I want to evaluate the accuracy of the self-trained checkpoints on Kitti, running Robust_Test.py, but get the following error
ading model /home/rc/20220410StereoMatching/CFNet/checkpoints/sceneflow/pretrained/checkpoint_000009.ckpt start at epoch 0 downscale epochs: [300], downscale rate: 10.0 setting learning rate to 0.001 Traceback (most recent call last): File "robust_test.py", line 335, in <module> train() File "robust_test.py", line 163, in train loss, scalar_outputs, image_outputs = test_sample(sample, compute_metrics=do_summary) File "/home/rc/20220410StereoMatching/CFNet/utils/experiment.py", line 30, in wrapper ret = func(*f_args, **f_kwargs) File "robust_test.py", line 280, in test_sample imgL, imgR, disp_gt = sample['left'], sample['right'], sample['disparity'] KeyError: 'disparity'

Gamma, Beta in the model weight

Why the value of gamma_s3, gamma_s2, beta_s3, beta_s2 are all zeros in your provided model weights?
If they are all zeros, meaning that they are not functional, right?

How to inference with data?

Hi, it looks like your model has a mutiple level of output, may I ask which one should I use to make inference?
Thanks!

When will you release the code?

Hello, thank you very much for your excellent work. May I ask when you open source? We need to conduct ablation experiment on your work to verify our versatility in an excellent work like yours.

preprocess for predicting Custom dataset

Hi.
I am thinking of applying your method to my own custom dataset.
So, I added the following code to save_disp.py's main with reference to datasets/sceneflow_dataset.py.

# test one sample
# @make_nograd_func
# def test_sample(sample):
#     model.eval()
#     disp_ests, pred1_s3_up, pred2_s4 = model(sample['left'].cuda(), sample['right'].cuda())
#     return disp_ests[-1]
@make_nograd_func
def test_sample(left, right):
    model.eval()
    disp_ests, pred1_s3_up, pred2_s4 = model(left.cuda(), right.cuda())
    return disp_ests[-1]


if __name__ == '__main__':
    left_img = Image.open("/media/A/left/0.png").convert("RGB")
    right_img = Image.open("/media/A/right/0.png").convert("RGB")

    w, h = left_img.size
    crop_w, crop_h = 950, 512
    left_img = left_img.crop((w-crop_w, h-crop_h, w, h))
    right_img = right_img.crop((w-crop_w, h-crop_h, w, h))

    processed = get_transform()
    left_img = processed(left_img)
    right_img = processed(right_img)
    test_sample(left_img, right_img)

Then I get the following error.

Mish activation loaded...
Mish activation loaded...
Mish activation loaded...
Mish activation loaded...
Mish activation loaded...
  File "/home/ubuntu/Apps/CFNet/models/cfnet.py", line 136, in forward
    x = self.firstconv(x)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 446, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 3, 3, 3], but got 3-dimensional input of size [3, 512, 950] instead

Probably this is due to wrong input to the preprocessing network.

how can I generate a disparity image with a custom dataset?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.