Git Product home page Git Product logo

e2e_multi_view_matching's Introduction

End2End Multi-View Feature Matching with Differentiable Pose Optimization

This repository contains the implementation of the ICCV 2023 paper: End2End Multi-View Feature Matching with Differentiable Pose Optimization.

Arxiv | Video | Project Page

Cloning the repository

The multi-view matching model is implemented in this fork, which is included as a submodules, therefore please use --recursive to clone the repository:

git clone https://github.com/barbararoessle/e2e_multi_view_matching --recursive

Required python packages are listed in requirements.txt.

Preparing datasets

ScanNet

Extract the ScanNet dataset, e.g., using SenseReader, and place the files scannetv2_test.txt, scannetv2_train.txt, scannetv2_val.txt from ScanNet Benchmark and the preprocessed image overlap (range [0.4, 0.8]) into the same directory <data_dir>/scannet. As a result, we have:

<data_dir>/scannet>
└───scans
|   └───scene0000_00
|   |   |   color
|   |   |   depth
|   |   |   intrinsic
|   |   |   pose
|   ...
└───scans_test
|   └───scene0707_00
|   |   |   color
|   |   |   depth
|   |   |   intrinsic
|   |   |   pose
|   ...
└───overlap
|   └───scans
|   |   |   scene0000_00.json
|   |   |   ...
|   └───scans_test
|   |   |   scene0707_00.json
|   |   |   ...
|    scannetv2_train.txt
|    scannetv2_val.txt
|    scannetv2_test.txt

MegaDepth

We follow the preprocessing done by LoFTR: The depth maps are used from original MegaDepth dataset, download and extract MegaDepth_v1 to <data_dir>/megadepth>. The undistorted images and camera parameters follow the preprocessing of D2-Net, download and extract to <data_dir>/megadepth>. As a result, we have:

<data_dir>/megadepth>
|    MegaDepth_v1
|    Undistorted_SfM
|    scene_info
|    megadepth_train.txt
|    megadepth_val.txt
|    megadepth_test.txt
|    megadepth_valid_list.json

Downloading pretrained models

Pretrained models are available here.

Evaluating on image pairs

Download test pair descriptions scannet_test_1500 and megadepth_test_1500_scene_info from LoFTR into assets/. The option eval_mode specifies the relative pose estimation method, e.g., weighted eight-point with bundle adjustment (w8pt_ba) or RANSAC (ransac).

ScanNet

python3 eval_pairs.py --eval_mode w8pt_ba --dataset scannet --exp_name two_view_scannet --data_dir <path to datasets> --checkpoint_dir <path to pretrained models>

MegaDepth

python3 eval_pairs.py --eval_mode w8pt_ba --dataset megadepth --exp_name two_view_megadepth --data_dir <path to datasets> --checkpoint_dir <path to pretrained models>

Evaluating on multi-view

To run multi-view evaluation, the bundle adjustment using Ceres solver needs to be build.

cd pose_optimization/multi_view/bundle_adjustment
mkdir build
cd build
cmake ..
make -j

It has the following dependencies:

ScanNet

python3 eval_multi_view.py --dataset scannet --exp_name multi_view_scannet --data_dir <path to datasets> --checkpoint_dir <path to pretrained models>

MegaDepth

To simplify internal processing, we convert the MegaDepth data to the same data format as ScanNet. It will be written to <path to datasets>/megadepth_640:

python3 convert_megadepth_to_scannet_format.py --dataset_dir <path to datasets>/megadepth --image_size 640
python3 eval_multi_view.py --dataset megadepth_640 --exp_name multi_view_megadepth --data_dir <path to datasets> --checkpoint_dir <path to pretrained models>

Training

Training stage 1 trains without pose loss, stage 2 with pose loss. Checkpoints are written into a subdirectory of the provided checkpoint directory. The subdirectory is named by the training start time of stage 1 or 2 in the format jjjjmmdd_hhmmss, which is the experiment name. The experiment name can be specified to resume a training or it is used to initialize stage 2 or to run evaluation.

Training on image pairs

ScanNet

Stage 1

python3 -u -m torch.distributed.launch --nproc_per_node=2 --rdzv_endpoint=127.0.0.1:29109 train.py --tuple_size 2 --dataset scannet --batch_size 32 --n_workers 12 --data_dir <path to datasets> --checkpoint_dir <path to write checkpoints>

Training stage 1 is trained until the validation matching loss is converged.

Stage 2

Training stage 2 trains with pose loss and loads the checkpoint from stage 1, therefore the following options are added to stage 1 command:

--init_exp_name <experiment name from stage 1> --pose_loss

Training stage 2 is trained until the validation rotation and translation losses are converged.

MegaDepth

To simplify internal processing, we convert the MegaDepth data to the same data format as ScanNet. Note that for image pairs image_size=720 is used (following SuperGlue), whereas for multi-view image_size=640 is used for computational reasons (following LoFTR). It will be written to <path to datasets>/megadepth_720:

python3 convert_megadepth_to_scannet_format.py --dataset_dir <path to datasets>/megadepth --image_size 720

Stage 1

Training is initialized with the provided pretrained weights of stage 1 on ScanNet.

python3 -u -m torch.distributed.launch --nproc_per_node=1 --rdzv_endpoint=127.0.0.1:29110 train.py --tuple_size 2 --dataset megadepth_720 --batch_size 16 --n_workers 6 --data_dir  <path to datasets> --checkpoint_dir <path to write checkpoints> --init_exp_name pretrained_on_scannet_two_view_stage_1

Training stage 1 is trained until the validation matching loss is converged.

Stage 2

Training stage 2 trains with pose loss and loads the checkpoint from stage 1, therefore option pose_loss is added and init_exp_name needs to be adjusted as follows:

--init_exp_name <experiment name from stage 1> --pose_loss

Training on multi-view

ScanNet

Stage 1

python3 -u -m torch.distributed.launch --nproc_per_node=3 --rdzv_endpoint=127.0.0.1:29111 train.py --tuple_size 5 --dataset scannet --batch_size 8 --n_workers 5 --data_dir  <path to datasets> --checkpoint_dir <path to write checkpoints>

Training stage 1 is trained until the validation matching loss is converged.

Stage 2

Training stage 2 trains with pose loss and loads the checkpoint from stage 1, therefore the following options are added to stage 1 command:

--init_exp_name <experiment name from stage 1> --pose_loss

MegaDepth

To simplify internal processing, we convert the MegaDepth data to the same data format as ScanNet. Note that for image pairs image_size=720 is used (following SuperGlue), whereas for multi-view image_size=640 is used for computational reasons (following LoFTR). It will be written to <path to datasets>/megadepth_640:

python3 convert_megadepth_to_scannet_format.py --dataset_dir <path to datasets>/megadepth --image_size 640

Stage 1

Training is initialized with the provided pretrained weights of stage 1 on ScanNet.

python3 -u -m torch.distributed.launch --nproc_per_node=2 --rdzv_endpoint=127.0.0.1:29112 train.py --tuple_size 5 --dataset megadepth_640 --batch_size 2 --n_workers 4 --data_dir  <path to datasets> --checkpoint_dir <path to write checkpoints> --init_exp_name pretrained_on_scannet_multi_view_stage_1

Training stage 1 is trained until the validation matching loss is converged.

Stage 2

Training stage 2 trains with pose loss and loads the checkpoint from stage 1, therefore option pose_loss is added and init_exp_name needs to be adjusted as follows:

--init_exp_name <experiment name from stage 1> --pose_loss

Citation

If you find this repository useful, please cite:

@inproceedings{roessle2023e2emultiviewmatching,
      title={End2End Multi-View Feature Matching with Differentiable Pose Optimization}, 
      author={Barbara Roessle and Matthias Nie{\ss}ner},
      booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
      month={October},
      year={2023}
}

Acknowledgements

We thank SuperGluePretrainedNetwork, kornia, ceres-solver and NeuralRecon, from which this repository borrows code.

e2e_multi_view_matching's People

Contributors

barbararoessle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

e2e_multi_view_matching's Issues

Inquiry on Custom Data Input for Feature Matching and Pose Estimation

Hello,
First off, I'd like to express my admiration for the work done on the e2e_multi_view_matching project. It's truly impressive, and I'm very interested in exploring its capabilities further.

I've noticed that the evaluation functionality is currently tailored to large datasets such as ScanNet, yfcc100m, and megadepth. This brings me to my question: Is it possible to use our own, smaller datasets as input for this project? Specifically, I'm looking to input two or three RGB images and obtain the feature matching relationships and pose estimations between them. Could you please advise on whether the system can directly accept such inputs? If so, how might one go about this? Alternatively, is it necessary to first train the feature matching weights on a large volume of our own data for each specific scene before applying it to our data scenarios?

I believe this information would be greatly beneficial not only to me but also to others in the community who might be interested in applying this work to more varied and potentially smaller datasets.

Thank you for your time and consideration.

Can you provide a demo to show how to obtain camera pose through image matching?

Hello, I have been following your paper for some time now and was delighted to see that you released the code yesterday.
My goal is to obtain the camera's pose. For example, if I have a template dataset that includes RGB, depth, camera intrinsics, and camera pose, is it possible for me to predict the camera pose for an RGB image of the same scene (like when the same object appears in the picture)? Could you provide an example of how this might be achieved? Or what would be the best approach to take in this situation? Or what should I do?
Looking forward to your answer, I wish you success in your research~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.