Git Product home page Git Product logo

supergluepretrainednetwork's Introduction

Research @ Magic Leap (CVPR 2020, Oral)

SuperGlue Inference and Evaluation Demo Script

Introduction

SuperGlue is a CVPR 2020 research project done at Magic Leap. The SuperGlue network is a Graph Neural Network combined with an Optimal Matching layer that is trained to perform matching on two sets of sparse image features. This repo includes PyTorch code and pretrained weights for running the SuperGlue matching network on top of SuperPoint keypoints and descriptors. Given a pair of images, you can use this repo to extract matching features across the image pair.

SuperGlue operates as a "middle-end," performing context aggregation, matching, and filtering in a single end-to-end architecture. For more details, please see:

We provide two pre-trained weights files: an indoor model trained on ScanNet data, and an outdoor model trained on MegaDepth data. Both models are inside the weights directory. By default, the demo will run the indoor model.

Dependencies

  • Python 3 >= 3.5
  • PyTorch >= 1.1
  • OpenCV >= 3.4 (4.1.2.30 recommended for best GUI keyboard interaction, see this note)
  • Matplotlib >= 3.1
  • NumPy >= 1.18

Simply run the following command: pip3 install numpy opencv-python torch matplotlib

Contents

There are two main top-level scripts in this repo:

  1. demo_superglue.py : runs a live demo on a webcam, IP camera, image directory or movie file
  2. match_pairs.py: reads image pairs from files and dumps matches to disk (also runs evaluation if ground truth relative poses are provided)

Live Matching Demo Script (demo_superglue.py)

This demo runs SuperPoint + SuperGlue feature matching on an anchor image and live image. You can update the anchor image by pressing the n key. The demo can read image streams from a USB or IP camera, a directory containing images, or a video file. You can pass all of these inputs using the --input flag.

Run the demo on a live webcam

Run the demo on the default USB webcam (ID #0), running on a CUDA GPU if one is found:

./demo_superglue.py

Keyboard control:

  • n: select the current frame as the anchor
  • e/r: increase/decrease the keypoint confidence threshold
  • d/f: increase/decrease the match filtering threshold
  • k: toggle the visualization of keypoints
  • q: quit

Run the demo on 320x240 images running on the CPU:

./demo_superglue.py --resize 320 240 --force_cpu

The --resize flag can be used to resize the input image in three ways:

  1. --resize width height : will resize to exact width x height dimensions
  2. --resize max_dimension : will resize largest input image dimension to max_dimension
  3. --resize -1 : will not resize (i.e. use original image dimensions)

The default will resize images to 640x480.

Run the demo on a directory of images

The --input flag also accepts a path to a directory. We provide a directory of sample images from a sequence. To run the demo on the directory of images in freiburg_sequence/ on a headless server (will not display to the screen) and write the output visualization images to dump_demo_sequence/:

./demo_superglue.py --input assets/freiburg_sequence/ --output_dir dump_demo_sequence --resize 320 240 --no_display

You should see this output on the sample Freiburg-TUM RGBD sequence:

The matches are colored by their predicted confidence in a jet colormap (Red: more confident, Blue: less confident).

Additional useful command line parameters

  • Use --image_glob to change the image file extension (default: *.png, *.jpg, *.jpeg).
  • Use --skip to skip intermediate frames (default: 1).
  • Use --max_length to cap the total number of frames processed (default: 1000000).
  • Use --show_keypoints to visualize the detected keypoints (default: False).

Run Matching+Evaluation (match_pairs.py)

This repo also contains a script match_pairs.py that runs the matching from a list of image pairs. With this script, you can:

  • Run the matcher on a set of image pairs (no ground truth needed)
  • Visualize the keypoints and matches, based on their confidence
  • Evaluate and visualize the match correctness, if the ground truth relative poses and intrinsics are provided
  • Save the keypoints, matches, and evaluation results for further processing
  • Collate evaluation results over many pairs and generate result tables

Matches only mode

The simplest usage of this script will process the image pairs listed in a given text file and dump the keypoints and matches to compressed numpy npz files. We provide the challenging ScanNet pairs from the main paper in assets/example_indoor_pairs/. Running the following will run SuperPoint + SuperGlue on each image pair, and dump the results to dump_match_pairs/:

./match_pairs.py

The resulting .npz files can be read from Python as follows:

>>> import numpy as np
>>> path = 'dump_match_pairs/scene0711_00_frame-001680_scene0711_00_frame-001995_matches.npz'
>>> npz = np.load(path)
>>> npz.files
['keypoints0', 'keypoints1', 'matches', 'match_confidence']
>>> npz['keypoints0'].shape
(382, 2)
>>> npz['keypoints1'].shape
(391, 2)
>>> npz['matches'].shape
(382,)
>>> np.sum(npz['matches']>-1)
115
>>> npz['match_confidence'].shape
(382,)

For each keypoint in keypoints0, the matches array indicates the index of the matching keypoint in keypoints1, or -1 if the keypoint is unmatched.

Visualization mode

You can add the flag --viz to dump image outputs which visualize the matches:

./match_pairs.py --viz

You should see images like this inside of dump_match_pairs/ (or something very close to it, see this note):

The matches are colored by their predicted confidence in a jet colormap (Red: more confident, Blue: less confident).

Evaluation mode

You can also estimate the pose using RANSAC + Essential Matrix decomposition and evaluate it if the ground truth relative poses and intrinsics are provided in the input .txt files. Each .txt file contains three key ground truth matrices: a 3x3 intrinsics matrix of image0: K0, a 3x3 intrinsics matrix of image1: K1 , and a 4x4 matrix of the relative pose extrinsics T_0to1.

To run the evaluation on the sample set of images (by default reading assets/scannet_sample_pairs_with_gt.txt), you can run:

./match_pairs.py --eval

Since you enabled --eval, you should see collated results printed to the terminal. For the example images provided, you should get the following numbers (or something very close to it, see this note):

Evaluation Results (mean over 15 pairs):
AUC@5    AUC@10  AUC@20  Prec    MScore
26.99    48.40   64.47   73.52   19.60

The resulting .npz files in dump_match_pairs/ will now contain scalar values related to the evaluation, computed on the sample images provided. Here is what you should find in one of the generated evaluation files:

>>> import numpy as np
>>> path = 'dump_match_pairs/scene0711_00_frame-001680_scene0711_00_frame-001995_evaluation.npz'
>>> npz = np.load(path)
>>> print(npz.files)
['error_t', 'error_R', 'precision', 'matching_score', 'num_correct', 'epipolar_errors']

You can also visualize the evaluation metrics by running the following command:

./match_pairs.py --eval --viz

You should also now see additional images in dump_match_pairs/ which visualize the evaluation numbers (or something very close to it, see this note):

The top left corner of the image shows the pose error and number of inliers, while the lines are colored by their epipolar error computed with the ground truth relative pose (red: higher error, green: lower error).

Running on sample outdoor pairs

[Click to expand]

In this repo, we also provide a few challenging Phototourism pairs, so that you can re-create some of the figures from the paper. Run this script to run matching and visualization (no ground truth is provided, see this note) on the provided pairs:

./match_pairs.py --resize 1600 --superglue outdoor --max_keypoints 2048 --nms_radius 3  --resize_float --input_dir assets/phototourism_sample_images/ --input_pairs assets/phototourism_sample_pairs.txt --output_dir dump_match_pairs_outdoor --viz

You should now image pairs such as these in dump_match_pairs_outdoor/ (or something very close to it, see this note):

Recommended settings for indoor / outdoor

[Click to expand]

For indoor images, we recommend the following settings (these are the defaults):

./match_pairs.py --resize 640 --superglue indoor --max_keypoints 1024 --nms_radius 4

For outdoor images, we recommend the following settings:

./match_pairs.py --resize 1600 --superglue outdoor --max_keypoints 2048 --nms_radius 3 --resize_float

You can provide your own list of pairs --input_pairs for images contained in --input_dir. Images can be resized before network inference with --resize. If you are re-running the same evaluation many times, you can use the --cache flag to reuse old computation.

Test set pair file format explained

[Click to expand]

We provide the list of ScanNet test pairs in assets/scannet_test_pairs_with_gt.txt (with ground truth) and Phototourism test pairs assets/phototourism_test_pairs.txt (without ground truth) used to evaluate the matching from the paper. Each line corresponds to one pair and is structured as follows:

path_image_A path_image_B exif_rotationA exif_rotationB [KA_0 ... KA_8] [KB_0 ... KB_8] [T_AB_0 ... T_AB_15]

The path_image_A and path_image_B entries are paths to image A and B, respectively. The exif_rotation is an integer in the range [0, 3] that comes from the original EXIF metadata associated with the image, where, 0: no rotation, 1: 90 degree clockwise, 2: 180 degree clockwise, 3: 270 degree clockwise. If the EXIF data is not known, you can just provide a zero here and no rotation will be performed. KA and KB are the flattened 3x3 matrices of image A and image B intrinsics. T_AB is a flattened 4x4 matrix of the extrinsics between the pair.

Reproducing the indoor evaluation on ScanNet

[Click to expand]

We provide the groundtruth for ScanNet in our format in the file assets/scannet_test_pairs_with_gt.txt for convenience. In order to reproduce similar tables to what was in the paper, you will need to download the dataset (we do not provide the raw test images). To download the ScanNet dataset, do the following:

  1. Head to the ScanNet github repo to download the ScanNet test set (100 scenes).
  2. You will need to extract the raw sensor data from the 100 .sens files in each scene in the test set using the SensReader tool.

Once the ScanNet dataset is downloaded in ~/data/scannet, you can run the following:

./match_pairs.py --input_dir ~/data/scannet --input_pairs assets/scannet_test_pairs_with_gt.txt --output_dir dump_scannet_test_results --eval

You should get the following table for ScanNet (or something very close to it, see this note):

Evaluation Results (mean over 1500 pairs):
AUC@5    AUC@10  AUC@20  Prec    MScore
16.12    33.76   51.79   84.37   31.14

Reproducing the outdoor evaluation on YFCC

[Click to expand]

We provide the groundtruth for YFCC in our format in the file assets/yfcc_test_pairs_with_gt.txt for convenience. In order to reproduce similar tables to what was in the paper, you will need to download the dataset (we do not provide the raw test images). To download the YFCC dataset, you can use the OANet repo:

git clone https://github.com/zjhthu/OANet
cd OANet
bash download_data.sh raw_data raw_data_yfcc.tar.gz 0 8
tar -xvf raw_data_yfcc.tar.gz
mv raw_data/yfcc100m ~/data

Once the YFCC dataset is downloaded in ~/data/yfcc100m, you can run the following:

./match_pairs.py --input_dir ~/data/yfcc100m --input_pairs assets/yfcc_test_pairs_with_gt.txt --output_dir dump_yfcc_test_results --eval --resize 1600 --superglue outdoor --max_keypoints 2048 --nms_radius 3 --resize_float

You should get the following table for YFCC (or something very close to it, see this note):

Evaluation Results (mean over 4000 pairs):
AUC@5    AUC@10  AUC@20  Prec    MScore
39.02    59.51   75.72   98.72   23.61  

Reproducing outdoor evaluation on Phototourism

[Click to expand]

The Phototourism results shown in the paper were produced using similar data as the test set from the Image Matching Challenge 2020, which holds the ground truth data private for the test set. We list the pairs we used in assets/phototourism_test_pairs.txt. To reproduce similar numbers on this test set, please submit to the challenge benchmark. While the challenge is still live, we cannot share the test set publically since we want to help maintain the integrity of the challenge.

Correcting EXIF rotation data in YFCC and Phototourism

[Click to expand]

In this repo, we provide manually corrected the EXIF rotation data for the outdoor evaluations on YFCC and Phototourism. For the YFCC dataset we found 7 images with incorrect EXIF rotation flags, resulting in 148 pairs out of 4000 being corrected. For Phototourism, we found 36 images with incorrect EXIF rotation flags, resulting in 212 out of 2200 pairs being corrected.

The SuperGlue paper reports the results of SuperGlue without the corrected rotations, while the numbers in this README are reported with the corrected rotations. We found that our final conclusions from the evaluation still hold with or without the corrected rotations. For backwards compatability, we included the original, uncorrected EXIF rotation data in assets/phototourism_test_pairs_original.txt and assets/yfcc_test_pairs_with_gt_original.txt respectively.

Outdoor training / validation scene splits of MegaDepth

[Click to expand]

For training and validation of the outdoor model, we used scenes from the MegaDepth dataset. We provide the list of scenes used to train the outdoor model in the assets/ directory:

  • Training set: assets/megadepth_train_scenes.txt
  • Validation set: assets/megadepth_validation_scenes.txt

A note on reproducibility

[Click to expand]

After simplifying the model code and evaluation code and preparing it for release, we made some improvements and tweaks that result in slightly different numbers than what was reported in the paper. The numbers and figures reported in the README were done using Ubuntu 16.04, OpenCV 3.4.5, and PyTorch 1.1.0. Even with matching the library versions, we observed some slight differences across Mac and Ubuntu, which we believe are due to differences in OpenCV's image resize function implementation and randomization of RANSAC.

Creating high-quality PDF visualizations and faster visualization with --fast_viz

[Click to expand]

When generating output images with match_pairs.py, the default --viz flag uses a Matplotlib renderer which allows for the generation of camera-ready PDF visualizations if you additionally use --viz_extension pdf instead of the default png extension.

./match_pairs.py --viz --viz_extension pdf

Alternatively, you might want to save visualization images but have the generation be much faster. You can use the --fast_viz flag to use an OpenCV-based image renderer as follows:

./match_pairs.py --viz --fast_viz

If you would also like an OpenCV display window to preview the results (you must use non-pdf output and use fast_fiz), simply run:

./match_pairs.py --viz --fast_viz --opencv_display

BibTeX Citation

If you use any ideas from the paper or code from this repo, please consider citing:

@inproceedings{sarlin20superglue,
  author    = {Paul-Edouard Sarlin and
               Daniel DeTone and
               Tomasz Malisiewicz and
               Andrew Rabinovich},
  title     = {{SuperGlue}: Learning Feature Matching with Graph Neural Networks},
  booktitle = {CVPR},
  year      = {2020},
  url       = {https://arxiv.org/abs/1911.11763}
}

Additional Notes

  • For the demo, we found that the keyboard interaction works well with OpenCV 4.1.2.30, older versions were less responsive and the newest version had a OpenCV bug on Mac
  • We generally do not recommend to run SuperPoint+SuperGlue below 160x120 resolution (QQVGA) and above 2000x1500
  • We do not intend to release the SuperGlue training code.
  • We do not intend to release the SIFT-based or homography SuperGlue models.

Legal Disclaimer

Magic Leap is proud to provide its latest samples, toolkits, and research projects on Github to foster development and gather feedback from the spatial computing community. Use of the resources within this repo is subject to (a) the license(s) included herein, or (b) if no license is included, Magic Leap's Developer Agreement, which is available on our Developer Portal. If you need more, just ask on the forums! We're thrilled to be part of a well-meaning, friendly and welcoming community of millions.

supergluepretrainednetwork's People

Contributors

dependabot[bot] avatar johnwlambert avatar kanishkanarch avatar romachalm avatar sarlinpe avatar skydes avatar wesleyliwei avatar zmurez-ml avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

supergluepretrainednetwork's Issues

Problem of getting a match in the signage picture

Thank you for your work sir. I wish you success in your work.

I select a signage photo I downloaded from the internet as image0 with the load_image function in ./demo_superglue.py. When I show the same picture on my phone to computer camera, I have trouble detecting the keypoint. Pairing occurs when I show it too close to the camera. When I show it from a distance that the picture is very clear and obvious, i can't get any matching!

One problem I have noticed is that the picture is slightly brighter than what is visible from the camera.

I wonder if I should take the picture from real life ?

How should I adjust the picture dimensions and how should I set the other parameters.

Thank you in advance for the answer.

About Sinkhorn algorithm use

Hello,

I would like to ask you since I do not completely understand how the Sinkhorn algorithm is used on a non-square matrix N by M. I thought it could be applied only in square matrices.

Thanks in advance

parameter z and alpha

I can not find learnable parameter z that is referred in the paper in the code. According the paper z is used to fill dustbins.

1-Where is z in the code?
2-Does z is fixed during test time for any input pair?
3-What is parameter alpha in def log_optimal_transport(scores, alpha, iters: int) ?

I would be great if you give me a reply.

Some details about homography trained models

Thanks for the excellent work! I have some questions about the homography trained models.
1). What is the threshold of correct matches when evaluating the Recall and Precision of HPatches dataset by synthetic homography trained model?
2). Have you tested the MMA of HPatches by synthetic homography trained model?
3). Does big homography dataset matters during training? Could I train the model with fewer data ?
I would appreciate it if you could give me a reply :)

Details for reproducing the indoor pose estimation experiment

Hi, thanks for the great work!

I'm reproducing SuperGlue on the ScanNet dataset and having some questions about the details of data preprocessing and ground-truth generation.

  1. About data preprocessing. It is mentioned in the appendix-E that you select image pairs that have an overlap score in [0.4, 0.8], however, it is time-consuming to exhaustively search all possible image pairs and calculate the overlap score. Did you use any heuristic to prune out part of the pairs before calculating the overlap score? I think this is not trivial since different implementation might lead to a huge difference in the portion of pairs with a small overlap score thus causing different portions of hard training samples for the network. Furthermore, could you share with me what depth threshold is used when checking depth consistency?
  2. About determining the ground-truth correspondences. It is mentioned in appendix-E that correspondences are determined by the reprojection error. After extracting the correspondences with minimum reprojection error along rows and columns with a threshold, did you check their depth consistency, if so, what threshold is used?

SuperPoint model sample descriptor

Hi, I've seen the SuperPoint repo and now that I'm comparing to this one I noticed a slightly different way of interpolating the descriptors from the keypoint locations. I don't understand this two lines

keypoints = keypoints - s / 2 + 0.5
keypoints /= torch.tensor([(ws - s/2 - 0.5), (hs - s/2 - 0.5)],
).to(keypoints)[None]

Could you explain me the difference between this implementation and simply dividing by the dimensions of the image?

Thank you for this great work!

Question about superPoint keypoint count for Indoor/Outdoor image sets

Thank you for your sharing of this art work.

I know you have no plan to provide training part of superGLUE, but I have one question.

According to your paper, you said, you used 400/512 keypoints(detected by superPoint) for in/outdoor image sets.

But, as you know, there surely could be less or more detected keypoints than that.

What if only 50 keypoints are detected in query image? Did you add zero pad of 350 to fit the tensor shape into batch size x 400 x 256(D) ? If so, did you ignore the zero padded graph nodes when calculating loss?

I hope to get any hint.

Thank you!

run in jetson nano

Is it possible to run this package on the jetson nano? I saw that jetson nano only supports pytorch instead of torch which is used by the demo script of this package.

Which dataset used for the released outdoor models?

Thanks for your great work.

The presented results in YFCC are so good:

Evaluation Results (mean over 4000 pairs):
AUC@5 AUC@10 AUC@20 Prec MScore
39.02 59.51 75.72 98.72 23.61

While I'm confused which dataset is used for your released outdoor model which presents such great result.
Thank you so much.

Question about optimal transport training

Hi,

Thank you so much for sharing your pretrained network! This is really helpful 👍

I am trying to reproduce the result with my own dataset. However, I face some problem where optimal transport seems to prohibit leanring of earlier layers. The loss will increase first and after a few iterations stuck at one place. If I train the early layers without optimal transport (keeping everything the same), I do not see such problem anymore. In the figure below, the blue curve is the training loss with optimal transport layer and the red curve is the training loss without optimal transport.

image

Have you encountered similar issues? Do you have any recommendation for training with optimal transport (such as learning rate etc)?

Thank you again!

Notice in the boilerplate

The notice in the boilerplate makes it seems like something intended for internal code and not for something that is on Github. See below:

# NOTICE:  All information contained herein is, and remains the property
# of COMPANY. The intellectual and technical concepts contained herein
# are proprietary to COMPANY and may be covered by U.S. and Foreign
# Patents, patents in process, and are protected by trade secret or
# copyright law.  Dissemination of this information or reproduction of
# this material is strictly forbidden unless prior written permission is
# obtained from COMPANY.  Access to the source code contained herein is
# hereby forbidden to anyone except current COMPANY employees, managers
# or contractors who have executed Confidentiality and Non-disclosure
# agreements explicitly covering such access.
#
# The copyright notice above does not evidence any actual or intended
# publication or disclosure  of  this source code, which includes
# information that is confidential and/or proprietary, and is a trade
# secret, of  COMPANY.   ANY REPRODUCTION, MODIFICATION, DISTRIBUTION,
# PUBLIC  PERFORMANCE, OR PUBLIC DISPLAY OF OR THROUGH USE  OF THIS
# SOURCE CODE  WITHOUT THE EXPRESS WRITTEN CONSENT OF COMPANY IS
# STRICTLY PROHIBITED, AND IN VIOLATION OF APPLICABLE LAWS AND
# INTERNATIONAL TREATIES.  THE RECEIPT OR POSSESSION OF  THIS SOURCE
# CODE AND/OR RELATED INFORMATION DOES NOT CONVEY OR IMPLY ANY RIGHTS
# TO REPRODUCE, DISCLOSE OR DISTRIBUTE ITS CONTENTS, OR TO MANUFACTURE,
# USE, OR SELL ANYTHING THAT IT  MAY DESCRIBE, IN WHOLE OR IN PART.

What value should bin_score converge to ?

Thanks for your great work. When I trained SuperGlue on my own dataset, I found the bin_score keep increasing at a relatively stable pace instead of converging to a dataset-specific value. I wonder is this a normal situation? If I understand correctly, the larger bin_score is, the more features will be considered as non-matching. So the extreme case is the bin_score is much larger than all the value in the scores matrix, which will lead to no matches surviving after mscores0 > self.config['match_threshold']. Could you provide some explanations?

Best,
Xuyang

About homography pretraining for outdoor scenes

Hi, thank you for your great work!

I'm trying to train superglue+superpoint on MegaDepth dataset. As is described in the paper, the weights for outdoor scenes are initialized with homography model due to limited scene number. I'm wondering how much dose this homography pretraining affect the final results. Is it possible to obtain similar results if I train the model from scratch on megadepth or a larger outdoor dataset with more scenes?

[Discussion] Grey-scale image as an input?

Hey there thank you so much for sharing your brilliant work!

I have one question, why we read images as grey-scaled ones in the demo? Is that what the current algorithm only supports for now? I am curious because to me it seems like color information can greatly help feature detection and matching.

about the paper P ̄

thanks for your work. it's so cool. about the paper. I think it should be P ̄1N+1 <= a not P ̄1N+1 = a. would you please explain why it is equal. I can understand P1N ≤1M and P⊤1M ≤1N. because you said that Set A has one correspond point to Set B at most. BTW.
screen
partial assignment matrix is a float matrix not a [0, 1] matrix. is it right?

output of log_optimal_transport

Hello,

Output of log_optimal_transport function is confusing for me. I want to train a model but output of log_optimal_transport is ambiguous for me.

I get some negative number as output of log_optimal_transport. It's differential version of Hungerian algorithm. So it shouldn't be some positive number?

output of log_optimal_transport when it is higher shows an assignment or when it is lower?

I want to use loss function but I don't know about output of log_optimal_transport

A question on the attention mechanism.

Your work is totally impressed me whatever in results and idea. While I still have some questions about the attention mechanism. In my understanding, self-attention is originally used in sequence to sequence. For example, the input sequence is A = {a1, a2, a3, a4}, the output sequence is B = {b1, b2, b3, b4}. Then by using self-attention, a1 can be expressed into three vector q1, k1, v1, and this is similar for a2, a3, a4. By using q1, k1 and soft-max, we can generate α11, and repeat this we can generate α12, α13, and α14. And finally we can use α as weights to sum up v1, v2, v3, and v4 to generate b1. So you got Equation (4). But I can not understand Equation (5), combining with the equation αij = softmax_j(qi.t()*kj). Because I think qi and kj are generated by the first sequence, but Equation (5) qi and kj are generated from key points from different images, i.e., different sequences. This really confused me, can you explain why?

About the pose of ScanNet

Hello Sarlin, thanks for the solid work! I have reproduced the result in Megadepth :) But I encountered a problem when preprocessing ScanNet Dataset.
I reprojected one image to the other by POSE(Tcw), INTRINSIC and DEPTH, but failed(can not align them). When using SUN3D dataset(preprocessed by OANet) and Megadepth(preprocessed by D2Net), the code works. Did I miss something? Does ScanNet have a different definition of world coordinate?
I would appreciate it if you could give me a reply :)

mismatch problem in c++

hi, i used the superglue modle in c++ by libtorch, but the matching result of superglue apper some mismatches.
smatch

i change the file "superglue.pth" into "superglue.pt", according to the "'matches0': indices0" match point
pt

how to deal with the mismatch ? In python code , just according to the "matches0" .

Slack variables issue

Hello,

I am trying to apply the Sinkhorn algorithm in a similar problem but I have an issue when introducing this slack variable. I am doing some dummy testing with two points and then adding this extra slack variable (dustbin) and it seems to not be able to reduce the loss and learn to do the assignment properly. Any thoughts?

Image fusion after Superglue Matching with Homography Transform

I am trying to perform homography transformation to do image pair fusion with the help of your work.
I wonder the values in your match result npz file in key "keypoints0".
Are they the X,Y values of the keypoints pixel coordinates in image 1 and the "keypoints1" refer to image 2?
Just the same as the SIFT implementation in OpenCV did as queryIdx.pt and trainIdx.pt?
Is it right?

I combined the "keypoints0" "keypoints1" and "match_confidence" data into a list with:

#npz:['keypoints0', 'keypoints1', 'matches', 'match_confidence']
#matches_list.shape:(npz['matches'].shape[0],5)

for m in range(npz['matches'].shape[0]):
matches_list[0]=npz['match_confidence'][m]
matches_list[1]=npz['keyponits0'][m][0]
matches_list[2]=npz['keyponits0'][m][1]
matches_list[3]=npz['keyponits1'][npz['matches'][m]][0]
matches_list[4]=npz['keyponits1'][npz['matches'][m]][1]

Then I sorted according to the match_confidence from high to low and selected the first 50 or 100 matches. The list print result seems that I have got the correspondences right.
Next I threw the selected data into OpenCV findHomography for image fusion but the result seems not well.

To my surprise, the matches result visualization seems quite right but the fusion not well.

Question regarding recoverPose

In the code it has a focal length manually input as 1e9. I just wanted to see if there's any clarification or justification for this, as typically this is default to 1. I couldn't find anything explicitly stated in the paper regarding this.

Edit: I guess another question add on top of this is why is the bias set to True on the MLP layers if it is by default using BatchNorm?

CUDA out of memory when sinkhorn_iterations == 100?

Hi,
I am trying to write a simple code that run superglue model with fake input batch size == 4 (batch size < 4 seems to work well), how ever it crashes with CUDA out of memory error. Is there any thing wrong with it or I have to change the superglue.py code?
PS: Could SuperGlue be trained on one 2020 GPU card?

My code is here:

import torch
import torch.nn as nn

import numpy as np

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# from common.utils.cuda import print_gpu_memory

# from config import GetConfig
# from opt import GetOpt

from models.superglue import SuperGlue

torch.set_grad_enabled(True)

torch.backends.cudnn.deterministic = False
torch.backends.cudnn.benchmark = False

import torch
from loguru import logger

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

print('device = {}'.format(device))

def to_GB(memory):
    return round(memory/1024**3, 1)


def print_gpu_memory(name="", id=0):
    t = torch.cuda.get_device_properties(id).total_memory
    # c = torch.cuda.memory_reserved(id)
    a = torch.cuda.memory_allocated(id)
    # f = c-a  # free inside cache

    # print('Free GPU memory : {}/{}'.format(f, t))
    logger.info('{} GPU memory : {}/{} GB'.format(name, to_GB(a), to_GB(t)))


# import pynvml
# pynvml.nvmlInit()
# # GPU的id
# handle = pynvml.nvmlDeviceGetHandleByIndex(0)
# meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle)
# print(meminfo.used)

batchsize = 8
keypoints = 1024
width = 2
height = 2

config = {
        'superpoint': {
                'nms_radius': 4.0,
                'keypoint_threshold': 0.005,
                'max_keypoints': keypoints
        },
        'superglue': {
                'weights': 'indoor',
                'sinkhorn_iterations': 100,
                'match_threshold': 0.2,
        }
}

print_gpu_memory(id=0)

superglue = SuperGlue(config.get('superglue', {}))

print_gpu_memory(name='init', id=0)

data = {'descriptors0':torch.rand((batchsize, 256, keypoints)).to(device),
        'descriptors1':torch.rand((batchsize, 256, keypoints)).to(device),
        'keypoints0':torch.rand((batchsize, keypoints, 2)).to(device),
        'keypoints1':torch.rand((batchsize, keypoints, 2)).to(device),
        'scores0':torch.rand((batchsize, keypoints)).to(device),
        'scores1':torch.rand((batchsize, keypoints)).to(device),
        'image0':np.random.rand(batchsize, 3, height, width),
        'image1': np.random.rand(batchsize, 3, height, width)}

print_gpu_memory(name='data', id=0)

superglue.to(device)

print_gpu_memory(name='model', id=0)

superglue(data)

print_gpu_memory(name='forward', id=0)

The runtime error message it here:

device = cuda
2020-12-29 18:25:19.262 | INFO     | __main__:print_gpu_memory:38 -  GPU memory : 0.0/10.8 GB
Loaded SuperGlue model ("indoor" weights)
2020-12-29 18:25:20.128 | INFO     | __main__:print_gpu_memory:38 - init GPU memory : 0.0/10.8 GB
2020-12-29 18:25:25.125 | INFO     | __main__:print_gpu_memory:38 - data GPU memory : 0.5/10.8 GB
2020-12-29 18:25:25.169 | INFO     | __main__:print_gpu_memory:38 - model GPU memory : 0.6/10.8 GB
Traceback (most recent call last):
  File "test_model_gputils.py", line 87, in <module>
    superglue(data)
  File "/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/Develop/SuperGluePretrainedNetwork/models/superglue.py", line 253, in forward
    desc0, desc1 = self.gnn(desc0, desc1)
  File "/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/Develop/SuperGluePretrainedNetwork/models/superglue.py", line 138, in forward
    delta0, delta1 = layer(desc0, src0), layer(desc1, src1)
  File "/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/Develop/SuperGluePretrainedNetwork/models/superglue.py", line 120, in forward
    message = self.attn(x, source, source)
  File "/home/liuxiao/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/Develop/SuperGluePretrainedNetwork/models/superglue.py", line 108, in forward
    x, _ = attention(query, key, value)
  File "/data/Develop/SuperGluePretrainedNetwork/models/superglue.py", line 89, in attention
    scores = torch.einsum('bdhn,bdhm->bhnm', query, key) / dim**.5
RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 10.76 GiB total capacity; 9.12 GiB already allocated; 359.12 MiB free; 9.55 GiB reserved in total by PyTorch)
INFO[0019] [Agent: A309085X9311734] Worker FAILED: user command error: exit code 1

Thx,
skylook

Details for training on Megadepth

Hi, thanks for the excellent work!

I'm trying to train SuperGlue on the Megadepth dataset and have some questions about the details of data preprocessing.
I saw the discussion said that the random cropping is used for MegaDepth.

  1. Is the random cropping method similar to D2-Net (selected a random w*h crop centered around one correspondence)?
  2. What is the size of cropping (w*h)?

This is my first time to train such a big work, so maybe I am confused about some basic details.
Looking forward to your reply :)

About pose estimation issue on YFCC100M dataset

Thank you very much for releasing the official implementation of superglue.

I'm trying to test the pose estimation performance on YFCC100M(using the same test sequences as in superglue paper) with command python match_pairs.py --resize 1600 --superglue outdoor --max_keypoints 2048 --nms_radius 3 --resize_float --eval, but get some strange results.

The auc @5/10/20 are 39.08/59.18/75.28, which are lower than reported. After investigation, I find some pairs with very high matching inlier ratio but low pose accuracy, see the picture below for example.
34511353_8612252871_58780959_4178228_evaluation

This pair have nearly 100% inlier ratio but the angular error for translation is high(27.5 degrees).

Is there any explanation for such results? Also do you have any suggestions on reproducing the performance on YFCC100M?

About the RANSAC

I think it's great job. But i have a problem, i test the pretrained model, i foud most time the output not have lots of erro match, so you used the RANSAC in the model to do it or others way to remove the erro matches ?

The question about results on Aachen dataset in original paper.

Hi, thanks for your great work! The paper is very solid and I think it will has a profound influence.
But I have some questions (uncertain details) about visual localization results in the original paper.

  1. Do the results on the Aachen dataset are obtained by a model trained on MegaDepth, or trained individually?
  2. Does the localize pipeline used in the original paper is similar to the local feature challenge 2019?
  3. I reproduced similar results by using the Hierarchical Localization pipeline. But the results from the original paper is marginally lower than workshop results on CVPR 2020, it is caused by the difference from the pipeline or something else?

already matched superpoints gone after adding new superpoints into array

In my case I have 2 images x,y. In the first scenario I have extracted 30 superpoints from image x and extract 512 superpoints from image y. In or to make the size of the arrays same, I created zeros array with size 512. I fill the first 30 element of that zeros array with the superpoints that I extracted from the image x. After that I used superglue and it gave rational matches between those images.

In the second scenario, I have added same 30 superpoints to the zeros array and it becomes filled with 60 superpoints(duplicate same superpoint set that I extracted from image x) and rest is zero. When I tried to match this array with the superpoint array from image y, it gave zero match. But I have used the same superpoints just duplicated them in the array.

Could anyone help me to understand the rational between this behavior of the superglue, thanks in advance.

Is this will be supported in mobile platform?

Thank you for your great work.

I am wondering if this will be used on the smartphone platform? Or do you have any suggestions for using functions, such as mach_pairs, with a mobile phone camera?

Thank you again.

Question about phototourism test pairs

Hi,

Thank you for releasing test pairs for outdoor evaluation on phototourism dataset. However, I'm a little confused about the data format of asset/phototourism_test_pairs.txt . I assume that the first 2 strings in each row correspond to the names of the two images used in each pair but I find that they are the same in every row. In addition, I'm wondering what the last two numbers in each row stand for?

sad

Thank you for your attention.

Sinkhorm Algorithm

Does implemented Sinkhorm algorithm assumes there should be same number of key-points in both images? If the answer is yes, how situations that there is no same number of key-points is handled?

Would you mind sharing the model trained on synthetic homographies?

Hello, I want to reproduce the Table5(Generalization to real data) in the paper, but the Precision and Recall tested by the model you shared (outdoor) are a little bit different from that(Recall increased while precision decreased). Would you mind sharing the model trained on synthetic homographies?

batch mode on match_pairs.py?

I noticed when running match_pairs.py on match mode, it seemed to match ONE image pair at a time, I'm wondering is it possible to run a batch of image pairs at a time? This should accelerate the speed when using GPU?
batch pairs = [im00, im01, im02, ... , im0n] with [im10, im11, im12, ..., im1n]

About SIFT implementation

Hello,

Thanks for your excellent work, I'm trying to implement SIFT with your SuperGlue model. However, the length of the descriptions of SIFT is 128. I think your model default length of the description is 256 the same as Superpoint. Because I got such mistakes when I input the SIFT description into the SuperGlue model even I modify the parameter of 'descriptor_dim' to 128.

What should I do in this case?

About the GT depth images of Megadepth

Hello, I‘m reproducing the paper's results in Megadepth dataset recently. But I found the depth images of Megadepth reconstructed by COLMAP(same with D2-Net) are noisy and had many outliers. Have you refined the depth images generated by COLMAP?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.