Git Product home page Git Product logo

rekd's Introduction

Self-Supervised Equivariant Learning for Oriented Keypoint Detection (CVPR 2022)

This is the official implementation of the CVPR 2022 paper "Self-Supervised Equivariant Learning for Oriented Keypoint Detection" by Jongmin Lee, Byungjin Kim, and Minsu Cho.

Detecting robust keypoints from an image is an integral part of many computer vision problems, and the characteristic orientation and scale of keypoints play an important role for keypoint description and matching. Existing learning-based methods for keypoint detection rely on standard translation-equivariant CNNs but often fail to detect reliable keypoints against geometric variations. To learn to detect robust oriented keypoints, we introduce a self-supervised learning framework using rotation-equivariant CNNs. We propose a dense orientation alignment loss by an image pair generated by synthetic transformations for training a histogram-based orientation map. Our method outperforms the previous methods on an image matching benchmark and a camera pose estimation benchmark.

Rotation-equivariant Keypoint Detection

PyTorch source code for CVPR2022 paper.

"Self-Supervised Equivariant Learning for Oriented Keypoint Detection".
Jongmin Lee, Byungjin Kim, Minsu Cho. CVPR 2022.

[Paper] [Project page]

Installation

Clone the Git repository

git clone https://github.com/bluedream1121/ReKD.git

Install dependency

Run the script to install all the dependencies. You need to provide the conda install path (e.g. ~/anaconda3) and the name for the created conda environment.

bash install.sh [conda_install_path] rekd

Requirements

  • Ubuntu 18.04
  • python 3.8
  • pytorch 1.8.1
  • torchvision 0.9.1
  • kornia 0.5.2
  • opencv-python 4.5.2.54
  • scipy 1.6.3
  • e2cnn 0.1.9

Dataset preparation

Training data

  • ImageNet 2012 for synthetic dataset generation (6.4G) : [Download ImageNet2012 validation set]

    • But, you don't have to use ImageNet 2012 validation set. Any image sequence can be used for the training, because this framework trains the model by self-supervised manner.

Evaluation data

Synthetic data generation

python train.py --data_dir [ImageNet_directory] --synth_dir datasets/synth_data --patch_size 192 --max_angle 180

  • Dataset parameters:

    • data_dir: File containing the image paths for generating synthetic training data.
    • patch_size: The patch size of the generated dataset.
    • max_angle: The max angle value for generating a synthetic view to train.
    • num_training_data: The number of the generated dataset.
  • We release the training data that we use to train our model. please download this link (841M) (password : rekd).
  • Please put the folder to datasets/ directory.

Training

python train.py --synth_dir datasets/synth_data --group_size 36 --batch_size 16 --ori_loss_balance 100

  • Network Architecture:

    • group_size: The order of group for the group convolution. default=36
    • dim_first: The number of channels of the first layer. default=2
    • dim_second: The number of channels of the second layer. default=2
    • dim_third: The number of channels of the third layer. default=2

Test on the HPatches

You can download the pretrained weights [best models] (password : rekd)

python eval_with_extract.py --load_dir [Trained_models] --eval_split full

  • descriptor: File containing the image paths for extracting features.
  • exp_name: The output path to save the extracted features to this directory: extracted_features/[exp_name]
  • num_points: The number of desired features to extract. Default: 1500.
  • pyramid_levels: downsampling pyramid levels.
  • upsampled_level: upsampling image levels.
  • nms_size: window size of non-maximal suppression.
HPatches all variations

Results on HPatches all variations. `*' denotes the results with outlier filtering using the orientation. We use HardNet descriptor for evaluation.

Model Repeatability MMA@3 MMA@5 pred. match. Links Notes
CVPR2022 57.6 73.1 79.6 505.8 - CVPR2022 results
CVPR2022* 57.6 76.7 82.3 440.1 - CVPR2022 results
REKD_release 58.4 73.5 80.1 511.6 model Official retrained model
REKD_release* 58.4 77.1 82.9 444.4 model Official retrained model
python eval_with_extract.py --load_dir trained_models/release_group36_f2_s2_t2.log/best_model.pt --eval_split full
HPatches viewpoint variations

Results on HPatches viewpoint variations. `*' denotes the results with outlier filtering using the orientation. We use HardNet descriptor for evaluation.

Model Repeatability MMA@3 MMA@5 pred. match. Notes
REKD_release 59.1 72.5 78.7 464.9 Official retrained model
REKD_release* 59.1 75.7 81.1 399.8 Official retrained model
HPatches illumination variations

Results on HPatches illumination variations. `*' denotes the results with outlier filtering using the orientation. We use HardNet descriptor for evaluation.

Model Repeatability MMA@3 MMA@5 pred. match. Notes
REKD_release 57.6 74.4 81.5 559.9 Official retrained model
REKD_release* 57.6 78.5 84.7 490.6 Official retrained model

Citation

If you find our code or paper useful to your research work, please consider citing our work using the following bibtex:

@inproceedings{lee2022self,
  title={Self-Supervised Equivariant Learning for Oriented Keypoint Detection},
  author={Lee, Jongmin and Kim, Byungjin and Cho, Minsu},
  booktitle={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages={4837--4847},
  year={2022},
  organization={IEEE}
}

Reference

Contact

Questions can be left as issues in the repository.

rekd's People

Contributors

bluedream1121 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rekd's Issues

RuntimeError: CUDA out of memory.

Hello! I'm very interested in your paper and code. When I try to reproduce the code on Test on the HPatches, I encounter a RuntimeError: CUDA out of memory. Tried to allocate 1.70 GiB (GPU 0; 11.91 GiB total capacity; 8.88 GiB already allocated; 1.32 GiB free; 9.82 GiB reserved in total by PyTorch). How can I resolve this issue?
thanks for your reply!

How to reproduce the results on IMC2021 datasets?

Hi Authors!

I think your paper is very interesting and your clear and compact declaration inspired me a lot. But I met some puzzles when I evaluate your released best model REKD on IMC2021 dataset. I follow the step on the github repository you offered. But I only implemented around 0.6 in mAA(10 degree), while 0.710 was reported in your paper. The screen shot of specific results is attached.

I'd like to ask about how you implemented evaluation on IMC2021? Particularly about the implementation and hyperparameter setting of "MultiScaleFeatureExtractor" on this dataset. Many thanks!

image

About results visualization

Hi, can you release the code of visualization of image matching results?
I'm very interested in how the matching result can be displayed on the image.
Thanks.

Implementation of GIFT descriptor

In the paper, the detector is evaluated using the GIFT descriptor as well, but this code only allows for the other 3 descriptors to be used. Would it be possible to get the code for the GIFT implementation too? Since it gave some of the best results in the paper, it is odd that it has been left out.
Thanks.

Channel Dimension

In the paper, the orientation maps have dimensions |G| x C x H x W, but the maps from the code appear to have dimensions C x |G| x H x W, so just wanted to check that when the argmax, for example, is taken over dim=1, this is being done over the correct dimension. Also, the paper says that the number of channels C used was 2, but running the code seems to show that the dimensions of the orientation maps has C being 1. It's not particularly clear from the paper or the code what C is suppose to represent - is it the colour channels of an image? I'm guessing the group order |G| corresponds to the bin_size/B in the code? These things would be really helpful to understand. Many thanks.

About train.py

Hello, I have configured the experimental environment according to the configuration you posted. But I have the following error
image
the error shows that there is an error in reading the image path, but I have completed the first generation of training, how can I solve this problem?

Match for homograph

Maybe I missed some code, I don't see the match part.
I personally used FLANN of opencv to match the two v_boat pic in the demo, the result is as follow.
I would appreciate it if can guide me to the match part in the project.
viz2

How to reproduce results in Figure 4

Hello,

I'm interested in trying to reproduce the results in Figure 4 (repeatability under synthetic rotations). In particular, can you give me more information on how much Gaussian noise was added after applying the rotation and cropping? Section 4.4 of the ORB paper mentions "Gaussian noise of 10," but this is very vague.

Thank you

Testing on own images

Hi,

Great results! Is there code to test on own images? Or should I work off of the demo code?

I noticed that this has cv2 and scilearn functions for the feature extraction. I see that KeyNet released a pytorch version. Is the inference time a lot slower than KeyNet? And is it worth it to create a pytorch version of this?

Thanks

Reproduce results of retrained model

I am trying to reproduce the results of your retrained model, but I don't know what settings you used, e.g. epochs, num_training_data, learning rate, etc.
I was wondering if you'd be able to describe the changes made between the model for the paper and the retrained model.
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.