zju3dv / loftr Goto Github PK

View Code? Open in Web Editor NEW

2.1K 44.0 329.0 30.84 MB

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021, T-PAMI 2022

Home Page: https://zju3dv.github.io/loftr/

License: Apache License 2.0

Python 6.56% Jupyter Notebook 93.09% Shell 0.35%

feature-matching pose-estimation 3d-vision

loftr's Issues

Adding Mask to Inference

Hi, thanks for sharing your work!
I noticed you enable masking in training phase, does it possible also during inference?
(applying this change fails due to tensors size mismatch, feat_c0<->mask_c0)

Some discussions

I try to match the image pairs in issue #7.
I don't agree with your answers in there.

JiamingSuen: I would like to point out that LoFTR (and SuperGlue) learns the matching priors (i.e., the distribution of matches between the image pair) from the training data, which means that it can only handle rotations exhibited in the training set.

SuperGlue gives a so-so result, while LoFTR cannot find any matches.

Result from LoFTR

Result from SuperGlue

Question on bordering operation in CoarseMatching

Hi, thanks for the fabulous work!

I was looking for works leveraging dense feature matching operations and ended up in this repository.
I am new to the feature matching task, so please be patient with me :)

What is the purpose of bordering the matching mask? (The mask_border() function)
In the function, does indexing [..-b:0..] apply borders on any portion of tensors? If it is intended, I don't get the purpose of it.

Thanks!

Hi. Why pos-conf and neg-conf all use 0.25 * (1-xxxconf)^2 * log(xxxconf)?
neg-conf means a bad match or one that doesn't need to be matched.Considering this reason. As neg-conf go down, the neg-loss as do.
But in the code, loss_neg = - alpha * torch.pow(1 - neg_conf, gamma) * neg_conf.log(), neg-conf decrease, the neg-loss will increase instaead?
Looking forward to your reply

Have you used sync BN when training with multiple GPUs?

大神可以把数据放国内一份吗？太难down了，费老劲了，全是泪啊~~

Unable to reproduce inference results

Hi,

First I'd like to sincerely thank the author for sharing this amazing work.
When I tried the provided colab with default setting (indoor), the resulting plot was quite poor.

Similar problem occurred when I cloned the repo and tested on both indoor and outdoor pairs on
indoor_ds.ckpt and outdoor_ds.ckpt

Can anyone advise on this?
Thank you very much!

Can you post the code?

Hi,

Can I use it as an alternative to Akaze?

Best Regards

difference between validation results and test results

Hi, for the indoor scenes, it seems that validation and test both use the scannet_test_1500 dataset, but their results are different even using the same model. What is the reason?

Which tool do you use to draw the network pipeline in paper

Hi, maybe an off-topic question that is not quite related to the code ;)
I really like the way how you illustrate your LoFTR pipeline in your paper, could you please share which tool do you use to draw that pipeline?
Thanks!

Do different colors represent different accuracy rates in your demo?

Hi, first thanks for your great work.
In your demo, the matching lines have different colors. Do different colors represent different accuracy rates? For example, blue indicates that the point is accurately matched, and yellow indicates that the matching accuracy is relatively low?

two view image with large parallax

Hii,
Using Colab to match two images with large parallax, the results are not well
Is there any way to improve this problem?
thank you for sharing this code.

Could you explain the training option "focal" ?

Hi ! Thank you very much for releasing the implementation.

As I am learning about this technology, I would like to kindly ask about the training option for self.loss_config['coarse_type'] == 'focal' ?

What are the alpha and gamma for ?
Specifically, why did you shape the supervised loss as follows:

loss_pos = - alpha * torch.pow(1 - conf[pos_mask], gamma) * (conf[pos_mask]).log()
loss_neg = - alpha * torch.pow(conf[neg_mask], gamma) * (1 - conf[neg_mask]).log()

And how are the alpha and gamma related to the previous supervised loss in 'cross_entropy' option?

loss_pos  = - torch.log(conf[pos_mask]) 
 loss_neg = - torch.log(1 - conf[neg_mask])

And what are torch.pow(1 - conf[pos_mask], gamma) and torch.pow(conf[neg_mask], gamma) for ?

About the positional encoding

Hi, thanks for sharing this great work.
I wonder if this an implementation mistake in the position encoding part in line 21 of position_encoding.py.
div_term = torch.exp(torch.arange(0, d_model//2, 2).float() * (-math.log(10000.0) / d_model//2))
Should we change (-math.log(10000.0) / d_model // 2) to (-math.log(10000.0) / (d_model // 2) )? Otherwise this part ends up with -1.0 which means very high temperature.

Reproduce Visual Localization Work

Thanks for amazing work!
About the reproduction for the visual localization work with hloc, I have some confusions.
As for my understanding, loftr will give a match result include local keypoints0, local keypoint1 and their matching confidence although the matching process is "local features" free. So the local key point locations are concerned with the image pairs. Different image pairs may produce different keypoints. Like pair (image a, image b) and pair (image a, image c) may result totally different keypoints distribution on image a.
During the sfm procedure in hloc,, triangluation process requires long keypoints tracks to ensure the 3D points' quality. So how to handle this situation while different keyframe pairs produce variance keypoints for one keyframe.
Is there any strategy or principle to combine the keypoints which are close distributed?

I want to know the `RESNETFPN.BLOCK_DIMS` while using ResNetFPN with RESOLUTION of (16, 4)

Hi! Thank you for releasing the training code and the awsome paper.

I want to know the RESNETFPN.BLOCK_DIMS while using ResNetFPN with RESOLUTION of (16, 4).
Thank you very much.

more,
https://github.com/zju3dv/LoFTR/blob/master/train.py#L27
https://github.com/zju3dv/LoFTR/blob/master/train.py#L28
https://github.com/zju3dv/LoFTR/blob/master/train.py#L39
are these mistakes in coding?

Training image size

Hello,
What image size you used for training? It is unclear from your paper.

Thank you

Mutual nearest neighbors

Hello,in coarse_matching.py ,How MNN is implemented？I think it’s a bit difficult for me to understand just by looking at the code Where did this method have been proposed before, and I didn’t find a related introduction. Or is it your new method.
thank you！

Ground-truth labels on my dataset

Hi, authors
if I want train the model on my own dataset, how can I get the Ground-truth label.

I see you said 'We use dense depth maps and camera parameters to compute ground-truth correspondences. The depth maps are provided by depth sensor (ScanNet) and MVS system (MegaDepth).' the camera parameters is ?

What is the accuracy if training with 4 GPUs?

I train the model with 4 GPUs and got 51AP of AUC20 at epoch 15. Is it normal?

Questions on training

Hi, thanks for your great work.
I tried to reproduce the training code, and then I met some problems:

I tried to set match threshold to 0.2 without loading pretrained model, and I found there will be 0 matches from coarse mathcing layer.
If I load the pretrained model to train, the Loss_coarse would go up while Loss_fine going down, and as training time goes on, the num of points from coarse matching layer reduces to 0.
Should I use gradient clipping?

About the ground-truth labels

This is a good job, but i'm very Interested in that how to make the ground-truth labels. I found your macth pair results similar to the SP+SG, you used the SuperPoint labels?

Grocery detection

Hello,
I wonder something,
https://ai.googleblog.com/2020/07/on-device-supermarket-product.html

this article has OCR used for product recognition, I think your method could be more efficient. but I'm not sure. your opinion is very important to me.
In your opinion, which one would be more efficient in recognizing the grocery's product. OCR or Your method?

Best Regards

Onur Güzeldemirci

Easy way to emulate COTR?

Hi,

Is there am (easy) way of supplying LOFTR with the "query points" on one image in order to get the most probable matching locations in the other image?

Problem of ScanNet Dataset pre-processing

Hi,

Thank you for your solid work!

I notice that you have computed the overlap score of image pairs in the given pre-processing file. Can you provide more details on how to compute this score? Or maybe release the pre-processing code of it.

Thanks.

Training with single node and multi-GPUs

Hi, there is an error when I try to train/test LoFTR with multiple GPUs on a single node:

  File "/root/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 208, in evaluation_epoch_end
    model.test_epoch_end(outputs)
  File "/mount/workspace/Code/ARCORT/src/lightning/lightning_loftr.py", line 234, in test_epoch_end
    metrics = {k: flattenList(gather(flattenList([_me[k] for _me in _metrics]))) for k in _metrics[0]}
  File "/mount/workspace/Code/ARCORT/src/lightning/lightning_loftr.py", line 234, in <dictcomp>
    metrics = {k: flattenList(gather(flattenList([_me[k] for _me in _metrics]))) for k in _metrics[0]}
  File "/mount/workspace/Code/ARCORT/src/utils/comm.py", line 196, in gather
    group = _get_global_gloo_group()
  File "/mount/workspace/Code/ARCORT/src/utils/comm.py", line 90, in _get_global_gloo_group
    return dist.new_group(backend="gloo")
  File "/root/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2503, in new_group
    pg = _new_process_group_helper(group_world_size,
  File "/root/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 588, in _new_process_group_helper
    pg = ProcessGroupGloo(
RuntimeError: [enforce fail at /opt/conda/conda-bld/pytorch_1616554788289/work/third_party/gloo/gloo/transport/tcp/device.cc:208] ifa != nullptr. Unable to find interface for: [0.32.0.61]

seems that the dist.new_group() encounters a problem. I'm not familiar with distributed pytorch. Could you provide some suggestions?

supervision for positions do not have valid depth in GT depth map

Hi, i have a question about the coarse level matching supervision for positions do not have valid depth in GT depth map. For example, some points on the top of the tower.

Will these matches with unvalid depth be supervised as un-matched? or just ignore these matches？

Could you provide the training sequence numbers of MegaDepth ?

I find there have 197 sequences in MegaDepth Datasets, from 0000 to 5018.

Could you provide the sequence numbers you used for training LoFTR ?

Is it same as the one used in SuperGlue ?

Thank you so much~ 🙂

can you provide data structure for training?

Hi, thank you so much for your great work! When I tried to train the model on my computer, I got some errors like this:
`Traceback (most recent call last):
File "/home/xzliu/LoFTR-master/train.py", line 123, in
main()
File "/home/xzliu/LoFTR-master/train.py", line 119, in main
trainer.fit(model, datamodule=data_module)
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 553, in fit
self._run(model)
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 918, in _run
self._dispatch()
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 986, in _dispatch
self.accelerator.start_training(self)
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training
self._results = trainer.run_stage()
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 996, in run_stage
return self._run_train()
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1031, in _run_train
self._run_sanity_check(self.lightning_module)
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1115, in _run_sanity_check
self._evaluation_loop.run()
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run
self.advance(*args, **kwargs)
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
dl_outputs = self.epoch_loop.run(
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run
self.advance(*args, **kwargs)
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 93, in advance
batch_idx, batch = next(dataloader_iter)
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 557, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/utils/data/dataset.py", line 219, in getitem
return self.datasets[dataset_idx][sample_idx]
File "/home/xzliu/LoFTR-master/src/datasets/scannet.py", line 78, in getitem
image0 = read_scannet_gray(img_name0, resize=(640, 480), augment_fn=None)
File "/home/xzliu/LoFTR-master/src/utils/dataset.py", line 154, in read_scannet_gray
image = cv2.resize(image, (640,480))
cv2.error: OpenCV(4.4.0) /tmp/pip-req-build-zeowd5_m/opencv/modules/imgproc/src/resize.cpp:3929: error: (-215:Assertion failed) !ssize.empty() in function 'resize'

`
I tried to search the errors on the Internet, the answer told me that maybe I got the wrong position of datasets. So can you provide the detailed data structure for training datasets? Or have you ever met errors like this? Thank you and looking forward to your reply.

Poor accuracy

I tested the code in some random image pairs. The results are extremely bad. How do I improvise the accuracy of the inference?

How to run a batch of image pairs?

I noteced that the the dimmision of the input is "Nx1xHxW". I have tried to used the value of the batch_size is 3, and I got the wrong "m_bids". I am very puzzle. Can you give me an answer? Thank you!

error "conda env create -f environment.yaml "

when conda env create -f environment.yaml
errors Solving environment: failed InvalidVersionSpecError: Invalid version spec: =2.7

python version:3.8
how to solve this problem?

Could you provide the Scannet training pairs?

Hi there,

Could you please help provide the training pairs for Scannet (maybe just a txt file)? It seems generating the pairs need quite long time, that would be really helpful if you already have that. Thanks a lot！

Some question about the coarse

hi,
I have some questions about the coarse supervision

in superbvision.py

1、about mask
if 'mask0' in data:
grid_pt0_i = mask_pts_at_padded_regions(grid_pt0_i, data['mask0'])
grid_pt1_i = mask_pts_at_padded_regions(grid_pt1_i, data['mask1'])

Where is the mask obtained? The data is color ，depth， pose and internal reference
2、 check if mutual nearest neighbor
w_pt0_c_round = w_pt0_c[:, :, :].round().long()
nearest_index1 = w_pt0_c_round[..., 0] + w_pt0_c_round[..., 1] * w1
w_pt1_c_round = w_pt1_c[:, :, :].round().long()
nearest_index0 = w_pt1_c_round[..., 0] + w_pt1_c_round[..., 1] * w0
I don’t quite understand this step, can you give me some interpretation?

3、warp grids
# create kpts in meshgrid and resize them to image resolution
grid_pt0_c = create_meshgrid(h0, w0, False, device).reshape(1, h0w0, 2).repeat(N, 1, 1) # [N, hw, 2]
grid_pt0_i = scale0 * grid_pt0_c
grid_pt1_c = create_meshgrid(h1, w1, False, device).reshape(1, h1w1, 2).repeat(N, 1, 1)
grid_pt1_i = scale1 * grid_pt1_c

how create kpts(key points)?

Hope to answer, thank you

Aachen test code

hi, thanks for your great work, will you release the test code for Aachen localization benchmark?

Auxiliary losses in training statge

Hi, thanks for your awesome work, have you tried to add Auxiliary losses( like other transformer based model, DETR) in the training statge? If yes, would that improve the matching performance？

Hi, did you train with different window size?

Hi, thanks for sharing your work!
did you train with different window size? I want to figure out how the window size affects the results.

Matching threshold used in training step

Hi, thanks for your great work.
After reading your paper, I have some questions regarding what is the match threshold set to in training step.
When I tried to set match threshold to 0.2 like your paper，I found there will be 0 matches from coarse mathcing layer, and fine matching layer and fine-level supervision will lose efficacy. So, should the threshold be set to a smaller value, such as 0.0, during training？

Update for the trained weights ?

Hello,
Thank you for the great works.

I have tried to use your method for outdoor images.
However, after bug fixing, it seems that the weight "outdoor_ds.ckpt" is not working as good as it used to be...
Therefore, could you please provide the update for the "outdoor_ds.ckpt" as well?

Will training code be given these days?

It's a very exciting job！Will training code be given these days?

Different image resolution?

Hi there,
I wanted to use different image resolution, my size is 2307x3466, I know that its quite a lot.
Do I have to re-train the network in order to use such resolution (network breaks if anything other than 640, 480 is used)
Do I really need to resize the image? In my mind, higher resolution means better feature matching, but is that correct?
Thanks

Some quetions about LoFTR

Hi， I tried your code and model, and the result is beyond me expectation.
I still got two questions about LoFTR:
(1) Why not fed colored images into network as they have more information especially in nature scene?
(2) For some reason, I do not have annotations of corresponding points, Is there any WEAK-supervision solution that be used to train LoFTR?
Thanks for your work again, and Looking forward to your reply : )

Questions about names of variables

Thank you for your wonderful work.
I have questions about your code variables.

At LoFTR/src/loftr/utils/coarse_matching.py,
Can you tell me what each "coarse_matches" id means?
(b_ids, i_ids, j_ids, gt_mask, m_bids, mkpts0_c, mkpts1_c, mconf)

Thank you!

Question about Position Encoding

LoFTR/src/loftr/utils/position_encoding.py

Line 21 in a0cf1fc

 div_term = torch.exp(torch.arange(0, d_model//2, 2).float() * (-math.log(10000.0) / d_model//2)) 

Hi, thanks very much for your great work!

While I am confused about how you formulate the position encoding here, looks that it is different how you definite in your paper. Could you please elaborate more on that?

I thought it would more like the following according to your pape?

   `div_term = (1/10000.0) ** (torch.arange(0, d_model//2, 2).float()/ d_model )`

Thanks!

Ground truth for training

Hi! Thank you for releasing the code and the awsome paper.

I am trying to re-implementing the training process.. And would like to ask some questions:

How do you generate ground truth for training?
For example, if I used MegaDepth dataset, have you use the keypoints as a part of generating the ground truth matches of the dataset? Or did you consider the entire HxW of the coarse-scale (but, that will be lots of data)?
From my understanding, the positional encoder of LoFTR has a fixed size of ChannelsxHxW.....So, at training your image size will be at a certain size... but this may be different from the inference phase .... How do solve this problem?
For example, did you scale the positional encoder up? Or did you up/down-sampling the number of elements along with the HxW grids?
Can you train the transformer of the coarse-scale separately from the fine-scale? Or they need to be trained together....?
I have noticed that you have used masks in training ... Could you guide how did you generate the mask? Is the mask representing the edge structure of the GAT? ....
Are you still aim for releasing the codes on 06/10...? It would be great to see the official implementation... :)

no results appeared !

Dear @JiamingSuen
Thank you for sharing such a great work !
I was testing random pair of images and this is the result I got:

I'm wondering that maybe LoFTR is suffering the same problem as SuperGlue that it cant detect matching when the image is rotated more than 45 degree ?
I was using the indoor_ds weights.

Traning issues

Hello, first of all, really great work ~~~
I have just a small question about training process:
Is there any chance to train the model with only single 1080Ti GPU?

the problem during training

Thank you for your amazing work. I have some question that the expec_f_gt after spvs_fine() function are all very large number. And this case cause the correct_mask in computing fine loss is zero.

Conda pip error and how to fix it

On my version of pip, the environment creation is crashing with the following error:

AttributeError: 'FileNotFoundError' object has no attribute 'read'

According to this SO answer, the issue is because of pip interface change and one needs to remove 'file:' or to provide an absolute path to requirements.txt

  - pip:
    - -r requirements.txt

number of matches

Is it possible to specify the number of matches that will be further used for evaluation (for example for findEssentialMat ) in the code?

zju3dv / loftr Goto Github PK

loftr's Issues

Result from LoFTR

Result from SuperGlue

Recommend Projects

Recommend Topics

Recommend Org