zju3dv / loftr Goto Github PK
View Code? Open in Web Editor NEWCode for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021, T-PAMI 2022
Home Page: https://zju3dv.github.io/loftr/
License: Apache License 2.0
Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021, T-PAMI 2022
Home Page: https://zju3dv.github.io/loftr/
License: Apache License 2.0
Hi, thanks for sharing your work!
I noticed you enable masking in training phase, does it possible also during inference?
(applying this change fails due to tensors size mismatch, feat_c0<->mask_c0)
I try to match the image pairs in issue
#7.
I don't agree with your answers in there.
JiamingSuen: I would like to point out that LoFTR (and SuperGlue) learns the matching priors (i.e., the distribution of matches between the image pair) from the training data, which means that it can only handle rotations exhibited in the training set.
SuperGlue gives a so-so result, while LoFTR cannot find any matches.
Hi, thanks for the fabulous work!
I was looking for works leveraging dense feature matching operations and ended up in this repository.
I am new to the feature matching task, so please be patient with me :)
What is the purpose of bordering the matching mask? (The mask_border()
function)
In the function, does indexing [..-b:0..]
apply borders on any portion of tensors? If it is intended, I don't get the purpose of it.
Thanks!
Hi. Why pos-conf and neg-conf all use 0.25 * (1-xxxconf)^2 * log(xxxconf)?
neg-conf means a bad match or one that doesn't need to be matched.Considering this reason. As neg-conf go down, the neg-loss as do.
But in the code, loss_neg = - alpha * torch.pow(1 - neg_conf, gamma) * neg_conf.log()
, neg-conf decrease, the neg-loss will increase instaead?
Looking forward to your reply
Hi,
First I'd like to sincerely thank the author for sharing this amazing work.
When I tried the provided colab with default setting (indoor), the resulting plot was quite poor.
Similar problem occurred when I cloned the repo and tested on both indoor and outdoor pairs on
indoor_ds.ckpt and outdoor_ds.ckpt
Can anyone advise on this?
Thank you very much!
Hi,
Can I use it as an alternative to Akaze?
Best Regards
Hi, for the indoor scenes, it seems that validation and test both use the scannet_test_1500 dataset, but their results are different even using the same model. What is the reason?
Hi, maybe an off-topic question that is not quite related to the code ;)
I really like the way how you illustrate your LoFTR pipeline in your paper, could you please share which tool do you use to draw that pipeline?
Thanks!
Hi, first thanks for your great work.
In your demo, the matching lines have different colors. Do different colors represent different accuracy rates? For example, blue indicates that the point is accurately matched, and yellow indicates that the matching accuracy is relatively low?
Hii,
Using Colab to match two images with large parallax, the results are not well
Is there any way to improve this problem?
thank you for sharing this code.
Hi ! Thank you very much for releasing the implementation.
As I am learning about this technology, I would like to kindly ask about the training option for self.loss_config['coarse_type'] == 'focal'
?
What are the alpha and gamma for ?
Specifically, why did you shape the supervised loss as follows:
loss_pos = - alpha * torch.pow(1 - conf[pos_mask], gamma) * (conf[pos_mask]).log()
loss_neg = - alpha * torch.pow(conf[neg_mask], gamma) * (1 - conf[neg_mask]).log()
And how are the alpha and gamma related to the previous supervised loss in 'cross_entropy' option?
loss_pos = - torch.log(conf[pos_mask])
loss_neg = - torch.log(1 - conf[neg_mask])
And what are torch.pow(1 - conf[pos_mask], gamma)
and torch.pow(conf[neg_mask], gamma)
for ?
Hi, thanks for sharing this great work.
I wonder if this an implementation mistake in the position encoding part in line 21 of position_encoding.py.
div_term = torch.exp(torch.arange(0, d_model//2, 2).float() * (-math.log(10000.0) / d_model//2))
Should we change (-math.log(10000.0) / d_model // 2)
to (-math.log(10000.0) / (d_model // 2) )
? Otherwise this part ends up with -1.0 which means very high temperature.
Thanks for amazing work!
About the reproduction for the visual localization work with hloc, I have some confusions.
As for my understanding, loftr will give a match result include local keypoints0, local keypoint1 and their matching confidence although the matching process is "local features" free. So the local key point locations are concerned with the image pairs. Different image pairs may produce different keypoints. Like pair (image a, image b) and pair (image a, image c) may result totally different keypoints distribution on image a.
During the sfm procedure in hloc,, triangluation process requires long keypoints tracks to ensure the 3D points' quality. So how to handle this situation while different keyframe pairs produce variance keypoints for one keyframe.
Is there any strategy or principle to combine the keypoints which are close distributed?
Hi! Thank you for releasing the training code and the awsome paper.
I want to know the RESNETFPN.BLOCK_DIMS
while using ResNetFPN with RESOLUTION of (16, 4).
Thank you very much.
more,
https://github.com/zju3dv/LoFTR/blob/master/train.py#L27
https://github.com/zju3dv/LoFTR/blob/master/train.py#L28
https://github.com/zju3dv/LoFTR/blob/master/train.py#L39
are these mistakes in coding?
Hello,
What image size you used for training? It is unclear from your paper.
Thank you
Hello,in coarse_matching.py ,How MNN is implemented?I think it’s a bit difficult for me to understand just by looking at the code Where did this method have been proposed before, and I didn’t find a related introduction. Or is it your new method.
thank you!
Hi, authors
if I want train the model on my own dataset, how can I get the Ground-truth label.
I see you said 'We use dense depth maps and camera parameters to compute ground-truth correspondences. The depth maps are provided by depth sensor (ScanNet) and MVS system (MegaDepth).' the camera parameters is ?
I train the model with 4 GPUs and got 51AP of AUC20 at epoch 15. Is it normal?
Hi, thanks for your great work.
I tried to reproduce the training code, and then I met some problems:
This is a good job, but i'm very Interested in that how to make the ground-truth labels. I found your macth pair results similar to the SP+SG, you used the SuperPoint labels?
Hello,
I wonder something,
https://ai.googleblog.com/2020/07/on-device-supermarket-product.html
this article has OCR used for product recognition, I think your method could be more efficient. but I'm not sure. your opinion is very important to me.
In your opinion, which one would be more efficient in recognizing the grocery's product. OCR or Your method?
Best Regards
Onur Güzeldemirci
Hi,
Is there am (easy) way of supplying LOFTR with the "query points" on one image in order to get the most probable matching locations in the other image?
Hi,
Thank you for your solid work!
I notice that you have computed the overlap score of image pairs in the given pre-processing file. Can you provide more details on how to compute this score? Or maybe release the pre-processing code of it.
Thanks.
Hi, there is an error when I try to train/test LoFTR with multiple GPUs on a single node:
File "/root/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 208, in evaluation_epoch_end
model.test_epoch_end(outputs)
File "/mount/workspace/Code/ARCORT/src/lightning/lightning_loftr.py", line 234, in test_epoch_end
metrics = {k: flattenList(gather(flattenList([_me[k] for _me in _metrics]))) for k in _metrics[0]}
File "/mount/workspace/Code/ARCORT/src/lightning/lightning_loftr.py", line 234, in <dictcomp>
metrics = {k: flattenList(gather(flattenList([_me[k] for _me in _metrics]))) for k in _metrics[0]}
File "/mount/workspace/Code/ARCORT/src/utils/comm.py", line 196, in gather
group = _get_global_gloo_group()
File "/mount/workspace/Code/ARCORT/src/utils/comm.py", line 90, in _get_global_gloo_group
return dist.new_group(backend="gloo")
File "/root/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2503, in new_group
pg = _new_process_group_helper(group_world_size,
File "/root/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 588, in _new_process_group_helper
pg = ProcessGroupGloo(
RuntimeError: [enforce fail at /opt/conda/conda-bld/pytorch_1616554788289/work/third_party/gloo/gloo/transport/tcp/device.cc:208] ifa != nullptr. Unable to find interface for: [0.32.0.61]
seems that the dist.new_group() encounters a problem. I'm not familiar with distributed pytorch. Could you provide some suggestions?
I find there have 197 sequences in MegaDepth Datasets, from 0000
to 5018
.
Could you provide the sequence numbers you used for training LoFTR ?
Is it same as the one used in SuperGlue ?
Thank you so much~ 🙂
Hi, thank you so much for your great work! When I tried to train the model on my computer, I got some errors like this:
`Traceback (most recent call last):
File "/home/xzliu/LoFTR-master/train.py", line 123, in
main()
File "/home/xzliu/LoFTR-master/train.py", line 119, in main
trainer.fit(model, datamodule=data_module)
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 553, in fit
self._run(model)
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 918, in _run
self._dispatch()
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 986, in _dispatch
self.accelerator.start_training(self)
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 92, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training
self._results = trainer.run_stage()
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 996, in run_stage
return self._run_train()
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1031, in _run_train
self._run_sanity_check(self.lightning_module)
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1115, in _run_sanity_check
self._evaluation_loop.run()
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run
self.advance(*args, **kwargs)
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
dl_outputs = self.epoch_loop.run(
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 111, in run
self.advance(*args, **kwargs)
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 93, in advance
batch_idx, batch = next(dataloader_iter)
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 557, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/xzliu/anaconda3/envs/loftr/lib/python3.8/site-packages/torch/utils/data/dataset.py", line 219, in getitem
return self.datasets[dataset_idx][sample_idx]
File "/home/xzliu/LoFTR-master/src/datasets/scannet.py", line 78, in getitem
image0 = read_scannet_gray(img_name0, resize=(640, 480), augment_fn=None)
File "/home/xzliu/LoFTR-master/src/utils/dataset.py", line 154, in read_scannet_gray
image = cv2.resize(image, (640,480))
cv2.error: OpenCV(4.4.0) /tmp/pip-req-build-zeowd5_m/opencv/modules/imgproc/src/resize.cpp:3929: error: (-215:Assertion failed) !ssize.empty() in function 'resize'
`
I tried to search the errors on the Internet, the answer told me that maybe I got the wrong position of datasets. So can you provide the detailed data structure for training datasets? Or have you ever met errors like this? Thank you and looking forward to your reply.
I tested the code in some random image pairs. The results are extremely bad. How do I improvise the accuracy of the inference?
I noteced that the the dimmision of the input is "Nx1xHxW". I have tried to used the value of the batch_size is 3, and I got the wrong "m_bids". I am very puzzle. Can you give me an answer? Thank you!
when conda env create -f environment.yaml
errors Solving environment: failed InvalidVersionSpecError: Invalid version spec: =2.7
python version:3.8
how to solve this problem?
Hi there,
Could you please help provide the training pairs for Scannet (maybe just a txt file)? It seems generating the pairs need quite long time, that would be really helpful if you already have that. Thanks a lot!
hi,
I have some questions about the coarse supervision
in superbvision.py
1、about mask
if 'mask0' in data:
grid_pt0_i = mask_pts_at_padded_regions(grid_pt0_i, data['mask0'])
grid_pt1_i = mask_pts_at_padded_regions(grid_pt1_i, data['mask1'])
Where is the mask obtained? The data is color ,depth, pose and internal reference
2、 check if mutual nearest neighbor
w_pt0_c_round = w_pt0_c[:, :, :].round().long()
nearest_index1 = w_pt0_c_round[..., 0] + w_pt0_c_round[..., 1] * w1
w_pt1_c_round = w_pt1_c[:, :, :].round().long()
nearest_index0 = w_pt1_c_round[..., 0] + w_pt1_c_round[..., 1] * w0
I don’t quite understand this step, can you give me some interpretation?
3、warp grids
# create kpts in meshgrid and resize them to image resolution
grid_pt0_c = create_meshgrid(h0, w0, False, device).reshape(1, h0w0, 2).repeat(N, 1, 1) # [N, hw, 2]
grid_pt0_i = scale0 * grid_pt0_c
grid_pt1_c = create_meshgrid(h1, w1, False, device).reshape(1, h1w1, 2).repeat(N, 1, 1)
grid_pt1_i = scale1 * grid_pt1_c
how create kpts(key points)?
Hope to answer, thank you
hi, thanks for your great work, will you release the test code for Aachen localization benchmark?
Hi, thanks for your awesome work, have you tried to add Auxiliary losses( like other transformer based model, DETR) in the training statge? If yes, would that improve the matching performance?
Hi, thanks for sharing your work!
did you train with different window size? I want to figure out how the window size affects the results.
Hi, thanks for your great work.
After reading your paper, I have some questions regarding what is the match threshold set to in training step.
When I tried to set match threshold to 0.2 like your paper,I found there will be 0 matches from coarse mathcing layer, and fine matching layer and fine-level supervision will lose efficacy. So, should the threshold be set to a smaller value, such as 0.0, during training?
Hello,
Thank you for the great works.
I have tried to use your method for outdoor images.
However, after bug fixing, it seems that the weight "outdoor_ds.ckpt" is not working as good as it used to be...
Therefore, could you please provide the update for the "outdoor_ds.ckpt" as well?
It's a very exciting job!Will training code be given these days?
Hi there,
I wanted to use different image resolution, my size is 2307x3466, I know that its quite a lot.
Do I have to re-train the network in order to use such resolution (network breaks if anything other than 640, 480 is used)
Do I really need to resize the image? In my mind, higher resolution means better feature matching, but is that correct?
Thanks
Hi, I tried your code and model, and the result is beyond me expectation.
I still got two questions about LoFTR:
(1) Why not fed colored images into network as they have more information especially in nature scene?
(2) For some reason, I do not have annotations of corresponding points, Is there any WEAK-supervision solution that be used to train LoFTR?
Thanks for your work again, and Looking forward to your reply : )
Thank you for your wonderful work.
I have questions about your code variables.
At LoFTR/src/loftr/utils/coarse_matching.py,
Can you tell me what each "coarse_matches" id means?
(b_ids, i_ids, j_ids, gt_mask, m_bids, mkpts0_c, mkpts1_c, mconf)
Thank you!
Hi, thanks very much for your great work!
While I am confused about how you formulate the position encoding here, looks that it is different how you definite in your paper. Could you please elaborate more on that?
I thought it would more like the following according to your pape?
`div_term = (1/10000.0) ** (torch.arange(0, d_model//2, 2).float()/ d_model )`
Thanks!
Hi! Thank you for releasing the code and the awsome paper.
I am trying to re-implementing the training process.. And would like to ask some questions:
How do you generate ground truth for training?
For example, if I used MegaDepth dataset, have you use the keypoints as a part of generating the ground truth matches of the dataset? Or did you consider the entire HxW of the coarse-scale (but, that will be lots of data)?
From my understanding, the positional encoder of LoFTR has a fixed size of ChannelsxHxW.....So, at training your image size will be at a certain size... but this may be different from the inference phase .... How do solve this problem?
For example, did you scale the positional encoder up? Or did you up/down-sampling the number of elements along with the HxW grids?
Can you train the transformer of the coarse-scale separately from the fine-scale? Or they need to be trained together....?
I have noticed that you have used masks in training ... Could you guide how did you generate the mask? Is the mask representing the edge structure of the GAT? ....
Are you still aim for releasing the codes on 06/10...? It would be great to see the official implementation... :)
Dear @JiamingSuen
Thank you for sharing such a great work !
I was testing random pair of images and this is the result I got:
I'm wondering that maybe LoFTR is suffering the same problem as SuperGlue that it cant detect matching when the image is rotated more than 45 degree ?
I was using the indoor_ds weights.
Hello, first of all, really great work ~~~
I have just a small question about training process:
Is there any chance to train the model with only single 1080Ti GPU?
Thank you for your amazing work. I have some question that the expec_f_gt after spvs_fine() function are all very large number. And this case cause the correct_mask in computing fine loss is zero.
On my version of pip, the environment creation is crashing with the following error:
AttributeError: 'FileNotFoundError' object has no attribute 'read'
According to this SO answer, the issue is because of pip interface change and one needs to remove 'file:' or to provide an absolute path to requirements.txt
- pip:
- -r requirements.txt
Is it possible to specify the number of matches that will be further used for evaluation (for example for findEssentialMat ) in the code?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.