Hello, I'm Gabriele Berton, a PhD student from the Vandal Lab at Polytechnic of Turin ! ![](https://camo.githubusercontent.com/51cddc77d10a9da6ad3f667e432cf85d0470b9b7b6f26bd4953b840ee2117934/68747470733a2f2f6b6f6d617265762e636f6d2f67687076632f3f757365726e616d653d676d626572746f6e)
Languages and Tools:
![]() |
![]() |
![]() |
![]() |
Official code for ICCV 2023 paper "EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition"
License: MIT License
Languages and Tools:
![]() |
![]() |
![]() |
![]() |
Hello, I thoroughly appreciated your series of work, showcasing a profound understanding of the VPR task. However, I've encountered challenges while attempting to reproduce the training on ResNet-50 with dim=2048, using a single 3090 GPU. I would greatly appreciate your assistance in identifying the issues and gaining a better grasp of the paper. For reference, I've provided the info.log
file. Your help would be immensely valuable:
Mine:
< test - #q: 1000; #db: 2805840 >: R@1: 73.2, R@5: 79.7, R@10: 82.0, R@20: 84.0
Yours:
R@1: 84.1
Training:
2023-11-06 17:30:53 train.py --save_dir processed_Train
2023-11-06 17:30:53 Arguments: Namespace(M=15, N=3, focal_dist=10, s=100, m=0.4, lambda_lat=1.0, lambda_front=1.0, groups_num=9, min_images_per_class=5, backbone='ResNet50', fc_output_dim=2048, batch_size=32, epochs_num=40, iterations_per_epoch=5000, lr=1e-05, classifiers_lr=0.01, brightness=0.7, contrast=0.7, hue=0.5, saturation=0.7, random_resized_crop=0.5, infer_batch_size=16, positive_dist_threshold=25, resume_train=None, resume_model=None, device='cuda', seed=0, num_workers=8, visualize_classes=0, train_dataset_folder='/work/chenshunpeng/data/sf_xl/processed/train', val_dataset_folder='/work/chenshunpeng/data/sf_xl/processed/val', test_dataset_folder='/work/chenshunpeng/data/sf_xl/processed/test', save_dir='processed_Train')
2023-11-06 17:30:53 The outputs are being saved in logs/processed_Train/2023-11-06_17-30-53
2023-11-06 17:30:53 Loading pretrained backbone's weights from CosPlace
2023-11-06 17:30:54 There are 1 GPUs and 64 CPUs.
2023-11-06 17:30:54 Using cached dataset cache/sfxl_M15_N3_mipc5.torch
2023-11-06 17:30:57 Using cached dataset cache/sfxl_M15_N3_mipc5.torch
2023-11-06 17:31:55 Using 18 groups
2023-11-06 17:31:55 The 18 groups have respectively the following number of classes {[len(g) for g in groups]}
2023-11-06 17:31:55 The 18 groups have respectively the following number of images {[g.get_images_num() for g in groups]}
2023-11-06 17:31:55 There are 5559 classes for the first group, each epoch has 5000 iterations with batch_size 32, therefore the model sees each class (on average) 28.8 times per epoch
2023-11-06 17:31:55 Validation set: < val - #q: 7983; #db: 8015 >
2023-11-06 17:31:55 Start training ...
2023-11-06 18:00:40 Epoch 00 in 0:28:45, < val - #q: 7983; #db: 8015 >: R@1: 90.4, R@5: 95.0, R@10: 96.1, R@20: 97.1
2023-11-06 18:29:30 Epoch 01 in 0:28:48, < val - #q: 7983; #db: 8015 >: R@1: 92.6, R@5: 96.1, R@10: 97.0, R@20: 97.8
2023-11-06 18:58:24 Epoch 02 in 0:28:52, < val - #q: 7983; #db: 8015 >: R@1: 92.6, R@5: 96.0, R@10: 97.0, R@20: 97.7
2023-11-06 19:27:21 Epoch 03 in 0:28:54, < val - #q: 7983; #db: 8015 >: R@1: 93.1, R@5: 96.4, R@10: 97.2, R@20: 98.0
2023-11-06 19:56:34 Epoch 04 in 0:29:11, < val - #q: 7983; #db: 8015 >: R@1: 92.8, R@5: 95.9, R@10: 96.8, R@20: 97.5
2023-11-06 20:26:30 Epoch 05 in 0:29:53, < val - #q: 7983; #db: 8015 >: R@1: 92.5, R@5: 95.7, R@10: 96.6, R@20: 97.4
2023-11-06 20:55:58 Epoch 06 in 0:29:25, < val - #q: 7983; #db: 8015 >: R@1: 92.4, R@5: 95.5, R@10: 96.6, R@20: 97.3
2023-11-06 21:26:05 Epoch 07 in 0:30:05, < val - #q: 7983; #db: 8015 >: R@1: 92.8, R@5: 96.2, R@10: 97.0, R@20: 97.8
2023-11-06 21:56:17 Epoch 08 in 0:30:10, < val - #q: 7983; #db: 8015 >: R@1: 93.2, R@5: 96.1, R@10: 97.0, R@20: 97.7
2023-11-06 22:26:18 Epoch 09 in 0:29:57, < val - #q: 7983; #db: 8015 >: R@1: 93.1, R@5: 96.0, R@10: 97.0, R@20: 97.6
2023-11-06 22:56:23 Epoch 10 in 0:30:01, < val - #q: 7983; #db: 8015 >: R@1: 93.9, R@5: 96.8, R@10: 97.6, R@20: 98.1
2023-11-06 23:26:21 Epoch 11 in 0:29:54, < val - #q: 7983; #db: 8015 >: R@1: 94.0, R@5: 96.8, R@10: 97.6, R@20: 98.2
2023-11-06 23:56:17 Epoch 12 in 0:29:54, < val - #q: 7983; #db: 8015 >: R@1: 93.5, R@5: 96.6, R@10: 97.3, R@20: 97.8
2023-11-07 00:26:10 Epoch 13 in 0:29:49, < val - #q: 7983; #db: 8015 >: R@1: 94.1, R@5: 96.8, R@10: 97.6, R@20: 98.1
2023-11-07 00:56:25 Epoch 14 in 0:30:13, < val - #q: 7983; #db: 8015 >: R@1: 93.2, R@5: 96.2, R@10: 96.9, R@20: 97.6
2023-11-07 01:26:18 Epoch 15 in 0:29:49, < val - #q: 7983; #db: 8015 >: R@1: 92.9, R@5: 95.8, R@10: 96.5, R@20: 97.3
2023-11-07 01:56:30 Epoch 16 in 0:30:08, < val - #q: 7983; #db: 8015 >: R@1: 93.6, R@5: 96.4, R@10: 97.1, R@20: 97.8
2023-11-07 02:26:28 Epoch 17 in 0:29:54, < val - #q: 7983; #db: 8015 >: R@1: 93.9, R@5: 96.7, R@10: 97.5, R@20: 98.0
2023-11-07 02:56:37 Epoch 18 in 0:30:06, < val - #q: 7983; #db: 8015 >: R@1: 94.4, R@5: 96.9, R@10: 97.5, R@20: 98.0
2023-11-07 03:26:57 Epoch 19 in 0:30:16, < val - #q: 7983; #db: 8015 >: R@1: 93.7, R@5: 96.5, R@10: 97.2, R@20: 97.9
2023-11-07 03:57:18 Epoch 20 in 0:30:18, < val - #q: 7983; #db: 8015 >: R@1: 93.8, R@5: 96.5, R@10: 97.3, R@20: 97.9
2023-11-07 04:27:42 Epoch 21 in 0:30:20, < val - #q: 7983; #db: 8015 >: R@1: 94.0, R@5: 96.8, R@10: 97.4, R@20: 97.9
2023-11-07 04:58:01 Epoch 22 in 0:30:15, < val - #q: 7983; #db: 8015 >: R@1: 94.2, R@5: 96.8, R@10: 97.3, R@20: 98.0
2023-11-07 05:28:23 Epoch 23 in 0:30:19, < val - #q: 7983; #db: 8015 >: R@1: 93.7, R@5: 96.6, R@10: 97.2, R@20: 97.8
2023-11-07 05:58:43 Epoch 24 in 0:30:17, < val - #q: 7983; #db: 8015 >: R@1: 93.7, R@5: 96.4, R@10: 97.2, R@20: 97.7
2023-11-07 06:29:06 Epoch 25 in 0:30:19, < val - #q: 7983; #db: 8015 >: R@1: 94.1, R@5: 96.8, R@10: 97.5, R@20: 98.0
2023-11-07 06:59:31 Epoch 26 in 0:30:22, < val - #q: 7983; #db: 8015 >: R@1: 93.8, R@5: 96.6, R@10: 97.2, R@20: 97.8
2023-11-07 07:29:53 Epoch 27 in 0:30:18, < val - #q: 7983; #db: 8015 >: R@1: 93.7, R@5: 96.3, R@10: 97.1, R@20: 97.6
2023-11-07 08:00:18 Epoch 28 in 0:30:22, < val - #q: 7983; #db: 8015 >: R@1: 94.3, R@5: 97.1, R@10: 97.6, R@20: 98.2
2023-11-07 08:30:42 Epoch 29 in 0:30:20, < val - #q: 7983; #db: 8015 >: R@1: 94.0, R@5: 96.7, R@10: 97.3, R@20: 97.9
2023-11-07 09:01:24 Epoch 30 in 0:30:39, < val - #q: 7983; #db: 8015 >: R@1: 93.5, R@5: 96.4, R@10: 97.0, R@20: 97.6
2023-11-07 09:31:50 Epoch 31 in 0:30:20, < val - #q: 7983; #db: 8015 >: R@1: 93.9, R@5: 96.6, R@10: 97.3, R@20: 97.9
2023-11-07 10:02:30 Epoch 32 in 0:30:33, < val - #q: 7983; #db: 8015 >: R@1: 94.0, R@5: 96.7, R@10: 97.4, R@20: 98.0
2023-11-07 10:32:55 Epoch 33 in 0:30:22, < val - #q: 7983; #db: 8015 >: R@1: 93.7, R@5: 96.4, R@10: 97.2, R@20: 97.8
2023-11-07 11:04:10 Epoch 34 in 0:31:12, < val - #q: 7983; #db: 8015 >: R@1: 93.8, R@5: 96.5, R@10: 97.2, R@20: 97.8
2023-11-07 11:35:41 Epoch 35 in 0:31:27, < val - #q: 7983; #db: 8015 >: R@1: 94.1, R@5: 96.7, R@10: 97.4, R@20: 97.8
2023-11-07 12:07:15 Epoch 36 in 0:31:31, < val - #q: 7983; #db: 8015 >: R@1: 94.0, R@5: 96.6, R@10: 97.3, R@20: 97.8
2023-11-07 12:38:48 Epoch 37 in 0:31:30, < val - #q: 7983; #db: 8015 >: R@1: 94.0, R@5: 96.8, R@10: 97.5, R@20: 98.0
2023-11-07 13:10:13 Epoch 38 in 0:31:21, < val - #q: 7983; #db: 8015 >: R@1: 93.8, R@5: 96.6, R@10: 97.1, R@20: 97.8
2023-11-07 13:41:33 Epoch 39 in 0:31:17, < val - #q: 7983; #db: 8015 >: R@1: 93.9, R@5: 96.5, R@10: 97.3, R@20: 97.8
2023-11-07 13:41:36 Trained for 40 epochs, in total in 20:10:43
Eval:
2023-11-10 17:30:05 eval.py --backbone ResNet50 --fc_output_dim 2048 --save_dir processed_test_train --resume_model /work/chenshunpeng/project/VPR/EigenPlaces-main/EigenPlaces-main/logs/processed_Train/2023-11-06_17-30-53/best_model.pth
2023-11-10 17:30:05 Arguments: Namespace(M=15, N=3, focal_dist=10, s=100, m=0.4, lambda_lat=1.0, lambda_front=1.0, groups_num=9, min_images_per_class=5, backbone='ResNet50', fc_output_dim=2048, batch_size=32, epochs_num=40, iterations_per_epoch=5000, lr=1e-05, classifiers_lr=0.01, brightness=0.7, contrast=0.7, hue=0.5, saturation=0.7, random_resized_crop=0.5, infer_batch_size=16, positive_dist_threshold=25, resume_train=None, resume_model='/work/chenshunpeng/project/VPR/EigenPlaces-main/EigenPlaces-main/logs/processed_Train/2023-11-06_17-30-53/best_model.pth', device='cuda', seed=0, num_workers=8, visualize_classes=0, train_dataset_folder='/work/chenshunpeng/data/sf_xl/processed/train', val_dataset_folder='/work/chenshunpeng/data/sf_xl/processed/val', test_dataset_folder='/work/chenshunpeng/data/sf_xl/processed/test', save_dir='processed_test_train')
2023-11-10 17:30:05 The outputs are being saved in logs/processed_test_train/2023-11-10_17-30-04
2023-11-10 17:30:05 Loading pretrained backbone's weights from CosPlace
2023-11-10 17:30:08 There are 1 GPUs and 64 CPUs.
2023-11-10 17:30:08 Loading model_ from /work/chenshunpeng/project/VPR/EigenPlaces-main/EigenPlaces-main/logs/processed_Train/2023-11-06_17-30-53/best_model.pth
2023-11-10 22:43:39 < test - #q: 1000; #db: 2805840 >: R@1: 73.2, R@5: 79.7, R@10: 82.0, R@20: 84.0
Besides, I get the same results as in the paper using the released trained weights.
test/queries_v1
:
< test - #q: 1000; #db: 2805840 >: R@1: 84.1, R@5: 89.1, R@10: 90.8, R@20: 92.6
Based on the large dataset size, I'd like to verify if there were any issues during the dataset download process. Below are the sizes, in MB, of the subfiles within the processed dataset, specifically in the training and validation sections:
sf_xl/processed/train
:
MB Folder
10008 37.70
14002 37.71
13916 37.72
22513 37.73
20563 37.74
24293 37.75
32104 37.76
35248 37.77
35154 37.78
25809 37.79
13383 37.80
34 37.81
sf_xl/processed/val
:
MB Folder
357 database
355 queries
Thank you for your help!
@gmberton
Hello @gmberton, thank you for the work. I am trying to evaluate the EigenPlaces method on a custom dataset that is captured on a car. I was wondering how to identify if I can use my data as a multiview one or not. This is my schematic for capture -
The paper says "This method assumes that the images available in the database are collected looking at all sides of the vehicle, and
in particular towards the side of the street. However, this is not always the case and many VPR datasets: for example, the datasets built with autonomous driving applications in mind only contain images collected from a front facing camera"
Do you think a capture as shown above qualifies as multiview?
Hi, I just downloaded SF-XL, and used this repo to train on them. Because I was doubt if there was a mistake about downloading. I would like to ensure that if this image number seems correct to you?
2023-10-09 00:46:28 Searching images in /data1/shaoshihao/sf_xl/train with glob()
2023-10-09 00:47:11 Found 3431092 images
Thanks in advance!
Hello!
I'm very interested about Eigenplaces and want to reproduce the recall.
However, I'm getting the error I've attached below.
How can I resolve it?
request failed [400]: https://a.tile.openstreetmap.org/22/670283/1620865.png
request failed [400]: https://a.tile.openstreetmap.org/22/670283/1620866.png
request failed [400]: https://a.tile.openstreetmap.org/22/670283/1620867.png
request failed [400]: https://a.tile.openstreetmap.org/22/670283/1620868.png
request failed [400]: https://a.tile.openstreetmap.org/22/670283/1620869.png
request failed [400]: https://a.tile.openstreetmap.org/22/670283/1620870.png
request failed [400]: https://a.tile.openstreetmap.org/22/670283/1620871.png
request failed [400]: https://a.tile.openstreetmap.org/22/670283/1620872.png
Hi, me again. Thank you for your answer before to make sure my number is correct! I really like EigenPlaces, so I kept diving into the detail of your great work. Then I got another confusion and really hope you can help to make it clear.
I saw from the code that your model is trained on the top of the CosPlace pre-trained weight. Since I did not find the corresponding description in your paper (I might missed that), I wonder whether this behaviour is desired, or we are supposed to fine-tune the weight on ImageNet pre-train weights?
Each of them sounds reasonable for me (so the paper is grounded in contribution whether the truth goes either way). I just want to make clear this detail:) Thank you again for your help! And hope to see your works again in CVPR 2024. I really like such type of easy-reproducing works which helps the community a lot.
Hi, it's me again!
I just finished training on ResNet-50, dim=128, and sadly I cannot reproduce the result in paper. Here are numbers (SF-XL test v1):
Mine:
< test - #q: 1000; #db: 2588640 >: R@1: 65.5, R@5: 71.3, R@10: 74.0, R@20: 76.4
Yours:
R@1: 72.4
I wonder whether I messed up some hyperparameters, so can you check if all hyperparameters I got correct?
import argparse
def parse_arguments():
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
# CosPlace Groups parameters
parser.add_argument("--M", type=int, default=15, help="")
parser.add_argument("--N", type=int, default=3, help="")
parser.add_argument("--focal_dist", type=int, default=10, help="") # done GS
parser.add_argument("--s", type=float, default=100, help="")
parser.add_argument("--m", type=float, default=0.4, help="")
parser.add_argument("--lambda_lat", type=float, default=1., help="")
parser.add_argument("--lambda_front", type=float, default=1., help="")
parser.add_argument("--groups_num", type=int, default=0,
help="If set to 0 use N*N groups")
parser.add_argument("--min_images_per_class", type=int, default=5, help="")
# Model parameters
parser.add_argument("--backbone", type=str, default="ResNet50",
choices=["VGG16", "ResNet18", "ResNet50", "ResNet101", "ResNet152"], help="")
parser.add_argument("--fc_output_dim", type=int, default=128,
help="Output dimension of final fully connected layer")
# Training parameters
parser.add_argument("--batch_size", type=int, default=64, help="")
parser.add_argument("--epochs_num", type=int, default=40, help="")
parser.add_argument("--iterations_per_epoch", type=int, default=5000, help="")
parser.add_argument("--lr", type=float, default=0.00001, help="")
parser.add_argument("--classifiers_lr", type=float, default=0.01, help="")
# Data augmentation
parser.add_argument("--brightness", type=float, default=0.7, help="")
parser.add_argument("--contrast", type=float, default=0.7, help="")
parser.add_argument("--hue", type=float, default=0.5, help="")
parser.add_argument("--saturation", type=float, default=0.7, help="")
parser.add_argument("--random_resized_crop", type=float, default=0.5, help="")
parser.add_argument("--num_preds_to_save", type=int, default=1, help="")
# Validation / test parameters
parser.add_argument("--infer_batch_size", type=int, default=16,
help="Batch size for inference (validating and testing)")
parser.add_argument("--positive_dist_threshold", type=int, default=25,
help="distance in meters for a prediction to be considered a positive")
# Resume parameters
parser.add_argument("--resume_train", type=str, default=None,
help="path to checkpoint to resume, e.g. logs/.../last_checkpoint.pth")
parser.add_argument("--resume_model", type=str, default=None,
help="path to model_ to resume, e.g. logs/.../best_model.pth")
# Other parameters
parser.add_argument("--device", type=str, default="cuda",
choices=["cuda", "cpu"], help="")
parser.add_argument("--seed", type=int, default=0, help="")
parser.add_argument("--num_workers", type=int, default=8, help="_")
parser.add_argument("--visualize_classes", type=int, default=0,
help="Save map visualizations for X classes in the save_dir")
# Paths parameters
parser.add_argument("--train_dataset_folder", type=str, default=None,
help="path of the folder with training images")
parser.add_argument("--val_dataset_folder", type=str, default=None,
help="path of the folder with val images (split in database/queries)")
parser.add_argument("--test_dataset_folder", type=str, default=None,
help="path of the folder with test images (split in database/queries)")
parser.add_argument("--save_dir", type=str, default="default",
help="name of directory on which to save the logs, under logs/save_dir")
args = parser.parse_args()
if args.groups_num == 0:
args.groups_num = args.N * args.N
return args
I really want to reproduce the result. Thanks for your help!
hi, gmberton, thank you again.
the EigenPlaces outperform SOTA on cross view street images VPR. As the uav image and satellite image has the same location but different viewpoint. does the EigenPlaces adapt to drone view and satellite view VPR eg given one uav image find the simalry satellite image.
How about retrain the model on paired uav images and satallite dataset to make it possible or some other work need to do? any suggestion appreciate.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.