Git Product home page Git Product logo

overlaptransformer's Introduction

OverlapTransformer

The code for our paper for RAL/IROS 2022:

OverlapTransformer: An Efficient and Yaw-Angle-Invariant Transformer Network for LiDAR-Based Place Recognition. [paper]

OverlapTransformer (OT) is a novel lightweight neural network exploiting the LiDAR range images to achieve fast execution with less than 4 ms per frame using python, less than 2 ms per frame using C++ in LiDAR similarity estimation. It is a newer version of our previous OverlapNet, which is faster and more accurate in LiDAR-based loop closure detection and place recognition.

Developed by Junyi Ma, Xieyuanli Chen and Jun Zhang.

OverlapTransformer is not a sophisticated model but holds natural mathematical properties in a lightweight style for surround-view observations. It can be seamlessly integrated into any range-image-based approach as a backbone, e.g., EINet (IROS 2024). Welcome to post results in issues if you have tried other input types (e.g., RGBD camera, Livox, 16/32-beam LiDAR).

News!

[2024-06] EINet successfully integrates OT into its framework as a powerful submodule, which is accepted by IROS 2024!
[2023-09] The multi-view extension of OT, CVTNet, is accepted by IEEE Transactions on Industrial Informatics (TII)! A better long-term recognition performance is available ⭐
[2022-12] SeqOT is accepted by IEEE Transactions on Industrial Electronics (TIE)!
[2022-09] We further develop a sequence-enhanced version of OT named as SeqOT, which can be found here.

Haomo Dataset

Fig. 1 An online demo for finding the top1 candidate with OverlapTransformer on sequence 1-1 (database) and 1-3 (query) of Haomo Dataset.

Fig. 2 Haomo Dataset which is collected by HAOMO.AI.

More details of Haomo Dataset can be found in dataset description (link).

Table of Contents

  1. Introduction and Haomo Dataset
  2. Publication
  3. Dependencies
  4. How to Use
  5. Datasets Used by OT
  6. Related Work
  7. License

Publication

If you use the code or the Haomo dataset in your academic work, please cite our paper (PDF):

@ARTICLE{ma2022ral,
  author={Ma, Junyi and Zhang, Jun and Xu, Jintao and Ai, Rui and Gu, Weihao and Chen, Xieyuanli},
  journal={IEEE Robotics and Automation Letters}, 
  title={OverlapTransformer: An Efficient and Yaw-Angle-Invariant Transformer Network for LiDAR-Based Place Recognition}, 
  year={2022},
  volume={7},
  number={3},
  pages={6958-6965},
  doi={10.1109/LRA.2022.3178797}}

Dependencies

We use pytorch-gpu for neural networks.

An nvidia GPU is needed for faster retrival. OverlapTransformer is also fast enough when using the neural network on CPU.

To use a GPU, first you need to install the nvidia driver and CUDA.

  • CUDA Installation guide: link
    We use CUDA 11.3 in our work. Other versions of CUDA are also supported but you should choose the corresponding torch version in the following Torch dependences.

  • System dependencies:

    sudo apt-get update 
    sudo apt-get install -y python3-pip python3-tk
    sudo -H pip3 install --upgrade pip
  • Torch dependences:
    Following this link, you can download Torch dependences by pip:

    pip3 install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio==0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

    or by conda:

    conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
  • Other Python dependencies (may also work with different versions than mentioned in the requirements file):

    sudo -H pip3 install -r requirements.txt

How to Use

We provide a training and test tutorials for KITTI sequences in this repository. The tutorials for Haomo dataset will be released together with the complete Haomo dataset.

We recommend you follow our code and data structures as follows.

Code Structure

├── config
│   ├── config_haomo.yml
│   └── config.yml
├── modules
│   ├── loss.py
│   ├── netvlad.py
│   ├── overlap_transformer_haomo.py
│   └── overlap_transformer.py
├── test
│   ├── test_haomo_topn_prepare.py
│   ├── test_haomo_topn.py
│   ├── test_kitti00_prepare.py
│   ├── test_kitti00_PR.py
│   ├── test_kitti00_topN.py
│   ├── test_results_haomo
│   │   └── predicted_des_L2_dis_bet_traj_forward.npz (to be generated)
│   └── test_results_kitti
│       └── predicted_des_L2_dis.npz (to be generated)
├── tools
│   ├── read_all_sets.py
│   ├── read_samples_haomo.py
│   ├── read_samples.py
│   └── utils
│       ├── gen_depth_data.py
│       ├── split_train_val.py
│       └── utils.py
├── train
│   ├── training_overlap_transformer_haomo.py
│   └── training_overlap_transformer_kitti.py
├── valid
│   └── valid_seq.py
├── visualize
│   ├── des_list.npy
│   └── viz_haomo.py
└── weights
    ├── pretrained_overlap_transformer_haomo.pth.tar
    └── pretrained_overlap_transformer.pth.tar

Dataset Structure

In the file config.yaml, the parameters of data_root are described as follows:

  data_root_folder (KITTI sequences root) follows:
  ├── 00
  │   ├── depth_map
  │     ├── 000000.png
  │     ├── 000001.png
  │     ├── 000002.png
  │     ├── ...
  │   └── overlaps
  │     ├── train_set.npz
  ├── 01
  ├── 02
  ├── ...
  ├── 10
  └── loop_gt_seq00_0.3overlap_inactive.npz
  
  valid_scan_folder (KITTI sequence 02 velodyne) contains:
  ├── 000000.bin
  ├── 000001.bin
  ...

  gt_valid_folder (KITTI sequence 02 computed overlaps) contains:
  ├── 02
  │   ├── overlap_0.npy
  │   ├── overlap_10.npy
  ...

You need to download or generate the following files and put them in the right positions of the structure above:

  • You can find the groud truth for KITTI 00 here: loop_gt_seq00_0.3overlap_inactive.npz
  • You can find gt_valid_folder for sequence 02 here.
  • Since the whole KITTI sequences need a large memory, we recommend you generate range images such as 00/depth_map/000000.png by the preprocessing from Overlap_Localization or its C++ version, and we will not provide these images. Please note that in OverlapTransformer, the .png images are used instead of .npy files saved in Overlap_Localization.
  • More directly, you can generate .png range images by the script from OverlapNet updated by us.
  • overlaps folder of each sequence below data_root_folder is provided by the authors of OverlapNet here. You should rename them to train_set.npz.

Quick Use

For a quick use, you could download our model pretrained on KITTI, and the following two files also should be downloaded :

Then you should modify demo1_config in the file config.yaml.

Run the demo by:

cd demo
python ./demo_compute_overlap_sim.py

You can see a query scan (000000.bin of KITTI 00) with a reprojected positive sample (000005.bin of KITTI 00) and a reprojected negative sample (000015.bin of KITTI 00), and the corresponding similarity.

Fig. 3 Demo for calculating overlap and similarity with our approach.

Training

In the file config.yaml, training_seqs are set for the KITTI sequences used for training.

You can start the training with

cd train
python ./training_overlap_transformer_kitti.py

You can resume from our pretrained model here for training.

Testing

Once a model has been trained , the performance of the network can be evaluated. Before testing, the parameters shoud be set in config.yaml

  • test_seqs: sequence number for evaluation which is "00" in our work.
  • test_weights: path of the pretrained model.
  • gt_file: path of the ground truth file provided by the author of OverlapNet, which can be downloaded here.

Therefore you can start the testing scripts as follows:

cd test
mkdir test_results_kitti
python test_kitti00_prepare.py
python test_kitti00_PR.py
python test_kitti00_topN.py

After you run test_kitti00_prepare.py, a file named predicted_des_L2_dis.npz is generated in test_results_kitti, which is used by python test_kitti00_PR.py to calculate PR curve and F1max, and used by python test_kitti00_topN.py to calculate topN recall.

For a quick test of the training and testing procedures, you could use our pretrained model.

Visualization

Visualize evaluation on KITTI 00

Firstly, to visualize evaluation on KITTI 00 with search space, the follwoing three files should be downloaded:

and modify the paths in the file config.yaml. Then

cd visualize
python viz_kitti.py

Fig. 4 Evaluation on KITTI 00 with search space from SuMa++ (a semantic LiDAR SLAM method).

Visualize evaluation on Haomo challenge 1 (after Haomo dataset is released)

We also provide a visualization demo for Haomo dataset after Haomo dataset is released (Fig. 1). Please download the descriptors of database (sequence 1-1 of Haomo dataset) firstly and then:

cd visualize
python viz_haomo.py

C++ Implementation

We provide a C++ implementation of OverlapTransformer with libtorch for faster retrival.

  • Please download .pt and put it in the OT_libtorch folder.
  • Before building, make sure that PCL exists in your environment.
  • Here we use LibTorch for CUDA 11.3 (Pre-cxx11 ABI). Please modify the path of Torch_DIR in CMakeLists.txt.
  • For more details of LibTorch installation , please check this website.
    Then you can generate a descriptor of 000000.bin of KITTI 00 by
cd OT_libtorch/ws
mkdir build
cd build/
cmake ..
make -j6
./fast_ot 

You can find our C++ OT can generate a decriptor with less than 2 ms per frame.

Datasets Used by OT

In this section, we list the files of different datasets used by OT for faster inquiry.

KITTI Dataset

KITTI is used to validate the place recognition performance in our paper. Currently we have released all the necessary files for evaluation on KITTI.

Ford Campus Dataset

Ford is used to validate the generalization ability with zero-shot transferring in our paper. Currently we have released all the necessary preprocessed files of Ford except the code for the evaluation which is similar to KITTI. You just need to follow our existing scripts.

Haomo Dataset

You can find the detailed description of Haomo dataset here.

Related Work

You can find our more recent LiDAR place recognition approaches below, which have better performance on larger time gaps.

  • SeqOT: spatial-temporal network using sequential LiDAR data (IEEE TIE 2022)
@ARTICLE{ma2022tie,
  author={Ma, Junyi and Chen, Xieyuanli and Xu, Jingyi and Xiong, Guangming},
  journal={IEEE Transactions on Industrial Electronics}, 
  title={SeqOT: A Spatial-Temporal Transformer Network for Place Recognition Using Sequential LiDAR Data}, 
  year={2022},
  doi={10.1109/TIE.2022.3229385}}
  • CVTNet: cross-view Transformer network using RIVs and BEVs (IEEE TII 2023)
@ARTICLE{10273716,
  author={Ma, Junyi and Xiong, Guangming and Xu, Jingyi and Chen, Xieyuanli},
  journal={IEEE Transactions on Industrial Informatics}, 
  title={CVTNet: A Cross-View Transformer Network for LiDAR-Based Place Recognition in Autonomous Driving Environments}, 
  year={2023},
  doi={10.1109/TII.2023.3313635}}

License

Copyright 2022, Junyi Ma, Xieyuanli Chen, Jun Zhang, HAOMO.AI Technology Co., Ltd., China.

This project is free software made available under the GPL v3.0 License. For details see the LICENSE file.

overlaptransformer's People

Contributors

bit-mjy avatar chen-xieyuanli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

overlaptransformer's Issues

does it only work with lidar.

hi, thanks for the great contribution on loop closure detection work.
I am wondering does it apply to sensor like rgbd camera (which have fov < 180 degress )
sorry for publish this Duplicate of # 31

Cannot get higher accuracy than Lidar-Iris when running test demo

Hi, thx a lot for your great effort.
I have followed the guidance and run the test demos, but the PR curve shows the precision recall break event point is about 0.64. Below is what did to generate the PR curve:

  1. generate depth images by python tools/utils/gen_depth_data.py, download model, calib, pose and ground truth files;
  2. run python test_kitti00_prepare.py and python test_kitti00_PR.py

PR

When I switch to Lidar-Iris, it seems the PR curve of Lidar-Iris showed better PRBEP, about 0.89

00pr

Is this comparison right?

CUDA out of memory

感谢你们的工作!
我最近在RTX3090运行该代码的时候会报CUDA out of memory,但是论文中使用的是RTX3070,请问是哪里出问题或者我该怎么减小显存的使用?谢谢!

format of depth image

十分感谢作者的工作,请问为什么seqot的深度图保存为numpy格式而overlaptransformer的深度图保存为png格式,这两者之间对网络的训练测试有什么影响吗?
屏幕截图 2023-05-26 111740
png格式
屏幕截图 2023-05-26 111815
numpy格式

SeqOT

请问可以直接使用SeqOT中的gen_ground_truth.py生成NCLT0108数据集的真实重叠值吗?我想要在NCLT数据集上测试一下OT但是由于数据集太大用转换并生成真实值太慢了。或者你可以提供一下NCLT0108数据集的真值、0205数据集的真值吗!

training tensors

Hi there,

I'm sorry for bother you again during the break.

I'm wondering why you set the input tensor requires_grad as true here

input_batch.requires_grad_(True)

As I understand, the input_batch are tensors of the range images which should not be modified in the training step (although do not impact the optimization since they are not added in the optimizer).

Appreciate your time and help.

Performance on 16 -beam lidar

Hi,
I recently read your work on Overlap Transformers and this is a really interesting approach! I was wondering if performance on 16-beam lidars was evaluated?

Recall@1% Ford Campus PR curve Questions.

Hello! I admire your work very much and appreciate your contribution to the field. After reading your paper and replicating your experiments, I have a few questions that I would be grateful if you could answer.

1.In Table 2 of your paper, there is a metric called Recall@1%. It is well-known that Recall@1 is equivalent to Top1 rate. What does Recall@1% mean in this context?

2.When replicating the experiments on the Ford Campus dataset, I extracted 3891 frames from the Ford dataset. However, the ground truth and pose.txt files provided on your website only have 3817 frames. Did you trim the dataset from the beginning (75-3891) or from the end (1-3817)? This information is crucial for the experiment, and I would appreciate it if you could clarify.

3.I have some questions regarding the PR curve. In your code, the faiss library is used for fast descriptor retrieval, and the resulting des_list is based on L2 distance. I'm not sure if my understanding is correct, but if it is, is it appropriate to directly compare this L2 distance to the threshold when calculating the PR curve?

Thank you very much for your help in addressing my concerns.

HAOMO Dataset links

Hi,

Thanks for your excellent work- I was interested in downloading the HAOMO dataset for my own work, but the links appear to be broken. Would you be able to fix these links or provide an alternative way to download?

Thank you,
Josh

16线激光雷达支持(16 line lidar support)

您好,方便提供支持16线激光雷达的配置吗?目前测试16线激光雷达数据时显示输入尺寸问题。

此外, 当输入为1800 32的激光雷达数据时, 描述子的尺寸为[2, 256], 说明网络输出的描述子尺寸是受输入尺寸影响的?

谢谢!!!

Will a coarse rotation guess be provided after a place recog?

Hi, Ma and Chen,

Really nice contribution! Your released code is easy to run and queries really fast!

Here is one request:

From my understanding, this model provides a similarity score. Which is good enough to find old visit.
But I didn't find a coarse rotation guess which is also interested in the application. Just as scancontext did.

I would be really happy if you could give me more information about the possibility of get a coarse yaw from your model. :D

gt_valid_folder and cov_files

Hello authors,

thank you for sharing your work!

Is the files in gt_valid_folder generated from OverlapNet demo? And How to get the cov_file from suma++?

I’m looking forward to hearing from you :)

loss function question

Hi, Junyi,

Thanks for sharing your excellent work.

I have a question about your loss function implementation. In the paper, the loss function is kp*(margin + max_p(d(Vq, Vp))) - ..., but in the implementation, the loss function is margin + kp*(max_n(d(Vq, Vp))) - ... . Since the num_pos and num_neg are both 6 in the setup and the loss is fixed to be >= 0.0, seems it does not matter in the experiment. To confirm, if the change of loss function will affect the training result?

Best,
Yanlong

Deploying C++ OT model on ROS

Hello, thank you for your great work!

Have you tried deploying the C++ OT model on ROS?
I am having a hard time running the OT model with ROS Kinetic in C++. The libtorch breaks the ROS when I try to include it in my package CMakelist.txt. But this "fast_ot.cpp" works in my environment compiled by cmake.

My Environment:
Ubuntu 20.04, CUDA 11.1, Pytorch 1.10.0, Libtorch 1.8.0/1.9.0/1.10.0 (all I have tried, but with no success on ROS)

关于Haomo数据集

您好,感谢您出色的开源工作,通过您给出的数据文件以及代码,我已经成功在KITTI数据集上复现了您的工作,我想在Haomo数据集上也进行实现,并且希望在我自己的工作中用到该数据集,您已经提供了Haomo数据集的scans文件以及poses文件,我应该如何生成train_set以及ground truth呢?如果您在OT代码中给出了生成KITTI sequence00的train_set以及groundt truth文件的代码,我想通过这个代码来生成train_set,但是并没有calib.txt文件,请问对与Haomo数据集来说生成train_set是否需要calib.txt文件,以及您是否会发布关于Haomo数据集的train_set和ground truth文件。
再次感谢您的开源工作!

Loop Decetion

Your work has taught me a lot. Can SeqOT also be used for loop closure detection in SLAM?

How to generate a three-channel depth map?

Hi, thank you very much for your work, and I admire it very much!
But I generated a single-channel depth map based on gen_depth_data in your utils. How can I generate a three-channel depth map with different colors like your paper? Attached is the depth map I generated.

000000

自己的数据集的train_set.npz如何制作

首先,感谢您优秀的工作,我对train_set.npz有一些疑惑,我用脚本简单查看了您提供的02段train_set.npz的数据如下
2024-04-07 14-47-22 的屏幕截图
我用overlapnet使用demo4_gen_gt_files.py生成的数据如下
2024-04-07 14-48-45 的屏幕截图
我理解的是overlapnet的demo4脚本生成的第一列是第零帧,第二列其他帧,第三列是重叠度,第四列是yaw角,我想修改程序生成相应的(57413, 3)的train_set.npz,但是我不太理解您提供的数据,我理解第一列和第二列是帧索引,第三列是两帧的重叠度,请问是这样子的意思吗?

Testing

Hello authors,thank you for sharing your work!When I run test_kitti00_prepare.py,it report an error :"IndexError: index 4516 is out of bounds for axis 0 with size 4516",may I ask why this problem arises?

ford campus dataset

Hello authors, thank you very much for sharing this project.
Due to the differences in the data formats of the ford campus dataset and the KITTI dataset, I can only do spherical projection of KITTI through the demo1 of Dr. Chen's OverlapNet codes. Therefore, I would like to know the main steps of processing the ford campus data into range maps.

pretrained model to generate results in the paper

Hello, thanks for the great work!

I download your pretrained model on kitti, generate depth data using gen_depth_data.py, and run

python test_kitti00_prepare.py
python test_kitti00_topN.py

but I get 0.883 for Top1 and 0.950 for Top1%, which are good enough but have a little gap to the 0.906 and 0.964 in the paper.

Could you please give me some suggestions to reproduce your results? Thanks!

ground truth of the test dataset

你好,关于你们代码中给的00sequence的ground truth文件内容好像有点问题,只有关于最后几帧的数据,另外请问你们是如何得到loop_gt_seq00_0.3overlap_inactive.npz. 期待你们的回复,谢谢!

train-set的一些疑问?

首先,非常感谢你们的十分优秀的项目。

1.对于train_set的建立,目前我根据你的建议使用OverlapNet中demo4_gen_gt_files,也对于每帧数据进行计算了。

2023-12-08_16-33
计算的是kitti的04序列(271个) ,但是我目前根据com_overlap_yaw和normalize_data得到的dist_norm_data是(9100, 4)以及根据split_train_val得到的train_data(8190, 4)和validation_data(910, 4),与我读到你们分享的04的train_data(2674,)差异很大
,请问一下我的处理哪里出现问题了?可以分享一下你们的处理代码吗?

2.对于你的其他两个项目(CVTNet和SeqOT)对于我的数据集效果不佳(我的数据类似地下车库的重复场景),使用你的预训练模型(未对我的数据集微调),想问问你有什么好的建议或者改进的地方?

期待你的回复!

training time

Hi Junyi,

A quick question, may I know how long it takes for training on the dataset KITTI and HAOMO on your device? Thanks.

Some problems I'm having when trying to test my own dataset

Thanks for your open source work! @BIT-MJY @Chen-Xieyuanli I plan to record a data set myself to test the effect of place recognition in the case of long-term or partial changes in the work, but in the process of looking at the code, I don't quite understand the generation of some files. 1. kitti experimental training uses the kitti03-09 data set, and the test uses the 00 data set, so what is the 02 data set in data_root used for? 2. I plan to collect data for experiments according to the Haomo experiment, but in the haomo.ywl file, how are the three npy files triplets_for_training, pose_file_database and pose_flie_query generated? Could you please let us know if it is convenient for you?

关于毫末数据集的相关文件

@BIT-MJY 你好 又是我 再次感谢您的开源工作!
我正在复现毫末数据集上的结果 请问我该如何获得config_haomo.yml中的诸如more_chosen_normalized_data_1208_1_01.npy和gt_0.3overlap_1.2tau_bet_two_traj.npy这些文件呢 请问可以公开吗

请问在test的时候生成query的描述子,将这个描述子和之前的描述子是如何匹配的?

作者,您好!test的过程是我理解的这样吗?先将query scan生成描述子,然后将他和之前的描述子匹配找到合适的candidate scan,再计算query scan和candidate scan的overlap来确定是否闭环。如果是这样的,请问描述子之间是如何匹配的,是看256维度的描述子是否一致吗?能否仅看描述子的匹配程度就确定是否闭环而不进一步计算overlap?

Fine Tuning and no. of epochs

Dear Authors, Thanks for giving code for your novel method !!
I have few questions:

  1. Can we fine-tune your kitti model on other datasets by making self.resume to True in this line?
  2. You have set max epochs to 100 in your code. But when I am doing fine-tuning(self.resume=True), then the training is starting from 20th epoch. So have you trained your kitti model for 19 epochs ?? I can't find that in paper and README.
    I am asking because the model is taking around 30mins for each epochs on kitti(whether fine-tuning or from scratch). So it's a bit long for training for 100 epochs.
  3. you have skipped those anchors where neg_num(no. of negative samples) are 0 for the current anchor. In my dataset, I am getting pos_num as 0 for some anchors. Hope it's fine to also skip when pos_num == 0 ?

    @BIT-MJY @Chen-Xieyuanli

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.