ajzhai / peanut Goto Github PK

[ICCV 2023] PEANUT: Predicting and Navigating to Unseen Targets

Home Page: https://ajzhai.github.io/PEANUT/

License: MIT License

Shell 0.95% Python 97.57% Dockerfile 0.23% Jupyter Notebook 1.09% Makefile 0.06% CSS 0.01% Batchfile 0.07%

computer-vision deep-learning indoor-navigation navigation robotics visual-navigation object-goal-navigation exploration

peanut's Introduction

PEANUT: Predicting and Navigating to Unseen Targets

Albert J. Zhai, Shenlong Wang
University of Illinois at Urbana-Champaign

ICCV 2023

Paper │ Project Page

Requirements

As required by the Habitat Challenge, our code uses Docker to run. Install nvidia-docker by following the instructions here (only Linux is supported). There is no need to manually install any other dependencies. However, you do need to download and place several files, as follows:

File Setup

Make a folder habitat-challenge-data/data/scene_datasets/hm3d
Download HM3D train and val scenes and extract in habitat-challenge-data/data/scene_datasets/hm3d/<split> so that you have habitat-challenge-data/data/scene_datasets/hm3d/val/00800-TEEsavR23oF etc.
Download episode dataset and extract in habitat-challenge-data so that you have habitat-challenge-data/objectgoal_hm3d/val etc.
Download Mask-RCNN weights and place in nav/agent/utils/mask_rcnn_R_101_cat9.pth
Download prediction network weights and place in nav/pred_model_wts.pth

The file structure should look like this:

PEANUT/
├── habitat-challenge-data/
│   ├── objectgoal_hm3d/
│   │   ├── train/
│   │   ├── val/
│   │   └── val_mini/
│   └── data/
│       └── scene_datasets/
│           └── hm3d/
│               ├── train/
│               └── val/
└── nav/
    ├── pred_model_wts.pth
    └── agent/
        └── utils/
            └── mask_rcnn_R_101_cat9.pth

Usage

In general, you should modify the contents of nav_exp.sh to run the specific Python script and command-line arguments that you want. Then, simply run

sh build_and_run.sh

to build and run everything in Docker. Note: depending on how Docker is setup on your system, you may need sudo for this.

Evaluating the navigation agent

An example script for evaluating ObjectNav performance is provided in nav/collect.py. This script is a good entry point for understanding the code and it is what nav_exp.sh runs by default. See nav/arguments.py for available command-line arguments.

Collecting semantic maps

An example script for collecting semantic maps and saving them as .npz files is provided in nav/collect_maps.py. A link to download the original map dataset used in the paper is provided below.

Training the prediction model

We use MMSegmentation to train and run PEANUT's prediction model. A custom clone of MMSegmentation is contained in prediction/, and a training script is provided in prediction/train_prediction_model.py. Please see the MMSegmentation docs in the prediction/ folder for more info about how to use MMSegmentation.

Semantic Map Dataset

The original map dataset used in the paper can be downloaded from this Google Drive link.

It contains sequences of semantic maps from 5000 episodes (4000 train, 1000 val) of Stubborn-based exploration in HM3D. This dataset can be directly used to train a target prediction model using prediction/train_prediction_model.py.

Citation

Please cite our paper if you find this repo useful!

@inproceedings{zhai2023peanut,
  title={{PEANUT}: Predicting and Navigating to Unseen Targets},
  author={Zhai, Albert J and Wang, Shenlong},
  booktitle={ICCV},
  year={2023}
}

Acknowledgments

This project builds upon code from Stubborn, SemExp, and MMSegmentation. We thank the authors of these projects for their amazing work!

peanut's People

Contributors

Stargazers

Watchers

Forkers

jackzhousz debikasamanta joaomcm chen37058

peanut's Issues

running time

When you tested this in HM3D, how about the approximate running time of each frame? While testing, I found that each frame took up to 5 seconds. Is it normal?

Reproducing MP3D Experiments

Hi,
Thanks for this work.
Would it be possible for you to also open-source the model weights and code for reproducing the MP3D experiments (Table 2)?
Besides, I have a question for PONI results, in their official paper the Success was 31.8, why is this result lower in PEANUT paper?
(Sorry for opening two issues.)

distribute train

Hello! I'm so sorry to bother you that I want to know how to successfully use the DDP in train_prediction_model.py.
Where should I init the process group to avoid error? Or do you have the guidance to help us use the DDP in the code?

Map Prediction Data Collection

Hi,
I see in the collect_mp.py file you are using the pre-trained MaskRCNN to collect semantic maps for map prediction.
May I know if you tried to use ground truth segmentation (habitat-sim semantic sensor) for the map data?
Thanks

Semantic Map Dataset for MP3D

Hi, thanks so much for your wonderful work.

Could you please share the semantic map dataset for MP3D? There the HM3D one is shared in the repo. Thanks!

ObjectNav results on HM3D (test-standard)

Hello! I'm so sorry to bother you again that I would like to know if the performance of PEANUT on HM3D is test on 500 episodes or on all 2000 episodes? Cause now I use my reproduced checkpoint to obtain a comparable performance when I run 500 episodes, but on all 2000 episodes its performance is lower than before.

which version of habitat?

According the "File Setup" part, using the episode dataset will happen a wrong which can't distinguish the key "scene_dataset_config" in the [episode dataset].

Evaluation on Map Prediction

Hi,
Thanks for this great work!
I see in the provided, saved semantic map there are train and val folders.
Have you tried to use the val data to validate and test the map predictor? I tried to integrate an Evaluation Hook during training, but the results are mostly all zero IoU. I'm not sure if I am wrong.
Also, I tried to run all the 2000 episodes in HM3D val set, but only 30 of them got non-zero map prediction. I am using your official model, is this expected?

How to finetune maskrcnn on HM3D

Hello! I'm so sorry to bother you. I was hoping you could guide me on finetuning a COCO-pretrained Mask R-CNN using HM3D images. I am interested in exploring other semantic segmentation methods and would greatly appreciate it if you could recommend any available datasets or sample training pipelines. Thank you very much for your time and consideration!😸

About the switch between stubborn and peanut

Hi, I see there is implementation for switching between corner goal and prediction goal, but seems that the switch step is set to 0.
Does peanut just rely on the prediction model for deciding the long-term goal at a frequency?

Thanks for your interesting and inspiring work!