GOMAA-Geo

PyTorch implementation of GOMAA-Geo: GOal Modality Agnostic Active Geo-localization

Anindya Sarkar*, Srikumar Sastry*, Aleksis Pirinen, Chongjie Zhang, Nathan Jacobs, Yevgeniy Vorobeychik (*Corresponding Author)

This repository is the official implementation of GOMAA-Geo, a goal modality agnostic active geo-localization agent that can geo-localize a goal location -- specified as an aerial patch, ground-level image, or textual description -- by navigating partially observed aerial imagery.

⏭️ Next

Update Gradio demo
Release Models to 🤗 HuggingFace
Release PyTorch ckpt files for all models

🎬 Installation

You can use the following commands to install the necessary dependencies to run the code:

conda create --name gomaa_geo
conda activate gomaa_geo
conda install python==3.11
pip install -r requirements.txt

⬇️ Getting the data

To run the code with the Masa or xBD data, download the zip file at the following link: https://www.kaggle.com/datasets/balraj98/massachusetts-buildings-dataset To run the code with the xBD data, download the zip file at the following link: https://xview2.org/ (Note that, In order to download the dataset, first login using a valid email id) To run the code with our MM-GAG data, download the zip file at the following Anonymous Huggingface link: https://huggingface.co/datasets/MVRL/MMGAG

The uncompressed folder named data should be placed at the root directory of this repository. This folder includes processed data for the following active geo localization problems:

Masa dataset in masa_data, xBD dataset in 'xBD_data'

This folder also includes other intermediate results to recreate our analyses and figures.

📄 Specify all configurations

Before setting up data or running experiments, setup all parameters of interest in the file config.py. This includes grid size, model configuration, training configuration etc.

📀 Process the Data

Extract all data of interest in the folder gomaa_geo/data

Then, create patches (grids) for each image in a dataset using the following script:

python -m gamma_geo.data_utils.get_patches

Then get CLIP-MMFE embeddings for each patch:

python -m gamma_geo.data_utils.get_sat_embeddings_sat2cap

Using the same script and function get_ground_embeds one can create embeddings for ground level images.

To create text embeddings, run the following script:

python -m gamma_geo.data_utils.get_text_embeddings

🔥 Running the code

To run our pre-training procedure with GOMAA-Geo, use the following commands:

python -m gomaa_geo.pretrain

Again, all parameters of interest must be specified in the config.py file.

The weights of the trained llm network at each iteration will be locally saved in gomaa_geo/checkpoint/

To run training of the pre-trained model, use the following command:

python -m gomaa_geo.train

To run inference, run the following command:

python -m gomaa_geo.validate

Set the path to the pre-trained llm in the variable: cfg.train.llm_checkpoint

To visualize exploration behaviour of the trained model, run the following script:

python -m gomaa_geo.viz_path --idx=77 --start=0 --end=24

where, idx is image id, start is the starting position and end is the goal position.

🐨 Model Zoo

Download GOMAA-Geo models from the links below: Coming Soon ...

📑 Citation

@article{sarkar2024gomaa,
  title={GOMAA-Geo: GOal Modality Agnostic Active Geo-localization},
  author={Sarkar, Anindya and Sastry, Srikumar and Pirinen, Aleksis and Zhang, Chongjie and Jacobs, Nathan and Vorobeychik, Yevgeniy},
  journal={arXiv preprint arXiv:2406.01917},
  year={2024}
}

🔍 Additional Links

Check out our lab website for other interesting works on geospatial understanding and mapping:

Multi-Modal Vision Research Lab (MVRL) - Link
Related Works from MVRL - Link

mvrl / gomaa-geo Goto Github PK

gomaa-geo's Introduction

GOMAA-Geo

⏭️ Next

🎬 Installation

⬇️ Getting the data

📄 Specify all configurations

📀 Process the Data

🔥 Running the code

🐨 Model Zoo

📑 Citation

🔍 Additional Links

gomaa-geo's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent