Git Product home page Git Product logo

gomaa-geo's Introduction

GOMAA-Geo

PyTorch implementation of GOMAA-Geo: GOal Modality Agnostic Active Geo-localization

This repository is the official implementation of GOMAA-Geo, a goal modality agnostic active geo-localization agent that can geo-localize a goal location -- specified as an aerial patch, ground-level image, or textual description -- by navigating partially observed aerial imagery.

⏭️ Next

  • Update Gradio demo
  • Release Models to 🤗 HuggingFace
  • Release PyTorch ckpt files for all models

🎬 Installation

You can use the following commands to install the necessary dependencies to run the code:

conda create --name gomaa_geo
conda activate gomaa_geo
conda install python==3.11
pip install -r requirements.txt

⬇️ Getting the data

To run the code with the Masa or xBD data, download the zip file at the following link: https://www.kaggle.com/datasets/balraj98/massachusetts-buildings-dataset To run the code with the xBD data, download the zip file at the following link: https://xview2.org/ (Note that, In order to download the dataset, first login using a valid email id) To run the code with our MM-GAG data, download the zip file at the following Anonymous Huggingface link: https://huggingface.co/datasets/MVRL/MMGAG

The uncompressed folder named data should be placed at the root directory of this repository. This folder includes processed data for the following active geo localization problems:

  • Masa dataset in masa_data, xBD dataset in 'xBD_data'

This folder also includes other intermediate results to recreate our analyses and figures.

📄 Specify all configurations

Before setting up data or running experiments, setup all parameters of interest in the file config.py. This includes grid size, model configuration, training configuration etc.

📀 Process the Data

Extract all data of interest in the folder gomaa_geo/data

Then, create patches (grids) for each image in a dataset using the following script:

python -m gamma_geo.data_utils.get_patches

Then get CLIP-MMFE embeddings for each patch:

python -m gamma_geo.data_utils.get_sat_embeddings_sat2cap

Using the same script and function get_ground_embeds one can create embeddings for ground level images.

To create text embeddings, run the following script:

python -m gamma_geo.data_utils.get_text_embeddings

🔥 Running the code

To run our pre-training procedure with GOMAA-Geo, use the following commands:

python -m gomaa_geo.pretrain

Again, all parameters of interest must be specified in the config.py file.

The weights of the trained llm network at each iteration will be locally saved in gomaa_geo/checkpoint/

To run training of the pre-trained model, use the following command:

python -m gomaa_geo.train

To run inference, run the following command:

python -m gomaa_geo.validate

Set the path to the pre-trained llm in the variable: cfg.train.llm_checkpoint

To visualize exploration behaviour of the trained model, run the following script:

python -m gomaa_geo.viz_path --idx=77 --start=0 --end=24

where, idx is image id, start is the starting position and end is the goal position.

🐨 Model Zoo

Download GOMAA-Geo models from the links below: Coming Soon ...

📑 Citation

@article{sarkar2024gomaa,
  title={GOMAA-Geo: GOal Modality Agnostic Active Geo-localization},
  author={Sarkar, Anindya and Sastry, Srikumar and Pirinen, Aleksis and Zhang, Chongjie and Jacobs, Nathan and Vorobeychik, Yevgeniy},
  journal={arXiv preprint arXiv:2406.01917},
  year={2024}
}

🔍 Additional Links

Check out our lab website for other interesting works on geospatial understanding and mapping:

  • Multi-Modal Vision Research Lab (MVRL) - Link
  • Related Works from MVRL - Link

gomaa-geo's People

Contributors

aleksispi avatar vishu26 avatar anindyasarkariith avatar

Stargazers

Ningnan Wang avatar  avatar  avatar Robin Cole avatar Nathan Jacobs avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.