Git Product home page Git Product logo

geosam's Introduction

GeoSAM: Fine-tuning SAM with Sparse and Dense Visual Prompting for Automated Segmentation of Mobility Infrastructure

(Hello, this is currently the old approach we tried. The "updated" approach that can take "text" as prompts in addition to sparse or click prompts can be found in the GeoSAM_with_text branch)

This repository is dedicated to the work of GeoSAM. Please find the paper here: Link

Also, please find the link for the weights.

See the demo here.

This work has been submitted. Waiting for the decision.

Abstract:

The Segment Anything Model (SAM) has shown impressive performance when applied to natural image segmentation. However, it struggles with geographical images like aerial and satellite imagery, especially when segmenting mobility infrastructure including roads, sidewalks, and crosswalks. This inferior performance stems from the narrow features of these objects, their textures blending into the surroundings, and interference from objects like trees, buildings, vehicles, and pedestrians - all of which can disorient the model to produce inaccurate segmentation maps. To address these challenges, we propose Geographical SAM (GeoSAM), a novel SAM-based framework that implements a fine-tuning strategy using the dense visual prompt from zero-shot learning, and the sparse visual prompt from a pre-trained CNN segmentation model. The proposed GeoSAM outperforms existing approaches for geographical image segmentation, specifically by 26%, 7%, and 17% for road infrastructure, pedestrian infrastructure, and on average, respectively, representing a momentous leap in leveraging foundation models to segment mobility infrastructure including both road and pedestrian infrastructure in geographical images.

GeoSAM

## Acknowledgement We want to thank these two works for their open-source code and contributions to the respective fields!

Segment Anything Model (SAM)

MAPPING THE WALK: A SCALABLE COMPUTER VISION APPROACH FOR GENERATING SIDEWALK NETWORK DATASETS FROM AERIAL IMAGERY.

Citations

If these codes are helpful for your study, please cite:

@article{sultan2023geosam,
  title={GeoSAM: Fine-tuning SAM with sparse and dense visual prompting for automated segmentation of mobility infrastructure},
  author={Sultan, Rafi Ibn and Li, Chengyin and Zhu, Hui and Khanduri, Prashant and Brocanelli, Marco and Zhu, Dongxiao},
  journal={arXiv preprint arXiv:2311.11319},
  year={2023}
}

geosam's People

Contributors

rafiibnsultan avatar

Stargazers

Qtian avatar  avatar ParatrooperAndy avatar Minseong Kweon avatar Robin Cole avatar Motonari Tsuzuki avatar Julius Fricke avatar  avatar Jacopo Lungo Vaschetti avatar  avatar teddy avatar Anthony-Hoo avatar  avatar  avatar Anjiang avatar Samuel Bancroft avatar  avatar  avatar YinxiaCao avatar ZWT avatar Matt Painter avatar JasonWang2019 avatar shubham avatar Flechazo. avatar Kien Nguyen avatar Wang Zhuo avatar  avatar Anuj Singh avatar poplar avatar Wenjie avatar Zhu Chenhe avatar Meiqing Li avatar Ariel Kadouri avatar TaoBingcheng avatar Julien Seillade avatar forestbat avatar Kan Wei avatar xialongyun avatar wei.li avatar  avatar Hasan Iqbal (汉森) avatar  avatar mohammad amin roshani avatar  avatar

Watchers

 avatar

geosam's Issues

Thank u!

Hello author, I read your article with great interest, as I am also a researcher in this field. I would like to know how the h5 file was generated. Could you please share some insights on this? Thank you for your assistance.

Dataset questions

Great work! Hello, I would like to know what formats of data are stored in image_dir, mask_dir and gt_dir in your dataset, and what do these data correspond to?
Snipaste_2024-06-19_17-14-47

关于数据集~~~

请问可以提供一下在训练和推理阶段使用的数据集嘛?

Plz provide the data

Hello, author! Thank you for publishing this effective research! I am very interested in your research findings, and I would like to reproduce your research code. Could you provide the data (including image, gt, mask and the image_embeddings.h5 file) for this research ?
And could you inform me about the structure and dimension of the file 'image_embeddings.h5'?
Thanks for your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.