Git Product home page Git Product logo

skysensegpt's Introduction

License Paper Paper GitHub stars

SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding

📢 News and Updates

🔥🔥🔥 Last Updated on 2024.07.22🔥🔥🔥

  • 2024.07.22: The FIT-RSFG benchmark have been uploaded here and the evaluation scripts have been released here! See Evaluation for details on how to evaluate.
  • 2024.07.20: The FIT-RS dataset (training set 1415k) categorized by tasks have been uploaded here.
  • 2024.07.01: The FIT-RS dataset (training set 1415k) have been uploaded here.
  • 2024.06.17: Our paper is available in arxiv!
  • 2024.06.07: First version.

📌 Introduction

[Paper][Dataset][Model][Code]

In this project, we propose the FIT-RS (Remote Sensing Fine-Grained Instruction Tuning) dataset, which contains 1,800,851 high-quality instruction samples covering various vision-language comprehension tasks. FIT-RS aims to enhance the fine-grained comprehension ability of Remote Sensing Large Multi-Modal Models (RSLMMs), specifically their ability to understand semantic relationships among objects in complex remote sensing scenes. Based on FIT-RS, we establish the FIT-RSFG (Remote Sensing Fine-Grained Comprehension) Benchmark to evaluate RSLMMs' ability in fine-grained understanding.

In addition, we constructed the FIT-RSRC (Remote Sensing Relation Comprehension) Benchmark, which adopts the common-used single-choice format and CircularEval strategy. It includes high-quality distractor options derived from commonsense word lists, as well as unanswerable questions, aiming to evaluate the Remote Sensing Relation Comprehension capabilities of LMMs.

🛠️ Table of Contents

⭐️ Dataset and Download

  • FIT-RS
  • FIT-RS is a large-scale fine-grained instruction tuning dataset, which contains 1,800,851 high quality instruction samples, aiming at enhancing the fine-grained comprehension ability of RSLMMs.

    Introduction

  • FIT-RSRC

    Given the current lack of a publicly available benchmark for comprehensive and quantitative evaluation of existing LMMs in remote sensing relation understanding, we propose the FIT-RSRC (Remote Sensing Relation Comprehension) benchmark. It is designed in the form of single-choice questions, containing four different types of questions and high-quality distractor options. Following the mainstream general benchmark, FIT-RSRC employs CircularEval as the evaluation strategy.

    Introduction

  • Download Links
    • FIT-RS: A fine-grained remote sensing instruction tuning dataset, containing 1800k instruction samples, 1415k for training.
    • FIT-RSFG: A fine-grained benchmark for remote sensing vision-language evaluation.
    • FIT-RSRC: A single-choice benchmark for remote sensing relation comprehension evaluation.
    • SkySenseGPT: A remote sensing large multi-modal model, capable of handling complex comprehension tasks like image-level scene graph generation.

Evaluation

  1. Download FIT-RSFG and FIT-RSRC Benchmarks.
  2. Install necessary packages as in the requirements.txt.
  3. See evaluation.sh for evaluation.

License

This project is released under the Apache 2.0 license.

Citation

Our FIT-RS dataset is built based on the STAR dataset. If you find this work helpful for your research, please consider giving this repo a star ⭐ and citing our paper:

@article{luo2024sky,
  title={SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding},
  author={Luo, Junwei and Pang, Zhen and Zhang, Yongjun and Wang, Tingzhu and Wang, Linlin and Dang, Bo and Lao, Jiangwei and Wang, Jian and Chen, Jingdong and Tan, Yihua and Li, Yansheng},
  journal={arXiv preprint arXiv:2406.10100},
  year={2024}
}

@article{li2024scene,
  title={STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery},
  author={Li, Yansheng and Wang, Linlin and Wang, Tingzhu and Yang, Xue and Luo, Junwei and Wang, Qi and Deng, Youming and Wang, Wenbin and Sun, Xian and Li, Haifeng and Dang, Bo and Zhang, Yongjun and Yu, Yi and Yan Junchi},
  journal={arXiv preprint arXiv:2406.09410},
  year={2024}
}

We are thankful to LLaVA-1.5 and GeoChat for releasing their models and code as open-source contributions.

skysensegpt's People

Contributors

luo-z13 avatar

Stargazers

JIMMY ZHAO avatar Tongfei avatar Keyan Chen avatar  avatar Tiantian avatar  avatar  avatar  avatar  avatar Minggang Dou avatar James avatar Ben avatar Zhen Pang avatar YcHades avatar Brendan Collins avatar ParatrooperAndy avatar  avatar  avatar  avatar Sanctuary avatar Guo Rui avatar Yijie Zheng avatar CHENG XIN avatar Apple tea avatar  avatar MingTao(陶明) avatar Kai Tang (唐 凯) avatar teddy avatar 爱可可-爱生活 avatar 李开宇 avatar Ishan Marikar avatar Casey Hilland avatar Robin Cole avatar  avatar Jieyi Tan avatar HLB avatar lansfair avatar  avatar Jiaqing Zhang avatar Qingyun avatar Kang Wu avatar Zhuzi24 avatar Theron Skilton avatar Linchn avatar  avatar

Watchers

Sanctuary avatar  avatar  avatar  avatar  avatar  avatar

Forkers

drroad wwlaoxi

skysensegpt's Issues

About License

Hi,

Thank you for sharing the great work. I am wondering about the licensing.
Over GitHub, the license says it is Apache 2.0 license.
However, over Hugging face it states: Creative Commons Attribution Non Commercial 4.0. [https://huggingface.co/datasets/ll-13/FIT-RS/tree/main/FIT-RSFG]

My question is that is it allowed to use for commercial purposes?

Regards
Mustansar

FIT-RSFG

May I ask when FIT-RSFG will be released? I only see FIT-RSRC

什么时候完整的开源训练代码和模型

您好,看到咱们团队的在遥感方向的大模型研究效果,很是厉害。想问一下微调代码和模型,什么时候开源了,到时候想测试验证一下,谢谢。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.