Git Product home page Git Product logo

remote-sense-image-captioning's Introduction

Image Captioning for Remote Sensing Data

Demo Video

Introduction

This project involves creation of an application for image captioning remote sensing data (satellite imagery) from the RSICD Dataset.

From urban planning and environmental monitoring to disaster management and agricultural analysis, the applications of remote sensing data are diverse and far-reaching.

Dataset Description

The dataset consists of three primary files: train.csv, test.csv, and valid.csv. These files contain information about image filenames and their respective captions. Each file includes multiple captions for each image to support diverse training techniques.

  • train.csv: This file contains filenames (filename column) and their corresponding captions (captions column) for training your image captioning model.
  • test.csv: The test set is included in this file, which contains a similar structure as that of train.csv. The purpose of this file is to evaluate your trained models on unseen data.
  • valid.csv: This validation set provides images with their respective filenames (filename) and captions (captions). It allows you to fine-tune your models based on performance during evaluation.

Evaluation Metric

BLEU (Bilingual Evaluation Understudy) Score - BLEU score provides a quantitative measure of the quality of generated captions compared to reference captions.

Model Used

GIT (GenerativeImage2Text), base-sized:

Model description

  • GIT is a Transformer decoder conditioned on both CLIP image tokens and text tokens. The model is trained using "teacher forcing" on a lot of (image, text) pairs.
  • The goal for the model is simply to predict the next text token, giving the image tokens and previous text tokens.
  • The model has full access to (i.e. a bidirectional attention mask is used for) the image patch tokens, but only has access to the previous text tokens (i.e. a causal attention mask is used for the text tokens) when predicting the next text token.
  • We fine-tuned the GIT-Base Model on the RSCID Dataset.

Application

The streamlit application involves taking the image as input from the user and getting inference from the model as the generated caption, as seen in the demonstration.

remote-sense-image-captioning's People

Contributors

raajanwankhade avatar deepakachu5114 avatar

Watchers

 avatar Aakarsh Bansal avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.