Git Product home page Git Product logo

disaster-response-pipeline's Introduction

Disaster Response Messages

A set of messages related to disaster response, covering multiple languages, suitable for text categorization and related natural language processing tasks.

Contents

  1. Motivation
  2. Dataset
  3. Installation
  4. Files Description
  5. Results

Motivation

In this project, I am using my Data Engineering skills to analyze disaster data from Figure Eight. The classifier model is built using Extract, Transform and Load process(ETL), natural language processing(NLP) and machine learning pipeline for classifying disaster messages. The project also includes a web app where an emergency worker can input a new message and get classification results in several categories. It can be useful to detect what messages actually need attention during the event of a disaster.


Dataset

The dataset contains 30,000 messages drawn from events including an earthquake in Haiti in 2010, an earthquake in Chile in 2010, floods in Pakistan in 2010, super-storm Sandy in the U.S.A. in 2012, and news articles spanning a large number of years and 100s of different disasters.

The data has been encoded with 36 different categories related to disaster response and has been stripped of messages with sensitive information in their entirety.

Disaster response messages dataset consists of imbalanced category labels data. Some labels like aid-related, weather-related have much more examples as compared to other categories. This imbalance might affect the model training as the classes are not represented equally. It can be handled by resampling the dataset or by generating synthetic samples. Although I have not applied these methods for now but I am planning to do it in future.


Installation

  1. Run the following commands in the project's root directory to set up database and model.
  • To run ETL pipeline that cleans data and stores in database python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
  • To run ML pipeline that trains classifier and saves python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
  1. Run the following command in the app's directory to run your web app. python run.py

  2. Go to http://0.0.0.0:3001/


Files Description

This is the file-folder structure of the project.

.
├── app     
│   ├── run.py                           #Flask file that runs app
│   └── templates   
│       ├── go.html                      #Classification result page of web app
│       └── master.html                  #Main page of web app    
├── data                   
│   ├── disaster_categories.csv          #Dataset including all the categories  
│   ├── disaster_messages.csv            #Dataset including all the messages
│   └── process_data.py                  #Data cleaning
├── models
│   └── train_classifier.py              #Training ML model           
└── README.md

Results

The model has accuracy of 95.34% which is really good and precision of 76%. Here is the screenshot of the webpage:

Disasters (1)

Example search message

Disasters

disaster-response-pipeline's People

Contributors

shweta-yadav15 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

disaster-response-pipeline's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.