Git Product home page Git Product logo

disaster-response-pipeline's Introduction

Disaster Response Pipeline Project

Instructions:

  1. Run the following commands in the project's root directory to set up your database and model.

    • To run ETL pipeline that cleans data and stores in database python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
    • To run ML pipeline that trains classifier and saves python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
  2. Run the following command in the app's directory to run your web app. python run.py

  3. Go to http://127.0.0.1:3001/

Motivation

In this project, we will analyze message data provided by Figure Eight with proper labeling. Our goal is to apply nltk library from Python to classify the message to correct label. We try to capture features from NLP process using NLTK to help us refine the classification.

Introduction

  • Firstly, we start from data processing that turn raw message and label data into one dataset for later use.
  • Secondly, we apply Machine Learning pipeline fitting using AdaBoost and RandomForest algorithm. We also output the metrics data from the process for later comparison
  • Finally, we demonstrate our result using a web app built by Flask. Users can also do online classification on the webpage.

Dataset

There are two major dataset in our project, and both of them are within data folder. Disaster_messages.csv contains the id and raw text messages and disaster_categories.csv contains labels according to message ID.

Libraries Used

  • nltk
  • pandas
  • plotly
  • sci-kit learn
  • sqlalchemy
  • os
  • flask

Files and Folders

Notebook
- ETL pipeline preparation
- ML pipeline preparation
- Graph Notebook
App Folder 
|--templates folder
| |-go.html: HTML that handles online classification results
| |-master.html: HTML for main web page
|-run.py: Main script for flask web app
Data Folder 
|-disaster_categories.csv: Classification label of messages
|-disaster_messages.csv: Dataset of messages
|-DisasterResponse.db: SQLite database file
|-process_data.py: Script for handling data processing
Models Folder 
|-classifier.pkl: Pickel file of trained model
|-Metrics.csv: Dataset storing metric result of ML pipeline fitting
|-train_classifier.py: Script for running ML pipeline

Summary

  • We are now able to show visualization of our dataset and metrics of our ML pipeline on our main page just like image below image

  • We can also perform online classification if you type text messages into the box. The result will be shown as below image

  • Our algorithms can achieve high accuracy over 90%, but this is trivial in this case. Because there are mostly zeroes in our label matrix, we achieve high accuracy simply by guessing all labels 0. Instead, we use f1 score to evaluate our model, which is more meaningful in this application.

  • As mentioned in last paragraph, this is a very imbalanced dataset, meaning that there are a lot of zeros in our label. For some categories, you can observe that they contained mostly 0. This makes classification hard (or impossible) to function normally. Also, it makes evaluation even more difficult because you may encounter divide by 0 situation when calculating f1 score. You can see the influence by checking out last plot in our main page(capture below). Those blank point represent NAN value cause by divide by zero situation and some categories with low f1 score are caused by sparse label vector. image

Acknowledgement

Special thanks to figure eight for providing the dataset and Udacity for providing information and concept to accomplish the project.

disaster-response-pipeline's People

Contributors

burgerwu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.