This Project is in collaboration with Figure Eight. The data contains pre-labelled tweet and messages from real-life disaster. The aim of the project is to build a Natural Language Processing tool that categorise messages.
The project contains the following parts:
- ETL Pipeline: process_data.py: reads in the data, cleans and stores it in a SQL database. The script merges the messages and categories datasets, splits the categories column into separate, clearly named columns, converts values to binary, and drops duplicates.
- Dataset: disaster_categories.csv and disaster_messages.csv
- DisasterResponse.db: created database from transformed and cleaned data.
- ML Model: train_classifier.py: includes the code necessary to load data, transform it using natural language processing, run a machine learning model using GridSearchCV, RandomForest and train it.
- Web App: run.py: Flask app and the user interface used to predict results and display them.
--> open folder in VS-Code
--> cd app
--> python run.py
--> Go to http://0.0.0.0:3001/
Dependencies
- Python 3.5+ (I used Python 3.7)
- Machine Learning Libraries: NumPy, SciPy, Pandas, Sciki-Learn
- Natural Language Process Libraries: NLTK
- SQLlite Database Libraqries: SQLalchemy
- Web App and Data Visualization: Flask, Plotly
Installing Clone this GIT repository: git clone https://github.com/singh728om/Twitter-Disaster-response
Executing Program:
- Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- Run the following command in the app's directory to run your web app. python run.py
- Go to http://0.0.0.0:3001/
Below are a few screenshots of the web app.
After clicking Classify Message, you can see the categories which the message belongs to highlighted in green