The nlp-with-disaster-tweets from mieczmik

Kaggle-nlp with disaster tweets

This is a tutorial in an IPython Notebook for the Kaggle competition, Natural Language Processing with Disaster Tweets. The goal of this repository is to provide an example of a competitive analysis for those interested in getting into the field of data analytics or using python for Kaggle's Data Science competitions.

Quick Start: View a static version of the notebook in the comfort of your own web browser.

Installation:

To run this notebook interactively:

Download this repository in a zip file by clicking on this link or execute this from the terminal: git clone https://github.com/Mieczmik/nlp-with-disaster-tweets.git
Install virtualenv.
Navigate to the directory where you unzipped or cloned the repo and create a virtual environment with virtualenv env.
Activate the environment with source env/bin/activate
Install the required dependencies with pip install -r requirements.txt.
Execute ipython notebook from the command line or terminal.
Click on nlp-with-disaster-tweets.ipynb on the IPython Notebook dasboard and enjoy!
When you're done deactivate the virtual environment with deactivate.

Kaggle Competition | Titanic Machine Learning from Disaster

"Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies). In this competition, you’re challenged to build a machine learning model that predicts which Tweets are about real disasters and which one’s aren’t. You’ll have access to a dataset of 10,000 tweets that were hand classified." From the competition homepage.

Goal for this Notebook:

Show a simple example of an analysis of the Titanic disaster in Python using a full complement of PyData utilities. This is aimed for those looking to get into the field or those who are already in the field and looking to see an example of an analysis done with Python.

This Notebook will show basic examples of:

Data Preprocessing

Importing Data with Pandas
Data Review
Work with text data:
- Tokenize tweets with TweetTokenizer
- Remove numbers, punctuation and hashtags
- Stemming words with PorterStemmer
- Create pipelines with Tfidf or Count Vectorizers

Data Analysis

Training listed below machine learning models using GridSearchCV and Pipelines:

Naive Bayes Classifier
Linear Support Vector Classifier
Support Vector Classifier
Logistic Regression
k-Nearest Neighbors
Decission Tree Classifier
Random Forest
Bagging Classifier
Extra Trees Classifier
Adaptive Boosting (AdaBoost)
Gradient Boosting
Extreme Gradient Boosting (xgboost)

Summary

Plotting results
Export results

mieczmik / nlp-with-disaster-tweets Goto Github PK

nlp-with-disaster-tweets's Introduction

Kaggle-nlp with disaster tweets

Installation:

Kaggle Competition | Titanic Machine Learning from Disaster

Goal for this Notebook:

This Notebook will show basic examples of:

Data Preprocessing

Data Analysis

Summary

nlp-with-disaster-tweets's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent