Git Product home page Git Product logo

nlp_analysis_on_us_airline_tweets's Introduction

NLP Analysis on US Airline Tweets

I used a natural language processing (NLP) approach to understand the topics underlying customer tweets within a Twitter US Airlines dataset. I used a semi-supervised classification method where I first classified the sentiments of the tweets as positive/neutral or negative using a Logistics Regression model. Then, I overlayed topic modeling/dimensionality reduction techniques to extract unique topics for each sentiment. Finally, I created an application using Twitter’s API and Flask to deploy my model to production.

Project Intro/Objective

Today, social media data in the form of Twitter, Facebook and Instagram posts have become a place to instantaneously know of the general perception. People have openly started sharing their opinions and customers have changed their way of engaging with brands online. One of the largest industries that has been affected by this shift to online customer engagement is the US Airline industry.

When it comes to customer service, travelers are increasingly skipping calls to the airlines and are instead taking their requests to Twitter and Facebook and airlines are responding by expanding their social media staff to aid travelers. With rising online customer inquiries, it makes a lot of sense to have a system in place that is capable of ingesting this incessant stream of data, organizing them into categories, and then directing the customers to staff responsible for those categories so that they can answer those questions effectively and quickly. To do this, I wanted to create a classifier that would be able to identify the sentiments behind the customer tweets as negative or positive/neutral, and extracts the main topics associated with each sentiment. Negative tweets can provide insight to issues that are bothering the customer, and the positive/neutral tweets can shed light on the effectiveness of the staff in resolving the issues. Through this analysis, US Airlines can better service their customers as well as evaluate their own performance.

Datasets Used

  • February 2015 Major US Airlines Tweets from Kaggle

Methods Used

  • Exploratory Data Analysis (EDA)
  • Data Visualization with Tableau
  • Natural Language Processing (NLP)
  • Topic Modeling
  • Sentiment Analysis
  • Feature Engineering
  • Application with Flask

Notable Technologies Used

  • Python 3, Jupyter Notebook
  • Nltk, Spacy, Scikit-learn # NLP Text Processing
  • CountVectorizer, TfidfVectorizer, NMF # Topic Modeling
  • Logistic Regression # Sentiment Analysis Model
  • Pandas, Numpy, Matplotlib, Seaborn, Tableau, Flask # Data Processing/Visualization tools
  • etc.

Main Analysis Threads

  • Tweet Cleaning - Tweet cleaning by removing hastags, retweets, @s, links, special characters
  • Tweet Tokenization and Vectorization - Tokenization through lowercasing and removal of numbers, punctuation and stopwords; term frequency-inverse document frequency count vectorization using ScikitLearn's TfidfVectorizer submodule
  • Sentiment Analysis - Used Logistic Regression model to classify the overall sentiment for each tweet based on features extracted from Count Vectorizer
  • Topic Modeling / Dimensionality Reduction - Ran topic modeling for each sentiment using non-negative matrix factorization (NMF) and TF-IDF with ngrams = 2; extracted 4 unique topics for negative tweets, and 3 unique topics for positive/neutral tweets
  • Application – Created a Flask application that outputs the sentiment and topic behind any customer tweet

Results and Conclusions

My best model for sentiment classification was a Logistic Regressions model. I considered KNN, Decision Tree, Random Forest, Gaussian NB, and XGBoost as well. However, I selected a Logistic Regressions model fine tuned with GridSearchCV based on both interpretability and the F1 Metric, since both negative and positive/neutral sentiments have telling information for US Airlines.

For my topics, I found that my best results were when I used features extracted with TF-IDF with ngrams = 2, though I also considered TF-IDF with ngrams = 1. I was able to extract 4 topics from negative tweets and 3 topics from positive/neutral tweets.

Reasons for Negative Tweets and Top Word Associations:

  • Flight Delays:
    • 'flight delayed', 'delayed hours', 'connecting flight'
  • Flight Cancellations:
    • 'flight cancelled', 'just cancelled', 'rebook help'
  • Customer Service Issues:
    • 'customer service', 'worst customer', 'hold hours'
  • Unknown

Reasons for Positive/Neutral Tweets and Top Word Associations:

  • Inquiries:
    • 'follow dm', 'need follow', 'dm info'
  • Customer Satisfaction:
    • 'customer service', 'great customer', 'dm sent'
  • Unknown

In conclusion, these insights can help US Airlines effectively identify their strengths and weaknesses through customer feedback, and ultimately find ways to improve customer service.

nlp_analysis_on_us_airline_tweets's People

Contributors

evaxu1995 avatar

Watchers

 avatar

Forkers

doyoungkwag

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.