Git Product home page Git Product logo

airline's Introduction

airline

Project logo

⚡ 🚀 Airline negative comments analysis


The objective of this project is to identify the most important issues faced by airline companies based on customers’ negative reviews.

📝 Table of Contents

🧐 About

The investigation is done through data analysis by using Python scripts. First, data cleaning is performed to narrow down the dataset to three major airline companies for analysis purpose. Then, data accuracy and classification are done using three methods: confusion matrix, Naïve Bayes, and Decision Tree. Lastly, results are analyzed and compared using Chunking, Word Cloud, and Topic Modeling.

Prerequisites

SQL, Tableau, Python or Jupyter Notebook

🔖 Data Cleaning

The dataset shown below is downloaded from Kaggle.com.; There are a total of 14 columns, 27284 lines, and no null values. In this dataset, the top 3 of the company are chosen for analysis. They are Air Canada Rouge, British Airways, and United Airlines.

image

Before the analysis, I used python NLTK, SQL and Tableau to checked the overall reviews about the airline industry, shows that the word “good” appeared the most in customer reviews, which has almost 16,000 counts.

SQL:

Tableau:

image

🌱 Classification

A deeper mining is done to explore why customers give negative reviews. Three attributes are kept for classification and topic modeling, they are airline_name, content, and recommended. Number 0 represents negative reviews and number 1 represents positive reviews.

Here are the total counts of the negative and positive reviews of each airline. United Airlines and Air Canada Rouge have significant higher negative review counts than positive. It is very critical to investigate what causes this result and how to improve it.

A confusion table is created to test the accuracy of the interpretation results. To begin, two attributes are extracted from data frame: content and recommended. Then, the reviews are converted to a list of a list.

Here are the object sets for each airline, which are lists of tuples. The review contents are broken down into individual words, and these words are labeled as neg or pos. Adjustives are selected for analysis only to eliminate background noise words.

Tagged the wrods:

Extracted Adjustives:

Classification starts after defining each feature set. First, training and testing set are generated at 80/20. Then, Naïve Bayes and Decision Tree are used to compare results and accuracy.

🌳 Result of classification and confusion table:

In classification, Naïve Bayes and Decision Tree are used for testing accuracy; and confusion tables are set for visualization of the algorithm. In addition, Chunking and Word Cloud are used to extract informative words from customers’ negative reviews.

Air Canada Rouge:

For Air Canada Rouge, the accuracy of the two models are high which is at around 90%. According to the confusion table, five reviews should be negative but prediction shows positive; and eight reviews should be positive but prediction shows negative. The overall accuracy is high and the most informative word for this airline is “unconformable”. From Chucking and Word Cloud, many customers complained the seat being uncomfortable; they also feel uncomfortable because of limited leg room.

British AirwaysAir:

British Airways has an accuracy of 80% from the two models. Nineteen reviews should be negative but prediction shows positive, and twenty six reviews should be positive but predition shows negative. The most informative words for this airline are “awful”, “terrible”, “worst”, “uncomfortable” and “disappointed”. From Chucking and Word Cloud results, customers mainly complained about the seat, food, and schedule delays.

United Airlines:

United Airlines has an accuracy of 85% from the two models. 31 reviews should be negative but prediction shows positive, and 40 reviews should be positive but predition shows negative. The overall accuracy is good and the most informative words for this airline are “worst”, “terrible” and “rude”. From Chucking and Word Cloud results, customers mainly complained about the seat, food, and customer service

🌽 Topic Modeling

Topic Modeling is also performed to compare with the result done by Chunking. First, the review contents are broken into individual words and initialized as a dictionary. Then, a corpus is generated, which is a library of words. Lastly, an LDA model is used to get the weight of each word in the negative reviews.

The circles shown below represent the corpus, and the distances between the circles represent similarity. For Air Canada Rouge, the main keywords are “seat”, “leg”, and “back”; for British Airways, the main keywords are “seat”, “food”, “time” and “hours”; for United Airline, the main keywords are “seat”, “service”, and “delay”. These words are very similar to previous results done by Chunking and Word Clouds.

Air Canada Rouge:

British AirwaysAir:

United Airlines:

🎉 Conclusion

In conclusion, this data analysis project has discovered the major customer complaints for the top three airline companies. They are categorized as seat comfortableness, food quality, customer service, and schedule on-time performance. If an airline company can provide comfortable seats, high quality food, exceptional customer service, and on-time schedule performance, then they would receive high customer satisfactions and good reviews, and they would be on the way to becoming a very successful airline company in this competitive industry.

airline's People

Contributors

yinghu1234 avatar

Stargazers

Tech Cow avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.