This project was done as a part of Udacity's Data Analyst Nanodegree program. The objective of this project was to use Python visualization libraries to explore a dataset systematically. The analysis begins with exploring single variables followed by bivariate and multivariate analysis.
This analysis is followed by a short presentation to convey and highlight important findings using explanatory data analysis. A slide deck was prepared that followed the major path of exploration and a story was constructed for the readers to understand what was discovered.
This dataset was selected from the list provided by Udacity. It is the Flights dataset that reports flight performance metric in the US. Data explored and analyzed is from years 2006 to 2008. The data is available from Bureau of Transportation Statistics official website.
Link to dataset: http://stat-computing.org/dataexpo/2009/the-data.html
Link to Bureau of Transportation Statistics: https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp
After analyzing this data, we found that the highest cause of delays came from the carrier or weather related problems. American Airlines was the worst performing carrier with highest delays and cancellations from 2006 to 2008. This has been shown in the presentation with the help of graphs and charts.
The plots shown in the presentation were chosen such that we can follow the tory of delays and cancellations at the same time and compare them. Since carrier performance can be gauged in terms of both these metric, they have been explored and highlighed in the presentation.