Git Product home page Git Product logo

investigate-tmdb-movies-dataset's Introduction

Investigate TMDB Movies Dataset

tmdb movies 2

Executive Summary

What is the project's main purpose?

This project seeks to analyze, draw insights, and share findings from a dataset "TMDB Movies dataset" using various data analysis methods ranging from Descriptive to Exploratory Data Analysis (EDA). This project is carried out using various techniques like data wrangling (gathering, assessing and cleaning), data visualization, and explanatory data analysis.

What are some of the key findings and conclusions drawn from your analysis?

As a result of the analysis carried out, here some of the key findings and conclusions considered includes:

  • Highest grossing movies of all time includes Avatar, Star Wars, Jurassic World, Titanic, The Net etc.
  • The most profitable year in the movies industry was 2015.
  • The most popular genres are Drama, Comedy, Thriller and Action. For investment purposes, this are genres that needs be given considered.
  • A well planned budget should be considered when investing in the movies industry.

Introduction

Overview

The dataset used to carry out our analysis is the TMDB Movies Dataset. This dataset contains information about the movie industry ranging from 1966 to 2015. The dataset is made up of 12 columns and 10866 rows. It fields include id, imdb_id, popularity, budget etc. Moreover, certain features like

  • 'cast', 'genres' and 'production_companies' contains multiple values which are seperated by pipes
  • values entered in the TMDB dataset are made up of unrecognisable characters that aren't english and as a matter of fact cannot be read and understood
  • the dataset is not up-to-date as it begins and ends from 1966 to 2015
  • the dataset contains huge amount of 0's.

What are the problems tackled in project?

The problems tackled in this project explores the data further giving us insights inot the trends, characteristics and features that hidden the data to help us make more informed data driven decisions. These problems includes:

  • What are the Top 10 highest grossing movies of all-time?
  • What is the movies industry's most profitable year in the most recent decade?
  • What is the relationship between overall annual Profit and Budget in the movies industry for the most recent decade?
  • What is the relationship between overall annual Profit and Revenue in the movies industry for the most recent decade?
  • What are the most popular genres throughout the years?

Why is the project important?

The project covers each step of the data analysis process in a chronological order from data gathering, data assessment and cleaning, data exploration, data visualization to sharing findings to perform thorough analysis on the data in question.

Where is this data sourced from?

The data used in this project is one of three projects carried out during Udacity Data Analyst Nanodegree Program. The data is provided by Udacity for the sole purpose of building projects utilizing the data analysis process.

Methodology

The main methods utilized to carry out analysis involves:

  • Descriptive Data Analysis
  • Exploratory Data Analysis (EDA)

Results/Data Analysis

Here are some of the results generated from the analysis made:

What are the Top 10 highest grossing movies of all-time?

top 10 movies

Utilizing a bar chart, we are able to visualize the 10 highest grossing movies of all-time according to the data we're working on.

What is the relationship between overall annual Profit and Budget in the movies industry for the most recent decade?

budget vs profit

This visual shows the relationship between total movies budget and the profit made on the year basis. It turns out the movies industry for the past decade made more than twice the amount they budgeted for, and this trend continued consistently on an annual basis.

What are the most popular genres throughout the years?

most popular genres

This visual depicts the most popular movie genres of the years. It could be seen that the three most popular genres by each decade include;

  • 1960's: Drama, Adventure and Action
  • 1970's: Drama, Thriller and Action
  • 1980's: Comedy, Drama and Action
  • 1990's: Drama, Comedy and Thriller
  • 2000's: Drama, Comedy and Thriller
  • 2010's: Drama, Comedy and Thriller

What are the limitations associated with the data?

  • Certain values entered in this dataset are made up of unrecognisable characters that aren't english and as a matter of fact cannot be read and understood.
  • The dataset is not up-to-date as it begins and ends from 1960 to 2015.
  • There's no vivid description of each column in the dataset making some columns quite redundant.
  • The dataset contains huge amount of 0's.

Conclusion

Summary of findings and recommendations includes:

  1. The highest grossing movies of all time are Avatar, Star Wars, Jurassic World, Titanic, The Net etc.
  2. The most profitable year in the movies industry was 2015.
  3. The most popular genres are Drama, Comedy, Thriller and Action. For investment purposes, this are genres that needs be given considered.
  4. A well planned budget should be considered when investing in the movies industry.

investigate-tmdb-movies-dataset's People

Contributors

sadiq-marcelo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.