Using pandas, numpy, matplotlib to analyze Tmdb dataset
This dataset contains information about 10,000 movies collected from The Movie Database (TMDb), including user ratings and revenue.
The Dataset contains:
- 10866 observations/rows
- 26 features/columns
- 9 columns with null values
- 8874 rows that have null values in one or more columns
- Columns like budget, revenue, budget_adj, revenue_adj contain lots of 0s.
Questions we would like to answer with this dataset include:
- Which genres are most popular from year to year?
- The number of Movies Released Each Year.
- Which is the most popular movie and the least popular movie and what features are associated with popular and less popular movies?
- Has the runtime of movies been declining over the years?
- Is the Movie industry making or loosing money and what is the relationship between budget and popularity?