This work embarked on a comprehensive exploratory data analysis (EDA) of a Netflix titles dataset, aiming to uncover insights and patterns within Netflix's vast content catalog.
The dataset, netflix_titles.csv, includes various details about movies and TV shows available on Netflix, such as:
- show_id: Unique identifier for each title
- type: Distinguishes between Movies and TV Shows
- title: Name of the title
- director: Director(s) of the title
- cast: Cast members involved
- country: Country of production
- date_added: When the title was added to Netflix
- release_year: Original release year
- rating: Content rating
- duration: Duration of the title
- listed_in: Genre(s) of the title
- description: Brief description of the title
- Distribution Analysis: Explored the distribution of movies vs. TV shows, the number of titles added per year, and the distribution of show ratings.
- Trend Analysis: Examined trends in release years and the country-wise distribution of titles.
- Genre Analysis: Identified the most prevalent genres within the catalog.
- Text Analysis: Conducted keyword extraction and sentiment analysis on titles and descriptions to uncover thematic elements and the emotional tone.
- Python: For data manipulation and analysis.
- Pandas: For data processing and analysis.
- Matplotlib and Seaborn: For data visualization.
- Scikit-learn: For applying machine learning techniques like TF-IDF.
- TextBlob: For performing sentiment analysis.