An analysis of the Music Industry using Spotify charts data and leveraging distributed concepts in Spark
In this project I have explored the top charts of Spotify across various regions around the world and try to glean some insights into the Music industry by performing distributed operations on the charts dataset.
I have also applied some Machine Learning techniques using the very powerful Spark ML library which helps in building insightful ML models that would help in predictions, classification and regression.
The analysis and Results can be found in the Jupyer Notebook spotify_analysis
Spotify releases Top 200 Charts & Viral 50 Charts every 2 days. This can be accessed using Spotify APIs. It gives us information about:
-
Regions where the charts are topping - Argentina, Paraguay, Global, United States
-
Various artists along with their titles
-
Total amount of data: 3.48GB
Since the dataset is too big for this repository I have not added it here.
Kaggle Link to Dataset - Spotify Charts