Get the MyAnimeDataset data and process some metrics using PySpark.
Genre statistics per date:
- Median reviews written per date;
- Median review score per date;
- Genre rank inside date;
MyAnimeList Dataset - Largest Anime Dataset on Kaggle
- Make
- Poetry
- Docker
- Download your kaggle API key (instructions) and add to credentials directory;
- Use poetry with
poetry shell
; - Export kaggle key to the env:
export KAGGLE_CONFIG_DIR=${PWD}/credentials/
- Setup your kaggle account to be able to download the dataset using
make download_dataset
- Execute the command
make start_jupyter_notebook
, check the terminal for the url and then execute the desired juputer notebook (insidework/notebook processing
)