This repo demonstrates data dump on Redshift
. The source tables are on s3
. Following are the two staging tables that are built:
log_data_staging
: Copy of user session details.song_data_staging
: Copy of song metadata.
From these two staging tables following fact and dimension tables are created:
songplays_fact
user_dim
songs_dim
artist_dim
time_dim
Use the dwh.cfg
to setup credentials for the redshift cluster and aws service user.
Run the requirements.txt
to install all the dependencies. Run the ./src/create_tables.py
followed by ./src/etl.py
The notebook ./src/test.ipynb
, runs queries against the data warehouse and prints the total number of rows in each table.