In this project, we implement the main algorithm of M Kula 2015 and compare findings with LightFM library, as a sanity check.
Using a Starbucks offers transactions dataset, we firstly transform data to:
- Users' features from several binned features of Starbucks customers
- Items' features from several binned features of Starbucks offers
- Binary transaction Data from Starbucks completed-declined offers
After splitting data to train and test transaction datasets, we train and predict recommendations for all test users, using MetadataEmbeddings algorithm and LightFM library, and compare results.
-
Install anaconda3
-
Activate virtual enviroment
sudo pip install --upgrade virtualenv
mkdir venvs
virtualenv my_venv
source my_venv/bin/activate
- Install python libraries
pip install -r requirements.txt
- Download data from starbucks-data or
kaggle datasets download -d ihormuliar/starbucks-customer-data
unzip starbucks-customer-data.zip
- Copy all files to
data
folder of this project
- porfolio: This dataset contains unique offers that were promoted to customers. Here, every offer is described by its properties.
- profile: This dataset contains unique Customer ids. Here, every customer is described by its properties.
- transcript: This dataset contains events occurred for several customers. In this exercise, we are interested on events that customer interacted with some offers.
This script reads dataset, creates and stores all aforementioned features, and train-test datasets.
cd src/
python preprocessing.py
This script reads the preprocessed datasets and feeds in corresponding data to LightFM model and MetadataEmbeddings model.
cd src/
python3 evaluate.py
Next, we report MAP@3 evaluated on test dataset for both models. As we can see both metrics are similar, and the same holds for the recommended product-offers.
Results for test sample
MAP@3 | |
---|---|
LightFM | 0.362 |
MetadataEmbeddings | 0.361 |
- LightFM first 10 users-items recommendations:
[1, 7, 6, 8, 2, 4, 0, 5, 3, 9]
[1, 9, 2, 7, 6, 4, 0, 5, 3, 8]
[1, 7, 2, 4, 0, 8, 6, 5, 3, 9]
[1, 9, 8, 6, 4, 5, 3, 0, 2, 7]
[1, 9, 6, 8, 2, 7, 4, 0, 5, 3]
[1, 9, 2, 8, 7, 4, 0, 5, 3, 6]
[9, 1, 7, 6, 8, 2, 5, 3, 0, 4]
[1, 9, 2, 4, 0, 8, 6, 5, 3, 7]
[9, 7, 2, 8, 6, 0, 4, 5, 3, 1]
[9, 1, 7, 2, 8, 6, 4, 5, 3, 0]
- MetaDataEmbeddings first 10 users-items recommendations:
[1, 6, 7, 4, 0, 8, 2, 3, 5, 9]
[1, 9, 6, 7, 4, 0, 2, 3, 5, 8]
[1, 6, 7, 4, 8, 0, 2, 3, 5, 9]
[1, 9, 6, 8, 4, 3, 5, 0, 2, 7]
[1, 9, 6, 8, 2, 7, 4, 0, 3, 5]
[1, 9, 8, 2, 7, 4, 0, 3, 5, 6]
[1, 6, 9, 7, 8, 2, 3, 5, 0, 4]
[1, 9, 6, 4, 8, 0, 2, 3, 5, 7]
[9, 6, 7, 4, 0, 8, 2, 3, 5, 1]
[1, 9, 6, 7, 4, 8, 2, 3, 5, 0]
Finally, we visualize the Items space implied by Items embeddings, along with 3 users that had 2 transactions on train dataset. In order to visualize our features against our target variables, and evaluate the explanatory power of our features, we perform a TSNE dimensionality reduction algorithm. Thus, we project all feature embedding, to a 2-D space
User 5, had bought items [0 9]. Top 2 recos are: [1, 6]
User 7, had bought items [8 9]. Top 2 recos are: [1, 6]
User 18, had bought items [0 4]. Top 2 recos are: [1, 9]