This repo covers an unsupervised learning problem in finance. It leverages credit info. of several thousand customers to segment them into groups with certain characteristics. A KMeans Clustering method is employed to achieve the same.
- The problem is based on a Kaggle dataset to develop a marketing strategy targeting credit card holders based on their credit card usage.
- As the name suggests it is an unsupervised problem as there are no labels for the data. So, idea here will be to divide customers into segments based on their similarities in the feature space considered.
- The details are in the notebook. Techniques employed include clustering methods like KMeans and dimensionality reduction methods like Principal Component Analysis (PCA) and Autoencoders. The dataset consists of around 9000 observations with 18 different features.
- Extensive use of visualizations to explore the dataset. One thing that gets immediately clear is that most variables have a narrow spread of values with a few that stand out as clear differentiators, for example 'PURCHASES_FREQUENCY'.
- Not to say that the other variables are not predictive, but the clustering might create imbalances in the explicit counts in each category. This is what we actually see as well.
- Coming to the technical aspects of the unsupervised learning, we use the elbow technique to find the optimal clusters needed. We do it manually as well as use the 'KElbowVisulaizer' from yellowbrick.
- Some of the clusters are easily put into certain brackets and those are explained in the notebook.
- We use PCA mostly for demonstration purposes and also to visualize the clusters assuming a 2-component formulation of the problem is possible.
- We also use deep learning, mostly as a dimensionality reduction technique. We employ a simple autoencoder using only dense layers to encode a 10 dimensional representation of the dataset. This is followed by a second KMeans on this reduced dataset and this gives us 5 clusters as opposed to 6 previously.
Tools : python, sklearn, TensorFlow, Keras, pandas, seaborn
I am deeply thankful to Udemy and all the instructors for hosting this wonderful course for machine learning in Finance. My curiosity to learn Machine Learning/deep learning applications led me to this course and I am glad not only to have learnt some useful techniques but also a great deal about the world of finance in general.