Git Product home page Git Product logo

batermj / predict-customer-churn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from featuretools/predict-customer-churn

0.0 3.0 0.0 20.27 MB

A general-purpose framework for solving problems with machine learning applied to predicting customer churn

Home Page: https://blog.featurelabs.com/how-to-create-value-with-machine-learning/

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 99.88% Python 0.12%

predict-customer-churn's Introduction

A Machine Learning Framework with an Application to Predicting Customer Churn

This project demonstrates applying a 3 step general-purpose framework to solve problems with machine learning. The purpose of this framework is to provide a scaffolding for rapidly developing machine learning solutions across industries and datasets.

The end outcome is a both a specific solution to a customer churn use case, with a reduction in revenue lost to churn of more than 10%, as well as a general approach you can use to solve your own problems with machine learning.

Framework Steps

  1. Prediction engineering
  • State business need
  • Translate business requirement into machine learning task by specifying problem parameters
  • Develop set of labels along with cutoff times for supervised machine learning
  1. Feature Engineering
  • Create features - predictor variables - out of raw data
  • Use cutoff times to make valid features for each label
  • Apply automated feature engineering to automatically make hundreds of relevant, valid features
  1. Modeling
  • Train a machine learning model to predict labels from features
  • Use a pre-built solution with common libraries
  • Optimize model in line with business objectives

Machine learning currently is an ad-hoc process requiring a custom solution for each problem. Even for the same dataset, a slightly different prediction problem requires an entirely new pipeline built from scratch. This has made it too difficult for many companies to take advantage of the benefits of machine learning. The standardized procedure presented here will make it easier to solve meaningful problems with machine learning, allowing more companies to harness this transformative technology.

Application to Customer Churn

The notebooks in this repository document a step-by-step application of the framework to a real-world use case and dataset - predicting customer churn. This is a critical need for subscription-based businesses and an ideal application of machine learning.

The dataset is provided by KKBOX, Asia's largest music streaming service, and can be downloaded here.

Within the overall scaffolding, several standard data science toolboxes are used to solve the problem:

Results

The final results comparing several models are shown below:

Model ROC AUC Recall Precision F1 Score
Naive Baseline (no ml) 0.5 3.47% 1.04% 0.016
Logistic Regression 0.577 0.51% 2.91% 0.009
Random Forest Default 0.929 65.2% 14.7% 0.240
Random Forest Tuned for 75% Recall 0.929 75% 8.31% 0.150
Auto-optimized Model 0.927 2.88% 64.4% 0.055
Auto-optimized Model Tuned for 75% Recall 0.927 75% 9.58% 0.170

Final Confusion Matrix

Feature Importances

Notebooks

  1. Partitioning Data: separate data into independent subsets to run operations in parallel.
  2. Prediction Engineering: create labels based on the business need and historical data.
  3. Feature Engineering: implement automated feature engineering workflow using label times and raw data
  4. Feature Engineering on Spark: parallelize feature engineering calculations by distributing across multiple machines
  5. Modeling: develop machine learning algorithms to predict labels from features; use automated genetic search tools to search for best model.

Feature Engineering with Spark

To scale the feature engineering to a large dataset, the data was partitioned and automated feature engineering was run in parallel using Apache Spark with PySpark.

Featuretools supports scaling to multiple cores on one machine natively or to multiple machines using a Dask cluster. However, this approach shows that Spark can also be used to parallelize feature engineering resulting in reduced run times even on large datasets.

The notebook Feature Engineering on Spark demonstrates the procedure. The article Featuretools on Spark documents the approach.

Feature Labs

Featuretools

Featuretools is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.

Contact

Any questions can be directed to [email protected]

predict-customer-churn's People

Contributors

gsheni avatar willkoehrsen avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.