Git Product home page Git Product logo

dsp7's Introduction

Pret à Dépenser

Description

The goal of this project was to build and train a prediction model that would allow the company "Prêt à dépenser" to determine the probabily that a given client would be able or not to repay their loan.

The data provided allowed us to train and test different models.

Based on the results of the different model, the aim was then to develop an API that would predict a client's score using the best model.

Steps

Training

We first started by cleaning the data in order to train different mmodels.

The results were logged on a MLFlow registry. For each model, we logged the model itself, the best parameters identified by a GridSearchCV, along with the different metrics. Using SHAP, we were also able to calculate the feature importance for each model and store the generated graph as an artifact.

The registry is accessible by running the command mlflow server --host 127.0.0.1 --port 8080

API

Once we trained a model that was able to make acceptable predictions based on a in house business score, we were able to develop an API that would leverage that model.

We created the endpoint POST /predict that takes a clientId as a parameter and returns a response containing the prediction.

The API documentation is accessible here.

Deployment

Once the API was up and running, we could then deploy it using Azure.

We started by creating a docker container, build an image which we uploaded on an Azure Container Registry and then created the Web App that would deploy the container image.

We then configured Github Actions to run these steps everytime a new commit is pushed to the repository, in order to automatically create an updated image and deploy the new version.

Github Actions was also configured to run the API unit and integration tests before the build and deploy steps in order to make sure the app works ass expected before being pushed to production.

Data Drift Analysis

We then create a data drift report using Evidently, based on a sample of the historical data (the data used to train the model) and the current data (the data used to make predictions using the API).

The report was saved in HTML format.

Project Folder Structure

A single github repository was used for the entire project, not just the api. The project was organized as follows:

  • The modeling part of the project is located in the notebooks folder.

  • The api is located in the api folder and is independent from the modeling part of the projet. This is the only folder that is used by docker and pushed in production.

project-root/

  • notebooks/
    • notebook_modelisation - model training and testing
    • notebook_test_api - notebook to test production api
    • notebook_evidently - notebook to create the data drift report
    • data_drift_report - html page containing the data drift report
    • data/
      • csv - data used by modeling notebook
  • api/
    • requirements.txt - api package requirements
    • main.py - main application
    • data/ - data used by api
    • tests/
      • tests.py - unit and integration tests for api
    • model/
      • model.pkl - best model saved on mlflow
  • README.md

dsp7's People

Contributors

dzov avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.