Git Product home page Git Product logo

georgemuriithi / kaggle-competitions Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.82 MB

House Prices Prediction and Credit Default Risk Prediction competitions. Advanced decision tree-based regression and classification models are used.

License: GNU General Public License v3.0

Jupyter Notebook 100.00%
python machine-learning decision-tree-regression kaggle-house-prices home-credit-default-risk decision-tree-classification

kaggle-competitions's Introduction

Kaggle Competitions

License

House Prices Prediction and Credit Default Risk Prediction competitions.

https://www.kaggle.com/c/house-prices-advanced-regression-techniques

https://www.kaggle.com/c/home-credit-default-risk

In both, advanced decision tree-based regression and classification models are used.

In House Prices Prediction, performance evaluation is based on RMSLE (Root Mean Squared Logarithmic Error), while in Credit Default Risk Prediction, it is based on AUROC (Area Under Receiver Operating Characteristic).

In House Prices Prediction, I ranked 816/5011, with an error of 0.12549, compared to the best one of 0.00000.

Screenshot 2022-01-24 115000

In Credit Default Risk Prediction, I scored 0.73610, compared to the best score of 0.81724. Ranking was unavailable.

Screenshot 2022-01-26 220705

My submissions can be accessed from the submissions folder.

Problem Description

The problems are detailed well in the Kaggle links provided above.

Solution Approach

Open In Colab

After Feature engineering, the following regression models are tested:

  • Ridge
  • BaggingRegressor
    • n_estimators=50
  • RandomForestRegressor
    • n_estimators=50
  • XGBRegressor
    • max_depth=5
    • objective='reg:squarederror'
  • LGBMRegressor
  • VotingRegressor
    • estimators=[ridge, bagging, random_forest, xgb, lgbm]
    • n_jobs=-1
  • StackingRegressor
    • estimators=[ridge, bagging, random_forest, xgb, lgbm]
    • final_estimator=Ridge
    • n_jobs=-1

Hyperparameters:

  • train_test_split(test_size=0.2, random_state=0)
  • kfold = KFold(n_splits=5, shuffle=True, random_state=0)
  • cross_val_score(cv=kfold)

VotingRegressor is the best performing, with the best combined Validation R2 score, RMSLE and Cross validation R2 mean score.

Open In Colab

After Feature engineering, the following classification models are tested:

  • XGBClassifier
    • tree_method='gpu_hist'
    • gpu_id=0
  • LGBMClassifier
    • device='gpu'
  • RandomForestClassifier
    • n_estimators=50
  • StackingClassifier
    • estimators=[xgb, lgbm, random_forest]
    • final_estimator=LGBMClassifier
    • n_jobs=-1

Hyperparameter: train_test_split(test_size=0.2, random_state=42)

GPU is leveraged. Classification requires more computation power.

LGBMClassifier is the best performing, with the maximum Validation AUROC score.

kaggle-competitions's People

Contributors

georgemuriithi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.