Phase-3-Project

Overview

This project analyzes car crashes in Chicago to create a classification model that is able to predict injury in event of a car crash.The information found could be useful for insurance agencies that have to deal with claims. A common issue insurance companies face is fraud, and therefore an AI that can either confirm or deny will help with that and lead to more genuine insurance cases.

Business Understanding

Insurance companies commonly use statistics in determining insurance rates. When there are greater claims than expected there arises a natural suspicion of fraud. The goal of this project is tocreate an algorithm that uses data collected from Chicago, in order to determine whether an injury took placein event of a car crash. This AI with its ability should expect to mirror the true population of injuries of whicheverarea it is predicting. Therefore in this example of Chicago, if the AI is accurate, the distribution of injury orno injury should emulate the true population of individuals who were truly injured in a car crash.

After establishing an optimal AI, I will market the findings to car insurance company Progressive. They will be able to better establish insurance rates by finding the true rate of injury which will lower fraud and therefore unneeded payouts.

Modeling

I started with a logistic model since they are a simple classifier model.

After that, I set up a gridsearch model using a pipeline that uses a DecisionTreeClassifier.
My reasoning for that was to start parameter tuning while also delving into a smarter model.

I then instantiated a normal XGB model because they use their internal gradient function to search for the optimal settings, I figured it was a natural step after a DecisionTree. When I was finished with that, I set a gridsearchinh XGB model. It was my thought of putting the last two models I made together to get the best of both worlds. The model iterated through my grid to establish its own best parameters.

Model Evaluation

In order to evaluate how well each model performed, I looked at their Recall and Accuracy scores.

I did this since the Recall metric is a measurement of how well the model finds the postive samples. That is is especially important in this case since I am trying to find when injury truly occured. Also, in the real world application, an instance where a false negative is reported is ultimately worse than a false positive.

The accuracy score will be a reflection of the overall performance of the model. This is still important to consider since the goal of the project is to use the model for scrutinizing insurance claims.

How I conducted model testing:

After instiantiating the models, I ran a code that ran the model that then shared the model's grading metrics for their train, and test data. Resized samplings using SMOTE were also included.

The models being stochastic in nature provided different results when I would run the same code. To address that, I made a code that ran the ModelSummary function n number of times. After setting n to 5, I ran all of my models that many times. I ended up choosing the gridsearch model because its recall and accuracy score were the most consistent and well performing of the models. It also did not run for too long, especially when comparing to the XGB grid model.

nickogreeno / phase-3-project Goto Github PK

phase-3-project's Introduction

Phase-3-Project

Overview

Business Understanding

Modeling

Model Evaluation

How I conducted model testing:

phase-3-project's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent