Git Product home page Git Product logo

phase-3-project's Introduction

Phase-3-Project

Overview

This project analyzes car crashes in Chicago to create a classification model that is able to predict injury in event of a car crash.The information found could be useful for insurance agencies that have to deal with claims. A common issue insurance companies face is fraud, and therefore an AI that can either confirm or deny will help with that and lead to more genuine insurance cases.

Business Understanding

Insurance companies commonly use statistics in determining insurance rates. When there are greater claims than expected there arises a natural suspicion of fraud. The goal of this project is tocreate an algorithm that uses data collected from Chicago, in order to determine whether an injury took placein event of a car crash. This AI with its ability should expect to mirror the true population of injuries of whicheverarea it is predicting. Therefore in this example of Chicago, if the AI is accurate, the distribution of injury orno injury should emulate the true population of individuals who were truly injured in a car crash.

Chicago Illinois Traffic

After establishing an optimal AI, I will market the findings to car insurance company Progressive. They will be able to better establish insurance rates by finding the true rate of injury which will lower fraud and therefore unneeded payouts.

Modeling

I started with a logistic model since they are a simple classifier model.

After that, I set up a gridsearch model using a pipeline that uses a DecisionTreeClassifier.
My reasoning for that was to start parameter tuning while also delving into a smarter model.

I then instantiated a normal XGB model because they use their internal gradient function to search for the optimal settings, I figured it was a natural step after a DecisionTree. When I was finished with that, I set a gridsearchinh XGB model. It was my thought of putting the last two models I made together to get the best of both worlds. The model iterated through my grid to establish its own best parameters.

Model Evaluation


In order to evaluate how well each model performed, I looked at their Recall and Accuracy scores.

I did this since the Recall metric is a measurement of how well the model finds the postive samples. That is is especially important in this case since I am trying to find when injury truly occured. Also, in the real world application, an instance where a false negative is reported is ultimately worse than a false positive.

The accuracy score will be a reflection of the overall performance of the model. This is still important to consider since the goal of the project is to use the model for scrutinizing insurance claims.

How I conducted model testing:

After instiantiating the models, I ran a code that ran the model that then shared the model's grading metrics for their train, and test data. Resized samplings using SMOTE were also included.

The models being stochastic in nature provided different results when I would run the same code. To address that, I made a code that ran the ModelSummary function n number of times. After setting n to 5, I ran all of my models that many times. I ended up choosing the gridsearch model because its recall and accuracy score were the most consistent and well performing of the models. It also did not run for too long, especially when comparing to the XGB grid model.

phase-3-project's People

Contributors

nickogreeno avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.