Git Product home page Git Product logo

student_admission_prediction's Introduction

<h1Supervised Machine Learning Algorithms for Predicting Student Dropout and Academic Success: A Comparative Study

In this project, I applied a resampling technique (oversampling) to the imbalanced dataset and compare the performance of different Machine Learning models. I have found the best accuracy score with Random forest (85%)

Table of Contents

1. Overview

The purpose of this project is to create and compare different ML models to predict students admission to graduate school. As a starting point, I used a tutorial from UCLA. This tutorial applies Logistic Regression with R, but I did it with Python. After predicting students admission with Logistic Regression, I decided to fix the unequal class distribution of the original dataset.

This imbalancement was not handled in the tutorial, so it was such a great opportunity for me to fix the imbalanced dataset. After the resampling process, I used the Logistic Regression again with the new dataset. To take the project further, I not only used Logistic Regression, but implemented other three algorithms (Decision Tree, SVM-SVC and Random Forest), along with the performance measurement and K-fold for all of the models. In conclusion, I created a graphic to compare the accuracies score for the different ML algorithms (Accuracy Comparison Graph).

1.1. Quick Start

Check out a static version of the notebook with Jupyter NBViewer from the comfort of your web browser.

2. Problem Statement

A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school. The response variable, admit/don’t admit, is a binary variable.This data set has a binary response (outcome, dependent) variable called admit, which is equal to 1 if the individual was admitted to graduate school, and 0 otherwise. There are three predictor variables: gre, gpa, and rank. We will treat the variables gre and gpa as continuous. The variable rank takes on the values 1 through 4. Institutions with a rank of 1 have the highest prestige, while those with a rank of 4 have the lowest. Source : UCLA

2.1. Dataset

Check out the dataset

3. Approach

This project is organized in two Chapters. In Chapter 1 I worked with the original dataset (which was imbalanced). In Chapter 2 I did a resampling (oversampling), repeated the logistic regression and included other algarithms.

3.1. Chapter 1 - Predicting students admission with Logistic Regression

  • PART 1: Data Handling -> Importing Data with Pandas, Cleaning Data, Data description.
  • PART 2: Data Analysis -> Supervised ML Technique:Logistic Regression.
  • PART 3: Valuation of the Analysis -> Performance measurement + K-folds cross validation to evaluate results locally.

3.2 Chapter 2 - Predicting students admission with Logistic Regression, Decision Tree, SVM (SVC) and Random Forest

  • PART 1: Data Handling -> Importing Data with Pandas, Cleaning Data, Data description.
  • PART 2: Data Analysis -> Supervised ML Technique: Logistic Regression, Decision Tree, SVM (SVC) and Random Forest
  • PART 3: Valuation of the Analysis -> Performance measurement + K-folds cross validation to evaluate results locally + Accuracy comparison graph

4. Dependencies

5. Results

In this project I have applied a widely adopted technique for dealing with unbalanced datasets called resampling. Random Sampling involves creating a new transformed version of the data with a new class distribution. The goal is to reduce the influence of the data on our ML algorithm. Generally, oversampling is preferable as under sampling can result in the loss of important data. I chose to apply oversampling and then use four ML algorithms: Logistic Regression, Decision Tree, SVM (SVC) and Random Forest. To all these modeles I did a Valuation Analisis (Performance Measurement and K-Fold). Finally, I created a graph to compare the accuracy of the different models. I have found the best accuracy score with Random forest (0.8545454545454545).

print

student_admission_prediction's People

Contributors

alicevillar avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

yasaswinireddie

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.