Git Product home page Git Product logo

check-it's Introduction

Student Performance Prediction

Introduction

This project aims to predict the performance of students based on various demographic and educational factors. The goal is to use regression analysis to predict student scores in mathematics.

About the Data

The dataset used for this project includes the following variables:

Independent Variables:

  • gender: Gender of the student (male/female)
  • race_ethnicity: Ethnic group of the student
  • parental_level_of_education: Highest education level of the student's parents
  • lunch: Type of lunch (standard/free/reduced)
  • test_preparation_course: Completion status of the test preparation course (completed/none)
  • math_score: Student's score in mathematics
  • reading_score: Student's score in reading
  • writing_score: Student's score in writing

Target Variable:

  • Overall performance score: A combined metric based on math, reading, and writing scores

Dataset Source

The dataset can be accessed from Kaggle.

Observations

It is observed that the categorical variables 'gender', 'race_ethnicity', 'parental_level_of_education', 'lunch', and 'test_preparation_course' play significant roles in predicting student performance.

Example Data

gender race_ethnicity parental_level_of_education lunch test_preparation_course math_score reading_score writing_score
female group B bachelor's degree standard none 72 72 74
female group C some college standard completed 69 90 88
female group B master's degree standard none 90 95 93
male group A associate's degree free/reduced none 47 57 44
male group C some college standard none 76 78 75

Approach for the Project

Data Ingestion

  • In the Data Ingestion phase, the data is first read as a CSV file.
  • Then the data is split into training and testing sets and saved as CSV files.

Data Transformation

  • In this phase, a ColumnTransformer pipeline is created.
  • For numeric variables, first, SimpleImputer is applied with the median strategy, then standard scaling is performed on the numeric data.
  • For categorical variables, SimpleImputer is applied with the most frequent strategy, then ordinal encoding is performed, and after this, the data is scaled with StandardScaler.
  • This preprocessor is saved as a pickle file.

Model Training

  • In this phase, various regression models are tested. The performance of these models is evaluated based on their R2 scores.
  • The following models were tested:
    • Ridge: R2 Score = 0.880593
    • Linear Regression: R2 Score = 0.880433
    • CatBoost Regressor: R2 Score = 0.851632
    • AdaBoost Regressor: R2 Score = 0.848774
    • XGBoost Regressor: R2 Score = 0.827797
    • Lasso: R2 Score = 0.825320
    • K-Neighbors Regressor: R2 Score = 0.784030
    • Decision Tree: R2 Score = 0.755628
  • Based on the R2 scores, Ridge and Linear Regression models performed the best.
  • Hyperparameter tuning is performed on the best models.
  • A final VotingRegressor is created, which combines the predictions of Ridge, Linear Regression, and CatBoost models.
  • This final model is saved as a pickle file.

Prediction Pipeline

  • This pipeline converts given data into a dataframe and has various functions to load pickle files and predict the final results in Python.

Flask App Creation

  • A Flask app is created with a user interface to predict the student performance scores inside a web application. ]

Screenshot of UI

Homepage UI Prediction_UI

check-it's People

Contributors

kirti36 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.