Git Product home page Git Product logo

ehtisham33 / diabetes-prediction Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 988 KB

This project aims to predict whether a person has diabetes or not using key health metrics such as glucose levels and BMI. The project involves data preprocessing, feature selection, model training, evaluation, and prediction using various machine learning algorithms.

Jupyter Notebook 100.00%
adaboostclassifier data-science diabetes-detection diabetes-prediction gradient-boosting-classifier machine-learning python random-forest-classifier

diabetes-prediction's Introduction

Diabetes Prediction Project

Overview

This project aims to predict whether a person has diabetes or not using key health metrics such as glucose levels and BMI. The project involves data preprocessing, feature selection, model training, evaluation, and prediction using various machine learning algorithms.

Table of Contents

  • Overview
  • Dataset
  • Features
  • Libraries Used
  • Data Preprocessing
  • Model Training and Evaluation
  • Making Predictions
  • Results
  • Conclusion
  • Usage
  • Contributing
  • License

Dataset

The dataset contains health information about individuals, including:

  • Glucose levels
  • BMI
  • Gender
  • Age
  • Hypertension
  • Heart disease
  • Smoking history
  • HbA1c level
  • Diabetes status (target variable)

Features

The primary features used for prediction in this project are:

  • Glucose levels
  • BMI

Libraries Used

  • Python
  • Pandas
  • NumPy
  • Scikit-learn

Data Preprocessing

  1. Importing Libraries: Import necessary libraries for data manipulation and machine learning.
  2. Loading Dataset: Load the dataset into a pandas DataFrame.
  3. Checking for Null Values: Identify and handle any missing values.
  4. Checking for Duplicate Values: Identify and remove duplicate entries.
  5. Checking Data Types: Ensure all data types are correct for analysis.
  6. Generating Statistical Summaries: Summarize the data using descriptive statistics.

Model Training and Evaluation

1. Splitting Data

Split the data into training and testing sets:

from sklearn.model_selection import train_test_split
X = df_new.drop(['gender', 'age', 'hypertension', 'heart_disease', 'smoking_history', 'HbA1c_level', 'diabetes'], axis=1)
y = df_new['diabetes']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

2. AdaBoostClassifier

from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score
abc = AdaBoostClassifier()
abc.fit(X_train, y_train)
abc_pred = abc.predict(X_test)
abc_accuracy = accuracy_score(y_test, abc_pred)
print(f"AdaBoostClassifier Accuracy: {abc_accuracy}")

3. GradientBoostingClassifier

from sklearn.ensemble import GradientBoostingClassifier
gc = GradientBoostingClassifier()
gc.fit(X_train, y_train)
gc_pred = gc.predict(X_test)
gc_accuracy = accuracy_score(y_test, gc_pred)
print(f"GradientBoostingClassifier Accuracy: {gc_accuracy}")

4. RandomForestClassifier

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier()
rf.fit(X_train, y_train)
rf_pred = rf.predict(X_test)
rf_accuracy = accuracy_score(y_test, rf_pred)
print(f"RandomForestClassifier Accuracy: {rf_accuracy}")

Making Predictions

Use the trained GradientBoostingClassifier to make predictions on new data:

import numpy as np
input_data = np.array([[25.19, 140]])
prediction = gc.predict(input_data)
print(f"Prediction: {prediction}")

Results

  • AdaBoostClassifier Accuracy: 94.57%
  • GradientBoostingClassifier Accuracy: 94.57%
  • RandomForestClassifier Accuracy: 92.59%

Conclusion

The GradientBoostingClassifier and AdaBoostClassifier both achieved the highest accuracy of 94.57%. This project demonstrates the effectiveness of machine learning in predicting diabetes using health metrics.

Usage

  1. Clone the repository:
git clone https://github.com/Ehtisham33/Diabetes-Prediction.git
  1. Navigate to the project directory:
cd Diabetes-Prediction
  1. Install the required libraries:
pip install -r requirements.txt
  1. Run the project:
python main.py

Contributing

Contributions are welcome! Please create a pull request or open an issue to discuss any changes.

License

This project is licensed under the MIT License.

diabetes-prediction's People

Contributors

ehtisham33 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.