The diabetes-risk-factor from ajalamarvellous

Diabetes Diagnosis App

using machine learning to predict diabetes in patients

Table of Contents

About The Project
Dataset
➤ Folder Structure
Roadmap
Preprocessing
Feature engineering
Visualizations
Experiments
Results
Model deployment
Getting Started
- Prerequisites
- Installation
Usage
Contributing
License
Contact
Acknowledgments

About The Project

Aim

In this project, we seek to use machine learning to predict whether a patient has diabetes or not.

Problem statement

According to World Health Organisation, over 460 million people suffer from diabetes globally and abot 70 percent of these people especially in Africa don't know they have diabetes. While diabetes is an easily preventable disease and doesn't have to be fatal if diagnosed earlier, because of the high percentage of people who doesn't know their health status in Africa, 77% of recorded deaths associated with diabetes occur in Africa. Some of the reasons associcated with this includes:

Lack of information about diseases earlier
busy schedule of people thus not taking out time to go do the test
Few doctors, nurses and other health practioners as compared with WHO standard of 1 - 600 doctors - patients ratio
Lack of access to health facility for people in remote places

Solution

We proposed a machine learning solution that can predict over the web whether an individual has diabetes or not based on symptoms they are experiencing such as polyphagia, polydipsia, weakness and demographic information such as age, gender etc. Because the solution is packaged as a webapp, it therefore bridges the gap of accessibility.

Future It can be repackaged later in the future as an API to be used with USSD to also imporve accessibility

(back to top)

Built With

(back to top)

Folder structure

    ├── LICENSE
    ├── README.md          <- The top-level README for developers using this project.
    ├── data
    │   ├── processed      <- The final, canonical data sets for modeling.
    │   └── raw            <- The original, immutable data dump.
    │
    ├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
    │                         generated with `pip freeze > requirements.txt`
    │
    ├── src                <- Source code for use in this project.
    │   ├── __init__.py    <- Makes src a Python module
    │   │
    |   ├── notebooks          <- Jupyter notebooks.
    |   |
    |   └─ scripts           <- Scripts to download or generate data
    │       ├── make_dataset.py
    |       |
    |       ├── modelling    <- Scripts to train models and then use trained models to make
    │       │                   predictions
    │       |     
    |       ├── preparation       <- Scripts to turn raw data into features for modeling
    │       |
    |       └── test
    |
    └── config.txt            <- tox file with settings for running tox; see tox.readthedocs.io

Dataset

The dataset ws obtained from the puma indian dataset that is freely available on Kaggle. The data has 17 columns namely:

The dataset also has n_rows

The dataset is publicly available. Please refer to the [Kaggle](link)

Roadmap

For this project, the road map was to

Perform minimal feature engineering to determine which features to include in the model pipeline
Try out different machine learning models Decision trees, Support vector machine Weiss et. al.
Deploy the model

Preprocessing

During preprocessing, missing values were removed. Categorical values were also transform using one-hot encoder

Feature engineering

The features were engineered so we can use features that best describe the dataset were used. New columns such as policy duration were added. A polynomial relationship between the columns were also calculated after which a correlation was run between all the columns and columns that are most correlated with the target were used. The columns used for the model training were as listed.

Visualizations

Some charts were produced to better understand the data. This include the age distribution of the users

A bar chat was plotted also to see their best products

Experiments

For our project we tried

Logistic Rregression ---- AS THE BASELINE
Support Vector Machine -----To visualize the kernel support and feature engineer
Xgboost ------For state of the art result on tabular data

We also performed gridsearch on our models to select the best hyperparameters for our models and evaluated it using cross validation

Results

For the evaluation of our model, the best parameter to evaluate our model is []. This is because using the confusion matrix There is a greater penalty on False P/N compared to False P/N. This is because for every F/N, the company had to do so so so which is more costly compared to so so for F/N. Therefore we are evaluating our models primarily by F/N followed by F1-sore which seeks to maintain a good balance between the precision and recall

From the experiments carried, using the best hyperparameter found with our gridsearch found, the result of our models showed This showed that the best model for our dataset is [name]

Model deployment

Here we will discuss the various technologies and techniques used to deploy the model.

Getting Started

To get started and set up the project in your local environment, please download the packages listed in the requirements

Prerequisites

You can download them from the terminal from requirements.txt using

pip
```
pip install requirements.txt
```
or Conda
```
conda install requirements.txt
```

Installation

Download Jupyter notebook or Jupyter lab For linux or Mac Users
```
sudo install notebook
```
For windows users, you can download it from here Jupyter Homepage

Clone the repo

git clone https://github.com/Ajalamarvellous/autolearn.git

Install the necessary packages
```
pip install requirements
```
Open your jupyter lab or notebook
Go to the folder📂 where you just downloaded the project to
Open the Untitled.ipynb notebook📔 there.

And you are ready to rumble

(back to top)

Usage

To try out the deployed page, you can try it out here

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Ajala, Marvellous - @madeofajala - [email protected]

Project Link: https://github.com/ajalamarvellous/Diabetes-risk-factor

(back to top)

ajalamarvellous / diabetes-risk-factor Goto Github PK

diabetes-risk-factor's Introduction

Diabetes Diagnosis App

About The Project

Aim

Problem statement

Solution

Built With

Folder structure

Dataset

Roadmap

Preprocessing

Feature engineering

Visualizations

Experiments

Results

Model deployment

Getting Started

Prerequisites

Installation

Usage

Contributing

License

Contact

diabetes-risk-factor's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org