Git Product home page Git Product logo

diabetes-risk-factor's Introduction

Image: width=

Diabetes Diagnosis App

using machine learning to predict diabetes in patients

Table of Contents
  1. About The Project
  2. Dataset
  3. ➤ Folder Structure
  4. Roadmap
  5. Preprocessing
  6. Feature engineering
  7. Visualizations
  8. Experiments
  9. Results
  10. Model deployment
  11. Getting Started
  12. Usage
  13. Contributing
  14. License
  15. Contact
  16. Acknowledgments

About The Project

Aim

In this project, we seek to use machine learning to predict whether a patient has diabetes or not.

Problem statement

According to World Health Organisation, over 460 million people suffer from diabetes globally and abot 70 percent of these people especially in Africa don't know they have diabetes. While diabetes is an easily preventable disease and doesn't have to be fatal if diagnosed earlier, because of the high percentage of people who doesn't know their health status in Africa, 77% of recorded deaths associated with diabetes occur in Africa. Some of the reasons associcated with this includes:

  • Lack of information about diseases earlier
  • busy schedule of people thus not taking out time to go do the test
  • Few doctors, nurses and other health practioners as compared with WHO standard of 1 - 600 doctors - patients ratio
  • Lack of access to health facility for people in remote places

Solution

We proposed a machine learning solution that can predict over the web whether an individual has diabetes or not based on symptoms they are experiencing such as polyphagia, polydipsia, weakness and demographic information such as age, gender etc. Because the solution is packaged as a webapp, it therefore bridges the gap of accessibility.

Future It can be repackaged later in the future as an API to be used with USSD to also imporve accessibility

(back to top)

Built With

(back to top)

Folder structure


    ├── LICENSE
    ├── README.md          <- The top-level README for developers using this project.
    ├── data
    │   ├── processed      <- The final, canonical data sets for modeling.
    │   └── raw            <- The original, immutable data dump.
    │
    ├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
    │                         generated with `pip freeze > requirements.txt`
    │
    ├── src                <- Source code for use in this project.
    │   ├── __init__.py    <- Makes src a Python module
    │   │
    |   ├── notebooks          <- Jupyter notebooks.
    |   |
    |   └─ scripts           <- Scripts to download or generate data
    │       ├── make_dataset.py
    |       |
    |       ├── modelling    <- Scripts to train models and then use trained models to make
    │       │                   predictions
    │       |     
    |       ├── preparation       <- Scripts to turn raw data into features for modeling
    │       |
    |       └── test
    |
    └── config.txt            <- tox file with settings for running tox; see tox.readthedocs.io

Dataset

The dataset ws obtained from the puma indian dataset that is freely available on Kaggle. The data has 17 columns namely:

Columns in the dataset

The dataset also has n_rows

size of dataset

The dataset is publicly available. Please refer to the [Kaggle](link)

Table1: 18 Activities

Roadmap

For this project, the road map was to

  • Perform minimal feature engineering to determine which features to include in the model pipeline
  • Try out different machine learning models Decision trees, Support vector machine Weiss et. al.
  • Deploy the model

Preprocessing

During preprocessing, missing values were removed. Categorical values were also transform using one-hot encoder

Feature engineering

The features were engineered so we can use features that best describe the dataset were used. New columns such as policy duration were added. A polynomial relationship between the columns were also calculated after which a correlation was run between all the columns and columns that are most correlated with the target were used. The columns used for the model training were as listed.

Visualizations

Some charts were produced to better understand the data. This include the age distribution of the users

A bar chat was plotted also to see their best products

Experiments

For our project we tried

  • Logistic Rregression ---- AS THE BASELINE
  • Support Vector Machine -----To visualize the kernel support and feature engineer
  • Xgboost ------For state of the art result on tabular data

We also performed gridsearch on our models to select the best hyperparameters for our models and evaluated it using cross validation

Results

For the evaluation of our model, the best parameter to evaluate our model is []. This is because using the confusion matrix There is a greater penalty on False P/N compared to False P/N. This is because for every F/N, the company had to do so so so which is more costly compared to so so for F/N. Therefore we are evaluating our models primarily by F/N followed by F1-sore which seeks to maintain a good balance between the precision and recall

From the experiments carried, using the best hyperparameter found with our gridsearch found, the result of our models showed This showed that the best model for our dataset is [name]

Model deployment

Here we will discuss the various technologies and techniques used to deploy the model.

back to top)

Getting Started

To get started and set up the project in your local environment, please download the packages listed in the requirements

Prerequisites

You can download them from the terminal from requirements.txt using

  • pip
    pip install requirements.txt
  • or Conda
    conda install requirements.txt

Installation

  1. Download Jupyter notebook or Jupyter lab For linux or Mac Users

    sudo install notebook

    For windows users, you can download it from here Jupyter Homepage

  2. Clone the repo

    git clone https://github.com/Ajalamarvellous/autolearn.git
  3. Install the necessary packages

    pip install requirements
  4. Open your jupyter lab or notebook

  5. Go to the folder📂 where you just downloaded the project to

  6. Open the Untitled.ipynb notebook📔 there.

    And you are ready to rumble

(back to top)

Usage

To try out the deployed page, you can try it out here

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Project Link: https://github.com/ajalamarvellous/Diabetes-risk-factor

(back to top)

(back to top)

diabetes-risk-factor's People

Contributors

ajalamarvellous avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.