This College Project builds a machine learning model to predict whether a person has diabetes based on diagnostic measurements.
The data comes from the Pima Indians Diabetes Database from Kaggle. It contains 768 observations of diagnostic measurements and a binary outcome of whether the patient has diabetes.
The features include:
- Pregnancies
- Glucose
- BloodPressure
- SkinThickness
- Insulin
- BMI
- DiabetesPedigreeFunction
- Age
- Outcome
- Data inspection and preprocessing
- Standard scaling
- KNN classifier
- Model evaluation
An accuracy of 79% and f1-score of 0.72 on the test set was produced.
Improvement by tuning hyperparameters and testing additional algorithms might improve the accuracy of the model.