In Tanzania, 4 million people lack access to safe water according to water.org. They mainly rely on water wells for access to clean drinking water, however many of the water wells in Tanzania are not functional. My goal is to build a predictive model, using classification models, for the charity, The Tanzania Water Project, to predict whether a water well is functional or non functional.
Nearly 50% of the water wells in Tanzania are non functional, according to the data I was provided. This is a major waste of both money and resources. I built a predictive model for the Tanzania Water Project, a charity who is helping build water wells throughout the country. By being able to predict which water wells are non functional and need to be rebuilt or repaired, I can help the charity properly allocate their precious resources.
My data was found on drivendata.com and has nearly 60,000 data points regarding water wells in Tanzania. The data included whether or not a well was functional or non functional.
I used a variety of different classification models to improve the ability of the model to predict whether a water wall was functional or non functional. I started off with a Dummy Classifier to help establish a baseline to compare all future models against. After this I tried a variety of different algorithms hypertuning them where I felt necessary. The specific methods used are:
- Logistic Regression
- Logistic Regression with GridSearch to tune hyper parameters
- Random Forest Classifier
- XGBoost Random Forest Classifer
- XGBoost Random Forest Classifer with GridSearch to tune hyper parameters
My best performing model was my XGBoost Classifier with default values, which had an accuracy of 86% on my training data, and an accuracy 84.9% on my training data, when predicting functional or non functional. The accuracy for just 'functional' status was 75.39% and the accuracy for just ' non functional' status was 93.99%
Given more time I would like to use a time series analysis to better predict when functional wells will begin to break down and need repair. I would also like to develop an application that allows the charity to plug in the specifications of wells they are considering buildng (ie location, waterpoint type, water quality, etc) and get an instant prediction about if the well will be functional or not and how long they can expect the well to remain functional. This will help the charity properly allocate their resources.
├── Data
├── Images
├── Pickles
├── gitignore
├── Final_Notebook.ipynb
├── Tanzanian Water Wells.pdf
└── README.md
├──LICENCE