Git Product home page Git Product logo

tanzanian-water-wells's Introduction

Tanzanian Water Wells

Phase 3 Project

Author

8a447806-33a1-4f65-aff9-fe4a790768c1

Overview

In Tanzania, 4 million people lack access to safe water according to water.org. They mainly rely on water wells for access to clean drinking water, however many of the water wells in Tanzania are not functional. My goal is to build a predictive model, using classification models, for the charity, The Tanzania Water Project, to predict whether a water well is functional or non functional.

1700-1536x960

Problem

Nearly 50% of the water wells in Tanzania are non functional, according to the data I was provided. This is a major waste of both money and resources. I built a predictive model for the Tanzania Water Project, a charity who is helping build water wells throughout the country. By being able to predict which water wells are non functional and need to be rebuilt or repaired, I can help the charity properly allocate their precious resources.

tanzania-mount-kilimanjaro

Data

My data was found on drivendata.com and has nearly 60,000 data points regarding water wells in Tanzania. The data included whether or not a well was functional or non functional.

functional_bar

Methods

I used a variety of different classification models to improve the ability of the model to predict whether a water wall was functional or non functional. I started off with a Dummy Classifier to help establish a baseline to compare all future models against. After this I tried a variety of different algorithms hypertuning them where I felt necessary. The specific methods used are:

  • Logistic Regression
  • Logistic Regression with GridSearch to tune hyper parameters
  • Random Forest Classifier
  • XGBoost Random Forest Classifer
  • XGBoost Random Forest Classifer with GridSearch to tune hyper parameters

My best performing model was my XGBoost Classifier with default values, which had an accuracy of 86% on my training data, and an accuracy 84.9% on my training data, when predicting functional or non functional. The accuracy for just 'functional' status was 75.39% and the accuracy for just ' non functional' status was 93.99%

XGBoost Confusion Matrix on Training Data

Confusion Matrix

XGBoost Confusion Matrix on Testing Data

Confusion Matrix Test Data

Next Steps

1_8qTfnlqpcU-TSZ3MsN1zAA

Given more time I would like to use a time series analysis to better predict when functional wells will begin to break down and need repair. I would also like to develop an application that allows the charity to plug in the specifications of wells they are considering buildng (ie location, waterpoint type, water quality, etc) and get an instant prediction about if the well will be functional or not and how long they can expect the well to remain functional. This will help the charity properly allocate their resources.

Repository Structure

 ├── Data
 ├── Images
 ├── Pickles
 ├── gitignore
 ├── Final_Notebook.ipynb
 ├── Tanzanian Water Wells.pdf
 └── README.md
 ├──LICENCE

tanzanian-water-wells's People

Contributors

jskominsky avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.