thinkingmachines / ph-poverty-mapping Goto Github PK

View Code? Open in Web Editor NEW

76.0 14.0 31.0 66.4 MB

Mapping Philippine Poverty using Machine Learning, Satellite Imagery, and Crowd-sourced Geospatial Information

Home Page: https://stories.thinkingmachin.es/philippines-most-vulnerable-communities/

License: MIT License

Makefile 0.01% Python 1.08% Jupyter Notebook 98.91%

poverty-prediction poverty-mapping unicef satellite-imagery

ph-poverty-mapping's Introduction

This repo is no longer maintained. Checkout our latest poverty mapping repo here!

Philippine Poverty Mapping

This repository accompanies our research work, "Mapping Philippine Poverty using Machine Learning, Satellite Imagery, and Crowd-sourced Geospatial Information".

The goal of this project is to provide a means for faster, cheaper, and more granular estimation of poverty measures in the Philippines using machine learning, satellite imagery, and open geospatial data.

Setup

To get started, run the jupyter notebooks in notebooks/ in order. Note that to run the notebooks, all dependencies must be installed. We provided a Makefile to accomplish this task:

make venv
make build

This creates a virtual environment, venv, and installs all dependencies found in requirements.txt. In order to run the notebooks inside venv, execute the following command:

ipython kernel install --user --name=venv

Notable dependencies include:

matplotlib==3.0.2
seaborn==0.9.0
numpy==1.16.0
pandas==0.24.0
torchsummary==1.5.1
torchvision==0.2.1
tqdm==4.30.0

Code Organization

This repository is divided into three main parts:

notebooks/: contains all Jupyter notebooks for different wealth prediction models.
utils/: contains utility methods for loading datasets, building model, and performing training routines.
src/: contains the transfer learning training script.

It is possible to follow our experiments and reproduce the models we've built by going through the notebooks one-by-one. For model training, we leveraged a Google Compute Engine (GCE) instance with 16 vCPUs and 60 GB of memory (n1-standard-16) and an NVIDIA Tesla P100 GPU.

Downloading Datasets

Demographic and Health Survey (DHS)

We used the poverty indicators in the 2017 Philippine Demographic and Health Survey as a measure of ground-truth for socioeconomic indicators. The survey is conducted every 3 to 5 years, and contains nationally representative information on different indicators across the country.

Due to data access agreements, users need to independently download data files from the Demographic and Health Survey Website. This may require you to create an account and fill out a Data User Agreement form.

Once downloaded, copy and unzip the file in the /data directory. The notebook /notebooks/00_dhs_prep.ipynb will walk you through how to prepare the dataset for modeling.

Google Static Maps

We used the Google Static Maps API to download 400x400 px zoom 17 satellite images. To download satellite images and generate training/validation sets, run the following script in the src/ directory:

python data_download.py Note that this script downloads 134,540 satellite images from Google Static Maps and may incur costs. See this page for more information on Maps Static API Usage and Billing.

To download satellite images and generate training/validation sets, run the following script in src/:

python data_download.py

Training the Model

To train the nighttime lights transfer learning model, run the following script in src/:

python train.py

Usage is as follows:

usage: train.py [-h] [--batch-size N] [--lr LR] [--epochs N] [--factor N]
                [--patience N] [--data-dir S] [--model-best-dir S]
                [--checkpoint-dir S]
Philippine Poverty Prediction
optional arguments:
  -h, --help          show this help message and exit
  --batch-size N      input batch size for training (default: 32)
  --lr LR             learning rate (default: 1e-6)
  --epochs N          number of epochs to train (default: 100)
  --factor N          factor to reduce learning rate by on pleateau (default:
                      0.1)
  --patience N        number of iterations before reducing lr (default: 10)
  --data-dir S        data directory (default: "../data/images/")
  --model-best-dir S  best model path (default: "../models/model.pt")
  --checkpoint-dir S  model directory (default: "../models/")

Data Sources

Demographic and Health Survey (DHS): we used the socioeconomic indicators in the 2017 Philippine Demographic and Health Survey as our measure of ground-truth for socioeconomic indicators.
Nighttime Luminosity Data: we obtained nighttime lights data from the Visible Infrared Imaging Radiometer Suite Day/Night Band (VIIRS DNB) for the year 2016.
Daytime Satellite Imagery: we captured 134,540 satellite images from the Google Static Maps API. Our parameter settings are as follows: zoom level=17, scale=1, and image size=400x400pixels. These images match the land area covered by a single pixel of night time lights data (0.25-sq.km).
High Resolution Settlement Data (HRSL): we used this dataset, provided by Facebook Research, CIESIN Columbia, and World Bank, to filter out images containing no human settlements. Their population estimates were based on recent census data and high resolution satellite imagery (0.5-m) from DigitalGlobe.
OpenStreetMaps Data (OSM): we acquired crowd-sourced geospatial data from OpenStreetMaps (OSM) via the Geofabrik online repository.

Models

We developed wealth prediction models using different data sources. You can follow-through our analysis by looking at the notebooks in the notebooks/ directory.

Nighttime Lights Transfer Learning Model (notebooks/03_transfer_model.ipynb): we used a transfer learning approach proposed by Xie et al and Jean et al. The main assumption here is that nighttime lights act as a good proxy for economic activity. We started with a Convolutional Neural Network (CNN) pre-trained on ImageNet, and used the feature embeddings as input into a ridge regression model.
Nighttime Lights Statistics Model (notebooks/01_lights_eda.ipynb, notebooks/03_lights_model.ipynb): in this model, we generated nighttime light features consisting of summary statistics and histogram-based features. We then compared the performance of three different machine learning algorithms: ridge regression, random forest regressor, and gradient boosting method (XGBoost).
OpenStreetMaps (OSM) Model (notebooks/04_osm_model.ipynb): we extracted three types of OSM features, roads, buildings, and points-of-interests (POIs) within a 5-km radius for rural areas and 2-km radius for urban areas. We then trained a random forest regressor on these features.
OpenStreetMaps (OSM) + Nighttime Lights (NTL) (notebooks/02_lights_eda.ipynb, notebooks/04_osm_model.ipynb): we also trained a random forest model combining OSM data and nighttime lights-derived features as input.

Citation

Use this bibtex to cite this repository:

@misc{ph_poverty_prediction_2018,
  title={Mapping Poverty in the Philippines Using Machine Learning, Satellite Imagery, and Crowd-sourced Geospatial Information},
  author={Tingzon, Isabelle and Orden, Ardie and Sy, Stephanie and Sekara, Vedran and Weber, Ingmar and Fatehkia, Masoomali and Herranz, Manuel Garcia and Kim, Dohyung},
  year={2018},
  publisher={Github},
  journal={GitHub repository},
  howpublished={\url{https://github.com/thinkingmachines/ph-poverty-mapping}},
}

Acknowledgments

This work was supported by the UNICEF Innovation Fund.

ph-poverty-mapping's People

Contributors

Stargazers

Watchers

ph-poverty-mapping's Issues

Add project scaffold

PHHR70DT/PHHR70FL.DTA no longner exists

It seems the PHHR70DT has been replaced by PHHR71DTon the DHS website. Do you have any idea of this and any differences that might impact the analysis?

Best Model

Hi @ardieorden

While running the 03_transfer_model notebook.

I am getting the following error message. How to select the Best_model.pt, Do we have to do this manually or code will do this for us.

Pop_sum and spatial join for nightlights.csv

Hi @ardieorden ardieorden

How you are getting the pop_sum in the nightlights file. and while performing a spatial join using (1) the layer with the longitude and latitude and (2) the buffered layer with the DHS clusters.

are you using one to one or one to many spatial join?

Thanks

boundaries CSV files

Quick question. I am trying to reproduce the work and while running the 03_transfer_model.ipynb

It is expecting the following files which seem to be missing:

dhs_provinces_file = data_dir+'dhs_provinces.csv'
dhs_regions_file = data_dir+'dhs_regions.csv'

I don't see where they were created. Should they have been generated in the preparation? If so It could be that the original DHS data mentioned in the 00_dhs_prep.ipynb (PHHR70DT/PHHR70FL.DTA, PHHR70DT/PHHR70FL.DO) isn't available instead I was able to find these :

dhs_file = dhs_zip + 'PHHR71DT/PHHR71FL.DTA'
dhs_dict_file = dhs_zip + 'PHHR71DT/PHHR71FL.DO'

Any pointers would be greatly appreciated! Thanks again for publishing your work and merry christmas! 🎄 🎅

Fix nested cross validation script

Use Pearson r-squares and RMSE as primary evaluation metrics
Fix generation of cross validated predictions and visualization

Add nighttime lights data, feature embeddings (csv format) to data directory

Add extracted feature embeddings in CSV format to data directory
Add nighttime lights data in CSV format to data directory

Regarding 03_lights_model

Can you let me know for the ntl_summary_stats_file is same file name ph_ntl_ntl_points.csv or another file which is not uploaded in the git.

I got confused with the code on

Define feature columns

feature_cols = ['cov', 'kurtosis', 'max', 'mean', 'median', 'min', 'skewness', 'std']

because in the Correlations section code
I'm getting an error -> " KeyError: 'cov' "

Hardware and Software Requirements

Hi, @ardieorden

Can you please mention the hardware and software requirements to run this project?

Thanks

Add instructions and scripts for downloading and pre-processing DHS dataset

The DHS dataset for the Philippines can be found in this link.

Issue on 02_lights_eda.ipynb

I have encountered a few issues while working on the notebook file.

In Night Time Lights (NTL) Dataset EDA Section
-> # Get sum of population per cluster Cell

There is one error in which nightlights_unstack.csv file doesn't create with a pop_sum column where it's giving 'no object found' error, while in the nightlights.csv has a pop_sum column

Need suggestion whether to use unstack file or nightlight file for population clustering.

In Load DHS Dataset Section
-> merging nightlights and indicators data showing this error

In Correlations between Average Nighttime Lights and Socioeconomic Indicators Section
-> Data is not loading properly

In Balancing Nighttime Lights Intensity Levels
-> It requires report.csv files.

Please let me know the following solutions on this issue.
I request you to provide data files as there are lots of missing data while executing the program. In other case generating the file, it's getting little confusion on plotting data properly or data getting mismatch for further analysis its taking lots of time for getting into the processing.
So if possible can share the data files to [email protected] till osm model.
Further would share research work which would be helpful for both.

dhs csv data

In the first notebook, I saw that you have changed many of the duplicated data of Time to get to water source minutes from 996 to 0. May I ask that what was the reason for this change?

Thanks from now.

Update README

Add the following:

Setup instructions (c/o @ljvmiranda921 )
Some description on the models and results (c/o @ibtingzon )

Note: don't forget to link on our blog post!

dhs_regions

I hope you are doing well.

Can you please let me know how you are getting the dhs_regions.csv file?

Thanks

Transformation of VIIRS DNB data to .csv file ('nightlights.csv')

Dear ph-pverty-mappers! How did you manage to transform the VIIRS DNB data to the .csv file 'nightlights.csv'? I would like to apply your approach to different countries. Therefore, I need to generate a 'nightlights.csv' equivalent for these other countries.
I am still downloading the VIIRS data, but it seems to be image data only. How did you manage to extract longitude, altitude and nighttime light intensity information from this data?
Thanks in advance fro your answer!

Add Philippine shapefile to data directory

00_dhs_prep.ipynb Code error

Hi @ardieorden ardieorden

For the following notebook 00_dhs_prep.ipynb
data = dhs[[
'Cluster number',
'Wealth index factor score combined (5 decimals)',
'Education completed in single years',
'Has electricity'
]].groupby('Cluster number').mean()

data['Time to get to water source (minutes)'] = dhs[[
'Cluster number',
'Time to get to water source (minutes)'
]].replace(996, 0).groupby('Cluster number').median()

data.columns = [[
'Wealth Index',
'Education completed (years)',
'Access to electricity',
'Access to water (minutes)'
]]

print('Data Dimensions: {}'.format(data.shape))
data.head(2)

For the bold part, I am getting the error message, how to resolve this, please.

Thanks

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

thinkingmachines / ph-poverty-mapping Goto Github PK

ph-poverty-mapping's Introduction

This repo is no longer maintained. Checkout our latest poverty mapping repo here!

Philippine Poverty Mapping

Setup

Code Organization

Downloading Datasets

Demographic and Health Survey (DHS)

Google Static Maps

Training the Model

Data Sources

Models

Citation

Acknowledgments

ph-poverty-mapping's People

Contributors

Stargazers

Watchers

Forkers

ph-poverty-mapping's Issues

Define feature columns

Recommend Projects

Recommend Topics

Recommend Org