- Feature-Extraction - This notebook extracts features from data generated with Google Earth Engine to use in modeling flood extent for Malawi.
- Flood-Modeling - Trained regression model on data from 2015 flood and tested on 2019 flood. Tried several different algorithms and Catboost produced the best results.
- Visualization done in a variety of programs, mainly QGIS and Tableau.
In recent decades, countries across Africa have experienced an increase in the frequency and severity of floods. Malawi is particularly vulnerable to flooding. In March 2019, Cyclone Idai impacted more than 922,900 people, with 56 deaths, 577 injuries, and more than 82,700 people were displaced. Food insecurity also rose sharply due to the destruction of crops in a country that is highly dependent on rainfed agriculture.
The vulnerability of this region is projected to increase with climate change. So, the goal of this project is to create a flood prediction model which will eventually lead to a flood planning app that governments can use to prepare for future flood impacts.
Southern Malawi experienced major flooding in 2015 and again in 2019. So I used data from the flood of 2015 to train my model and then tested it with the flood data from 2019.
For the target variable I started with a polygon of the area that was flooded for each year. The map of southern Malawi is broken into 1 km sq rectangles and overlayed onto the flood exent. The percent area flooded was calculated for each rectangle and was used as the value for my target.
Several of the features were extracted for the study area using Google Earth Engine code editor which allows processing on google cloud.
- Soil organic carbon (proxy for soil health and ecosystem condition)
- % clay content of soil at 10cm
- Distance to wetlands
- Landcover classes
- Elevation
- Topographic position index
- Total weekly precipitation 2 months before event
I ran several models and the best performing one was Catboost.
The most important features were distance to wetlands, elevation, and clay content.
I chose RMSE because it gives a relatively high weight to large errors. For this project large errors in any given cell are undesirable so I want to weigh them heavily.
My final RMSE score = .11
This means that my model can be off by 11% (percent area flooded) in any given rectangle over the map of Southern Malawi.