This project classifies cassava leaf diseases using deep learning, leveraging image augmentation and transfer learning with CNN models.
(Midterm Project for DATA 2040 - Deep Learning @ Brown University Spring 2021)
- Project Description
- Methods
- Technologies Used
- Screenshots
- Setup
- Deliverables
- Contributing Members
- Acknowledgements
- Background: Cassava is a key crop for food security across Sub-Saharan Africa. Yet, viral diseases threaten cassava yields, and are costly to detect manually.
- As a midterm group project for DATA2040, this project fine-tunes a series of CNN models to accurately detect cassava leaf diseases, using image augmentation and transfer learning to increase classification accuracy.
- Why?: Fine-tuned deep learning models may help identify diseased cassava plants more efficiently and ultimately prevent crop loss.
- The data consists of 21,367 labeled images of cassava leaves belonging to five different categories - four different disease categories and one category for healthy plants. The images were crowdsourced from farmers in Uganda and labeled by experts at the National Crops Resources Research Institute (NaCRRI) in collaboration with the AI lab at Makerere University, Kampala. The data and task was made available as a Kaggle competition.
- Result: Using image augmentation and transfer learning we increased our accuracy from a baseline 61.5% (majority classifier) to 80% (DenseNet201 model).
- EDA
- Image data augmentation
- Deep learning (CNN)
- Transfer learning
- Fine-tuning
- Hyperparameter tuning
- Python (3.8)
- TensorFlow (2.4.1)
- Keras (2.4.0)
- Scikit-learn (0.24.0)
- Pandas (1.2.1)
- Numpy (1.19.2)
Cassava disease class balance (normalized). Due to the data imbalance, we used a StratifiedKFold split to preserve class ratios in each fold. |
Cassava leaf images by disease category. The images vary by angle, lighting, and position. |
Validation and training accuracy for our baseline VGG16 model over 16 epochs. |
Validation and training accuracy for fine-tuned DenseNet model with a ReduceLROnPlateau schedule over 31 epochs. |
To read the project code as a Jupyter/IPython notebook, click on the RAD_Final_Blog_Post_3.ipynb file here to open it in your browser.
The project notebook can also be opened and run in Google Colab (with GPU). To download the Kaggle Cassava Leaf Disease Classification dataset, create a Kaggle account and create an API token. Then, replace the Kaggle username and key in the following code cell:
We described our project process - from data exploration and augmentation to transfer learning and model fine-tuning - in the following blog posts:
- This project task and data was sourced from Kaggle’s Cassava Leaf Disease Classification competition.
- As such, many thanks to the Makerere Artificial Intelligence (AI) Lab at Makerere University in Uganda - who apply AI and data science to real-word challenges, as well as to the experts and collaborators from National Crops Resources Research Institute (NaCRRI) for assisting in preparing this dataset.
- All of our sources for our EDA and model development are listed in our blog posts (1, 2, and 3) in the Sources Used section.
- Many thanks to Annie and Roma for your collaboration!