Machine learning project for the course EL-GY 9123

Deep Learning for Lung Nodule Detection

Lung cancer is the leading cause of cancer-related deaths in the US and worldwide. The general prognosis of lung cancer is poor because doctors tend not to find the disease until it is at an advanced stage. Five-year survival is around 54% for early stage lung cancer that is localized to the lungs, but only around 4% in advanced, inoperable lung cancer. Hence, there is a need for early detection of lung cancer nodules as early diagnosis can improve the chances of survival manifold. In this project, we use deep learning to detect lung cancer nodules using the YOLO algorithm that has been predominantly used for self-driving cars and other non-medicine related applications, and the results were promising.

Code Hierarchi:

NODULES XML PROCESSING WITHOUT REGION DUPLICATES.ipynb
NON NODULES XML PROCESSING WITHOUT REGION DUPLICATES.ipynb
NODULES DCM-XML MAPPING.ipynb
NON NODULES DCM-XML MAPPING
Creating .npy files of image pixels for training.ipynb
YOLO_FINAL.ipynb

The JSON string of the model is provided as a pickled file "model_json_final.pkl". The trained weights are provided in "model_final_weights". The model can be reconstructed from these.
The model visualization code and the resulting .png file visualizing the model are also provided.

Note: All codes are original. We read the paper on YOLO algorithm and coded a version appropriate for the problem at hand. The code for callback function was taken from one of the assigments in the course.

Overview

We provide a brief overview here. Detailed analysis can be found in the jupyter notebook "YOLO_FINAL.ipynb" .

1. Dataset

The LIDC-IDRI dataset was used for this project. The whole dataset which is 124 GB of images and annotations in the form of XML files can be obtained from the following link:

https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI

The dataset consists of CT scan series of varying lengths for 1018 patients with annotated nodules with diameters:

Greater than 3 mm
<3 mm
non-nodules

The first and biggest step was to organize the data. XML parsing was carried out for all the 1018 patients and a dictionary was made with SERIES ID (which identifies the slice of the CT scan series) as the key and all the nodules annotated in it as the values.

Further, from these dictionaries, an array of pixel values was extracted from the DICOM images using the pydicom package and was stored in a separate folder after rescaling using the mean and standard deviation. Another folder contained the co-ordinates of the annotations for the corresponding images.

A total of 3000 annotated CT images were used out of which 2100 were used for training and the rest for testing.

2. Training

The YOLO (You Only Look Once) algorithm detects objects and predicts bounding boxes with just one pass through the image instead of multiple sliding windows. Inspired from this idea, we tried out the YOLO algorithm to detect the probabilities of the presence of a nodule in a CT scan by dividing each scan into a 16 by 16 grid. Our algorithm does not predict bounding boxes.

For more information on the YOLO algorithm, read the paper at this link: https://arxiv.org/abs/1506.02640

A convolutional neural network with 18 layers that was inspired from u-net and resnet was designed after multiple iterations. A batch size of 1 worked the best and training was carried out on 2100 images.

3. Testing and Analysis

An accuracy of 65% was obtained on randomly generated testing data that the model had never seen, as defined by our accuracy metric. For more details please refer to the jupyter notebook.

We checked the training accuracy (not shown in the notebook) and it is very close to the testing accuracy. So we believe that the model is underfitting and are working on improving the network architecture.

Work in progress:

Change neural network architecture to increase accuracy.

tricoffee / lungnoduledetection Goto Github PK

lungnoduledetection's Introduction

Machine learning project for the course EL-GY 9123

Deep Learning for Lung Nodule Detection

Code Hierarchi:

Overview

1. Dataset

2. Training

3. Testing and Analysis

We checked the training accuracy (not shown in the notebook) and it is very close to the testing accuracy. So we believe that the model is underfitting and are working on improving the network architecture.

Work in progress:

lungnoduledetection's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent