Git Product home page Git Product logo

x-ray-image-prediction's Introduction

Using Neural Networks to Classify X-Ray Images of Pneumonia

Author: Lauren Esser

The contents of this repository detail an analysis of the module four project. This analysis is detailed in hopes of making the work accessible and replicable.

Business Problem

  1. Objective: Build a neural network that classifies x-ray images of pediatric patients to identify whether or not they have pneumonia.

  2. Project plan: I worked daily on this project for the allowed time (one week). While working, the goal was to test parameters of basic neural networks and convolutional neural networks to see which model would provide the best accuracy and precision when classifying if an X-Ray Image has pneumonia or not.

  3. Success criteria: A successful model will predict correct X-Ray images above 50%. The goal I set for myself is to create a model that predicts the image correctly 80% of the time. I will know I have achieved this success criteria when I can see the classification report and confusion matrix show numbers over 80%. I also am making sure to pay attention to the loss, accuracy, precision, and recall visualizations in order to monitor gradient descent.

Data

Data set was downloaded from Chest X-Ray Images (Pneumonia) from Kaggle. Linked Here. The provided dataset was already organized into three folders: train, test, validation. The subfolders for each group include: Pneumonia and Normal. All images come from pediatric patients aged 1 to 5 years old in Guangzhou Women and Children's Medical Center. In this notebook I uploaded the zip file from Kaggle into my Google Drive. You may access the files in my Drive here.

Methods

I used the OSEMN Method while working on this project. The steps are outlined in detail below:

Obtain:

Data was obtained from the Chest X-Ray Images (Pneumonia) dataset from Kaggle and uploaded into my Google Drive. Since I chose to complete my work on Google Colab, I had to mount my drive and show the source folder to access the correct zip file. I then unzipped the images for use.

Example of code:

target_folder = r'/content/'
file = glob.glob(source_folder+ 'chest-xray-pneumonia-jmi.zip', recursive = True)
file = file[0]
file 
zip_path = file
!cp '{zip_path}' .

!unzip -q chest-xray-pneumonia-jmi.zip
!rm chest-xray-pneumonia-jmi.zip 

Scrub:

To Scrub the data I first organized it into proper directories. These included a train, test, and validation directory.

Example of code:

# Create Data Directory
data_dir = Path(r'chest_xray/')

# Train directory
train_dir = data_dir/'train'
pneumonia_train_dir = data_dir/'train'/'PNEUMONIA'
normal_train_dir = data_dir/'train'/'NORMAL'

I then checked the length of the directory to see what I was working with. For this project I went back and forth between Explore and Scrub so you will see farther details in the next section.

Explore:

To begin my exploration section I wanted to first see the images I was working with. I created a function to view images and took a look at a normal train image and a pneumonia train image.

Example of code:

# See image function
def see_image(img_file):
  filename = img_file
  img1 = load_img(filename, target_size = (150, 150))
  return plt.imshow(img1)

Once I viewed the X-Ray iamges I began to prepare the data for modeling. Since I knew I wanted to work with Basic Neural Networks and Convolutional Neural Networks I knew I would have to arrange the data differently. For the CNN I used an Image Data Generator to rescale the images, set the image size to 64 x 64, the batch size to the full dataset, and class mode was set to binary. I then used the next() function to iterate through the images to create the needed datasets.

Example of code:

# Example with train set
train_set = ImageDataGenerator(rescale= 1./255).flow_from_directory(train_dir, target_size = (64, 64), batch_size = 5216, class_mode = "binary")
train_images, train_labels = next(train_set)

I then continued to reformat the dataset for my Basic Neural Network. First I indentified the shape of my images and labels.

Example of code:

# Identify shapes of image/labels
print('Train Images Shape: ', str(train_images.shape))
print('Train Labels Shape: ', str(train_labels.shape))
print("Number of Training Samples: ", str(train_images.shape[0]))

Then I reshaped the images using the .reshape method.

Example of code:

# Reshape images for basic NN modeling
train_img = train_images.reshape(train_images.shape[0], -1)

To finish scrubbing/exploring I looked at another image, identified the class indices, and renamed my train_labels as train_y to help avoid confusion later on.

Model:

Within my modeling section I created three Basic Neural Networks and three Convolutional Neural Networks. I will discuss the Basic Neural Networks first.

Basic Neural Networks

Baseline Model To begin Modeling I created a baseline Neural Network Model. I intializaed a random seed, created three layers, and compiled the model using tanh as the activation, binary_crossentropy as the loss, and sgd for the optimizer.

While creating this model I set a callback and created my own functions for running the history, time of model, results, visualizations. (Can view in Google Colab Notebook)

Results:

In the following Basic Neural Networks I experimented with changing image size, using different activations, optimizers, changing the batch size, and adding additional Dense layers. The third Neural Network model had the best results with 85% accuracy, high precision/recall, and visualizations that were relatively smooth.

Third Basic Neural Network Model and Results:

Convolutional Neural Networks

Convolutional Neural Networks are popular in deep learning where analyzing visual imagery is required. For my baseline CNN Model I used the same parameters as our final Basic Neural Network and simply added the Conv2D, MaxPooling, and Dropout Layers.

With our first Convolutional Neural Network our accuracy has dropped, our gradient descent steps are more extreme, and our precision has changed. The next steps taken were to tweak the parameters and layers to build a better CNN model. The final CNN Model I will show in the interpret section because this is the choice model that I would select.

Interpret:

Final CNN Model

This model hits our goal of predicting with 88% accuracy. Our precision and recall scores are also above 80%.

Recommendations

  1. Use an image size of 64x64 for faster processing.
  2. If focusing on highest Accuracy use CNN Model 3.
  3. If focusing on highest Recall use CNN Model 2.
  4. If focusing on highest Precision use the Baseline Neural Network Model.
  5. Overall the best Model to use would be CNN Model 3.

Future Work

  1. Use a larger dataset to improve accuracy and precision.
  2. Look at X-Rays of adult lungs with and without Pneumonia.
  3. Use a premade CNN to see if they could potentially work better.

Reproduction Instructions

This project uses:

  • Anaconda, a package and environmental management tool
  • Python 3.6.9
  • Numpy
  • Pandas
  • Seaborn
  • Matplotlib
  • Sklearn
  • Tensorflow
  • Datetime
  • Scipy
  • Os, Glob
  • Yellowbrick

If you would like to follow the analysis locally and have the above tools:

  1. Fork and clone this repository
  2. Obtain data Linked Here. and upload into your Google Drive under /gdrive/My Drive
  3. Download and run notebook into Google Colab making sure you connect to your own Google Drive.

You should then be able to run the exploration and analysis in the provided X-Ray-Prediction-Notebook.ipynb.

For Further Information

Please review the narrative of my analysis in my jupyter notebook or review my presentation. For any additional questions please contact via e-mail at [email protected] or on LinkedIn.

Repository Structure:

README.md <- README for reviewers of this project. X-Ray-Prediction-Notebook.ipynb <- narrative documentation of analysis in jupyter notebook Presentation.pdf <- pdf version of project presentation

x-ray-image-prediction's People

Contributors

lauren-esser avatar laurenpasciak avatar davidbraslow avatar sumedh10 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.