About this repository
This repository contains scripts used to learn interpretable features from X-Ray imaging data originating from various sources related to the COVID-19 pandemic.
Files Contained
- Extract_COVIDnet_Features.py - Utilizes the pretrained COVID-Net models and extracts the learned representations of the data at the final 3 layers of COVID-Net. The header of this script contains information on how to run it.
- kmeans_DenseNet_Representations.m - Performs k-means clustering on the learned representation of the data for DenseNet [Add Reference], and computes the average purity of clusters over each different initialization of k-means.
- kmeans_COVIDnet_Representations.m - Performs k-means clustering on the learned representations of the data for COVID-Net and computes the average purity of clusters over each different initialization of k-means.
- make_COVIDx_labels.py - Reads in test_COVIDx.txt and train_COVIDx.txt and creates .mat files for the labels
Requirements
- To generate the data set for COVID-Net and use COVID-Net - PyDicom, Pandas, Jupyter, Tensorflow 1.15, OpenCV 4.2.0, Python 3.5+, Numpy, Scikit-Learn, Matplotlib
Usage
To perform clustering on the learned representations of COVID-Net:
- Clone the repository at https://github.com/lindawangg/COVID-Net and follow their instructions on how to generate the relevant data set
- Run the Extract_COVIDnet_Features.py script by providing as arguments the path to the pretrained model/data
- Run the script make_COVIDx_labels.py to generate .mat files containing the numerical labels for each data point. This is needed when computing the average purity for k-means clustering.
- Run the Octave (MATLAB) script kmeans_COVIDnet_Representations.m after adjusting the load paths and variables as appropriate to the .mat files generated in step 2 and 3. Warning: The 4/15/2020 update of COVID-Net changed the number and type of classes being used in the classification task.