Git Product home page Git Product logo

alzheimers_diagnosis_stagedetermination's Introduction

Alzheimers_Diagnosis_StageDetermination

EECS 6893 Big Data Analytics - Final Project

Project ID: 201712-18

Authors: Jing Ai (ja3130), Michael Nguyen (mn2769), Haoquan Zhao (hz2441)

Alzheimer’s Disease affect 1 in 3 seniors in the US and is one of the fastest rising part of the healthcare budget. Gaining better understanding of disease patterns and achieving accurate diagnosis showing the disease progression are crucial problems to address. Our project aims to identify the Alzheimer’s Disease biomarker combinations with the highest diagnostic power and examine the disease patterns of patients at different disease stages. The novelty of our project is that we performed a comprehensive analysis that integrated clinical, genomic and imaging data and included patients of multiple disease stages (Normal, Early Mild Cognitive Impairment, Late Mild Cognitive Impairment and Diagnosed Alzheimer’s), as previous studies have only focused their analysis on one modality and binary phenotypes (diseased/not diseased).

Dataset: ADNI data collection

The Alzheimer's Disease Neuroimaging Initiative (ADNI) data collection is a publicly available data collection consist of clinical, genetic and imaging datasets based on studies of approximately 1,550 participants including Alzheimer’s disease patients, mild cognitive impairment subjects and elderly controls across 3 multi-year cohorts (ADNI1, ADNI GO, ADNI2) between 2004 and 2017. More information on the data collection can be found here: http://adni.loni.usc.edu/data-samples/

Analytics

Data Normalization, Preprocessing using PCA

Preprocessing

Clinical

Imaging - radiology measurements

Genetic

Combined

Data-merging

Classification, Features importance, Features correlations and Data visualization

Multi-layer Perceptron for Merged Data Classification

Convolutional Neural Network(CNN) for Raw Images Classification

Random Forest for classification and assessing feature importance

Spearman’s correlation for estimating feature correlations

tSNE for visualizing high dimensional data

alzheimers_diagnosis_stagedetermination's People

Contributors

endomain avatar michaelng1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alzheimers_diagnosis_stagedetermination's Issues

Problem in Duplicating your study run

Hi, try to replicate your project. Got the data from ADNI. But there are some issues.

(a) I assume I can run the steps one-by-one i.e. pre-processing first then merge then ... I concentrate on the 4 pre-processing steps and 2 merge steps first.

###The most important issue is in (c) about an error which I cannot handle, but I present my questions in the step-by-step manner.

(b) For the before-Merge steps, some of the datasets I cannot find the sources. I handle it somehow but many of the datasets cannot be found in csv and/or the challenge. I ended up just restore your dataset but that is no good. As I should generate from ADNI sources. In particular, at least the following 4 not sure where the sources are:

Merged_clinobioimg_nona.csv     <-- seems to be generated by 2nd merge step 
adin_clin_gwas.csv                         <-- ??? no sure where is this coming from
Merged_Filtered.csv                      <-- seems to generated by preprocessing step 4 
MergedProcessedMRI_filtered.csv <--seems to generated by 1st merge step  as well from outside:
      
For the last one you may refer to Merge step 2 the statement
     #img =pd.read_csv
('/Users/ja/Documents/BigDataAnalytics/BigData_ADNI_project/Data/ProcessedImaging/MergedProcessedMRI_filtered.csv')  I cannot find this from ADNI download and I think there is another file like this.  Where is the big data ADNI project?  Your advice would be needed and very helpful.


(c) I did manage to pass the 4 pre-processing steps but get into the Merge steps, I have issues

(i) The VISCODE cannot be drop as the steps above merge generate VISCODE_x and VISCODE_y. I amend all to drop those instead of VISCODE but not sure it is right.

(ii) Then these error ... I cannot resolve them, do you mind to have a look :

# Convert features to numpy array for PCA transformation
d_x = d_x.as_matrix()
d_x = d_x.astype(float)

# Normalize features to mean=0, variance = 1
x_mean = np.mean(d_x, axis = 0)
x_std = np.std(d_x, axis = 0)

d_x = np.subtract( d_x, np.matlib.repmat(x_mean, n_sampels, 1) )
d_x = np.divide( d_x,np.matlib.repmat(x_std, n_sampels, 1) )

--- error below The first .values is unimportant for the moment, but others do not know how to hanlde ---- 

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  
C:\ProgramData\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
C:\ProgramData\Anaconda3\lib\site-packages\numpy\core\_methods.py:78: RuntimeWarning: invalid value encountered in true_divide
  ret, rcount, out=ret, casting='unsafe', subok=False)
C:\ProgramData\Anaconda3\lib\site-packages\numpy\core\_methods.py:140: RuntimeWarning: Degrees of freedom <= 0 for slice
  keepdims=keepdims)
C:\ProgramData\Anaconda3\lib\site-packages\numpy\core\_methods.py:110: RuntimeWarning: invalid value encountered in true_divide
  arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
C:\ProgramData\Anaconda3\lib\site-packages\numpy\core\_methods.py:130: RuntimeWarning: invalid value encountered in true_divide
  ret, rcount, out=ret, casting='unsafe', subok=False)

If I ignore it obviously the PCA has not been done and nothing come up.

(d) Under the Merge there is another notebook about file an imaging and "Extract NIfTI imaging files"

I am not sure I can find the files and what are the purpose of this script.

    rootdir = '/Users/ja/Documents/BigDataAnalytics/Preprocessed_AD/Preprocessed_AD_%s'%n

and output is

    copyfile(rootdir+'/'+fid+'/'+sub +'/'+sub2+'/'+sub3+'/'+image_name, "/Users/ja/Documents/BigDataAnalytics/Preprocessed_AD/"+image_name)

asking in advance just in case it will be useful in the future.


for your kind advice:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.