Hi, try to replicate your project. Got the data from ADNI. But there are some issues.
(a) I assume I can run the steps one-by-one i.e. pre-processing first then merge then ... I concentrate on the 4 pre-processing steps and 2 merge steps first.
###The most important issue is in (c) about an error which I cannot handle, but I present my questions in the step-by-step manner.
(b) For the before-Merge steps, some of the datasets I cannot find the sources. I handle it somehow but many of the datasets cannot be found in csv and/or the challenge. I ended up just restore your dataset but that is no good. As I should generate from ADNI sources. In particular, at least the following 4 not sure where the sources are:
Merged_clinobioimg_nona.csv <-- seems to be generated by 2nd merge step
adin_clin_gwas.csv <-- ??? no sure where is this coming from
Merged_Filtered.csv <-- seems to generated by preprocessing step 4
MergedProcessedMRI_filtered.csv <--seems to generated by 1st merge step as well from outside:
For the last one you may refer to Merge step 2 the statement
#img =pd.read_csv
('/Users/ja/Documents/BigDataAnalytics/BigData_ADNI_project/Data/ProcessedImaging/MergedProcessedMRI_filtered.csv') I cannot find this from ADNI download and I think there is another file like this. Where is the big data ADNI project? Your advice would be needed and very helpful.
(c) I did manage to pass the 4 pre-processing steps but get into the Merge steps, I have issues
(i) The VISCODE cannot be drop as the steps above merge generate VISCODE_x and VISCODE_y. I amend all to drop those instead of VISCODE but not sure it is right.
(ii) Then these error ... I cannot resolve them, do you mind to have a look :
# Convert features to numpy array for PCA transformation
d_x = d_x.as_matrix()
d_x = d_x.astype(float)
# Normalize features to mean=0, variance = 1
x_mean = np.mean(d_x, axis = 0)
x_std = np.std(d_x, axis = 0)
d_x = np.subtract( d_x, np.matlib.repmat(x_mean, n_sampels, 1) )
d_x = np.divide( d_x,np.matlib.repmat(x_std, n_sampels, 1) )
--- error below The first .values is unimportant for the moment, but others do not know how to hanlde ----
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
C:\ProgramData\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
C:\ProgramData\Anaconda3\lib\site-packages\numpy\core\_methods.py:78: RuntimeWarning: invalid value encountered in true_divide
ret, rcount, out=ret, casting='unsafe', subok=False)
C:\ProgramData\Anaconda3\lib\site-packages\numpy\core\_methods.py:140: RuntimeWarning: Degrees of freedom <= 0 for slice
keepdims=keepdims)
C:\ProgramData\Anaconda3\lib\site-packages\numpy\core\_methods.py:110: RuntimeWarning: invalid value encountered in true_divide
arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
C:\ProgramData\Anaconda3\lib\site-packages\numpy\core\_methods.py:130: RuntimeWarning: invalid value encountered in true_divide
ret, rcount, out=ret, casting='unsafe', subok=False)
If I ignore it obviously the PCA has not been done and nothing come up.
(d) Under the Merge there is another notebook about file an imaging and "Extract NIfTI imaging files"
I am not sure I can find the files and what are the purpose of this script.
rootdir = '/Users/ja/Documents/BigDataAnalytics/Preprocessed_AD/Preprocessed_AD_%s'%n
and output is
copyfile(rootdir+'/'+fid+'/'+sub +'/'+sub2+'/'+sub3+'/'+image_name, "/Users/ja/Documents/BigDataAnalytics/Preprocessed_AD/"+image_name)
asking in advance just in case it will be useful in the future.
for your kind advice: