berkeley-stat159 / project-alpha Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 8.0 61.2 MB

License: BSD 3-Clause "New" or "Revised" License

Makefile 1.29% TeX 22.62% Python 68.58% Jupyter Notebook 7.51%

project-alpha's People

Contributors

Stargazers

Watchers

Forkers

kentschen reychil janewliang benjaminleroy hiroto-udagawa karenceli matthew-brett flaviu-gostin

project-alpha's Issues

Data update

Could someone (like Jane) update that data/Makefile test file :)

How to Run the final stuff

@janewliang @hiroto-udagawa @reychil @kentschen

I think this is the order needed: (@ 12 December 1:23 am)

required

smooth_final.py
convolution_final.py
glm_final_fourier.py
bh_t_beta_final_fourier.py
image_overlay_final.py

some really late "eda"

selection_final_fourier.py
parameter_selection_final_fourier.py

feedback

Great work. Keep it up. Let's know how you can help.

I know you are aware of this, but you need to make it clear to us and
documented that everyone is contributing. I know you are all participating,
but you should consider trying to switch who is generating code to mostly
reviewing code (or some other approach).

Multiple Linear Regression

The different condition files to run multiple linear regression, explore results

Time Correction

Code up time correction and include good comments

(with some help from Ben maybe)

Final report draft feedback

Sorry --- I skipped you guys originally since you got a lot of time with Jarrod and Matthew on Tuesday. Here's my feedback from the paper:

The introduction is very good: you present your plan in a clear and it is
apparent you developed it in a very methodical way.
I'm a little confused as to what the table in figure 2 is displaying.
Neither the text nor the caption really clarify what is being
shown, especially with regards to "accuracy". Maybe it would make more
sense if I stared at it longer, but consider polishing this part up.
For section 3.2.3 - if you have results from this (maybe one figure showing
the different HRF's convolved with the event course and what the resulting
columns in the design matrix look like) I'd be interested to see it; though
it is by no means necessary.
I didn't comment on any grammar mistakes as I assume you'll proofread before
submitting, but the first paragraph of section 3.3 has a typo regarding
regressors as "rows" of your design matrix
In figure 5 in your results section, is the y-axis ennumeration meaningful?
It looks like you just wanted to plot the lines on the same plot so chose
to set the baselines at increments of -2. If this is the case, you can
suppress the printing of the y-tick marks. If the values on the y-axis
are indeed meaningful, you should mention it either in the caption or
text.
Also, I'm not clear exactly what you are showing in figures 3 and 4 - try
to clarify what these images show. It's clear they are from a single voxel,
but what are the "fitted values" in this case? Is this the application of
your linear model to that voxel?
overall you do a very good job of explaining the logical thought processes
that went into your decisions. I very much appreciate this.

Updates Slides Someone!

I really wish somebody would update these slides

Clustering for Group analysis

PCA beginning

Create function for PCA, test file, and script by Monday

Auto correlation and Time series analysis

Extend simple regression with auto correlation and examine other time series analysis of hemoglobin response (goal competition Monday)

Smoothing

Create function to smooth 3d images, creating mega-voxels (goal competition Monday)

Meeting Friday 11/20, 12-1

@jarrodmillman

Benjamini-Hochberg (scipy)

5 out of 9 tests failed with make test

When I clone your project and run make test, I get 5 out of 9 test failures.

I've attached the error message I get. Take a look and see if you can figure out what the issue is.

NOTE: You guys are very active on your project --- this is intended to be an exercise for lab on Monday, so please don't resolve the issue before then :)

nosetest_error_summary.txt

Beginning Subject Comparision

Begin cross subject comparison, looking into R^2, RSS, \betas and other statistics to compare observations in a single person to between subject comparison (Goal, initial effort competition by Monday)

Commentary on the new analysis

Dear all,

I hope this finds you well. Having spend 12+ hours after our meeting on Wednesday cleaning up and finding why overlapping of our Benjamini Hochberg multiple comparison of p-values, and the other thresholds using upper quantiles of abs(t-statistic) and abs(beta) were not providing accuracy results to find locations of activations regions.

My first conclusions where 2 fold;
(1) that the brains fail to line up enough that a lot of the area interpretation is lost via this problem alone and
(2) the certain subjects linear regression models overfit, and inclusion of 6 principle components created collinearity with our predicted Hemodynamic response function. These generally occurred when the variance explained by these principle components where much larger than our lower threshold of .4 (40% variance explained) ** recall this is all in terms of principle components of the A^T A matrix , where A is the masked voxel time courses.

Introduction of Fourier

In attempts to correct for this problem (of overfitting/collinearity), and with a firm understanding that the 6 fourier features would fail to introduce such collinearity and explanation of variance, I ran models with 6 fourier features instead. Let’s call this model “_fourier”.

For the subjects with ~40% variance explained by the first 6 principle, the beta values and t-statistics from this new model (“_fourier”) compared to the old model (with the 6 principle components “_pca”). Impressed by this outcome (as we saw earlier that _fourier was less able to predicted the BOLD response than the _pca model in the model selection analysis). I decided to implement the analysis for all subjects

Parameter Selection for “multiple comparison” and threshold analysis

While trying to find a single set of parameters to utilize across all subjects for all of these analyses, I discovered that for Benjamini Hochberg the Q value was much more variable or each subject than the other threshold analysis. And as such, even though Benjamini Hochberg is more theoretically sound, it is less strong in our case.

Recall that Benjamini Hochberg requires a Q (generally thought of as an \alpha value) and # of neighbors selection for neighbor smoothing, and our threshold statistics require an quantile value of the proportion of values saved, and a # of neighbors for neighbor smoothing.

In general, the data fails to provide many regions of activation, as the only current method for such analysis is now requires per subject analysis separately (by eye), the charts to find these patterns can be created by running “image_outlay_final.py”. As such, our conclusions should probably only include identification of the frontal lob (@KentChen, I still need the full range of possibilities of actual activation locations created for at “sub001” at least)

Basic “Bro” Conclusion

In general the only thing that sticks out to me is the frontal cortex as previously noted. #BasicBro

******* Please review/comment and add my stuff

Final Paper Draft

Need to write the final draft of the paper and do some organizing. We maybe want to address Ross's Piazza post answer and squish the appendices into a single PDF with the main report. Ideally, we'd all have things written a day or two before the deadline, so there is time to revise and proofread and every section gets more than one pair of eyes checking it.

Also need to write the scripts to generate the images.

Glm residuals and fitted

Just make sure that shit work

Paper/Reports

In text file, add name, title, link to paper. In bibliography, please add in the URL/link

General Organization/Documentation

Need to check that all the requirements are listed in requirements.txt and that all the Makefile recipes run and make sense. Should also update the READMEs so that every directory has a README that outlines the Makefile recipes, what the directory stores, etc. May want to do some organizing to delete useless files and consolidate where the user-generated images and data files get saved.

Script initializing

Start formatting/ exploring approach the FINAL script

Analysis fails due to data that is not unpacked

Hey guys,

I tried make analysis and it fails at a certain point because at least one bold.nii.gz doesn't get unpacked by make data.

If any of you are not getting killed by finals I'd appreciate if you could take a look. If you're all super busy that's fine, I'll work it out on my own. Let me know if you'll have a chance to look at it today

Code Review

ALL the code needs to be reviewed. Check if it runs, makes sense, and has good comments.

9-page Paper Draft

PLEASE FINISH WRITING YOUR ASSIGNED SECTIONS BEFORE SUNDAY, 11/29, so Jane can proofread/organize as needed. If your section already exists in some form from the previous draft, the .tex file for the section is just copied over. If not, it will be blank.

In general, try to cite more frequently, write more detailed captions (e.g. you can tell what's going on in the figure based on just reading the caption, as opposed to a one-line title), write "p-value" in quotes, and it's hemodynamic, not hemoglobin.

In addition to the assigned sections outlined below, we also need to go over/update the abstract, intro, data, and discussion sections. Jane is happy to do that when she organizes/proofreads on Sunday, but you are all more than welcome to contribute too.

Methods:
Smoothing (Rachel)
Convolution and time correction (Ben)
GLM (Jane)
Normality checks/assumptions (Kent)
Hypothesis testing (Hiro)
Benjamini-Hochberg (Rachel)
Clustering (Hiro)

Results:
GLM (Jane/Ben)
Hypothesis testing -- normality and Benjamini-Hochberg are lumped in here too (Hiro/Rachel/Kent)
Clustering (Hiro)

Appendix (If you can fit everything you want to say gracefully in the main paper, it's fine not to do these):
Convolution analysis (Ben)
Benjamini-Hochberg (Rachel)
Clustering (Hiro)
Time series (Jane)
NOTE: If you are using citations in your appendix section, please uncomment the 3 other lines in the make file associated with your section. If you're not using citations, it's fine to leave the make file as is and I'll clean it up. But to get your citation references to render correctly, you'll need to uncomment the 3 other lines for your section.

Convolution

Final approach to Convolution