greenelab / biobombe Goto Github PK

View Code? Open in Web Editor NEW

63.0 6.0 23.0 2.33 GB

BioBombe: Sequentially compressed gene expression features enhances biological signatures

Home Page: https://greenelab.github.io/BioBombe/

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 72.80% Python 1.57% R 1.29% Shell 0.07% HTML 24.27%

gene-sets msigdb gene-expression tcga compression hetnet biobombe network autoencoder

biobombe's Issues

Missing x axis label in main coverage figure

Update Supplementary Figure S4 - Stability

Switch labels for panels B and C

Make Mission of Predicting Cancer-Types and Mutations Clear

Need to write carefully about this point in README (see #90) and especially in the manuscript

Determine which genes were removed from the GTEx Signature Transformation

Restructure visualize_genesets.R

Currently the results are being read in for all gene sets. They should be read in once, and then visualized and subset.

Add TCGA Analysis

Answering question of how much additional information is gained through biobombe serial compression vs. lasso on 200 features

Add TCGA Classify Module README

Need to add results generated in #89 to an archived resource

Add t-test for NBL Cell lines in MYCN amplification signature application

Add GTEx Module README and analysis bash script

Validate MYCN status in NBL Cell Lines

Related with #163

Resources include https://www.nature.com/articles/sdata201733/tables/3 and https://figshare.com/articles/STAR-reads/7613975

Update Figure 6 - Coverage Analysis

I don't think I need to label all facets - probably just A, B, and C is sufficient

Additional Analysis: Applications to External Datasets

Get scores for all top scoring features across k dimension and algorithm for two publicly available datasets.

See if the score is associated with "separation" of target samples

Can also split out "monocyte" vs other in that plot

Update GTEx Supplementary Figure

Currently, A and B are plotted on the same row with two columns. I need to make two rows and 1 column instead

Reorder Modules to Squish in New Module 7

I am adding a new module 7 in #71 - i will need to update the other module numbers (GTEX and TCGA)

Rename Module 6 to `6.biobombe-projection`

Also, see what happens for all feature predictions on TCGA cancer type and mutation

Update the colors in the stability boxplot figures

The colors in these figures are not adding anything - they are actually a bit confusing

Update GTEx Geneset Panels C and D

After changes are merged in #125, the function plot_gene_set() will change. I will need to rerun the visualize notebook in the gtex module after the update

Remake Supplementary Figure 1

Need to update with strip text background color - also should make it so it can be in portrait orientation

Update TCGA Supplementary Figure

Switch panels A and B - also, the two panels currently in A are not in the correct order

Find Sex Feature in TCGA

Related to #163 as was previously done in GTEx. Also, box plots can be changed to display different correlations with transformed data in both cases as well

Update Supplementary Figure 3 - Correlation Summary

Switch panels c and d with a and b

Update GTEx Supplementary Figure

Need to lower case panel labels and move z to k

https://github.com/greenelab/BioBombe/blob/master/8.gtex-interpret/1.visualize-gtex-blood-interpretation.ipynb

Remove Redundant Supplementary TCGA Figures

A couple figures are redundant - added with different names

Add Vince as an Author

Will need to update the author list (and title) on the website once new preprint is posted

related to #181

cc @vincerubinetti

Add analysis to supplementary TCGA classification

Need to predict with top 1 feature
also determine which z the features are coming from

Add Ensemble Analysis to TCGA FIgure

We are interested in comparing ensemble VAE performance to ensemble multi-algorithm performance in cancertype and mutation prediction.

Move E and F of GTEx Figure to Supplement

Add Results Table For TCGA Classifier Figure Panel D

Probably good to map compressed features with high weights to their respective genesets

Consider changing `Z` dimension to `k` dimension

This may alleviate potential confusion between z dimension and z score language

Update TCGA Ensemble Model Coefficient Figure (panel h)

Instead of plotting weight sum per algorithm, plot average absolute value weight per algorithm

Sample Correlation for HGSC Subtypes

Could be interesting to observe the correlation pattern for OV

https://github.com/greenelab/interpret-compression/blob/master/4.analyze-components/figures/TCGA/sample-correlation/sample-type/sample-correlation_OV_TCGA_signal_pearson.png?raw=true

Split out by HGSC subtype assignment

Change y axis label for Feature Rank Plot

The plot generated here needs an updated y axis label. It should read: "Absolute Rank Enrichment"

Add Supplementary Table for Signature Genes

Need to track Neutrophils_HPCA_2 and Monocytes_FANTOM_2 genes

Add Transcription Factor Analysis to Supplementary Coverage Figure

Move SVCCA

SVCCA does not seem to work for the sample activation patterns in our models. I will apply SVCCA to the weight matrices instead to see if the results appear more promising.

I removed the colorblindr dependency in #13 because the package is not currently a conda recipe. Adding back this dependency will require a conda-forge pull request that I will save for a later date.

Update panels in coverage figure

Currently, the panels are labeled by gene set, they should be lettered by model type

Update Figure 1 - Add numbering for each BioBombe analysis bit

Describe Directory Structure in Module README

a more complete description of the directory tree structure will help orient a new viewer to the results.

Hex Color Tables

as @ajlee21 pointed out in #56 here

Is it worth creating a lookup table with colors -- HEX code as you've done before?

It will be good to update HEX colors in a table lookup. Also related to #14

greenelab / biobombe Goto Github PK

biobombe's Issues

Recommend Projects

Recommend Topics

Recommend Org