Git Product home page Git Product logo

cube's Introduction

Cubé: Intuitive Gene Network Search Algorithm

Cubé


How It Works

Given a single-cell dataset and an input gene(s), Cubé looks for simple & nonlinear gene-gene relationships to construct a regulation network informed by prior gene signatures. For example, Cubé might give you the result that GeneA * GeneB ~= GeneC, potentially meaning that genes A & B coregulate to produce C, or there is some other nonlinear relationship. Cubé then recursively feeds outputs back into itself to great a gene network.

Cubé


Install

$ python3 -m pip install git+https://github.com/connerlambden/Cube.git


Running Cubé

from sc_cube import cube
import scanpy as sc
adata = sc.read_h5ad('my_expression_data.h5ad') # Load AnnData Object containing logged expression matrix
go_files = ['BioPlanet_2019.tsv', 'GeneSigDB.tsv'] # Load Gene Signatures to Search In

cube.run_cube(adata=adata, seed_gene_1='ifng', seed_gene_2='tbx21', go_files=go_files, 
            out_directory='Cubé_Results', num_search_children=4, search_depth=2)

Example Outputs


Inputs

adata: AnnData Object with logged expression matrix

seed_gene_1: Starting search gene of interest

seed_gene_2: Optional: Additional seed gene of interest (to search for seed_gene_1 * seed_gene_2)

go_files: List of Pathway files to search in. Each edge in Cubé requires all connected genes to be present in at least 2 pathways. Examples To Download or Download More From Enrichr

out_directory: Folder to put results in

num_search_children: How many search children to add to the network on each iteration. For example, a value of 2 will add two children to each node.

search_depth: Recursive search depth. Values above 2 may take a long time to run


Outputs

Cubé_data_table.csv: Table showing the genes, pathways, and weight for each edge in the network. Positive correlations will have small edge weights and negative correlations will have large edge weights.

*.graphml file. Network file that can be visualized in programs like Cytoscape

Cubé_network.png: Network visualization where green edges are positive correlation & red edges are negative correlation. For better visualizations, we recommend loading the .graphml file into Cytoscape


Visualizing The Product of 2 Genes Using Scanpy

import numpy as np
# Visualizing Product of 2 Genes using Scanpy (assuming adata.X is logged and sparse)
gene_1 = 'ifng'
gene_2 = 'tbx21'
adata_expressing_both = adata[(adata[:,gene_1].X.toarray().flatten() > 0) & (adata[:,gene_2].X.toarray().flatten() > 0),:]
adata_expressing_both.obs[gene_1 + ' * ' + gene_2] = np.exp(adata_expressing_both[:,gene_1].X.toarray() + adata_expressing_both[:,gene_2].X.toarray())
sc.pl.umap(adata_expressing_both, color=[gene_1 + ' * ' + gene_2])

Why Cubé?

Cubé

Single-cell RNA sequencing has allowed for unprecedented resolution into the transcriptome of single cells, however the sheer complexity of the data and high rates of dropout have posed interpretive and computational challenges to create biological meanings and gene relationships. Many methods have been proposed for inferring gene regulatory networks, leading to sometimes dramatic differences depending upon the initial assumptions made 😬. Even in the case of unsupervised learning (UMAP) or clustering (Leiden), it’s not clear how to balance local/global structure or what data features are most important. Additionally, these “black-box” machine learning methods are closed to scrutiny of their inner workings and cannot explicate logical, understandable steps and tend to be fragile to model parameters. Cubé addresses the dropout issue by only comparing sets of genes together in cells that have nonzero expression in all cells. This removes the need for biased imputation methods and focuses each relationship to relevant cells. Cubé addresses the interpretability problem by presenting solutions in the form of expression(gene1) ~= expression(gene2) * expression(gene3) which succinctly express nonlinear relationships between specific genes in an understandable way without any pesky parameters. Since Cubé samples from the space of all possible nonlinear gene-gene pairs, results have high representational capacity and low ambiguity. Cubé is a descriptive search algorithm that optimizes for biologically & statistically informed gene patterns.


How It Works Under The Hood

Cubé



Special Thanks to Vijay Kuchroo, Ana Anderson, Lloyd Bod, & Aviv Regev

Contact: [email protected]

cube's People

Contributors

connerlambden avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

cube's Issues

adata not defined

Got past the last one, but now encountering another - adata not defined in the function def build_data_structures(go_files, lower_gene_names)

go_files = ['BioPlanet_2019.tsv'] # Load Gene Signatures to Search In

cube.run_cube(adata=mdata_nucall, seed_gene_1='Alb', seed_gene_2='Fgb', go_files=go_files, 
            out_directory='Cubé_Results', num_search_children=4, search_depth=2)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-23-5474995593bf> in <module>
      2 
      3 cube.run_cube(adata=mdata_nucall, seed_gene_1='Alb', seed_gene_2='Fgb', go_files=go_files, 
----> 4             out_directory='Cubé_Results', num_search_children=4, search_depth=2)

\cube.py in run_cube(adata, seed_gene_1, seed_gene_2, go_files, out_directory, num_search_children, search_depth)
    346     if seed_gene_1 == seed_gene_1.lower():
    347         lower_gene_names = True
--> 348     G = find_cancellations(adata, seed_gene_1, seed_gene_2, search_depth, lower_gene_names, go_files, num_search_children=num_search_children)
    349     out_name = 'Cubé_' + seed_gene_1
    350     if seed_gene_2 is not None:

\cube.py in build_data_structures(go_files, lower_gene_names)
    255     go_dict = dict()
    256     genes_set = set()
--> 257     valid_gene_symbols = set(adata.var.index)
    258     go_adata_dict = dict()
    259     pathway_list = []

NameError: name 'adata' is not defined

gene name capitalization error

Mouse genes in scanpy (first letter capitalized)

call:
cube.run_cube(adata=mdata_heps, seed_gene_1='Alb', seed_gene_2='Fga', go_files=go_files, out_directory='Cube_Results', num_search_children=4, search_depth=2)

----> 1 cube.run_cube(adata=mdata_heps, seed_gene_1='Alb', seed_gene_2='Fga', go_files=go_files, out_directory='Cube_Results', num_search_children=4, search_depth=2)

~\Miniconda3\envs\scrna\lib\site-packages\sc_cube\cube.py in run_cube(adata, seed_gene_1, seed_gene_2, go_files, out_directory, num_search_children, search_depth)
    345     if seed_gene_1 == seed_gene_1.lower():
    346         lower_gene_names = True
--> 347     G = find_cancellations(adata, seed_gene_1, seed_gene_2, search_depth, lower_gene_names, go_files, num_search_children=num_search_children)
    348     out_name = 'Cubé_' + seed_gene_1
    349     if seed_gene_2 is not None:

UnboundLocalError: local variable 'lower_gene_names' referenced before assignment

Looks like either gene names assumed to be all lower case, or if not just missing a lower_gene_names=False

Neat concept, hope to get working!

local variable 'lower_gene_names' referenced before assignment

Hi Connor, thanks for the last updates -- still not working, here's the latest error. Let me know if you want a minimal mouse example dataset to play with, would be happy to share

go_files = ['BioPlanet_2019.tsv'] # Load Gene Signatures to Search In

cube.run_cube(adata=mdata_nuc, seed_gene_1='Alb', seed_gene_2='Fgb', go_files=go_files, 
            out_directory='Cubé_Results', num_search_children=4, search_depth=2)
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-5-fdfb6c0198ac> in <module>
      2 
      3 cube.run_cube(adata=mdata_nuc, seed_gene_1='Alb', seed_gene_2='Fgb', go_files=go_files, 
----> 4             out_directory='Cubé_Results', num_search_children=4, search_depth=2)

~\Miniconda3\envs\scrna\lib\site-packages\sc_cube\cube.py in run_cube(adata, seed_gene_1, seed_gene_2, go_files, out_directory, num_search_children, search_depth)
    345     if seed_gene_1 == seed_gene_1.lower():
    346         lower_gene_names = True
--> 347     G = find_cancellations(adata, seed_gene_1, seed_gene_2, search_depth, lower_gene_names, go_files, num_search_children=num_search_children)
    348     out_name = 'Cubé_' + seed_gene_1
    349     if seed_gene_2 is not None:

UnboundLocalError: local variable 'lower_gene_names' referenced before assignment

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.