Git Product home page Git Product logo

crc's Introduction

CRC

Transcriptional Core Regulatory Circuitry (CRC) analysis. CRC analysis leverages the observation that almost all known master transcription factors are regulated by large enhancers or super enhancers and that the master transcription factors bind to and regulate the enhancers of other master transcription factors. CRC analysis takes in a set of enhancer or other cis-regulatory regions and a list of active genes. For known transcription factors, CRC identifies those with proximal enhancers and learns how transcription factors are connected to one another. For each transcription factor, it does this by calculating the IN degree (the number of distinct transcription factors that regulate it) and the OUT degree (the number of other transcription factors regulated by the transcription factor in question). CRC analysis also determines which enhancers are bound by each transcription factor, and this can be useful to identify transcription factor cis-regulomes and gene expression programs.

crc.png

Dependencies

The CRC software uses the following dependencies:

Both FIMO and fasta-get-markov can be found in The MEME Suite.

Install

pip install git+https://github.com/linlabcode/CRC.git

Usage

As a command line tool:

crc -e <enhacer_text_file> -g <genome_build> -s <subpeak_bed_file> -c <chromosomes_dir> -o <out_dir> -n <name>

As a python library:

import crc

crc.crc(enhancers, genome_input, chrom_path, output, analysis_name, bam=None, subpeak_file=None,
        mask_file=None, activity_path=None, const_extension=100, number=1, motifs=None,
        config='')

Inputs/Options


All example files were generated with data from a publicly available sample under the GEO accession number GSM2712746. Reads were aligned using the hg19 genome build.


REQUIRED ARGUMENTS

-e/--enhancer_file

ROSE2 generated enhancers table (_AllEnhancers_ENHANCER_TO_GENE.txt). Example

-g/--genome

Build of the genome to be used for the analysis. Currently supports HG38, HG19, MM10, and RN6.

-c/--chrom-path

Path to a folder with a seperate fasta file for each chromosome.

-o/--output

Output folder.

-n/--name

Name of the analysis.


ONE OF THE FOLLOWING ARGUMENTS IS REQUIRED

If a BAM file is provided valley regions will be determined and used for motif search.
If a BED file is provided the regions from the file will be used for motif search instead.

-b/--bam

Comma separated list of BAMs for valley finding.

-s/--subpeaks

BED file of regions to search for motifs. Example


OPTIONAL ARGUMENTS

-m/--mask

Masking file in BED format.

-a/--activity

Text file with active gene names. Either a list where each gene is in a new line or a table where one column holds gene names. Example

-l/--extension-length

Subpeak regions used in motif finding get extended by this length. Default is 100.

-N/--number

Number of non overlapping motifs in a region required to assign a binding event. Default is 1.

--motifs

An additional PWM file for the analysis.

--config

Genome configuration file to overwrite default paths.


Outputs

_ADJ_LIST.txt
Adjacency list - the first label in a line is the source node and further labels in the line are target nodes.
_all_motifs.bed
BED file containing motif regions.
_all_subpeak.bed
BED file of all the subpeak regions.
_all_subpeak.fasta
FASTA file of all the subpeak regions.
_all_valleys.bed
BED file containing all valley regions used in motif search.
_bg.meme
1st order markov background file used in FIMO for motif search.
_CLIQUE_SCORES_DEGREE.txt
List of top 100 cliques or CRCs.
_DEGREE_TABLE.txt
File describing how many TFs a specific TF interacts with (out degree) and how many TFs interact with a specific TF (in degree).
_EDGE_LIST.txt
List of edges describing which TFs interact with each other.
_EDGE_TABLE.txt
Similar to '_EDGE_LIST.txt' but it also contains the interaction region and enhancer name.
_ENHANCER_TABLE.txt
Table of enhancers.
_ENHANCER_TF_TABLE.txt
Table connecting enhancers to TFs.
_ENRICHED_CLIQUE_FACTORS.txt
Node scoring; percentage of cliques the node is found in.
_GENE_SUMMARY.txt
Table describing if a gene is a TF and which enhancers are/could be connected to it.
_GENE_TABLE.txt
Table connecting genes and enhancers.
_GENE_TF_TABLE.txt
Table connecting TFs and enhancers.
_NODELIST.txt
List of nodes, the same as in '_ENRICHED_CLIQUE_FACTORS.txt'. These TFs are likely a part of a CRC.
FIMO
Folder containing FIMO command (bash script) and output.

Authors

Charles Y. Lin Baylor College of Medicine

Jost V. Koren Baylor College of Medicine

Alexander J. Federation University of Washington

Donald R. Polaski

crc's People

Contributors

jvrakor avatar charlesylin avatar sprouffske avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.