Git Product home page Git Product logo

txburst's Introduction

Genomic encoding of transcriptional burst kinetics

Repository for transcriptional burst kinetics inference and analysis from allelic single-cell RNA-seq described in the article Genomic encoding of transcriptional burst kinetics by Anton J. M. Larsson, Per Johnsson, Michael Hagemann-Jensen, Leonard Hartmanis, Omid R. Faridani, Björn Reinius, Åsa Segerstolpe, Chloe M. Rivera, Bing Ren and Rickard Sandberg Nature. 2019 Jan 2. doi: 10.1038/s41586-018-0836-1. [Epub ahead of print].

txburst scripts

System Requirements for inference and hypothesis testing scripts

txburstML.py, txburstPL.py and txburstTEST.py are all python3 scripts with dependencies:

pandas: 0.19.2
numpy: 1.9.0
scipy: 1.0.0
joblib: 0.11

No further installation is needed.

txburstML.py

txburstML.py estimates parameters of bursting kinetics given transcript counts for an allele of a gene.

If you have estimated the allelic transcript counts from the fraction of allelic reads, it is important to consider what is missing data in this case. Genes with expression (reads) but no allelic reads are different from genes without expression (and therefore no allelic reads). We handle the first case as missing data, since it is not possible to assign the expression. In the second case, we replace the NaN with a 0 since there is genuinely no detected expression. Omitting this step may have severe effects on the quality of the inference.

Usage

usage: txburstML.py [-h] [--njobs NJOBS] file

Maximum likelihood inference of bursting kinetics from scRNA-seq data

positional arguments: file .csv file with allelic-resolution UMI counts

optional arguments: -h, --help show this help message and exit --njobs NJOBS Number of jobs for the parallelization, default 50

Output

txburstML.py outputs a pickled pandas dataframe which for each gene contains an array with three entries [k_on, k_off, k_syn] where the burst frequency = k_on and burst size = k_syn/k_off, and a boolean indicating whether gene passed a rudimentary filtering step based on the quality of the inference procedure. For example:

gene name parameters keep
gene1 [k_on,k_off,k_syn]_gene1 True
gene2 [k_on,k_off,k_syn]_gene2 False

Example

./txburstML.py UMI_counts.csv

The output of this command is a pickled pandas dataframe UMI_counts_ML.pkl which can be loaded into python with the command

pd.read_pickle('UMI_counts_ML.pkl')

txburstPL.py

txburstPL.py calculates the confidence intervals parameters of bursting kinetics given transcript counts for an allele of a gene. Requires txburstML.py to be run on the dataset to provide input for txburstPL.py.

Usage

usage: txburstPL.py [-h] [--file file] [--njobs NJOBS] [--alpha ALPHA] [--MLFile MLFILE]

Confidence intervals on parameters of bursting kinetics from scRNA-seq data

optional arguments: -h, --help show this help message and exit --file file .csv file file with allelic-resolution transcript counts --njobs NJOBS Number of jobs for the parallelization, default 50 --alpha ALPHA Alpha significance level (default 0.05) --MLFile MLFILE Maximum Likelihood file (required)

Output

txburstML.py outputs a pickled pandas dataframe which for each gene has three arrays with the point estimate from txburstML.py and the confidence intervals for burst frequency and size. For example:

gene name point estimates bf_CI bs_CI
gene1 [k_on,k_off,k_syn]_gene1 [bf_point,bf_lower,bf_upper]_gene1 [bs_point,bs_lower,bs_upper]_gene1
gene2 [k_on,k_off,k_syn]_gene2 [bf_point,bf_lower,bf_upper]_gene2 [bs_point,bs_lower,bs_upper]_gene2

Example

./txburstPL.py --file UMI_counts.csv --MLFile UMI_counts_ML.pkl

The output of this command is a pickled dataframe UMI_counts_PL.pkl which can be loaded into python with the command

pd.read_pickle('UMI_counts_PL.pkl')

txburstTEST.py

txburstTEST.py performs hypothesis testing for differential burst frequency and size between two experiments. Requires txburstML.py to be run on the dataset to provide input for txburstTEST.py.

Usage

usage: txburstTEST.py [-h] [--file1 file1] [--file2 file2] [--njobs NJOBS] [--ML1 ML1] [--ML2 ML2]

Hypothesis testing of differences in bursting kinetics from scRNA-seq data

optional arguments: -h, --help show this help message and exit --file1 file1 .csv file 1 with allelic-resolution transcript counts --file2 file2 .csv file 2 with allelic-resolution transcript counts --njobs NJOBS Number of jobs for the parallelization, default 50 --ML1 ML1 Maximum Likelihood file 1 (required) --ML2 ML2 Maximum Likelihood file 2 (required)

Output

txburstTEST.py outputs a pickled pandas dataframe with the two point estimates from txburstML.py and p-values for the significance tests for differential bursting kinetics. For example:

gene name point estimates 1 point estimates 2 bf_pvalue bs_pvalue
gene1 [k_on,k_off,k_syn]_gene1_sample1 [k_on,k_off,k_syn]_gene1_sample2 burst frequency pvalue gene1 burst size pvalue gene1
gene2 [k_on,k_off,k_syn]_gene2_sample1 [k_on,k_off,k_syn]_gene2_sample2 burst frequency pvalue gene2 burst size pvalue gene2

Example

./txburstTEST.py --file1 UMI_counts.csv --file2 UMI_counts_2.csv --ML1 UMI_counts_ML.pkl --ML2 UMI_counts_2_ML.pkl

The output of this command is a pickled dataframe UMI_counts_PL.pkl which can be loaded into python with the command

pd.read_pickle('UMI_counts_TEST.pkl')

txburst's People

Contributors

antonjmlarsson avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.