Git Product home page Git Product logo

gensum's Introduction

GenSum

RNA-seq gene expression counter

Counts reads on genes. Fast. Inspired by htseq-count GenSum support two quantification methods: union and strict, which are compatible with htseq-count union and intersection-strict. GenSum and htseq-count produce identical counts in the union mode, and minor differences in strict mode (1 in 35000 reads). GenSum is much much faster. A 15 million read bam file can be quantified in around 5s while htseq-count takes about 11 minutes.

Installation

For now manually build using the rust toolchain (https://rustup.rs/).

cargo build --release
# binary ends up here:
./target/release/gensum --help

Docker containers

Docker containers for gensum can be found on dockerhub and ghcr.

Input

Required input is a GTF file that contains the exons and the gene-ids. GenSum uses a very naive parser to extract only 'exon' features. The GTF exon features are required to have an entry in the attributes that contains: gene_id: "<the gene id to count>". It it recommended to use the files generated by the ensembl team at: http://ftp.ensembl.org/pub/current_gtf/

The second input is the .bam file created by an aligner. TopHat/HiSat2/STAR should all work fine. Stranded libraries as well as paired end data are supported. When using a stranded RNA library supply the library type using the --strandness flag to restrict counting only the correctly oriented reads. Paired-end reads are expected to be oriented inwards (--->...<---). Sorting the bam file by position or name is not required, but paired end reads are stored until the mate is encountered which can affect memory usage.

Options

USAGE:
    gensum [FLAGS] [OPTIONS] --bam <FILE> --gtf <FILE>

FLAGS:
    -h, --help        Prints help information
    -n, --nosingle    Do not count paired-end reads that have only 1 mapped end. Default allows one mapped end. Only
                      affects paired-end reads.
    -d, --usedups     Also count read (pairs) marked as (optical) duplicate, default excludes duplicates. Requires a bam
                      files processed with a markdups tool
    -V, --version     Prints version information

OPTIONS:
    -b, --bam <FILE>                 The bam file to quantify
    -f, --gtf <FILE>                 The .gtf reference transcriptome file
    -q, --mapq <0-255>               The minimum required mapping quality to include [default: 10]
    -o, --output <FILE>              The output file
    -m, --method <qmethod>           The quantification method, 'strict' or 'union'. 'union' counts all genes that
                                     overlap any part of the reads, 'strict' requires the read to map within the exon
                                     boundaries. [default: union]  [possible values: union, strict]
    -s, --strandness <strandness>    The RNA library strandness [F]orward, [R]everse or [U]nstranded [default: U]
                                     [possible values: F, R, U]

Output

The output is a simple two column <tab> delimited file. The first column contains the gene_id or a descriptive name for unassigned reads. The second column the counts on that gene.

gensum's People

Contributors

veldsla avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

matthiaszepper

gensum's Issues

Build fails with conda

Reproducer:

git clone https://github.com/NKI-GCF/gensum
cd gensum
conda install rust
cargo build --release

Result:

   Compiling gensum v0.1.0 (/home/<user>/Projects/gensum)
error: linking with `/opt/anaconda/envs/rust/bin/x86_64-conda-linux-gnu-cc` failed: exit status: 1
  |
  = note: "/opt/anaconda/envs/rust/bin/x86_64-conda-linux-gnu-cc" "-m64" "/home/<user>/Projects/gensum/target/release/deps/gensum-314a923d97f3b699.gensum.6b8eeb36-cgu.0.rcgu.o" "/home/<user>/Projects/gensum/target/release/deps/gensum-314a923d97f3b699.gensum.6b8eeb36-cgu.1.rcgu.o" "/home/<user>/Projects/gensum/target/release/deps/gensum-314a923d97f3b699.gensum.6b8eeb36-cgu.10.rcgu.o" "/home/<user>/Projects/gensum/target/release/deps/gensum-314a923d97f3b699.gensum.6b8eeb36-cgu.11.rcgu.o" "/home/<user>/Projects/gensum/target/release/deps/gensum-314a923d97f3b699.gensum.6b8eeb36-cgu.2.rcgu.o" "/home/<user>/Projects/gensum/target/release/deps/gensum-314a923d97f3b699.gensum.6b8eeb36-cgu.3.rcgu.o" "/home/<user>/Projects/gensum/target/release/deps/gensum-314a923d97f3b699.gensum.6b8eeb36-cgu.4.rcgu.o" "/home/<user>/Projects/gensum/target/release/deps/gensum-314a923d97f3b699.gensum.6b8eeb36-cgu.5.rcgu.o" "/home/<user>/Projects/gensum/target/release/deps/gensum-314a923d97f3b699.gensum.6b8eeb36-cgu.6.rcgu.o" "/home/<user>/Projects/gensum/target/release/deps/gensum-314a923d97f3b699.gensum.6b8eeb36-cgu.7.rcgu.o" "/home/<user>/Projects/gensum/target/release/deps/gensum-314a923d97f3b699.gensum.6b8eeb36-cgu.8.rcgu.o" "/home/<user>/Projects/gensum/target/release/deps/gensum-314a923d97f3b699.gensum.6b8eeb36-cgu.9.rcgu.o" "/home/<user>/Projects/gensum/target/release/deps/gensum-314a923d97f3b699.3ai763jtq5v65lqf.rcgu.o" "-Wl,--as-needed" "-L" "/home/<user>/Projects/gensum/target/release/deps" "-L" "/usr/lib" "-L" "/home/<user>/Projects/gensum/target/release/build/lzma-sys-82d560a9ec2a96e8/out" "-L" "/home/<user>/Projects/gensum/target/release/build/hts-sys-63c9e5a7389b1df2/out" "-L" "/home/<user>/Projects/gensum/target/release/build/curl-sys-a3e5f0f83674a732/out/build" "-L" "/home/<user>/Projects/gensum/target/release/build/libz-sys-26887ec18a5d8ff4/out/lib" "-L" "/home/<user>/Projects/gensum/target/release/build/libz-sys-26887ec18a5d8ff4/out/lib" "-L" "/home/<user>/Projects/gensum/target/release/build/openssl-sys-548701f82ae0d3f1/out/openssl-build/install/lib" "-L" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-Wl,-Bstatic" "/home/<user>/Projects/gensum/target/release/deps/libniffler-56b4c6949e42dd44.rlib" "/home/<user>/Projects/gensum/target/release/deps/libxz2-23eaeb3809d709b9.rlib" "/home/<user>/Projects/gensum/target/release/deps/libbzip2-60f74516ee42d617.rlib" "/home/<user>/Projects/gensum/target/release/deps/libflate2-d09d8813daf84f72.rlib" "/home/<user>/Projects/gensum/target/release/deps/libminiz_oxide-d16dbb21614b649d.rlib" "/home/<user>/Projects/gensum/target/release/deps/libadler-e75f7594b8bae6be.rlib" "/home/<user>/Projects/gensum/target/release/deps/libcrc32fast-7ccf8c75e49749e3.rlib" "/home/<user>/Projects/gensum/target/release/deps/libenum_primitive-e09ea9aedc1a7c7e.rlib" "/home/<user>/Projects/gensum/target/release/deps/libnum_traits-8c682da58e2af5a8.rlib" "/home/<user>/Projects/gensum/target/release/deps/libcfg_if-f6a35cef654aa08d.rlib" "/home/<user>/Projects/gensum/target/release/deps/librust_htslib-1cff1a0d1341cc83.rlib" "/home/<user>/Projects/gensum/target/release/deps/libhts_sys-7e8095ab54e350ce.rlib" "/home/<user>/Projects/gensum/target/release/deps/libcurl_sys-60dba45bb7001c8f.rlib" "/home/<user>/Projects/gensum/target/release/deps/libopenssl_sys-6c280da2e4e7a01f.rlib" "/home/<user>/Projects/gensum/target/release/deps/liblzma_sys-abebe815e7a84592.rlib" "/home/<user>/Projects/gensum/target/release/deps/libbzip2_sys-6e191dbb478ea6a7.rlib" "/home/<user>/Projects/gensum/target/release/deps/liblibz_sys-cdd411373bbfb160.rlib" "/home/<user>/Projects/gensum/target/release/deps/libieee754-2e2a7a919f0748a3.rlib" "/home/<user>/Projects/gensum/target/release/deps/liburl-3b4f4ed37665a91b.rlib" "/home/<user>/Projects/gensum/target/release/deps/libidna-80163f900f3ff87b.rlib" "/home/<user>/Projects/gensum/target/release/deps/libunicode_normalization-b6e358d7e2d8421a.rlib" "/home/<user>/Projects/gensum/target/release/deps/libtinyvec-a8db09191bb25ded.rlib" "/home/<user>/Projects/gensum/target/release/deps/libtinyvec_macros-9e6dc58399d26886.rlib" "/home/<user>/Projects/gensum/target/release/deps/libunicode_bidi-ea0b37ba75857135.rlib" "/home/<user>/Projects/gensum/target/release/deps/libform_urlencoded-e5a8cba9af4c5455.rlib" "/home/<user>/Projects/gensum/target/release/deps/libpercent_encoding-39102524c2d994d3.rlib" "/home/<user>/Projects/gensum/target/release/deps/libmatches-9acb7d43912c29d5.rlib" "/home/<user>/Projects/gensum/target/release/deps/libbio_types-bd05a42c2a333733.rlib" "/home/<user>/Projects/gensum/target/release/deps/libthiserror-5de8223321bf1013.rlib" "/home/<user>/Projects/gensum/target/release/deps/libregex-9d2985763a1f01a7.rlib" "/home/<user>/Projects/gensum/target/release/deps/libregex_syntax-a29410a23c86ff7a.rlib" "/home/<user>/Projects/gensum/target/release/deps/libaho_corasick-9fbfb3e032d00e77.rlib" "/home/<user>/Projects/gensum/target/release/deps/libmemchr-9fec3276a4abc327.rlib" "/home/<user>/Projects/gensum/target/release/deps/liblinear_map-27241d64e4177a59.rlib" "/home/<user>/Projects/gensum/target/release/deps/liblazy_static-fe3ffb0d817d13d7.rlib" "/home/<user>/Projects/gensum/target/release/deps/libnewtype_derive-947572ba6887247a.rlib" "/home/<user>/Projects/gensum/target/release/deps/libcustom_derive-206c5fd94b81fee1.rlib" "/home/<user>/Projects/gensum/target/release/deps/libnclist-c29a058fa2326627.rlib" "/home/<user>/Projects/gensum/target/release/deps/libitertools-355bbf4b5a1e746a.rlib" "/home/<user>/Projects/gensum/target/release/deps/libeither-08f90d3f0d88bbf0.rlib" "/home/<user>/Projects/gensum/target/release/deps/libitoa-875eff8244583b39.rlib" "/home/<user>/Projects/gensum/target/release/deps/libindexmap-5d941a2530ae39d6.rlib" "/home/<user>/Projects/gensum/target/release/deps/libhashbrown-1740255a0c675020.rlib" "/home/<user>/Projects/gensum/target/release/deps/libatoi-5c352f89951263c5.rlib" "/home/<user>/Projects/gensum/target/release/deps/libnum_traits-190c641f9b28952f.rlib" "/home/<user>/Projects/gensum/target/release/deps/libanyhow-07dc354720abd4cf.rlib" "/home/<user>/Projects/gensum/target/release/deps/libclap-6f873020885f692e.rlib" "/home/<user>/Projects/gensum/target/release/deps/libvec_map-42adc355362c5416.rlib" "/home/<user>/Projects/gensum/target/release/deps/libtextwrap-b39d0db27731a433.rlib" "/home/<user>/Projects/gensum/target/release/deps/libunicode_width-3bf99c6b4d3f3e20.rlib" "/home/<user>/Projects/gensum/target/release/deps/libstrsim-cd375033641a485f.rlib" "/home/<user>/Projects/gensum/target/release/deps/libbitflags-4208964463941641.rlib" "/home/<user>/Projects/gensum/target/release/deps/libatty-8aad9bacf4c0a634.rlib" "/home/<user>/Projects/gensum/target/release/deps/liblibc-de5a67a0b94dcbd6.rlib" "/home/<user>/Projects/gensum/target/release/deps/libansi_term-1396373762392330.rlib" "-Wl,--start-group" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-008055cc7d873802.rlib" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/libpanic_unwind-06f01ac2578bda94.rlib" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/libminiz_oxide-f9a3c3274a1835e0.rlib" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/libadler-d4cbb754ee9f4daa.rlib" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/libobject-95c14e1c1e3ebcc4.rlib" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/libaddr2line-d489f0ca872880cc.rlib" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/libgimli-75f07df0b18fea39.rlib" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd_detect-0c35b278736219a2.rlib" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_demangle-e530649c9a06e3c6.rlib" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/libhashbrown-6b148909d375a785.rlib" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_std_workspace_alloc-cd15fa647f4775d1.rlib" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/libunwind-74be3a703f788ba2.rlib" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcfg_if-8f2c5b445c28b2e3.rlib" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/liblibc-8480e85e0be96197.rlib" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/liballoc-ac23a75f6f42004e.rlib" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_std_workspace_core-557ba8776e04d182.rlib" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcore-4beb03d03503c439.rlib" "-Wl,--end-group" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-dd7db1bec6909f24.rlib" "-Wl,-Bdynamic" "-lbz2" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-Wl,--eh-frame-hdr" "-Wl,-znoexecstack" "-L" "/opt/anaconda/envs/rust/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-o" "/home/<user>/Projects/gensum/target/release/deps/gensum-314a923d97f3b699" "-Wl,--gc-sections" "-pie" "-Wl,-zrelro" "-Wl,-znow" "-Wl,-O1" "-nodefaultlibs"
  = note: /opt/anaconda/envs/rust/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/<user>/Projects/gensum/target/release/deps/libopenssl_sys-6c280da2e4e7a01f.rlib(threads_pthread.o): in function `fork_once_func':
          threads_pthread.c:(.text.fork_once_func+0x16): undefined reference to `pthread_atfork'
          collect2: error: ld returned 1 exit status
          
  = help: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified
  = note: use the `-l` flag to specify native libraries to link
  = note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo (see https://doc.rust-lang.org/cargo/reference/build-scripts.html#cargorustc-link-libkindname)

error: could not compile `gensum` due to previous error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.