Git Product home page Git Product logo

bigtools's Introduction

Bigtools

License Paper Zenodo

Rust, CLI crates.io bigtools on Bioconda Rust Docs
Python PyPI pybigtools on Bioconda Python Docs

Bigtools is a library and associated tools for reading and writing bigwig and bigbed files.

The primary goals of the project are to be

  • Performant
  • Extensible
  • Modern

Performant

Bigtools uses async/await internally to allow for efficient, multi-core computation when possible. In addition, tools are optimized for minimal memory usage. See Benchmarks for more details.

Extensible

Bigtools is designed to be as modular as possible. This, in addition to the safety and reliability of Rust, allows both flexibility and correctness as a library. In addition, its extremely easy to quickly create new tools or binaries. A number of binaries are available that parallel related existing binaries from UCSC, with drop-in compatibility for the most common flags.

Modern

Bigtools is written in Rust and published to crates.io, so binaries can be installed with cargo install bigtools or it can be used as a library by simply including it in your cargo.toml.

Library

To use bigtools in your Rust project, add bigtools to your Cargo.toml or run:

cargo add bigtools

See the bigtools ๐Ÿฆ€ Documentation.

Example

use bigtools::bbiread::BigWigRead;

let mut reader = BigWigRead::open("test.bigWig").unwrap();
let chr1 = reader.get_interval("chr1", 0, 10000).unwrap();
for interval in chr1 {
    println!("{:?}", interval);
}

Binaries

The bigtools CLI binaries can be installed through crates.io or conda.

cargo install bigtools
conda install -c bioconda bigtools

Additionally, pre-built binaries can be downloaded through Github releases.

The following binaries are available:

binary description
bigtools Provides access to multiple subcommands, including all below
bedgraphtobigwig Writes a bigWig from a given bedGraph file
bedtobigbed Writes a bigBed from a given bed file
bigbedinfo Shows info about a provided bigBed
bigbedtobed Writes a bed from the data in a bigBed
bigwigaverageoverbed Calculate statistics over the regions of a bed file using values from a bigWig
bigwiginfo Shows info about a provided bigWig
bigwigmerge Merges multiple bigWigs, outputting to either a new bigWig or a bedGraph
bigwigtobedgraph Writes a bedGraph from the data in a bigWig
bigwigvaluesoverbed Get the per-base values from a bigWig over the regions of a bed file using values

Renaming the bigtools binary to any of the subcommands (case-insensitive) allows you to run that subcommand directly.

Python wrapper

The pybigtools package is a Python wrapper written using PyO3. It can be installed or used as a dependency either through PyPI or conda.

pip install pybigtools
conda install -c bioconda pybigtools

See the pybigtools ๐Ÿ API Documentation.

How to build from source

In order to build the bigtools binaries, you can run

cargo build --release

and the binaries can be found in target/release/.

Otherwise, you can install the binaries from source by running

cargo install --path bigtools/

Building the python wheels for pybigtools requires maturin. To build the pybigtools wheel for installation (and install), you can run

maturin build --release -m pybigtools/Cargo.toml
pip install target/wheels/pybigtools*.whl

or

maturin develop --release -m pybigtools/Cargo.toml

Benchmarks

Benchmarks are included in the ./bench directory. They require python to run.

Multiple tools are compared against the comparable UCSC tools. For completeness, both single-threaded and multi-threaded (when available) benchmarks are included. Multiple different configuration options are benchmarked across multiple replicates, but a summar is available in the table below:

How to cite

This repository contains contains a CITATION.cff file with citation information. Github allows you to get a citation in either APA or BibTeX format; this is available in "Cite this repository" under About.

bigtools's People

Contributors

anderspitman avatar dependabot[bot] avatar ghuls avatar jackh726 avatar nvictus avatar pkerpedjiev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

bigtools's Issues

Implement `-minMax` option of bigWigAverageOverBed.

Implement -minMax option of bigWigAverageOverBed in bigwigaverageoverbed- (as eg: -minmax)

$ bigWigAverageOverBed 
bigWigAverageOverBed v2 - Compute average score of big wig over each bed, which may have introns.
usage:
   bigWigAverageOverBed in.bw in.bed out.tab
The output columns are:
   name - name field from bed, which should be unique
   size - size of bed (sum of exon sizes
   covered - # bases within exons covered by bigWig
   sum - sum of values over all bases covered
   mean0 - average over bases with non-covered bases counting as zeroes
   mean - average over just covered bases
Options:
   -stats=stats.ra - Output a collection of overall statistics to stat.ra file
   -bedOut=out.bed - Make output bed that is echo of input bed but with mean column appended
   -sampleAroundCenter=N - Take sample at region N bases wide centered around bed item, rather
                     than the usual sample in the bed item.
   -minMax - include two additional columns containing the min and max observed in the area.

bigtools bigwigtobedgraph hangs for certain bigwig file created with pyBigWig

bigtools bigwigtobedgraph hangs for certain bigwig file created with pyBigWig

wget https://temp.aertslab.org/.bigtools/atac_fragments.pybigwig.bw

# bigtools hangs after writing 65530584 of the 130726419 output lines.
bigtools bigwigtobedgraph atac_fragments.pybigwig.bw atac_fragments.pybigwig.with_bigtools.bedgraph

# bigWigToBedGraph works fine on the same file
bigWigToBedGraph atac_fragments.pybigwig.bw atac_fragments.pybigwig.with_kenttools.bedgraph

# When the original bigWig was created with pybigtools instead of pyBigWig, the conversion works.
bigtools bigwigtobedgraph atac_fragments.bigtools.bw atac_fragments.bigtools.with_bigtools.bedgraph

pybigwig regression writing bigwigs in >=0.1.1

looks like there was a regression from 0.1.0 to 0.1.1+ when trying to create a bgiwig - using example code from this repo:

import math
import random
import pybigtools

def genintervals(clengths):
    for chrom in clengths:
        clength = clengths[chrom]
        current = random.randint(0, 300)
        start = current
        while True:
            length = random.randint(1, 200)
            end = start + length
            if end > clength:
                break
            value = round(random.random(), 5)
            yield (chrom, start, end, value)
            start = end + random.randint(20, 50)

chroms = ["chr1", "chr2", "chr3"]
clengths = {"chr1": 10000, "chr2": 8000, "chr3": 6000}
intervals = list(genintervals(clengths))

b = pybigtools.open("test-xxx-delete.bw", "w")
b.write(clengths, iter(intervals))

this works fine with 0.1.0 and just hangs real bad with 0.1.1+ ( i'm working from the jupyter notebook - and have to restart it, as interrupt doesn't work ) ...
bigbed seems to work fine though across versions ...

bigwigmerge: thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0'

$ RUST_BACKTRACE=1 bigwigmerge -b mouse.bw -b human.bw -b macaque.bw -b marmoset.bw all_species.bw
thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0', src/bin/bigwigmerge.rs:139:24
stack backtrace:
   0: rust_begin_unwind
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:593:5
   1: core::panicking::panic_fmt
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panicking.rs:67:14
   2: core::panicking::panic_bounds_check
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panicking.rs:162:5
   3: bigwigmerge::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

UCSC bigWigMerge works fine on those files:

$ bigWigMerge mouse.bw human.bw macaque.bw marmoset.bw all_species.bedGraph
Got 87 chromosomes from 4 bigWigs
Processing.......................................................................................

Each bigwig has all chromosomes:

$ bigWigInfo -chroms mouse.bw
version: 4
isCompressed: yes
isSwapped: 0
primaryDataSize: 289,634,826
primaryIndexSize: 847,988
zoomLevels: 8
chromCount: 87
        mouse_chr1 0 195471971
        mouse_chr10 1 130694993
        mouse_chr11 2 122082543
        mouse_chr12 3 120129022
        mouse_chr13 4 120421639
        mouse_chr14 5 124902244
        mouse_chr15 6 104043685
        mouse_chr16 7 98207768
        mouse_chr17 8 94987271
        mouse_chr18 9 90702639
        mouse_chr19 10 61431566
        mouse_chr2 11 182113224
        mouse_chr3 12 160039680
        mouse_chr4 13 156508116
        mouse_chr5 14 151834684
        mouse_chr6 15 149736546
        mouse_chr7 16 145441459
        mouse_chr8 17 129401213
        mouse_chr9 18 124595110
        mouse_chrX 19 171031299
        human_chr1 20 248956422
        human_chr10 21 133797422
        human_chr11 22 135086622
        human_chr12 23 133275309
        human_chr13 24 114364328
        human_chr14 25 107043718
        human_chr15 26 101991189
        human_chr16 27 90338345
        human_chr17 28 83257441
        human_chr18 29 80373285
        human_chr19 30 58617616
        human_chr2 31 242193529
        human_chr20 32 64444167
        human_chr21 33 46709983
        human_chr22 34 50818468
        human_chr3 35 198295559
        human_chr4 36 190214555
        human_chr5 37 181538259
        human_chr6 38 170805979
        human_chr7 39 159345973
        human_chr8 40 145138636
        human_chr9 41 138394717
        human_chrX 42 156040895
        macaque_chr1 43 223616942
        macaque_chr2 44 196197964
        macaque_chr3 45 185288947
        macaque_chr4 46 169963040
        macaque_chr5 47 187317192
        macaque_chr6 48 179085566
        macaque_chr7 49 169868564
        macaque_chr8 50 145679320
        macaque_chr9 51 134124166
        macaque_chr10 52 99517758
        macaque_chr11 53 133066086
        macaque_chr12 54 130043856
        macaque_chr13 55 108737130
        macaque_chr14 56 128056306
        macaque_chr15 57 113283604
        macaque_chr16 58 79627064
        macaque_chr17 59 95433459
        macaque_chr18 60 74474043
        macaque_chr19 61 58315233
        macaque_chr20 62 77137495
        macaque_chrX 63 153388924
        marmoset_chr1 64 217961735
        marmoset_chr2 65 204486479
        marmoset_chr3 66 191910223
        marmoset_chr4 67 174041770
        marmoset_chr5 68 164351765
        marmoset_chr6 69 161003406
        marmoset_chr7 70 157546058
        marmoset_chr8 71 126850804
        marmoset_chr9 72 134044658
        marmoset_chr10 73 137671225
        marmoset_chr11 74 129688756
        marmoset_chr12 75 124486764
        marmoset_chr13 76 118934817
        marmoset_chr14 77 112090317
        marmoset_chr15 78 99198953
        marmoset_chr16 79 97817134
        marmoset_chr17 80 74942703
        marmoset_chr18 81 47031477
        marmoset_chr19 82 51570929
        marmoset_chr20 83 45615054
        marmoset_chr21 84 51259342
        marmoset_chr22 85 51300780
        marmoset_chrX 86 148168104
basesCovered: 1,932,598,589
mean: 0.116719
min: 0.014938
max: 271.738831
std: 0.333123

Different chromosome order with bigwigtobedgraph than with UCSC bigWigToBedGraph

bigtools bigwigtobedgraph gives a different chromosome order than UCSC bigWigToBedGraph on a bigwig file made with pyBigWig.

$ cut -f 1 test.bigtools.bedgraph | uniq
chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr20
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrX
chrY

$  cut -f 1 test.ucsc.bedgraph | uniq
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr20
chrX
chrY

โฏ  bigWigInfo -chroms test.bw
version: 4
isCompressed: yes
isSwapped: 0
primaryDataSize: 1,236,921,292
primaryIndexSize: 3,993,736
zoomLevels: 9
chromCount: 23
        chr1 0 223616942
        chr2 1 196197964
        chr3 2 185288947
        chr4 3 169963040
        chr5 4 187317192
        chr6 5 179085566
        chr7 6 169868564
        chr8 7 145679320
        chr9 8 134124166
        chr10 9 99517758
        chr11 10 133066086
        chr12 11 130043856
        chr13 12 108737130
        chr14 13 128056306
        chr15 14 113283604
        chr16 15 79627064
        chr17 16 95433459
        chr18 17 74474043
        chr19 18 58315233
        chr20 19 77137495
        chrM 20 16564
        chrX 21 153388924
        chrY 22 11753682
basesCovered: 2,768,582,316
mean: 24.193325
min: 1.000000
max: 12332.000000
std: 56.607673

Also bigtools bigwiginfo is missing some option available in UCSC bigWigInfo

Avoiding OpenSSL

Hi there, thanks for the excellent library.

I'm trying to statically link my tool using musl and getting OpenSSL errors. This is most likely due to attohttpc. I know with reqwest it's possible to use rustls instead of OpenSSL using features. Is there a similar option with attohttpc, and would it be possible for you to expose it to users of bigtools?

Example of writing a bigbed

Hello,

I was excited to come across this crate, thanks for making it! Do you have an example of writing a bigbed file from bed data (ideally bed12 data)?

thanks!
Mitchell

minor pybigtools complaints

pybigtools can feels a bit alien as a python module
would be nice to have access to things like __version__, __path__ etc etc - e.g. import pybigtools as pybig; print(pybig.__version__)

Missing fields when using `-a` with bedtobigbed

Hello,

I have found a discrepancy in the number of fields and some other metadata when using bigtools vs ucsc and it is having an impact on how things appear in the browser.

Basically, when I test with bigbedinfo I see differences in field number and other metadata columns (the correct number of fields is 17):

# bigtools
version: 4
fieldCount: 3
isCompressed: yes
isSwapped: 0
itemCount: 1100122467552
primaryDataSize: 367,623,108
primaryIndexSize: 20,200
zoomLevels: 6
chromCount: 2
basesCovered: 97,494,196
meanDepth: 44.449296
minDepth: 1.000000
maxDepth: 3195.000000
std of depth: 69.740202
version: 4

# UCSC
fieldCount: 17
isCompressed: yes
isSwapped: 0
itemCount: 1100122467552
primaryDataSize: 386,514,961
primaryIndexSize: 46,952
zoomLevels: 8
chromCount: 2
basesCovered: 101,269,035
meanDepth: 82.861921
minDepth: 1.000000
maxDepth: 4056.000000
std of depth: 96.352168

I have uploaded all the data in my test to:
https://s3-us-west-2.amazonaws.com/stergachis-public1/index.html?prefix=Mitchell/temp/bigtools-test/

And here is the script I am running:

zcat my.bed.gz | bigtools bedtobigbed -a decoration.as -s start - hg38.fai bigtools.bb
bedToBigBed -as=decoration.as -type=bed12+ my.bed.gz hg38.fai ucsc.bb

bigbedinfo bigtools.bb
bigbedinfo ucsc.bb

Any help with an obvious error on my part would be great!

Cheers,
Mitchell

Enhancements to Python reader API

Common reader interface for BigWig and BigBed

  • props: is_bigwig, is_bigbed
  • info
  • schema -> translated autoSql schema
  • records
  • zoom_records
  • values -> Provide arr argument like numpy functions to populate pre-existing array view
  • Context Manager

BED file with only 3 columns is not supported by bigwigaverageoverbed

BED file with only 3 columns is not supported by bigwigaverageoverbed :

$ head test.bed
chr1    794809  795309
chr1    807890  808390
chr1    816094  816594
chr1    817089  817589
chr1    817792  818292
chr1    818522  819022
chr1    820322  820822
chr1    821778  822278
chr1    826530  827030
chr1    827297  827797

$  bigwigaverageoverbed -n interval test.bw test.bed /dev/stdout | head
Error: Custom { kind: InvalidData, error: "Invalid end: 795309\n" }

bigwigvaluesoverbed works with this file.

It might also be nice if bigwigvaluesoverbed -n would support the same options as bigwigaverageoverbed -n:

-n, --namecol Supports three types of options: interval, none, or a column number (one indexed). If interval, the name column in the output will be the interval in the form of chrom:start-end. If none, then all columns will be included in the output file. Otherwise, the one-indexed column will be used as the name. By default, column 4 is used as a name column

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.