Git Product home page Git Product logo

gia's Introduction

gia: Genome Interval Arithmetic

MIT licensed actions status Crates.io

Summary

gia is a free and open-source command-line tool for highly efficient and scalable set operations on genomic interval data.

It is inspired by the open source command-line tools bedtools and bedops and aims to be a drop-in replacement to both.

gia is written in rust and distributed via cargo. It is a command-line tool built on top of bedrs, a separate and abstracted genomic interval library.

Installation

gia is distributed using the rust package manager cargo.

cargo install gia

You can validate the installation by checking gia's help menu:

gia --help

Installing cargo

You can install cargo by following the instructions here

Usage

You can see more detailed usage for each subcommand on the documentation site.

Issues and Contributions

gia is a work-in-progress and under active development by Noam Teyssier.

If you are interested in building more functionality or want to get involved please don't hesitate to reach out!

Please address all issues to future contributors.

gia's People

Contributors

mrvollger avatar noamteyssier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

gia's Issues

bed12 support, -wo flag support

Thanks so much for providing this tool to the community! I am finding it much faster than the existing toolkits.
Would it be possible to provide support for the bed12 format? Additionally, would it be possible to extend functionality to additional bedtools intersect flags such as -wo?
For people working in the single-molecule sequencing space these additions would be massively helpful.

Thanks again!

Autodetermine file format

Right now the default is to read everything in as BED3 unless providing an alternative format to the -T flag.

Ideally this should autodetermine the format to be the number of columns in the input and fall-back to BED3 if it fails to do so.

There should also be two flags for left format and right format in case this is necessary.

Mix file formats when applicable

Need to be able to mix file formats (i.e. bed3, bed6, bed12) when provided as inputs to operations requiring 2 files.

Multiple Files

  • closest
  • intersect
  • subtract

Single Files

  • complement
  • extend
  • get_fasta
  • merge
  • sample
  • sort

Can close #65 once done

Stranded Methods

This is an issue to track the development of implementing stranded methods

  • Closest
  • Extend
  • Get Fasta
  • Intersect
  • Flank
  • Merge
  • Random
  • Sort
  • Subtract
  • Window

GIA 0.2.0

Matching development of bedrs-0.2

  • Convert all instances of Containers into static structs of IntervalContainer

  • Convert all numeric instances of Bed3, Bed4, Bed6, Bed12 into bedrs structs

  • Handle mixed file formats and combinatorics with dispatch methods

  • #68

  • #31

Incorporate Streamed Methods

Streamable Methods

  • Closest
  • Complement
  • Extend
  • Intersect
  • Subtract

Named Streamable Methods

  • Closest
  • Complement
  • Extend
  • Intersect
  • Subtract

Retain BED6 format?

Hi Noam!

I'm running a lot of bedtools intersect commands that I would love to replace with gia, but I was relying on the information in the bed6 format being retained.

e.g. fileA.bed
chr1 29300 29400

e.g. fileB.bed
chr1 29301 29400 CTAACTTTCCTATCAT-1 41 +
chr1 29328 29427 CTAACTTTCCTATCAT-1 40 -

e.g. output I need with the cell barcode.
chr1 29301 29400 CTAACTTTCCTATCAT-1 41 +

In this case, would I need to use bedrs instead of gia & create an interval type with my additional field?
-- Amanda

Support for intersect -wo flag

Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlaps restricted by -f and -r.
		  Only A features with overlap are reported.

Reproduce bedtools methods

  • intersect
  • window
  • closest
  • coverage
  • map
  • genomecov
  • merge
  • cluster
  • complement
  • shift
  • subtract
  • slop
  • flank
  • sort
  • random
  • shuffle
  • sample
  • spacing
  • unionbedg

performance claims

I meant to do some testing on my own, but I may never get there. I'm one of the authors of BEDOPS. It is not easy to imagine a 6x or so improvement in runtimes, as these are linear (or n log n for sorting) time algorithms in bedops/closest-features utilities.

There are a couple of things that stand out to me in the bioarxiv paper. Mainly, timed tests are at most 1 second for the slowest tool which indicates very, very small inputs (Figures 1 and 2). If the trend held with large inputs, that would be far more interesting and impressive. Right now, the differences might be attributable to things that do not generalize beyond 1 second, for example.

The memory overhead shown for bedops (Figure 5) makes me think that they used the "megarow" build of BEDOPS. That build is meant for very large sequencing results (nanopore and pacbio). It scales to those much larger data at the cost of some small memory overhead but also considerable time overhead. It would be worth measuring time/memory against that larger build but also against the more popular (and default) build for utilities in BEDOPS.

You can use the switch-BEDOPS-binary-type utility to switch between typical (default) and megarow builds of utilities in BEDOPS.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.