Git Product home page Git Product logo

rust-bio-types's Introduction

Crates.io Crates.io Crates.io GitHub Workflow Status Coveralls DOI

Rust-Bio logo Rust-Bio, a bioinformatics library for Rust.

This library provides Rust implementations of algorithms and data structures useful for bioinformatics. All provided implementations are rigorously tested via continuous integration.

Please see the API documentation for available features and examples of how to use them.

When using Rust-Bio, please cite the following article:

Köster, J. (2016). Rust-Bio: a fast and safe bioinformatics library. Bioinformatics, 32(3), 444-446.

Further, you can cite the used versions via DOIs:

Rust-Bio: DOI

Contribute

Any contributions are welcome, from a simple bug report to full-blown new modules:

If you find a bug and don't have the time or in-depth knowledge to fix it, just check if you can add info to an existing issue and otherwise file a bug report with as many infos as possible. Pull requests are welcome if you want to contribute fixes, documentation, or new code. Before making commits, it would be helpful to first install pre-commit to avoid failed continuous integration builds due to issues such as formatting:

  1. Install pre-commit (see pre-commit.com/#installation)
  2. Run pre-commit install in the rust-bio base directory

Depending on your intended contribution frequency, you have two options for opening pull requests:

  1. For one-time contributions, simply fork the repository, apply your changes to a branch in your fork and then open a pull request.
  2. If you plan on contributing more than once, become a contributor by saying hi on the rust-bio Discord server, Together with a short sentence saying who you are and mentioning what you want to contribute. We'll add you to the team. Then, you don't have to create a fork, but can simply push new branches into the main repository and open pull requests there.

If you want to contribute and don't know where to start, have a look at the roadmap.

Documentation guidelines

Every public function and module should have documentation comments. Check out which types of comments to use where. In rust-bio, documentation comments should:

  • explain functionality
  • give at least one useful example of how to use it (best as doctests, that run during testing, and using descriptive expect() statements for handling any Err()s that might occur)
  • describe time and memory complexity listed (where applicable)
  • cite and link sources and explanations for data structures, algorithms or code (where applicable)

For extra credit, feel free to familiarize yourself with:

Minimum supported Rust version

Currently the minimum supported Rust version is 1.65.0.

License

Licensed under the MIT license http://opensource.org/licenses/MIT. This project may not be copied, modified, or distributed except according to those terms.

rust-bio-types's People

Contributors

6br avatar adam-azarchs avatar dcroote avatar delehef avatar dependabot[bot] avatar dlaehnemann avatar evolvedmicrobe avatar github-actions[bot] avatar ingolia avatar johanneskoester avatar nh13 avatar pmarks avatar ragnargrootkoerkamp avatar sjackman avatar sky-alin avatar tedil avatar tianyishi2001 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rust-bio-types's Issues

Naming of "contig" module and type

I think the contig module and Contig struct have a naming issue. From my understanding, "contig" is a well-established term from genome assembly that describes a contiguously assembled sequence. This is, e.g., supported by the contig header in VCF... Also, I believe genome::AbstractInterval::contig shares the description.

What about renaming this as "linear"?

Error prettry print suffix xclip or yclip

When xclip or yclip is suffix, pretty print gives wrong sequence, the problem lies in src/alignment.rs file where you should skip x_i and y_i bases before actual clip sequence, I think, like below

图片

action-semantic-pull-request does not accept 'deps' as type

The conventional-prs.yml Actions workflow, which uses amannn/action-semantic-pull-request, does not accept deps (which we want for release-please) as a commit type as shown by the CI failure from #58:

Unknown release type "deps" found in pull request title "deps: bump serde from 1.0.136 to 1.0.156". 

Available types:
 - feat: A new feature
 - fix: A bug fix
 - docs: Documentation only changes
 - style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
 - refactor: A code change that neither fixes a bug nor adds a feature
 - perf: A code change that improves performance
 - test: Adding missing tests or correcting existing tests
 - build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)
 - ci: Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)
 - chore: Other changes that don't modify src or test files
 - revert: Reverts a previous commit

The available types are configurable though (https://github.com/amannn/action-semantic-pull-request#configuration) and I will open a PR to add deps.

Alignment::pretty has a bug

If 'y' overhangs at the beginning of the alignment, the sequence of 'x' is shown. There is a pull request which should fix this and contains a test illustrating the problem.

Error handling scheme

rust-bio and rust-htslib both use thiserror to automagically create custom error types, but this crate uses quickerror. Shoud we let it be so, or homogeneize it with the other ones?

String formatting for bio_types::annot::contig::Contig

What is the reason for the choice of string formatting for contigs to be

The display format for a Contig is chr:start-end(+/-/.). The boundaries are given as a half-open 0-based interval, like the Rust Range and BED format.

I would thing that the convention is that if the coordinate is written as 'chr:start-end' then it is 1-based-end-inclusive, while 'chr start end' would be 0-based-end-exclusive. AT least this is the convention that is followed by UCSC and plethora of utilities.

Improvement of AbstractInterval

Hi,

I think this crate is a very good idea.

But now AbstractInterval is clearly built for only genome concerned, so it could be more generic.

First replace contig by sequence, isn't very important but it is not very important but all the intervals do not concern only the contigs.
I would also like to replace the str type by a generic type that implement Index for a Range which would allow the user to use the data structure he wants.

I propose something like this :

pub type Position = usize;

pub trait AbstractInterval<I: Index<Range<Position>>> {

    /// Identifier for a genomic contig, e.g., a chromosome
    fn sequence(&self) -> &I;
    
    /// Interval on the contig
    fn range(&self) -> Range<Position>;

    fn subsequence(&self) -> &<I as Index<Range<Position>>>::Output; 
}
    
    
pub struct Interval<I: Index<Range<Position>>> {
    sequence: I,
    range: Range<Position>,
}

impl AbstractInterval<String> for Interval<String> {

    fn sequence(&self) -> &String {
        &self.sequence
    }

    fn range(&self) -> Range<Position> {
        self.range.clone()
    }
    
    fn subsequence(&self) -> &str {
        return &self.sequence[..][self.range.clone()];
    }
}

Thank

Genome annotations

A while ago, I put together a set of data types to represent genomic annotations or regions -- I had a few key features, which included the need to keep track of strand information some of the time, and the ability to track a "spliced" genomic location (in the sense of pre-mRNA splicing).

Briefly, it features a few different location types -- single positions, contiguous locations, and spliced locations -- as well as a general location trait. They're generic over the type of the chromosome name (so you could use String, an interned Rc, or an i32 target ID from a BAM file) and over the strandedness (some annotations may be stranded while others aren't, and this provides a type-level guarantee to separate this). They also feature some useful interval math, e.g., given a position and an annotation, find the offset of the position in the annotated feature.

I was revisiting some of this recently and made a quick attempt to add this as a module in rust-bio-types. Here is this annot module. Would something of this general design be interesting, in this crate, or in the rust-bio crate?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.