Git Product home page Git Product logo

rust-bio-tools's Introduction

Crates.io Crates.io Crates.io GitHub Workflow Status Coveralls DOI

Rust-Bio logo Rust-Bio, a bioinformatics library for Rust.

This library provides Rust implementations of algorithms and data structures useful for bioinformatics. All provided implementations are rigorously tested via continuous integration.

Please see the API documentation for available features and examples of how to use them.

When using Rust-Bio, please cite the following article:

Köster, J. (2016). Rust-Bio: a fast and safe bioinformatics library. Bioinformatics, 32(3), 444-446.

Further, you can cite the used versions via DOIs:

Rust-Bio: DOI

Contribute

Any contributions are welcome, from a simple bug report to full-blown new modules:

If you find a bug and don't have the time or in-depth knowledge to fix it, just check if you can add info to an existing issue and otherwise file a bug report with as many infos as possible. Pull requests are welcome if you want to contribute fixes, documentation, or new code. Before making commits, it would be helpful to first install pre-commit to avoid failed continuous integration builds due to issues such as formatting:

  1. Install pre-commit (see pre-commit.com/#installation)
  2. Run pre-commit install in the rust-bio base directory

Depending on your intended contribution frequency, you have two options for opening pull requests:

  1. For one-time contributions, simply fork the repository, apply your changes to a branch in your fork and then open a pull request.
  2. If you plan on contributing more than once, become a contributor by saying hi on the rust-bio Discord server, Together with a short sentence saying who you are and mentioning what you want to contribute. We'll add you to the team. Then, you don't have to create a fork, but can simply push new branches into the main repository and open pull requests there.

If you want to contribute and don't know where to start, have a look at the roadmap.

Documentation guidelines

Every public function and module should have documentation comments. Check out which types of comments to use where. In rust-bio, documentation comments should:

  • explain functionality
  • give at least one useful example of how to use it (best as doctests, that run during testing, and using descriptive expect() statements for handling any Err()s that might occur)
  • describe time and memory complexity listed (where applicable)
  • cite and link sources and explanations for data structures, algorithms or code (where applicable)

For extra credit, feel free to familiarize yourself with:

Minimum supported Rust version

Currently the minimum supported Rust version is 1.65.0.

License

Licensed under the MIT license http://opensource.org/licenses/MIT. This project may not be copied, modified, or distributed except according to those terms.

rust-bio-tools's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rust-bio-tools's Issues

Multiple file support for csv-report

rbt csv-report needs improvements for use-cases with multiple csv files:

  • Accept multiple csv-files in CLI
  • Add navbar navigation elements to navigate between given csv files
  • Add --foreign-keys parameter that lets the user set foreign-keys via a json file. These keys will be used to generate link outs between the tables.

vcf-report tasks

@fxwiegand:

  • Use SYMBOL column of ANN instead of Gene column for left axis tick labels.
  • Use normalized stacked bar charts for impact, consequence and clinical significance.
  • Use tick marks for allele frequency.
  • With many samples, tooltips and clicking are slow on level 1. Instead of having a fixed number of rows, let us have a fixed number of cells and adjust the number of rows such that the fixed number of cells is not exceeded. Let's make this parameter configurable via the CLI. Default should be 1000.
  • Level 2 needs paging as well (with many samples it becomes unresponsive otherwise). Same threshold as for level 1.
  • Off by one error in variant positions: bcf::Record::pos() returns 0-based position. In the interface, this should be reported as 1-based position (1-based counting is the default when talking about genomic variants).
  • We should save space here: put all css and JS into separate files and refer to them from the HTMLs. Also add all external JS to separate files and refer to them from the HTMLs.
  • Vega read plot: remove the title.
  • Improve visibility of bar plots on the right with many samples
  • use a darker hover color for rows (level 1) and columns (level 2).
  • clicks for deeper level should also work on the bar plots.
  • With only one sample given, genes/alterations should be ordered by IMPACT and then CLIN_SIG.
  • I am not sure whether all reads are properly displayed in the Vega plot. See this example. According to the VCF, there should be at least 36 reads here, and two which contain a C at the variant position. That does not seem to be the case...

VCF for the vega plot example:

13      28399082        .       T       C       .       .       SVLEN=.;PROB_ARTIFACT=16.0747;PROB_ABSENT=23.6698;PROB_PRESENT=0.127744;PROB_FFPE_ARTIFACT=inf;ANN=C|intron_variant|MODIFIER|FLT1|ENSG00000102755|Transcript|ENST00000282397|protein_coding||11/29|ENST00000282397.9:c.1552-2014A>G|||||||||-1||SNV|HGNC|HGNC:3763|YES|NM_002019.4|1|P1|CCDS9330.1|ENSP00000282397|P17948|L7RSL3|UPI000013DCDD|1|||||||||||||||||||||||||||||||||0.463,C|missense_variant|MODERATE|FLT1|ENSG00000102755|Transcript|ENST00000539099|protein_coding|12/12||ENST00000539099.1:c.1588A>G|ENSP00000442630.1:p.Thr530Ala|1873|1588|530|T/A|Aca/Gca|||-1||SNV|HGNC|HGNC:3763|||1||CCDS53861.1|ENSP00000442630|P17948||UPI00001FC918|1|tolerated_low_confidence(0.08)|benign(0)|||||||||||||||||||||||||||||||0.463,C|intron_variant|MODIFIER|FLT1|ENSG00000102755|Transcript|ENST00000541932|protein_coding||11/14|ENST00000541932.5:c.1552-2014A>G|||||||||-1||SNV|HGNC|HGNC:3763|||1||CCDS53860.1|ENSP00000437631|P17948||UPI0000488BDC|1|||||||||||||||||||||||||||||||||0.463,C|intron_variant|MODIFIER|FLT1|ENSG00000102755|Transcript|ENST00000615840|protein_coding||11/12|ENST00000615840.4:c.1552-2014A>G|||||||||-1||SNV|HGNC|HGNC:3763|||1||CCDS73556.1|ENSP00000484039|P17948||UPI0000001C77|1|||||||||||||||||||||||||||||||||0.463,C|intron_variant|MODIFIER|FLT1|ENSG00000102755|Transcript|ENST00000639477|protein_coding||11/13|ENST00000639477.1:c.1552-2014A>G|||||||||-1||SNV|HGNC|HGNC:3763|||5|||ENSP00000491097||A0A1W2PNW4|UPI00097BA61C|1|||||||||||||||||||||||||||||||||0.463       DP:AF:OBS:SB    36:0.054784:17N+17N-1V-1V+:.

CLI --help formatting

Currently, the CLI --help is especially wide which causes odd line wrapping unless the shell is equally wide. I have not investigated the solution yet, but before I potentially battle clap to make it so are there any initial oppositions to re-formatting the CLI to have less empty space around subcommands?

From:

...
    fastq-split              Split FASTQ file from STDIN into N chunks.
                             
                             Example:
                             rbt fastq-split A.fastq B.fastq < test.fastq
...

To:

...
    fastq-split
        Split FASTQ file from STDIN into N chunks.

        Example:
        rbt fastq-split A.fastq B.fastq < test.fastq
...

is what I am thinking (if I can make clap participate in such formatting).

(I could knock this out while working on #36 )

Feel free to contribute

Rust-Bio-Tools (RBT) provide command line utilities for small Bioinformatics tasks that are currently either not available elsewhere or either not fast or reliably implemented.

Everybody is welcome to contribute. Please post here if you want me to add you to the team.

@rust-bio/contributors: check this out :-). Do you have something to add as well?

name change on call-consensus-reads?

Hi,
this call-consensus-reads sounds very different to what it actually does. Why not call it remove-dups or similar ? Calling consensus sounds more like read merging and quality correction like with SPOA / racon etc.

Just my two cents, thanks for the interesting library.
Colin

call-consensus-reads     Tool to remove PCR duplicates from either FASTQ or BAM files.

                         Requirements:
                           - starcode

xlsxwriter v0.3.5 compiling failed

Hi,
thanks a lot for this amazing repo.

I'm trying to install it via cargo install --path .

this is the error i get on a Apple M2 Pro macOS Ventura 13.4.1:

   Compiling xlsxwriter v0.3.5
error[E0412]: cannot find type `size_t` in crate `libxlsxwriter_sys`
    --> /Users/adrianodemarino/.cargo/registry/src/index.crates.io-6f17d22bba15001f/xlsxwriter-0.3.5/src/worksheet.rs:1126:52
     |
1126 |                 buffer.len() as libxlsxwriter_sys::size_t,
     |                                                    ^^^^^^ help: a type alias with a similar name exists: `time_t`
     |
    ::: /Users/adrianodemarino/Downloads/rust_pro/rust-bio-tools-0.42.0/target/release/build/libxlsxwriter-sys-cc1660963e333409/out/bindings.rs:3:6995
     |
3    | ... , stringify ! (_offset))) ; } pub type FILE = __sFILE ; pub type time_t = __darwin_time_t ; pub type lxw_row_t = u32 ; pub type lxw_col_t = u16 ; pub const lxw_boolean_LXW_FALSE ...
     |                                                             --------------- similarly named type alias `time_t` defined here

error[E0412]: cannot find type `size_t` in crate `libxlsxwriter_sys`
    --> /Users/adrianodemarino/.cargo/registry/src/index.crates.io-6f17d22bba15001f/xlsxwriter-0.3.5/src/worksheet.rs:1150:52
     |
1150 |                 buffer.len() as libxlsxwriter_sys::size_t,
     |                                                    ^^^^^^ help: a type alias with a similar name exists: `time_t`
     |
    ::: /Users/adrianodemarino/Downloads/rust_pro/rust-bio-tools-0.42.0/target/release/build/libxlsxwriter-sys-cc1660963e333409/out/bindings.rs:3:6995
     |
3    | ... , stringify ! (_offset))) ; } pub type FILE = __sFILE ; pub type time_t = __darwin_time_t ; pub type lxw_row_t = u32 ; pub type lxw_col_t = u16 ; pub const lxw_boolean_LXW_FALSE ...
     |                                                             --------------- similarly named type alias `time_t` defined here

error[E0063]: missing fields `output_buffer` and `output_buffer_size` in initializer of `lxw_workbook_options`
  --> /Users/adrianodemarino/.cargo/registry/src/index.crates.io-6f17d22bba15001f/xlsxwriter-0.3.5/src/workbook.rs:90:40
   |
90 |             let mut workbook_options = libxlsxwriter_sys::lxw_workbook_options {
   |                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ missing `output_buffer` and `output_buffer_size`

Some errors have detailed explanations: E0063, E0412.
For more information about an error, try `rustc --explain E0063`.
error: could not compile `xlsxwriter` (lib) due to 3 previous errors
warning: build failed, waiting for other jobs to finish...
^C  Building [=======================> ] 448/456: librocksdb-sys(build)

More fixes and improvements for vcf-report

  • It seems like sometimes empty files like .html occur. 🐞
  • The table of the third stage can sometimes contain empty data attributes that should be removed.
  • For improved usage of disk space the data attributes containing json for the read plots should be moved into separate javascript files.
  • Format fields with type String cause a panic when the table report template is rendered. 🐞
  • Table reports containing more than one single variants have multiple tab/nav-bars for read plots. 🐞
  • Add COSV and COSN to the report. COSN variants should also be linked to COSMIC in the third stage.

Pin table until given column

such that scrolling to the right does not hide the column and those before.

E.g.
rbt csv-report --pin-until sample

Various read plot updates for vcf-report and plot-bam

The read plots for rbt vcf-report and rbt plot-bam need the following updates:

  • Show MAPQ and CIGAR in tooltip
  • Add border to reads depending on the MAPQ value.
  • Improve CLI usage help section for rbt plot-bam
  • The CLI should secure that there is at least one given bam file when using the rbt plot-bam command

Improved error messages for input data

Hi there,
I'd suggest to improve some error messages:

panic!("there is no flag type for format");

The error simply states that

there is no flag type for format

and it took me quite a while digging through rust-htslib to figure out that it actually wants to say something like

Format "XX" was requested for inclusion in vcf-to-txt, but that format field does not appear in the input vcf file(s).

where "XX" is some format annotation shortcut for vcf files. The same probably applies for the info fields as well.

I'd provide a PR, but I don't know rust well enough to gauge whether the error message can occur in other circumstances as well, where my improved message would not be fitting.

All the best
Lucas

Python 2.7 has issues with a regex parameter when compiling

File "/usr/lib/python2.7/site-packages/conda/models/channel.py", line 131, in
TOKEN_RE = re.compile(r'(/t/[a-z0-9A-Z-]+)?(\S*)?')
File "/usr/lib64/python2.7/re.py", line 190, in compile
return _compile(pattern, flags)
File "/usr/lib64/python2.7/re.py", line 242, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat

csv-report improvements

  • In the search dialog, only the page should be a link, not the value in the left column.
  • When clicking on a page in the search dialog and the page is opened, it would be great if the rows corresponding to the search value could be highlighted. This could happen by adding them as optional args to the URL (e.g. file:///home/johannes/Downloads/csv-report-test/indexes/index4.html?highlight=2,3,7, and parsing those via JS for highlighting the rows on the page.
  • For numeric columns that are integer only, the search should not show bins with fractional numbers but just integer bins as well.
  • In the search dialog title, just say "Search for ", remove the word "Prefix".
  • Remove the note "Note: Non-numeric plots with 11 or more different values are reduced down to their 10 most frequent values." for numeric columns.
  • The search and plot buttons at each column should trigger a "hand" cursor when hovering them. Currently, I just see the normal pointer.
  • Currently, the content of the table on each page is encoded directly via HTML. I think we can save more space by storing the content in a separate JSON document (no need for repeating the column names in the JSON, just arrays of the values) and load that via a script tag with subsequent rendering into the table.
  • Use lz-string compression for data
  • Remove the top right search field from the main table
  • The javascript folder contains a file from the vcf-report that is not needed 🐞
  • Move all javascript (same for css) inside script tags to a separate file for saving more space.

General grammar for even more flexible csv-reports

This is a generalization of #192, also replacing other specialized functionality from before. We replace the CLI interface with a textual JS file definition like this (example with annotations):

{
    "tables": {
        "table-a": {
            "path": "table-a.tsv",
            "render-columns": {
              "x": {
                  "custom": function(value) {
                      return `<b>${value}</b>`; // applies the given function to render column content
                  },
              },
              "y": {
                  "link-to-url": "https://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g={value}" // renders as <a href="https://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g={value}">{value}</a>
              },
              "z": {
                  "link-to-table-row": "table-b/gene" // renders as link to the other table highlighting the row in which the gene column has the same value as here
              }
        },
        "table-b": {
            "path": "table-b.tsv"
            "render-columns": {
                "gene": {
                    "link-to-table": "gene-{value}" // renders as link to the given table, not a specific row
                }
            }
        },
        "gene-mycn": {
            "path": "genes/table-mycn.tsv"
            "render-columns": {
                "score": {
                    "plot": {
                        "type": "quantitative" // renders as vega-lite tick marks with a tooltip showing the value
                    },
                    "summary-plot": "hist" // shows a histogram at the top of the column
                },
                "type": {
                    "plot": {
                        "type": "nominal" // renders as colored cells (via vega-lite) with a tooltip showing the value and a global legend showing the color meanings
                    }
                },
                "length": {
                    "custom-plot": {
                        "data": function(value) { // a function to return the data needed for the schema below from the content of the column cell
                            return [{"length": value}]
                        },
                        "schema": { // a schema for a vega plot that is rendered into each cell of this column
                            "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
                            "encoding": { "y": { "field": "variety", "type": "ordinal" } },
                            "layer": [
                                {
                                    "mark": { "type": "point", "filled": true },
                                    "encoding": {
                                        "x": {
                                            "aggregate": "mean",
                                            "field": "yield",
                                            "type": "quantitative",
                                            "scale": { "zero": false },
                                            "title": "Barley Yield"
                                        },
                                        "color": { "value": "black" }
                                    }
                                },
                                {
                                    "mark": { "type": "errorbar", "extent": "stdev" },
                                    "encoding": {
                                        "x": { "field": "yield", "type": "quantitative", "title": "Barley Yield" }
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        }
    }
}

Feature request: FILTER field for vcf-to-txt

Currently, there are command line options to output genotype calls and arbitrary INFO and FORMAT fields, but there doesn't seem to be a way to extract the standard FILTER column (nor does the tool keep only PASS variants etc), so ant FILTER information is lost.

Broken CI build

Hello!

Last merge broke the CI pipeline. I am getting the same errors as in travis when I locally run cargo build.

I wanted to try to implement some stuff here, do I have to install to dependencies that are not listed? Or was the Cargo.toml file improperly pushed?

vcf-report improvements and fixes

Level 1

  • add search function for gene names, similar to the table report (iframe, prefix based approach that points people to pages).

Level 1 and 2

  • add column to first and second level showing the ANN field Existing_variation. This field contains comma separated IDs like rs237687,COSM26876823. For first and second level, you should show the non-numeric prefixes, in the usual barplot style. On third level, they should be listed in the transcript table on the right.
  • it seems like only snv, mnv and complex is shown in the matrix. deletions for example aren't shown currently ⚡.

Level 3:

  • it seems like the first table row is no longer selected on level 3 ⚡.
  • always highlight the selected table row (not only on hover)
  • sometimes, tabs and alignment plots are not matching and seem switched ⚡.
  • add custom format values to columns to top left table via CLI
  • display Existing_variation field in the transcript table on the right. IDs like COSM<somenumber> should have a link that points to https://cancer.sanger.ac.uk/cosmic/mutation/overview?id=<somenumber>; IDs like rs<somenumber> should have a link that points to https://www.ncbi.nlm.nih.gov/snp/rs<somenumber>.

[vcf-report] 0-based position reporting for 1-based VCF file

In my VCF file, I have a variant at the following specific position:

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	TUMOR
22	42524491	.	C	<DUP>	334	PASS	TYPE=DUP;AF=0.1005; GT:DP 0/1:3091

Above variant is missing HGVSg and vcf-report clearly reports it. But it reports a 0-based position. It is a minor issue, but it'd be useful to either properly have it 1-based or clearly specify what convention vcf-report is using for reporting. I prefer the former, it just makes the debugging easier.

Also, it does not report chromosome either.

Here is the vcf-report output:

thread '<unnamed>' panicked at 'Failed building table report for sample a. Found variant . at position 42524490 with no HGVSg value.', src/main.rs:207:21

Ideally the output could be something like:

thread '<unnamed>' panicked at 'Failed building table report for sample a. Found variant . at position 22:42524491 with no HGVSg value.', src/main.rs:207:21

bam-anonymize may require soft clipping for supplementary alignments

Hi,

For the sequence in the link below aligned to the reference in the link below, bam-anonymize throws an error that I guess means there is no soft clipping for a supplementary alignment when using minimap2. My suggestion might be to produce an error message that might suggest turning on soft clipping for supplementary alignments in the aligner if the goal is to output all sequences from the BAM file later (i.e., I would not suggest changing bam-anonymize to ignore such a sequence).

get sequence and reference

wget https://www.dropbox.com/s/3r7eozi23dzrifk/test.fasta
wget https://www.dropbox.com/s/9vf8sa56aecgypk/reference.fasta.gz
gzip -d reference.fasta.gz
samtools faidx reference.fasta

map read to reference with minimap2 without the -Y option
-Y use soft clipping for supplementary alignments

~/bin/minimap2-2.24_x64-linux/minimap2 -ax map-hifi reference.fasta test.fasta |samtools sort \
> 9.pbccs-6.3.0.fasta-to-reference.bam && samtools index 9.pbccs-6.3.0.fasta-to-reference.bam

[M::mm_idx_gen::0.165*1.01] collected minimizers
[M::mm_idx_gen::0.188*1.25] sorted minimizers
[M::main::0.188*1.25] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.203*1.24] mid_occ = 50
[M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.210*1.23] distinct minimizers: 454545 (99.24% are singletons); average occurrences: 1.021; average spacing: 9.997; total length: 4641652
[M::worker_pipeline::0.223*1.21] mapped 1 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: /nfs/scistore16/itgrp/jelbers/bin/minimap2-2.24_x64-linux/minimap2 -ax map-hifi reference.fasta test.fasta
[M::main] Real time: 0.232 sec; CPU: 0.280 sec; Peak RSS: 0.042 GB

run bam-anonymize - returns error

/nfs/scistore16/itgrp/jelbers/.cargo/bin/rbt bam-anonymize 9.pbccs-6.3.0.fasta-to-reference.bam reference.fasta \
anonymous-reads.bam anonymous-reference.fasta NC_000913.3 1 4641652

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /nfs/scistore16/itgrp/jelbers/.cargo/registry/src/github.com-1ecc6299db9ec823/rust-bio-tools-0.39.0/src/bam/anonymize_reads.rs:139:60
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

map read to reference with minimap2 with the -Y option

~/bin/minimap2-2.24_x64-linux/minimap2 -ax map-hifi -t 34 -Y reference.fasta test.fasta |samtools sort \
> 9.pbccs-6.3.0.fasta-to-reference.bam && samtools index 9.pbccs-6.3.0.fasta-to-reference.bam

[M::mm_idx_gen::0.166*1.02] collected minimizers
[M::mm_idx_gen::0.182*2.89] sorted minimizers
[M::main::0.182*2.89] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.191*2.80] mid_occ = 50
[M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.197*2.74] distinct minimizers: 454545 (99.24% are singletons); average occurrences: 1.021; average spacing: 9.997; total length: 4641652
[M::worker_pipeline::0.212*2.63] mapped 1 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: /nfs/scistore16/itgrp/jelbers/bin/minimap2-2.24_x64-linux/minimap2 -ax map-hifi -t 34 -Y reference.fasta test.fasta
[M::main] Real time: 0.220 sec; CPU: 0.566 sec; Peak RSS: 0.045 GB

run bam-anonymize - runs without error using -Y with minimap2

/nfs/scistore16/itgrp/jelbers/.cargo/bin/rbt bam-anonymize 9.pbccs-6.3.0.fasta-to-reference.bam reference.fasta \
anonymous-reads.bam anonymous-reference.fasta NC_000913.3 1 4641652

Rust-Bio-Tools version

/nfs/scistore16/itgrp/jelbers/.cargo/bin/rbt --version

Rust-Bio-Tools 0.39.0

vcf-report improvements

  • URGENT: Allow the field CLIN_SIG to be not present (in that case, omit the column in the visualization).
  • Proper error message in case of missing ANN field.

GSL not linked in final binary

A problem experienced by @HenningTimm and myself during an otherwise benign change (#39) is that the final binary is not properly linked against GSL.

I personally have tried the following:

  • Manually purging an reinstalling libgsl-dev
  • Setting up a conda environment and installing the gsl package that way

Neither of these fixed the GSL linking problem* and I suspect the issue is somewhere in either our use of rgsl or within rgsl itself. There are instructions on rgsl for linking to GSL v2, however we already follow those instructions via addition of the "GSL/v2" feature in Cargo.toml here.

* I am away from the machine where I experienced this GSL linking problem so will copy the error message into this thread later this evening.

Failing build

From repo

cargo 0.20.0-nightly (397359840 2017-05-18)
nightly-x86_64-apple-darwin - rustc 1.19.0-nightly (81734e0e0 2017-05-22)

Installing rust-bio-tools v0.1.2 (file:///Users/USER/rust-bio-tools)
Compiling bit-vec v0.4.3
Compiling log v0.3.7
Compiling approx v0.1.1
Compiling bitflags v0.5.0
Compiling strsim v0.4.1
Compiling itertools v0.4.19
Compiling custom_derive v0.1.7
Compiling libc v0.2.23
Compiling num-traits v0.1.37
Compiling void v1.0.2
warning[E0122]: trait bounds are not (yet) enforced in type definitions
--> /Users/USER/.cargo/registry/src/github.com-1ecc6299db9ec823/itertools-0.4.19/src/lib.rs:123:1
|
123 | pub type MapFn<I, B> where I: Iterator = iter::Map<I, fn(I::Item) -> B>;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Compiling semver v0.1.20
Compiling lazy_static v0.1.16
Compiling rust-htslib v0.9.0
Compiling ieee754 v0.1.1
Compiling fern v0.3.5
Compiling rustc-serialize v0.3.24
error: use of unstable library feature 'float_bits_conv': recently added (see issue #40470)
--> /Users/USER/.cargo/registry/src/github.com-1ecc6299db9ec823/ieee754-0.1.1/src/lib.rs:164:17
|
164 | Self::from_bits(bits)
| ^^^^^^^^^^^^^^^
...
290 | mk_impl!(f32, u32, u8, u32, 8, 23);
| ----------------------------------- in this macro invocation
|
= help: add #![feature(float_bits_conv)] to the crate attributes to enable
error: use of unstable library feature 'float_bits_conv': recently added (see issue #40470)
--> /Users/USER/.cargo/registry/src/github.com-1ecc6299db9ec823/ieee754-0.1.1/src/lib.rs:164:17
|
164 | Self::from_bits(bits)
| ^^^^^^^^^^^^^^^
...
291 | mk_impl!(f64, u64, u16, u64, 11, 52);
| ------------------------------------- in this macro invocation
|
= help: add #![feature(float_bits_conv)] to the crate attributes to enable
error: use of unstable library feature 'float_bits_conv': recently added (see issue #40470)
--> /Users/USER/.cargo/registry/src/github.com-1ecc6299db9ec823/ieee754-0.1.1/src/lib.rs:178:17
|
178 | Self::from_bits(bits)
| ^^^^^^^^^^^^^^^
...
290 | mk_impl!(f32, u32, u8, u32, 8, 23);
| ----------------------------------- in this macro invocation
|
= help: add #![feature(float_bits_conv)] to the crate attributes to enable
error: use of unstable library feature 'float_bits_conv': recently added (see issue #40470)
--> /Users/USER/.cargo/registry/src/github.com-1ecc6299db9ec823/ieee754-0.1.1/src/lib.rs:178:17
|
178 | Self::from_bits(bits)
| ^^^^^^^^^^^^^^^
...
291 | mk_impl!(f64, u64, u16, u64, 11, 52);
| ------------------------------------- in this macro invocation
|
= help: add #![feature(float_bits_conv)] to the crate attributes to enable
error: use of unstable library feature 'float_bits_conv': recently added (see issue #40470)
--> /Users/USER/.cargo/registry/src/github.com-1ecc6299db9ec823/ieee754-0.1.1/src/lib.rs:205:17
|
205 | Self::from_bits(
| ^^^^^^^^^^^^^^^
...
290 | mk_impl!(f32, u32, u8, u32, 8, 23);
| ----------------------------------- in this macro invocation
|
= help: add #![feature(float_bits_conv)] to the crate attributes to enable
error: use of unstable library feature 'float_bits_conv': recently added (see issue #40470)
--> /Users/USER/.cargo/registry/src/github.com-1ecc6299db9ec823/ieee754-0.1.1/src/lib.rs:205:17
|
205 | Self::from_bits(
| ^^^^^^^^^^^^^^^
...
291 | mk_impl!(f64, u64, u16, u64, 11, 52);
| ------------------------------------- in this macro invocation
|
= help: add #![feature(float_bits_conv)] to the crate attributes to enable
error: aborting due to 6 previous errors
error: Could not compile ieee754.
Build failed, waiting for other jobs to finish...

Thanks!

Unify CLI and Cargo.yaml content via clap macros

I noticed we are using clap for CLI. clap has macros (e.g. crate_authors!() and crate_version!()) to pull content from Cargo.toml to ensure these two are not out of sync. Is there interest in me working on this unification?

Related note: is there a reason the command name (rbt) and the package name (rust-bio-tools) are not unified? I have found this dual-naming can cause unnecessary confusion on how exactly to reference the tool. As I unify the CLI to pull from Cargo.toml would there be interest in me unifying this dual-naming strictly within the repo -- i.e., not changing the repos name, but rather the package.name in Cargo.toml and changing the name used at the CLI so it is always referred to as rbt?

Edit: Cargo.yaml -> Cargo.toml

vcf-split breaks sorting

vcf-split keeps breakends together, but I thought it would just keep them in the same file. In practice, it moves the records around and breaks the file sorting.

So, for example, records:

III     3226364 MantaDEL:78744:0:0:0:0:0
III     3228674 MantaBND:78991:0:1:1:0:0:1
III     3228676 MantaBND:78991:0:1:0:1:0:1
III     3228678 MantaINS:78991:0:0:0:0:0
III     3228681 MantaDEL:78991:0:0:0:0:1
III     3228782 MantaBND:78991:0:1:0:1:0:0
III     3228783 MantaBND:78991:0:1:1:0:0:0

become:

III     3226364 MantaDEL:78744:0:0:0:0:0
III     3228678 MantaINS:78991:0:0:0:0:0
III     3228681 MantaDEL:78991:0:0:0:0:1
III     3228674 MantaBND:78991:0:1:1:0:0:1
III     3228676 MantaBND:78991:0:1:0:1:0:1
III     3228782 MantaBND:78991:0:1:0:1:0:0
III     3228783 MantaBND:78991:0:1:1:0:0:0

Is there any reason why this behavior?
Would it make sense to add an option to keep the records sorted?

thanks,

Error and file handling in vcf-report

As the vcf-report subcommand may receive many input files it becomes difficult to identify invalid file-paths as rbt only returns a file/path does not exists Error.
It would be helpful to receive a more detailed output by returning the affected input-path.

Also, vcf-report currently fails in case the output path does not end with a / e.g. path/to/folder.1 is invalid and must be path/to/folder.1/.

Failed to compile rust-bio-tools v0.2.10-alpha.0 via cargo

HI,

I tried compiling the latest release of rust-bio-tools on our Linux server, but failed the built step.

The error message is:

$ cargo install
  Installing rust-bio-tools v0.2.10-alpha.0 (file:///home/user/git/rust-bio-tools)
    Updating registry `https://github.com/rust-lang/crates.io-index`
   Compiling matches v0.1.8                                                     
   Compiling bindgen v0.43.2
...
   Compiling approx v0.3.1
error[E0552]: unrecognized representation hint
  --> /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/ordered-float-1.0.1/src/lib.rs:44:8
   |
44 | #[repr(transparent)]
   |        ^^^^^^^^^^^

error[E0552]: unrecognized representation hint
   --> /home/user/.cargo/registry/src/github.com-1ecc6299db9ec823/ordered-float-1.0.1/src/lib.rs:173:8
    |
173 | #[repr(transparent)]
    |        ^^^^^^^^^^^

error: aborting due to 10 previous errors

error: Could not compile `ordered-float`.
warning: build failed, waiting for other jobs to finish...
error: failed to compile `rust-bio-tools v0.2.10-alpha.0 (file:///home/user/git/rust-bio-tools)`, intermediate artifacts can be found at `/home/user/git/rust-bio-tools/target`

Caused by:
  build failed

I got the cargo and rust version from the maintainers sources:

$ cargo version
cargo 0.25.0

$ rustc -V
rustc 1.24.1

The server is running Debian 9.

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 9.8 (stretch)
Release:	9.8
Codename:	stretch

$ uname -a
Linux servername 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux

Assuming the maintainers versions of rust and cargo might be outdated, I tried to install on a local computer runing Ubuntu 16.04, again with the maintainers versions of rust and cargo.

$ rustc -V
rustc 1.30.0

$ cargo -V
cargo 1.30.0

Running into different problems.

First Problem:

 $ cargo install
error: Using `cargo install` to install the binaries for the project in current working directory is no longer supported, use `cargo install --path .` instead. Use `cargo build` if you want to simply build the package.

switching to cargo build:

$ cargo build
   Compiling arrayvec v0.4.10
   Compiling libc v0.2.49                                                                                                           
   Compiling nodrop v0.1.13                      
...
   Compiling librocksdb-sys v5.14.3                                                                                                 
error: failed to run custom build command for `librocksdb-sys v5.14.3`                                                              
process didn't exit successfully: `/home/user/git/rust-bio-tools/target/debug/build/librocksdb-sys-6bb47e4aca51bf41/build-script-build` (exit code: 101)
--- stdout
cargo:rerun-if-changed=build.rs
cargo:rerun-if-changed=rocksdb/
cargo:rerun-if-changed=snappy/
cargo:rerun-if-changed=lz4/
cargo:rerun-if-changed=zstd/
cargo:rerun-if-changed=zlib/
cargo:rerun-if-changed=bzip2/

--- stderr
rocksdb/include/rocksdb/c.h:48:9: warning: #pragma once in main file [-Wpragma-once-outside-header]
rocksdb/include/rocksdb/c.h:68:10: fatal error: 'stdarg.h' file not found
rocksdb/include/rocksdb/c.h:48:9: warning: #pragma once in main file [-Wpragma-once-outside-header], err: false
rocksdb/include/rocksdb/c.h:68:10: fatal error: 'stdarg.h' file not found, err: true
thread 'main' panicked at 'unable to generate rocksdb bindings: ()', libcore/result.rs:1009:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Is there a specific version of rust/cargo/... you support/suggest?

Thanks.
Marten

Compile error when depending on `rust-bio` HEAD/master

I'm getting this error when building rust-bio-tools against the latest version of rust-bio:

error[E0277]: the trait bound `rust_htslib::bam::Record: SequenceRead` is not satisfied
   --> src/bam/collapse_reads_to_fragments/calc_consensus.rs:190:10
    |
190 | impl<'a> CalcConsensus<'a, bam::Record> for CalcOverlappingConsensus<'a> {
    |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `SequenceRead` is not implemented for `rust_htslib::bam::Record`
    |
   ::: src/common.rs:14:32
    |
14  | pub trait CalcConsensus<'a, R: SequenceRead> {
    |                                ------------ required by this bound in `CalcConsensus`

It looks like SequenceRead is implemented for a fastq record, but not for bam records.

`vcf-report` exits with "thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value'" error

I'm testing out vcf-report with VCF files generated by VarDict, as I was looking for packages/libraries/tools to generate oncoprint plots for our workflow: https://github.com/Clinical-Genomics/BALSAMIC

I followed routine install on MacOS Big Sur: conda create --name rust-bio-tools rust-bio-tools

And ran the following command:

rbt vcf-report fasta.fa --vcfs a=sample1.vcf.gz b=sample2.vcf.gz --bams a:sampleA=sample1.bam b:sampleB=sample2.bam -- oncoprint_example/

The output is quite cryptic and doesn't really tell me what went wrong:

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', src/bcf/report/table_report/create_report_table.rs:65:thread '77<unnamed>
' panicked at 'called `Option::unwrap()` on a `None` valuenote: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
', src/bcf/report/table_report/create_report_table.rs:65:77

Are there any example files or documentations I can follow to find out what might be wrong?

`index out of bounds` error in `allel_frequencies` vector in `oncoprint.rs`

Running the following command goes well for quite a while (minutes to hours), but finally I get a panic:

RUST_BACKTRACE=full rbt vcf-report   resources/genome.fasta   --bams 18:18_B=results/mapped/18_B.sorted.bam 18:18_C=results/mapped/18_C.sorted.bam 18:18_D=results/mapped/18_D.sorted.bam 18:18_F=results/mapped18_F.sorted.bam   --vcfs 18=results/annotated-calls/18.basic.present.1.fdr-controlled.two_samples_cov.annotated.bcf   -f DP AF OBS   -i  PROB_PRESENT PROB_ARTIFACT PROB_ABSENT   -j workflow/resources/custom-table-report.js   -- results/vcf-report/18.basic.present.1.fdr-controlled.annotated/

It seems to come from this code line:

allel_frequency: allel_frequencies[i],

And here's the full backtrace, although I think the most useful info is the first row at the top, with the line number included:

thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/bcf/report/oncoprint.rs:181:50
stack backtrace:
   0:     0x556dbf4f99f0 - std::backtrace_rs::backtrace::libunwind::trace::h577ea05e9ca4629a
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/../../backtrace/src/backtrace/libunwind.rs:96
   1:     0x556dbf4f99f0 - std::backtrace_rs::backtrace::trace_unsynchronized::h50b9b72b84c7dd56
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/../../backtrace/src/backtrace/mod.rs:66
   2:     0x556dbf4f99f0 - std::sys_common::backtrace::_print_fmt::h6541cf9823837fac
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/sys_common/backtrace.rs:79
   3:     0x556dbf4f99f0 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hf64fbff071026df5
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/sys_common/backtrace.rs:58
   4:     0x556dbf52197c - core::fmt::write::h9ddafa4860d8adff
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/core/src/fmt/mod.rs:1082
   5:     0x556dbf4f3a77 - std::io::Write::write_fmt::h1d2ee292d2b65481
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/io/mod.rs:1514
   6:     0x556dbf4fc250 - std::sys_common::backtrace::_print::ha25f9ff5080d886d  
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/sys_common/backtrace.rs:61
   7:     0x556dbf4fc250 - std::sys_common::backtrace::print::h213e8aa8dc5405c0   
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/sys_common/backtrace.rs:48
   8:     0x556dbf4fc250 - std::panicking::default_hook::{{closure}}::h6482fae49ef9d963
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/panicking.rs:200
   9:     0x556dbf4fbf9c - std::panicking::default_hook::he30ad7589e0970f9
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/panicking.rs:219
  10:     0x556dbf4fc8b3 - std::panicking::rust_panic_with_hook::haa1ed36ada4ffb03
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/panicking.rs:569
  11:     0x556dbf4fc489 - std::panicking::begin_panic_handler::{{closure}}::h7001af1bb21aeaeb
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/panicking.rs:476
  12:     0x556dbf4f9e7c - std::sys_common::backtrace::__rust_end_short_backtrace::h39910f557f5f2367
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/sys_common/backtrace.rs:153
  13:     0x556dbf4fc449 - rust_begin_unwind
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/panicking.rs:475
  14:     0x556dbf51f541 - core::panicking::panic_fmt::h4e2659771ebc78eb
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/core/src/panicking.rs:85
  15:     0x556dbf51f502 - core::panicking::panic_bounds_check::h2e8c50d2fb4877c0 
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/core/src/panicking.rs:62
  16:     0x556dbebd8d03 - rbt::bcf::report::oncoprint::oncoprint::h30b8a45d1d21b2d7
  17:     0x556dbeb6db17 - rbt::main::h284af7b901acdb78
  18:     0x556dbebc1e23 - std::sys_common::backtrace::__rust_begin_short_backtrace::h75709365801ea186
  19:     0x556dbeb293ed - std::rt::lang_start::{{closure}}::hcf4c7d8d71619a23
  20:     0x556dbf4fcd81 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h6a3209f124be2235
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/core/src/ops/function.rs:259
  21:     0x556dbf4fcd81 - std::panicking::try::do_call::h88ce358792b64df0
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/panicking.rs:373
  22:     0x556dbf4fcd81 - std::panicking::try::h6311c259678e50fc
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/panicking.rs:337
  23:     0x556dbf4fcd81 - std::panic::catch_unwind::h56c5716807d659a1
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/panic.rs:379
  24:     0x556dbf4fcd81 - std::rt::lang_start_internal::h73711f37ecfcb277
                               at /rustc/18bf6b4f01a6feaf7259ba7cdae58031af1b7b39/library/std/src/rt.rs:51
  25:     0x556dbeb6fc62 - main
  26:     0x7fb9c0181b97 - __libc_start_main
  27:     0x556dbeaf4651 - <unknown>

@fxwiegand: Am I specifying the samples for this group wrong somehow? Or is this a bug? It seems like the vector allel_frequencies from before does not have an entry for every sample at some point:

let allel_frequencies = record.format(b"AF").float()?[0].to_vec();

But I double-checked the input BCF file with:

bcftools query -f '%CHROM\t%POS\t%ALT[\t%SAMPLE=%AF]\n' 18.basic.present.1.fdr-controlled.two_samples_cov.annotated.bcf | less

And this gives an AF FORMAT field entry for every sample at every given genomic position. So I'm not sure what might be triggering this behaviour. If you have any further ideas, I'd be glad to investigate further.

Parental leave until end of February, please be patient :-)

Hi folks, I am on parental leave until end of February.
Hence, I won't have the chance to look into your bug reports and PRs until then.

Of course, you are all invited to do mutual reviews :-)!

Thanks a lot for your patience.
Johannes

Tests for vcf-annotate-dgidb fail

Both tests for vcf-annotate-dgidb fail with Error: Error(Json(Error("missing field `matchedTerms`", line: 1, column: 46))). This could be connected to the new v4 release.

It seems like the API is not working anymore although the api docs still state v2 is the current version and up to date. All I can get is {"status":500,"error":"Internal Server Error"}.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.