Git Product home page Git Product logo

abyss's Introduction

Release Downloads Conda Issues

ABySS

ABySS is a de novo sequence assembler intended for short paired-end reads and genomes of all sizes.

Please cite our papers.

Contents

Installation

Install ABySS using Conda (recommended)

If you have the Conda package manager (Linux, MacOS) installed, run:

conda install -c bioconda -c conda-forge abyss

Or you can install ABySS in a dedicated environment:

conda create -n abyss-env
conda activate abyss-env
conda install -c bioconda -c conda-forge abyss

Install ABySS using Homebrew

If you have the Homebrew package manager (Linux, MacOS) installed, run:

brew install abyss

Install ABySS on Windows

Install Windows Subsystem for Linux from which you can run Conda or Homebrew installation.

Dependencies

Dependencies for linked reads

  • ARCS for scaffolding.
  • Tigmint for correcting assembly errors.

These can be installed through Conda:

conda install -c bioconda arcs tigmint

Or Homebrew:

brew install brewsci/bio/arcs brewsci/bio/links-scaffolder

Optional dependencies

  • pigz for parallel gzip.
  • samtools for reading BAM files.
  • zsh for reporting time and memory usage.

Conda:

conda install -c bioconda samtools
conda install -c conda-forge pigz zsh

Homebrew:

brew install pigz samtools zsh

Compiling ABySS from source

When compiling ABySS from source the following tools are required:

ABySS requires a C++ compiler that supports OpenMP such as GCC.

The following libraries are required:

Conda:

conda install -c conda-forge boost openmpi
conda install -c bioconda google-sparsehash btllib

It is also helpful to install the compilers Conda package that automatically passes the correct compiler flags to use the available Conda packages:

conda install -c conda-forge compilers

Homebrew:

brew install boost open-mpi google-sparsehash

ABySS will receive an error when compiling with Boost 1.51.0 or 1.52.0 since they contain a bug. Later versions of Boost compile without error.

To compile, run the following:

./autogen.sh
mkdir build
cd build
../configure --prefix=/path/to/abyss
make
make install

You may also pass the following flags to configure script:

--with-boost=PATH
--with-mpi=PATH
--with-sqlite=PATH
--with-sparsehash=PATH
--with-btllib=PATH

Where PATH is the path to the directory containing the corresponding dependencies. This should only be necessary if configure doesn't find the dependencies by default. If you are using Conda, PATH would be the path to the Conda installation. SQLite and MPI are optional dependencies.

The above steps install ABySS at the provided path, in this case /path/to/abyss. Not specifying --prefix would install in /usr/local, which requires sudo privileges when running make install.

ABySS requires a modern compiler such as GCC 6 or greater. If you have multiple versions of GCC installed, you can specify a different compiler:

../configure CC=gcc-10 CXX=g++-10

While OpenMPI is assumed by default you can switch to LAM/MPI or MPICH using:

    ../configure --enable-lammpi
    ../configure --enable-mpich

The default maximum k-mer size is 192 and may be decreased to reduce memory usage or increased at compile time. This value must be a multiple of 32 (i.e. 32, 64, 96, 128, etc):

../configure --enable-maxk=160

If you encounter compiler warnings that are not critical, you can allow the compilation to continue:

../configure --disable-werror

To run ABySS, its executables should be found in your PATH environment variable. If you installed ABySS in /opt/abyss, add /opt/abyss/bin to your PATH:

PATH=/opt/abyss/bin:$PATH

Before starting an assembly

ABySS stores temporary files in TMPDIR, which is /tmp by default on most systems. If your default temporary disk volume is too small, set TMPDIR to a larger volume, such as /var/tmp or your home directory.

export TMPDIR=/var/tmp

Modes

Bloom filter mode

The recommended mode of running ABySS is the Bloom filter mode. Specifying the Bloom filter memory budget with the B parameter enables this mode, which can reduce memory consumption by ten-fold compared to the MPI mode. B may be specified with unit suffixes 'k' (kilobytes), 'M' (megabytes), 'G' (gigabytes). If no units are specified bytes are assumed. Internally, the Bloom filter assembler allocates the entire memory budget (B * 8/9) to a Counting Bloom filter, and an additional (B/9) memory to another Bloom filter that is used to track k-mers that have previously been included in contigs.

A good value for B depends on a number of factors, but primarily on the genome being assembled. A general guideline is:

P. glauca (~20Gbp): B=500G H. sapiens (~3.1Gbp): B=50G C. elegans (~101Mbp): B=2G

For other genome sizes, the value for B can be interpolated. Note that there is no downside to using larger than necessary B value, except for the memory required. To make sure you have selected a correct B value, inspect the standard error log of the assembly process and ensure that the reported FPR value under Counting Bloom filter stats is 5% or less. This requires using verbosity level 1 with v=-v option.

MPI mode (legacy)

This mode is legacy and we do not recommend running ABySS with it. To run ABySS in the MPI mode, you need to specify the np parameter, which specifies the number of processes to use for the parallel MPI job. Without any MPI configuration, this will allow you to use multiple cores on a single machine. To use multiple machines for assembly, you must create a hostfile for mpirun, which is described in the mpirun man page.

Do not run mpirun -np 8 abyss-pe. To run ABySS with 8 threads, use abyss-pe np=8. The abyss-pe driver script will start the MPI process, like so: mpirun -np 8 ABYSS-P.

The paired-end assembly stage is multithreaded, but must run on a single machine. The number of threads to use may be specified with the parameter j. The default value for j is the value of np.

Examples

Assemble a small synthetic data set

wget http://www.bcgsc.ca/platform/bioinfo/software/abyss/releases/1.3.4/test-data.tar.gz
tar xzvf test-data.tar.gz
abyss-pe k=25 name=test B=1G \
	in='test-data/reads1.fastq test-data/reads2.fastq'

Calculate assembly contiguity statistics:

abyss-fac test-unitigs.fa test-contigs.fa test-scaffolds.fa

Assembling a paired-end library

To assemble paired reads in two files named reads1.fa and reads2.fa into contigs in a file named ecoli-contigs.fa, run the command:

abyss-pe name=ecoli k=96 B=2G in='reads1.fa reads2.fa'

The parameter in specifies the input files to read, which may be in FASTA, FASTQ, qseq, export, SRA, SAM or BAM format and compressed with gz, bz2 or xz and may be tarred. The assembled contigs will be stored in ${name}-contigs.fa and the scaffolds will be stored in ${name}-scaffolds.fa.

A pair of reads must be named with the suffixes /1 and /2 to identify the first and second read, or the reads may be named identically. The paired reads may be in separate files or interleaved in a single file.

Reads without mates should be placed in a file specified by the parameter se (single-end). Reads without mates in the paired-end files will slow down the paired-end assembler considerably during the abyss-fixmate stage.

Assembling multiple libraries

The distribution of fragment sizes of each library is calculated empirically by aligning paired reads to the contigs produced by the single-end assembler, and the distribution is stored in a file with the extension .hist, such as ecoli-3.hist. The N50 of the single-end assembly must be well over the fragment-size to obtain an accurate empirical distribution.

Here's an example scenario of assembling a data set with two different fragment libraries and single-end reads. Note that the names of the libraries (pea and peb) are arbitrary.

  • Library pea has reads in two files, pea_1.fa and pea_2.fa.
  • Library peb has reads in two files, peb_1.fa and peb_2.fa.
  • Single-end reads are stored in two files, se1.fa and se2.fa.

The command line to assemble this example data set is:

abyss-pe k=96 B=2G name=ecoli lib='pea peb' \
	pea='pea_1.fa pea_2.fa' peb='peb_1.fa peb_2.fa' \
	se='se1.fa se2.fa'

The empirical distribution of fragment sizes will be stored in two files named pea-3.hist and peb-3.hist. These files may be plotted to check that the empirical distribution agrees with the expected distribution. The assembled contigs will be stored in ${name}-contigs.fa and the scaffolds will be stored in ${name}-scaffolds.fa.

Scaffolding

Long-distance mate-pair libraries may be used to scaffold an assembly. Specify the names of the mate-pair libraries using the parameter mp. The scaffolds will be stored in the file ${name}-scaffolds.fa. Here's an example of assembling a data set with two paired-end libraries and two mate-pair libraries. Note that the names of the libraries (pea, peb, mpa, mpb) are arbitrary.

abyss-pe k=96 B=2G name=ecoli lib='pea peb' mp='mpc mpd' \
	pea='pea_1.fa pea_2.fa' peb='peb_1.fa peb_2.fa' \
	mpc='mpc_1.fa mpc_2.fa' mpd='mpd_1.fa mpd_2.fa'

The mate-pair libraries are used only for scaffolding and do not contribute towards the consensus sequence.

Scaffolding with linked reads

ABySS can scaffold using linked reads from 10x Genomics Chromium. The barcodes must first be extracted from the read sequences and added to the BX:Z tag of the FASTQ header, typically using the longranger basic command of Long Ranger or EMA preproc. The linked reads are used to correct assembly errors, which requires that Tigmint. The linked reads are also used for scaffolding, which requires ARCS. See Dependencies for installation instructions.

ABySS can combine paired-end, mate-pair, and linked-read libraries. The pe and lr libraries will be used to build the de Bruijn graph. The mp libraries will be used for paired-end/mate-pair scaffolding. The lr libraries will be used for misassembly correction using Tigmint and scaffolding using ARCS.

abyss-pe k=96 B=2G name=hsapiens \
	pe='pea' pea='lra.fastq.gz' \
	mp='mpa' mpa='lra.fastq.gz' \
	lr='lra' lra='lra.fastq.gz'

ABySS performs better with a mixture of paired-end, mate-pair, and linked reads, but it is possible to assemble only linked reads using ABySS, though this mode of operation is experimental.

abyss-pe k=96 name=hsapiens lr='lra' lra='lra.fastq.gz'

Rescaffolding with long sequences

Long sequences such as RNA-Seq contigs can be used to rescaffold an assembly. Sequences are aligned using BWA-MEM to the assembled scaffolds. Additional scaffolds are then formed between scaffolds that can be linked unambiguously when considering all BWA-MEM alignments.

Similar to scaffolding, the names of the datasets can be specified with the long parameter. These scaffolds will be stored in the file ${name}-long-scaffs.fa. The following is an example of an assembly with PET, MPET and an RNA-Seq assembly. Note that the names of the libraries are arbitrary.

abyss-pe k=96 B=2G name=ecoli lib='pe1 pe2' mp='mp1 mp2' long='longa' \
	pe1='pe1_1.fa pe1_2.fa' pe2='pe2_1.fa pe2_2.fa' \
	mp1='mp1_1.fa mp1_2.fa' mp2='mp2_1.fa mp2_2.fa' \
	longa='longa.fa'

Assembling using a paired de Bruijn graph

Assemblies may be performed using a paired de Bruijn graph instead of a standard de Bruijn graph. In paired de Bruijn graph mode, ABySS uses k-mer pairs in place of k-mers, where each k-mer pair consists of two equal-size k-mers separated by a fixed distance. A k-mer pair is functionally similar to a large k-mer spanning the breadth of the k-mer pair, but uses less memory because the sequence in the gap is not stored. To assemble using paired de Bruijn graph mode, specify both individual k-mer size (K) and k-mer pair span (k). For example, to assemble E. coli with a individual k-mer size of 16 and a k-mer pair span of 96:

abyss-pe name=ecoli K=16 k=96 in='reads1.fa reads2.fa'

In this example, the size of the intervening gap between k-mer pairs is 64 bp (96 - 2*16). Note that the k parameter takes on a new meaning in paired de Bruijn graph mode. k indicates kmer pair span in paired de Bruijn graph mode (when K is set), whereas k indicates k-mer size in standard de Bruijn graph mode (when K is not set).

Assembling a strand-specific RNA-Seq library

Strand-specific RNA-Seq libraries can be assembled such that the resulting unitigs, contigs and scaffolds are oriented correctly with respect to the original transcripts that were sequenced. In order to run ABySS in strand-specific mode, the SS parameter must be used as in the following example:

abyss-pe name=SS-RNA B=2G k=96 in='reads1.fa reads2.fa' SS=--SS

The expected orientation for the read sequences with respect to the original RNA is RF. i.e. the first read in a read pair is always in reverse orientation.

Optimizing the parameters k and kc

It is standard practice when running ABySS to run multiple assemblies to find the optimal values for the k and kc parameters. k determines the k-mer size in the de Bruijn Graph, and kc is the k-mer minimum coverage multiplicity cutoff, which filters out erroneous k-mers. The range in which k should be tested depends on the read size and read coverage.

A rough indicator is, for 2x150bp reads and 40x coverage, the right k value is often around 70 to 90. For 2x250bp reads and 40x coverage, the right value might be around 110 to 140.

For kc, 2 is most often a good value, but can go as high as 4.

The following shell snippet will assemble for k values 2 and 3, and every eighth value of k from 50 to 90. In the end, we calculate the contiguity statistics, as a proxy for identifying the optimal assembly. Other metrics can be used, as needed.

for kc in 2 3; do
	for k in `seq 50 8 90`; do
		mkdir k${k}-kc${kc}
		abyss-pe -C k${k}-kc${kc} name=ecoli B=2G k=$k kc=$kc in=../reads.fa
	done
done
abyss-fac k*/ecoli-scaffolds.fa

The default maximum value for k is 192. This limit may be changed at compile time using the --enable-maxk option of configure. It may be decreased to 32 to decrease memory usage or increased to larger values.

Running ABySS on a cluster

ABySS integrates well with cluster job schedulers, such as:

  • SGE (Sun Grid Engine)
  • Portable Batch System (PBS)
  • Load Sharing Facility (LSF)
  • IBM LoadLeveler

For example, to submit an array of jobs to assemble every eighth value of k between 50 and 90 using 64 processes for each job:

qsub -N ecoli -pe openmpi 64 -t 50-90:8 \
	<<<'mkdir k$SGE_TASK_ID && abyss-pe -C k$SGE_TASK_ID in=/data/reads.fa'

Using the DIDA alignment framework

ABySS supports the use of DIDA (Distributed Indexing Dispatched Alignment), an MPI-based framework for computing sequence alignments in parallel across multiple machines. The DIDA software must be separately downloaded and installed from http://www.bcgsc.ca/platform/bioinfo/software/dida. In comparison to the standard ABySS alignment stages which are constrained to a single machine, DIDA offers improved performance and the ability to scale to larger targets. Please see the DIDA section of the abyss-pe man page (in the doc subdirectory) for details on usage.

Assembly Parameters

Parameters of the driver script, abyss-pe

  • a: maximum number of branches of a bubble [2]
  • b: maximum length of a bubble (bp) [""]
  • B: Bloom filter size (e.g. "100M")
  • c: minimum mean k-mer coverage of a unitig [sqrt(median)]
  • d: allowable error of a distance estimate (bp) [6]
  • e: minimum erosion k-mer coverage [round(sqrt(median))]
  • E: minimum erosion k-mer coverage per strand [1 if sqrt(median) > 2 else 0]
  • G: genome size, used to calculate NG50
  • H: number of Bloom filter hash functions [4]
  • j: number of threads [2]
  • k: size of k-mer (when K is not set) or the span of a k-mer pair (when K is set)
  • kc: minimum k-mer count threshold for Bloom filter assembly [2]
  • K: the length of a single k-mer in a k-mer pair (bp)
  • l: minimum alignment length of a read (bp) [40]
  • m: minimum overlap of two unitigs (bp) [0 (interpreted as k - 1) if mp is provided or if k<=50, otherwise 50]
  • n: minimum number of pairs required for building contigs [10]
  • N: minimum number of pairs required for building scaffolds [15-20]
  • np: number of MPI processes [1]
  • p: minimum sequence identity of a bubble [0.9]
  • q: minimum base quality [3]
  • s: minimum unitig size required for building contigs (bp) [1000]
  • S: minimum contig size required for building scaffolds (bp) [100-5000]
  • t: maximum length of blunt contigs to trim [k]
  • v: use v=-v for verbose logging, v=-vv for extra verbose
  • x: spaced seed (Bloom filter assembly only)
  • lr_s: minimum contig size required for building scaffolds with linked reads (bp) [S]
  • lr_n: minimum number of barcodes required for building scaffolds with linked reads [10]

Environment variables

abyss-pe configuration variables may be set on the command line or from the environment, for example with export k=96. It can happen that abyss-pe picks up such variables from your environment that you had not intended, and that can cause trouble. To troubleshoot that situation, use the abyss-pe env command to print the values of all the abyss-pe configuration variables:

abyss-pe env [options]

ABySS programs

abyss-pe is a driver script implemented as a Makefile. Any option of make may be used with abyss-pe. Particularly useful options are:

  • -C dir, --directory=dir Change to the directory dir and store the results there.
  • -n, --dry-run Print the commands that would be executed, but do not execute them.

abyss-pe uses the following programs, which must be found in your PATH:

  • ABYSS: de Bruijn graph assembler
  • ABYSS-P: parallel (MPI) de Bruijn graph assembler
  • AdjList: find overlapping sequences
  • DistanceEst: estimate the distance between sequences
  • MergeContigs: merge sequences
  • MergePaths: merge overlapping paths
  • Overlap: find overlapping sequences using paired-end reads
  • PathConsensus: find a consensus sequence of ambiguous paths
  • PathOverlap: find overlapping paths
  • PopBubbles: remove bubbles from the sequence overlap graph
  • SimpleGraph: find paths through the overlap graph
  • abyss-fac: calculate assembly contiguity statistics
  • abyss-filtergraph: remove shim contigs from the overlap graph
  • abyss-fixmate: fill the paired-end fields of SAM alignments
  • abyss-map: map reads to a reference sequence
  • abyss-scaffold: scaffold contigs using distance estimates
  • abyss-todot: convert graph formats and merge graphs
  • abyss-rresolver: resolve repeats using short reads

This flowchart shows the ABySS assembly pipeline and its intermediate files.

Export to SQLite Database

ABySS has a built-in support for SQLite database to export log values into a SQLite file and/or .csv files at runtime.

Database parameters

Of abyss-pe:

  • db: path to SQLite repository file [$(name).sqlite]
  • species: name of species to archive [ ]
  • strain: name of strain to archive [ ]
  • library: name of library to archive [ ]

For example, to export data of species 'Ecoli', strain 'O121' and library 'pea' into your SQLite database repository named '/abyss/test.sqlite':

abyss-pe db=/abyss/test.sqlite species=Ecoli strain=O121 library=pea [other options]

Helper programs

Found in your path:

  • abyss-db-txt: create a flat file showing entire repository at a glance
  • abyss-db-csv: create .csv table(s) from the repository

Usage:

abyss-db-txt /your/repository
abyss-db-csv /your/repository program(s)

For example,

abyss-db-txt repo.sqlite
abyss-db-csv repo.sqlite DistanceEst
abyss-db-csv repo.sqlite DistanceEst abyss-scaffold
abyss-db-csv repo.sqlite --all

Citation

Shaun D Jackman, Benjamin P Vandervalk, Hamid Mohamadi, Justin Chu, Sarah Yeo, S Austin Hammond, Golnaz Jahesh, Hamza Khan, Lauren Coombe, René L Warren, and Inanc Birol (2017). ABySS 2.0: Resource-efficient assembly of large genomes using a Bloom filter. Genome research, 27(5), 768-777. doi:10.1101/gr.214346.116

Simpson, Jared T., Kim Wong, Shaun D. Jackman, Jacqueline E. Schein, Steven JM Jones, and Inanc Birol (2009). ABySS: a parallel assembler for short read sequence data. Genome research, 19(6), 1117-1123. doi:10.1101/gr.089532.108

Related Publications

Vladimir Nikolić, Amirhossein Afshinfard, Justin Chu, Johnathan Wong, Lauren Coombe, Ka Ming Nip, René L. Warren & Inanç Birol (2022). RResolver: efficient short-read repeat resolution within ABySS. BMC Bioinformatics 23, Article number: 246 (2022). doi:10.1186/s12859-022-04790-z

Robertson, Gordon, Jacqueline Schein, Readman Chiu, Richard Corbett, Matthew Field, Shaun D. Jackman, Karen Mungall, et al (2010). De novo assembly and analysis of RNA-seq data. Nature methods, 7(11), 909-912. doi:10.1038/10.1038/nmeth.1517

Nielsen, Cydney B., Shaun D. Jackman, Inanc Birol, and Steven JM Jones (2009). ABySS-Explorer: visualizing genome sequence assemblies. IEEE Transactions on Visualization and Computer Graphics, 15(6), 881-888. doi:10.1109/TVCG.2009.116

Support

Create a new issue on GitHub.

Ask a question on Biostars.

Subscribe to the ABySS mailing list, [email protected].

For questions related to transcriptome assembly, contact the Trans-ABySS mailing list, [email protected].

Authors

Supervised by Dr. Inanc Birol.

Copyright 2016-present Canada's Michael Smith Genome Sciences Centre

abyss's People

Contributors

aafshinfard avatar benvvalk avatar eernst avatar eli-schwartz avatar ibirol avatar jts avatar justinchu avatar jwcodee avatar kkapuria3 avatar kraghavan avatar kubu4 avatar lcoombe avatar mbargull avatar mmokrejs avatar mohamadi avatar mr-c avatar nsoranzo avatar parham-k avatar pkubaj avatar sauparna avatar schutzekatze avatar seekingmeaning avatar sjackman avatar smattr avatar traymond avatar tskir avatar vlad0x00 avatar warrenlr avatar wookietreiber avatar zkamvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

abyss's Issues

Stats is different when ABYSS-P is used

Here are the Statistics on the suggested test data set downloaded from the bcgsc site:

Non-MPI result is below:

n n:500 n:N50 min N80 N50 N20 max sum
817 108 38 509 642 905 1323 2260 93364 test-unitigs.fa
497 80 19 500 1418 2969 5099 10702 173468 test-contigs.fa
421 26 6 578 6166 10350 20266 27989 180109 test-scaffolds.fa

4-node parallelism provides a bit different stats:

n n:500 n:N50 min N80 N50 N20 max sum
817 108 38 509 642 905 1323 2260 93364 test-unitigs.fa
499 80 19 500 1407 2969 5099 10702 173017 test-contigs.fa
421 25 5 578 6166 10392 27989 30442 179959 test-scaffolds.fa

(2- and 8-node parallelism gave the same stats)

V1.3.5 and Apps are built on the x86 with the recommended packages.

abyss-scaffold: Assertion `v1 != v2' failed

From: Nathaniel Street
Date: 2012-03-15 15:33:30 PDT

Hi

My assembly has crashed during the MP scaffolding step with the error

abyss-scaffold: scaffold.cc:227: void resolveForks(Graph&, const Graph&): Assertion v1 != v2' failed. /bin/bash: line 1: 30823 Aborted abyss-scaffold -v -k47 -s200-5000 -n3 -g abyss008-6.path1.dot abyss008-6.dot MP1-6.dist.dot MP2-6.dist.dot > abyss008-6.path1 make: *** [abyss008-6.path1] Error 134 make: *** Deleting fileabyss008-6.path1'

I have pasted the end of stdout below. I am using v1.3.3 and my assembly command was

abyss-pe k=47 aligner=map np=12 n=10 N=3 s=500 b=10000 p=0.8 S=200-5000 v=-v SIMPLEGRAPH_OPTIONS=--dist-error=30 POPBUBBLES_OPTIONS=--scaffold FAC_OPTIONS=--mmd lib='PE1 PE2 PE3' mp='MP1 MP2' MP1_l=24 MP2_l=24 name=$outName
se="$reads/asp201_g_ipe_150.i.QCa.RSa.unique.adapterTrimmed.joined.fq $reads/asp201_g_ipe_150-1.i.QCa.RSa.unique.adapterTrimmed.joined.fq $reads/asp201_g_ipe_150.se.QCa.RSa.fq $reads/asp201_g_ipe_150-1.se.QCa.RSa.fq $reads/asp201_g_ipe_300.se.QCa.RSa.fq $reads/asp201_g_ipe_650.se.QCa.RSa.fq"
PE1="$reads/asp201_g_ipe_150-1.i.QCa.RSa.unique.adapterTrimmed.notjoined.fq $reads/asp201_g_ipe_150.i.QCa.RSa.unique.adapterTrimmed.notjoined.fq"
PE2="$reads/asp201_g_ipe_300.i.QCa.RSa.unique.fq" PE3="$reads/asp201_g_ipe_650.i.QCa.RSa.unique.fq"
MP1="$readsMP_SciLife/asp201_g_imp_3Kb_1.i.QCa.RSa.unique.linkerRemoved.trim.rev.fq"
MP2="$readsMP_BGI/120224_I235_FCD0PP2ACXX_L8_ASPvaaDADDTAAPEI-93.rev.unique.fq"
$outName.log 2>&1

The MP reads for both MP libraries were reverse complemented, so they should align as FR.

Can anyone point me to the likely source of the problem? Let me know what more information you might need.

Thanks

Nate

Mapped 49051008 of 54963174 reads (89.2%)
Mapped 43643338 of 54963174 reads uniquely (79.4%)
Read 54963174 alignments
Mateless 0
Unaligned 456181 1.66%
Singleton 4999804 18.2%
FR 699611 2.55%
RF 1429321 5.2%
FF 12695 0.0462%
Different 19883975 72.4%
Total 27481587
FR Stats mean: 7842 median: 10338 sd: 4395 n: 705363 min: 0 max: 14368
▁▂▃▄▃▃▂▁_
RF Stats mean: 144.8 median: 137 sd: 120.6 n: 1428099 min: 0 max: 3065
█▇▆▅▇▅▃▁__
Mate orientation FR: 699611 (32.9%) RF: 1429321 (67.1%)
The library MP2-6.hist is oriented reverse-forward (RF).
Stats mean: 144.8 median: 137 sd: 120.6 n: 1428099 min: 0 max: 3065
█▇▆▅▇▅▃▁__
Minimum and maximum distance are set to -46 and 3065 bp.
abyss-scaffold -v -k47 -s200-5000 -n3 -g abyss008-6.path1.dot abyss008-6.dot MP1-6.dist.dot MP2-6.dist.dot >abyss008-6.path1
Reading abyss008-6.dot'... V=5261822 E=8260506 E/V=1.57 Degree: _█▂_ 01234 0: 6.4% 1: 57% 2-4: 33% 5+: 2.8% max: 79 ReadingMP1-6.dist.dot'...
V=5261822 E=8266922 E/V=1.57
Degree: █▂
01234
0: 6.3% 1: 57% 2-4: 33% 5+: 2.8% max: 79
Reading MP2-6.dist.dot'... V=5261822 E=8271661 E/V=1.57 Degree: _█▂_ 01234 0: 6.3% 1: 57% 2-4: 33% 5+: 2.8% max: 94 Added 3393 complementary edges. Removed 4970902 vertices. Removed 139887 edges. V=290920 E=13612 E/V=0.0468 Degree: █ 01234 0: 96% 1: 3.7% 2-4: 0.11% 5+: 0.068% max: 24 Removed 272 cyclic edges. V=290920 E=13334 E/V=0.0458 Degree: █ 01234 0: 96% 1: 3.7% 2-4: 0.12% 5+: 0.061% max: 22 abyss-scaffold: scaffold.cc:227: void resolveForks(Graph&, const Graph&): Assertionv1 != v2' failed.
/bin/bash: line 1: 30823 Aborted abyss-scaffold -v -k47 -s200-5000 -n3 -g abyss008-6.path1.dot abyss008-6.dot MP1-6.dist.dot MP2-6.dist.dot > abyss008-6.path1
make: *** [abyss008-6.path1] Error 134
make: *** Deleting file `abyss008-6.path1'

DistanceEst missing options

Hello,
I have version abyss-pe (ABySS) 1.5.1 compiled with maxk=128
I ran abyss-pe --dry-run and looked at the commands. The first two DistanceEst commands have values for -l -s -n and -o

 DistanceEst  -j8 -k111 -l111 -s200 -n5   -o lib1-3.dist lib1-3.hist

 DistanceEst --dot  -j8 -k111 -l111 -s200 -n5   -o mp1-6.dist.dot mp1-6.hist

But the third DistanceEst command is lacking values for -l -s n and -o

 DistanceEst --dot  -j8 -k111 -l -s -n   -o mp2lib1=sample1_pairs.fastq-6.dist.dot mp2lib1=sample1_pairs.fastq-6.hist

Also when using a kmer of 111, the -s value is set to 200 which gives the warning:

 warning: the seed-length should be at least twice k: k=111, s=200

Is this the expected behavior?
Thanks

Processing many files

@sjackman @traymond

Please review this patch for the 1.3.7 inclusion. To workaround the problem awk can't handle many files at once.
diff --git a/Parallel/parallelAbyss.cpp b/Parallel/parallelAbyss.cpp
index dbdaced..ee05883 100644
--- a/Parallel/parallelAbyss.cpp
+++ b/Parallel/parallelAbyss.cpp
@@ -39,10 +39,10 @@ static void concatenateFiles(const string& dest,
{
cout << "Concatenating to " << dest << endl;
ostringstream s;

  •   s << command;
    
  •   s << "cat";  
    for (int i = 0; i < opt::numProc; i++)
            s << ' ' << prefix << i << suffix;
    
  •   s << " >'" << dest << '\'';
    
  •   s << " | " << command << " > " << dest;
    systemx(s.str());
    
    bool die = false;
    

    @@ -50,7 +50,7 @@ static void concatenateFiles(const string& dest,
    s.str("");
    s << prefix << i << suffix;
    const char* path = s.str().c_str();

  •           if (unlink(path) == -1) {
    
  •           if (unlink(s.str().c_str()) == -1) {
                    cerr << "error: removing `" << path << "': "
                            << strerror(errno) << endl;
                    die = true;
    

warning: converting to size_t from double

g++ -DHAVE_CONFIG_H -I. -I.. -I.. -I../Common -I../DataLayer -I. -I/
usr/local/openmpi/include -I/usr/local/sparsehash-1.10/include -Wall -
Wextra -Werror -g -O2 -MT libassembly_a-AssemblyAlgorithms.o -MD -MP -
MF .deps/libassembly_a-AssemblyAlgorithms.Tpo -c -o libassembly_a-
AssemblyAlgorithms.o test -f 'AssemblyAlgorithms.cpp' || echo './'AssemblyAlgorithms.cpp
cc1plus: warnings being treated as errors
../Common/Histogram.h: In member function ‘int
Histogram::percentile(float) const’:
../Common/Histogram.h:135: warning: converting to ‘size_t’ from â
€˜double’
../Common/Histogram.h: In member function ‘int
Histogram::weightedPercentile(float) const’:
../Common/Histogram.h:155: warning: converting to ‘long long
unsigned int’ from ‘double’

Alignments are written to *.sam.gz instead of being piped to DistanceEst

Sometimes alignments are written to *.sam.gz instead of being piped to DistanceEst although *.sam.gz was not specified as an endpoint in abyss-pe.

Reads-to-unitig alignments are written to *-3.sam.gz with this command:
$ abyss-pe [options] contigs
An endpoint is specified.

Reads-to-unitig alignments are piped to DistanceEst with one of these commands:
$ abyss-pe [options] unitigs contigs
An endpoint and its prerequisite is specified.

$ abyss-pe [options] mp=''
No endpoint is specified.

Scaffolding may have the similar issue.

Multi-line CASAVA-formatted FASTQ files are broken

Test case:

    >HISEQ2:311:D0PTKACXX:4:1101:1189:2054 1:Y:0:
    NTGCNTTNNNNNNNNNNNNNNNNNNNNNAAGGAATATCTTCAATTTAGAATGCCAATGCT
    TCCATAATGATGTAGAAGAGCTTAATCATGTGTTTTGAATG

    error: Expected either `>' or `@' or 11 fields
    and saw `T' and 1 fields near
    TCCATAATGATGTAGAAGAGCTTAATCATGTGTTTTGAATG

Abyss 1.3.4 and GCC 4.1 DistanceEst.cpp:141: aviso: convirtiendo a ‘int’ desde ‘double’:

When i'm doing 'make' abyss with gcc 4.1 i got the following error.

make[2]: se ingresa al directorio /home/velvet/abyss-1.3.4/DistanceEst' g++ -DHAVE_CONFIG_H -I. -I.. -I.. -I../Common -I. -Wall -Wextra -Werror -fopenmp -g -O2 -MT DistanceEst-DistanceEst.o -MD -MP -MF .deps/DistanceEst-DistanceEst.Tpo -c -o DistanceEst-DistanceEst.otest -f 'DistanceEst.cpp' || echo './'DistanceEst.cpp cc1plus: warnings being treated as errors DistanceEst.cpp: In function ‘int estimateDistanceUsingMean(const std::vector<int, std::allocator<int> >&, const PMF&, unsigned int&)’: DistanceEst.cpp:141: aviso: convirtiendo a ‘int’ desde ‘double’ make[2]: *** [DistanceEst-DistanceEst.o] Error 1 make[2]: se sale del directorio/home/velvet/abyss-1.3.4/DistanceEst'
make[1]: *** [all-recursive] Error 1
make[1]: se sale del directorio `/home/velvet/abyss-1.3.4'
make: *** [all] Error 2

Then i use
make AM_CXXFLAGS=-Wall
And all works fine.

Running ABySS on aarch64 machine

Version 1.5.2 is built without a problem on an APM's X-Gene server. Programs including ABYSS-P in the package all ran fine and assembly statistics are consistent with results obtained on x86.

Does not compile with GCC 4.7

../../FMIndex/FMIndex.h: In function 'std::istream& operator>>(std::istream&, FMIndex&)':
../../FMIndex/FMIndex.h:540:7: error: variable 'c' set but not used [-Werror=unused-but-set-variable]

and

I am trying to compile Abyss 1.3.3 on a Linux machine with GCC 4.7.0
and Boost 1.49.0 and am running into the following issues:

abyss-fixmate.cc:130:8: error: ‘intptr_t’ does not name a type
abyss-fixmate.cc: In function ‘void printProgress(const Alignments&)’:
abyss-fixmate.cc:143:38: error: expected type-specifier before
‘intptr_t’
abyss-fixmate.cc:143:38: error: expected ‘>’ before ‘intptr_t’
abyss-fixmate.cc:143:38: error: expected ‘(’ before ‘intptr_t’
abyss-fixmate.cc:143:38: error: ‘intptr_t’ was not declared in this
scope
abyss-fixmate.cc:143:54: error: ‘sbrk’ was not declared in this scope
abyss-fixmate.cc:143:59: error: ‘sbrk0’ was not declared in this scope
abyss-fixmate.cc:143:64: error: expected ‘)’ before ‘;’ token
make[2]: *** [abyss_fixmate-abyss-fixmate.o] Error 1
make[2]: Leaving directory /n/home01/jhess/Software/abyss-1.3.3_n/ abyss-1.3.3/ParseAligns' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory/n/home01/jhess/Software/abyss-1.3.3_n/
abyss-1.3.3'
make: *** [all] Error 2

I managed to alleviate some of those by including in abyss-
fixmate.cc and -std=gnu++11 in the compilation command but I am still
seeing the following errors which I am at loss about:

abyss-fixmate.cc:130:58: error: ‘sbrk’ was not declared in this scope
abyss-fixmate.cc: In function ‘void printProgress(const Alignments&)’:
abyss-fixmate.cc:143:54: error: ‘sbrk’ was not declared in this scope

Fails to compile on OS X 10.8 Mountain Lion

ABySS 1.3.4 fails to compile on OS X 10.8 Mountain Lion. There's some issue with stl_relops.h.

/usr/bin/clang++ -DHAVE_CONFIG_H -I. -I..  -I.. -I../Common -I.   -isystem /usr/local/include -Wall -Wextra -Werror -Os -w -pipe -march=native -Qunused-arguments -mmacosx-version-min=10.8 -c -o abyss_drawgraph-drawgraph.o `test -f 'drawgraph.cc' || echo './'`drawgraph.cc
In file included from junction.cc:5:
../Graph/ContigGraphAlgorithms.h:332:19: error: call to function 'operator!=' that is neither visible in the template definition nor found by argument-dependent lookup
                } else if (g[e] != g[f]) {
                                ^
junction.cc:192:21: note: in instantiation of function template specialization 'addComplementaryEdges<DirectedGraph<NoProperty, NoProperty> >' requested here
                size_t numAdded = addComplementaryEdges(scaffoldG);
                                  ^
/usr/include/c++/4.2.1/bits/stl_relops.h:92:7: note: 'operator!=' should be declared prior to the call site or in the global namespace
      operator!=(const _Tp& __x, const _Tp& __y)
      ^
1 error generated.
make[1]: *** [abyss_junction-junction.o] Error 1
make[1]: *** Waiting for unfinished jobs....
In file included from scaffold.cc:9:
In file included from ../Graph/Assemble.h:6:
In file included from ../Graph/DepthFirstSearch.h:4:
../Graph/ContigGraphAlgorithms.h:332:19: error: call to function 'operator!=' that is neither visible in the template definition nor found by argument-dependent lookup
                } else if (g[e] != g[f]) {
                                ^
scaffold.cc:702:20: note: in instantiation of function template specialization 'addComplementaryEdges<DirectedGraph<Length, DistanceEst> >' requested here
        size_t numAdded = addComplementaryEdges(g);
                          ^
/usr/include/c++/4.2.1/bits/stl_relops.h:92:7: note: 'operator!=' should be declared prior to the call site or in the global namespace
      operator!=(const _Tp& __x, const _Tp& __y)
      ^
1 error generated.
make[1]: *** [abyss_scaffold-scaffold.o] Error 1
In file included from drawgraph.cc:13:
../Graph/ContigGraphAlgorithms.h:332:19: error: call to function 'operator!=' that is neither visible in the template definition nor found by argument-dependent lookup
                } else if (g[e] != g[f]) {
                                ^
drawgraph.cc:177:2: note: in instantiation of function template specialization 'addComplementaryEdges<DirectedGraph<Length, DistanceEst> >' requested here
        addComplementaryEdges(g);
        ^
/usr/include/c++/4.2.1/bits/stl_relops.h:92:7: note: 'operator!=' should be declared prior to the call site or in the global namespace
      operator!=(const _Tp& __x, const _Tp& __y)
      ^
1 error generated.
make[1]: *** [abyss_drawgraph-drawgraph.o] Error 1
make: *** [install-recursive] Error 1
Error: #<BuildError: Failed executing: make install >

error: use of overloaded operator '!=' is ambiguous

I believe this issue is caused by a compiler bug in clang.

error: use of overloaded operator '!=' is ambiguous
(with operand types 'Matches::reverse_iterator'
(aka 'reverse_iterator<iterator>') and 'reverse_iterator'
(aka 'reverse_iterator<iterator>'))
it != matches.rend(); ++it)

automake warning: the 'subdir-objects' automake option hasn't been enabled

Unittest/Makefile.am:27: warning: source file '$(top_srcdir)/connectpairs/test/BloomFilter.cc' is in a subdirectory,
Unittest/Makefile.am:27: but option 'subdir-objects' is disabled
automake-1.14: warning: possible forward-incompatibility.
automake-1.14: At least a source file is in a subdirectory, but the 'subdir-objects'
automake-1.14: automake option hasn't been enabled.  For now, the corresponding output
automake-1.14: object file(s) will be placed in the top-level directory.  However,
automake-1.14: this behaviour will change in future Automake versions: they will
automake-1.14: unconditionally cause object files to be placed in the same subdirectory
automake-1.14: of the corresponding sources.
automake-1.14: You are advised to start using 'subdir-objects' option throughout your
automake-1.14: project, to avoid future incompatibilities.

AdjList produces incorrect adjacency info

On the sparc machine AdjList makes incorrect -1.adj file that aren't usable in the downstream analyses. I compared it against the -1.adj file made on the x86, about 0.3% of the u->v records are different.

sam2afg: Skip secondary alignments

Dr. James Abbott wrote...

Hello,

I see Shaun Jackmans' (abyss-)sam2afg script is available within the
Amos source repository. I find this very useful for coercing output from
assemblers which which don't track read location into amos by remapping
reads against the assembly and converting to afg - not ideal but better
than nothing when you don't have the 'real' read locations to work with.

This fails, however when reads are aligned with 'bwa mem', which can
output multi-part alignments. sam2afg checks for reuse of the same read
id (presumably to prevent the generation of non-unique eid values),
consequently encountering multiple alignments for a read causes it to die.

The following one-line patch allows sam2afg to skip these secondary
alignments present in 'bwa mem' output, provided bwa mem has been run
with the '-M' argument which sets the SAM 'secondary alignment' flag on
the alignments in question.

Hopefully this will also be of use to others...

Best Regards,
James

*** /usr/biosoft/packages/abyss/current/bin/abyss-samtoafg 
2012-12-06 11:55:17.468551266 +0000
--- ../bin/abyss-samtoafg       2013-09-10 11:45:31.267125932 +0100
***************
*** 105,110 ****
--- 105,111 ----
        die unless defined $qqual;

        $tstart--; # convert to zero-based coordinate
+       next if $flag & 0x100; # secondary alignment
        $qid .= "/1" if $flag & 0x40; #FREAD1
        $qid .= "/2" if $flag & 0x80; #FREAD2

Dr. James Abbott
Lead Bioinformatician
Bioinformatics Support Service
Imperial College, London

abyss-map crashes when a FASTA header matches the query better than the sequence

For instance:

$ pwd
/projects/trans_scratch/validations/abyss_map_crash
$ cat test.fq
@FOO
NNNCCCNNN
+
#########
$ cat good.fa
>AAA
CCC
$ cat bad.fa
>CCC
AAA
$ ./abyss-1.5.2/Map/abyss-map ./test.fq bad.fa > /dev/null
Building the suffix array...
Building the Burrows-Wheeler transform...
Building the character occurrence table...
abyss-map: ../DataLayer/FastaIndex.h:101: boost::tuples::tuple<const FAIRecord&, long unsigned int, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type> FastaIndex::operator[](size_t) const: Assertion `it != m_data.begin()' failed.
Aborted
$ ./abyss-1.5.2/Map/abyss-map ./test.fq good.fa > /dev/null
Building the suffix array...
Building the Burrows-Wheeler transform...
Building the character occurrence table...
# All good

That's pretty inconvenient for us when, for instance, we're using the header to track which sequence variant we're looking for in the query.

For a quick work-around, it would be nice if crash message identified the problematic sequence so I can change the name.

Cheers,
Alex

errors when running abyss-pe

I run abyss-pe using the following command:
abyss-pe k=25 name=test in='test-data/reads1.fastq test-data/reads2.fastq'
but there is an error. The message which the terminal output is:
ABYSS -k25 -q3 --coverage-hist=coverage.hist -s test-bubbles.fa -o test-1.fa test-data/reads1.fastq test-data/reads2.fastq
ABySS 1.2.5
ABYSS -k25 -q3 --coverage-hist=coverage.hist -s test-bubbles.fa -o test-1.fa test-data/reads1.fastq test-data/reads2.fastq
Reading test-data/reads1.fastq' Readingtest-data/reads2.fastq'
Loaded 938113 k-mer
Minimum k-mer coverage is 22
Using a coverage threshold of 2...
The median k-mer coverage is 5
The reconstruction is 227624
The k-mer coverage threshold is 2.23607
Setting parameter e (erode) to 2
Setting parameter E (erodeStrand) to 1
Setting parameter c (coverage) to 2.23607
Generating adjacency
Generated 1897999 edges
Eroding tips
Eroded 499695 tips
Eroded 0 tips
Trimming short branches: 1
Trimmed 11 k-mer in 11 branches
Trimming short branches: 2
Trimmed 16 k-mer in 8 branches
Trimming short branches: 4
Trimmed 73 k-mer in 21 branches
Trimming short branches: 8
Trimmed 140 k-mer in 23 branches
Trimming short branches: 16
Trimmed 278 k-mer in 24 branches
Trimming short branches: 25
Trimmed 93 k-mer in 5 branches
Trimming short branches: 25
Trimmed 92 branches in 6 rounds
Marked 35786 edges of 17755 ambiguous vertices.
Removing low-coverage contigs (mean k-mer coverage < 2.23607)
Found 437801 k-mer in 25227 contigs before removing low-coverage contigs
Removed 231837 k-mer in 9467 low-coverage contigs
Split 18897 ambiguous branches
Eroding tips
Eroded 3118 tips
Eroded 0 tips
Trimming short branches: 1
Trimmed 9 k-mer in 9 branches
Trimming short branches: 2
Trimmed 20 k-mer in 17 branches
Trimming short branches: 4
Trimmed 86 k-mer in 42 branches
Trimming short branches: 8
Trimmed 245 k-mer in 58 branches
Trimming short branches: 16
Trimmed 737 k-mer in 80 branches
Trimming short branches: 25
Trimmed 768 k-mer in 53 branches
Trimming short branches: 25
Trimmed 259 branches in 6 rounds
Popping bubbles
Removed 81 bubbles
Removed 81 bubbles
Marked 525 edges of 258 ambiguous vertices.
6 unassembled k-mer in circular contigs
Assembled 198539 k-mer in 907 contigs
AdjList -k25 -m50 test-1.fa >test-1.adj
AdjList: invalid option -- m
AdjList: invalid option -- 5
AdjList: invalid option -- 0
Try `AdjList --help' for more information.
make: *** [test-1.adj] error 1
make: *** deleting the file “test-1.adj”
In this case, what should I do? Why is there another parameter m?

Thanks!

abyss-map cannot handle multi-line fasta

I'm trying to use abyss-map with a multi-line fasta file (sequences are broken across multiple lines). Unfortunately FastaIndex assumes that the sequence are on a single line, so the assertion on line 83 trips. Would it be possible to support multi-line fasta?

Assertion `it != m_edges.end()' failed.

Removing shim contigs from the graph...
abyss-filtergraph: ../Graph/DirectedGraph.h:301: const EdgeProp &DirectedGraph<VertexProp, EdgeProp>::Vertex::operator const [with VertexProp = ContigProperties, EdgeProp = Distance]: Assertion `it != m_edges.end()' failed.
/bin/bash: line 1: 7160 Aborted abyss-filtergraph -v -k65 -g celegans-small-2.adj celegans-small-1.adj > celegans-small-1.path
make: *** [celegans-small-2.adj] Error 134

I've been trying to process the C. elegans reads SRR065390 from SRA with the fixed kmer size 65.The -1.fa file, if I bring it to the x86 machine, it doesn't crash the point above. I am looking at the read_adj() method and related code.

DistanceEst missing options

Hello,
I have version abyss-pe (ABySS) 1.5.1 compiled with maxk=128
I ran abyss-pe --dry-run to look at the commands.

 abyss-pe --dry-run j=8 np=96 k=111 n=5 name=test_build lib='lib1' mp='mp1 mp2'\
 lib1="pe_lib.fastq" \
 mp1="mp1_a.fastq.gz mp1_b.fastq.gz" \
 mp2="mp2_a.fastq.gz mp2_b.fastq.gz" \
 se="se_lib.fastq se2_lib.fastq.gz"

The first two DistanceEst commands have values for -l -s -n and -o

 DistanceEst  -j8 -k111 -l111 -s200 -n5   -o lib1-3.dist lib1-3.hist

 DistanceEst --dot  -j8 -k111 -l111 -s200 -n5   -o mp1-6.dist.dot mp1-6.hist

But the third DistanceEst command is lacking values for -l -s n and -o

 DistanceEst --dot  -j8 -k111 -l -s -n   -o mp2lib1=sample1_pairs.fastq-6.dist.dot mp2lib1=sample1_pairs.fastq-6.hist

Also when using a kmer of 111, the -s value is set to 200 which gives the warning:

 warning: the seed-length should be at least twice k: k=111, s=200

Is this the expected behavior?
Thanks

PATCH for comping 1.3.7 with FCC

FCC doesn't support std::hash nor std::tr1::hash. Use boost as an alternative.

Patch is below.

diff --git a/AdjList/AdjList.cpp b/AdjList/AdjList.cpp
index 92336fc..4f1b916 100644
--- a/AdjList/AdjList.cpp
+++ b/AdjList/AdjList.cpp
@@ -147,8 +147,13 @@ static void addOverlapsSA(Graph& g, const vector<Kmer>& prefixes)
 }

 /** An index of suffixes of k-1 bp. */
+#ifdef defined(HAVE_STD_HASH) || defined(HAVE_STD_TR1_HASH)
 typedef unordered_map<Kmer, vector<ContigNode>, hash<Kmer> >
        SuffixMap;
+#else
+typedef unordered_map<Kmer, vector<ContigNode>, boost::hash<Kmer> >
+       SuffixMap;
+#endif

 /** Read contigs. Add contig properties to the graph. Add prefixes to
  * the collection and add suffixes to their index.

diff --git a/Common/Hash.h b/Common/Hash.h
index 1079c7d..7a7c3e9 100644
--- a/Common/Hash.h
+++ b/Common/Hash.h
@@ -14,7 +14,11 @@ using std::tr1::hash;
 # define NAMESPACE_STD_HASH_BEGIN namespace std { namespace tr1 {
 # define NAMESPACE_STD_HASH_END } }
 #else
-# error One of either std::hash or std::tr1::hash is required.
+//# error One of either std::hash or std::tr1::hash is required.
+# include <boost/functional/hash.hpp>
+using boost::hash;
+# define NAMESPACE_STD_HASH_BEGIN namespace boost {
+# define NAMESPACE_STD_HASH_END }
 #endif

 #endif
diff --git a/Common/Kmer.h b/Common/Kmer.h
index e6686d8..cb4ae73 100644
--- a/Common/Kmer.h
+++ b/Common/Kmer.h
@@ -124,6 +124,7 @@ static inline Kmer reverseComplement(const Kmer& seq)
 }

 NAMESPACE_STD_HASH_BEGIN
+       template <typename T> struct hash;
        template <> struct hash<Kmer> {
                size_t operator()(const Kmer& kmer) const
                {

MergePaths: Assertion failed: (!m_ambig)

A bug in MergePaths causes the following error in PathConsensus:
Assertion failed: (!m_ambig), function index, file ../../Common/ContigNode.h, line 141.

MergePaths may output two gaps immediately next to each other with no contig in between:
818857- 1N 130N 2701535+

Error when SAM file contains @CO headers

From the SAM spec, the headers should match this:

/^@[A-Za-z][A-Za-z](\t[A-Za-z][A-Za-z0-9]:[ -~]+)+$/ or /^@CO\t.*/

The first case is handled, but not the @CO tags. This was reported with the error:

error: expected `+' and saw `@' near @CORG:HSQ1009_126_5 PR:ITGE:hs37d5

KAligner fails to build with clang 5.0

g++ -Wall -Wextra -Werror -g -O2 -pthread  -o KAligner KAligner-KAligner.o KAligner-Aligner.o ../DataLayer/libdatalayer.a ../Common/libcommon.a -ldl -lm 
clang: error: argument unused during compilation: '-pthread'
make: *** [KAligner] Error 1

Autconf complains of undefined macros

I'm getting rather cryptic build errors on the 1.3.7 release and later:

    [ahammel@login3]─[abyss] ./autogen.sh
    + aclocal
    + autoconf
    configure.ac:33: error: possibly undefined macro: AC_TYPE_SSIZE_T
          If this token and others are legitimate, please use m4_pattern_allow.
          See the Autoconf documentation.
    configure.ac:34: error: possibly undefined macro: AC_TYPE_INT64_T
    configure.ac:35: error: possibly undefined macro: AC_TYPE_UINT8_T
    configure.ac:36: error: possibly undefined macro: AC_TYPE_UINT16_T
    configure.ac:37: error: possibly undefined macro: AC_TYPE_UINT32_T
    configure.ac:38: error: possibly undefined macro: AC_TYPE_UINT64_T
    configure.ac:200: error: possibly undefined macro: AC_OPENMP

I can power through just by runnning autogen.sh again, but this seems to break
a the dialgin build, and (possibly unrelated) the 'docdir' variable goes
missing:

    [ahammel@login3]─[abyss] ./autogen.sh
    + aclocal
    + autoconf
    + autoheader
    + automake -a
    configure.ac: installing `./install-sh'
    configure.ac: installing `./missing'
    ABYSS/Makefile.am: installing `./depcomp'
    dialign/Makefile.am: C objects in subdir but `AM_PROG_CC_C_O' not in `configure.ac' # <---
    dialign/Makefile.am: installing `./compile'
    doc/Makefile.am:2: `dist_doc_DATA' is used but `docdir' is undefined
    Makefile.am:18: `dist_doc_DATA' is used but `docdir' is undefined

And then again during the configuration step:

    [ahammel@login3]─[abyss] ./configure
    checking for a BSD-compatible install... /usr/bin/install -c
    checking whether build environment is sane... yes
    [...]
    ./configure: line 5366: AC_TYPE_SSIZE_T: command not found
    ./configure: line 5367: AC_TYPE_INT64_T: command not found
    ./configure: line 5368: AC_TYPE_UINT8_T: command not found
    ./configure: line 5369: AC_TYPE_UINT16_T: command not found
    ./configure: line 5370: AC_TYPE_UINT32_T: command not found
    ./configure: line 5371: AC_TYPE_UINT64_T: command not found
    [...]
    ./configure: line 9209: AC_OPENMP: command not found
    configure: creating ./config.status
    config.status: creating Makefile
    config.status: error: cannot find input file: Makefile.in

Interestingly, the error I marked above isn't present in the 1.3.4 release:

    [ahammel@login3]─[abyss] ./autogen.sh
    + aclocal
    + autoconf
    + autoheader
    + automake -a
    configure.ac: installing `./install-sh'
    configure.ac: installing `./missing'
    ABYSS/Makefile.am: installing `./depcomp'
    doc/Makefile.am:2: `dist_doc_DATA' is used but `docdir' is undefined
    Makefile.am:5: `dist_doc_DATA' is used but `docdir' is undefined

I suspect this is down to an old version of autotools:

    [ahammel@login3]─[Map] autoconf --version
    autoconf (GNU Autoconf) 2.59
    [...]
    [ahammel@login3]─[Map] automake --version
    automake (GNU automake) 1.9.6
    [...]

error: typedef 'vertex_descriptor' locally defined but not used

BTW, I checked the latest version and injected it into the Git
repository but I realised that it does not build because of:

g++ -DHAVE_CONFIG_H -I. -I.. -I.. -I../Common -I../DataLayer -I. -I/usr/lib/openmpi/include -D_FORTIFY_SOURCE=2 -Wall -Wextra -Werror -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -c -o libassembly_a-DotWriter.o test -f 'DotWriter.cpp' || echo './'DotWriter.cpp
In file included from DotWriter.cpp:5:0:
../Graph/ContigGraphAlgorithms.h: In function 'OutIt assemble_if(Graph&, OutIt, Predicate)':
../Graph/ContigGraphAlgorithms.h:202:44: error: typedef 'vertex_descriptor' locally defined but not used [-Werror=unused-local-typedefs]
typedef typename Graph::vertex_descriptor vertex_descriptor;
^
cc1plus: all warnings being treated as errors
make[3]: *** [libassembly_a-DotWriter.o] Error 1
make[3]: Leaving directory /tmp/buildd/abyss-1.3.6/Assembly' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory/tmp/buildd/abyss-1.3.6'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/tmp/buildd/abyss-1.3.6'
dh_auto_build: make -j1 returned exit code 2

Since you are upstream dealing with things like this would be way
simpler if you just fix these things upstream rather than leaving us
poor packagers alone with patching.

Kind regards

   Andras.

System.IO.MMap Error (abyss 1.3.6)

When building:

./configure --prefix=/soft/abyss/1.3.4 CPPFLAGS=-I/soft/sparsehash/2.0.2/include --enable-maxk=96 --with-mpi=/soft/openmpi/gcc/1.7.2-gnu-4.8.1-static/
make
[...]
ghc --make -O2 -hidir ./ -odir ./ -o abyss-samtobreak samtobreak.hs

samtobreak.hs:16:8:
Could not find module `System.IO.MMap':
Use -v to see a list of the files searched for.
make[2]: *** [abyss-samtobreak] Error 1
[...]

ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.0.4

make: mixed implicit and normal rules

Courtney Hofman wrote...

Hi,

I installed Abyss 1.3.7 using homebrew and tried running abyss-pe on small mitogenome dataset. However I keep getting this error :
/usr/local/bin/abyss-pe:613: *** mixed implicit and normal rules. Stop.

I also tried the activity found at sjackman.github.io/abyss-activity using the run-abyss scripts (using the included data and following the directions) to see if it was just my data but I got the same error. I haven't seen anything in seq answers about this so I thought I would ask for any advice you might have to fix this problem.

Thanks,
Courtney

ParseAligns -l is broken

The option -l of ParseAligns is broken, giving the error message
ParseAligns: invalid option: `-l32'

ParseAligns is only used when using KAligner:
abyss-pe aligner=kaligner

Error running DistanceEst with Bowtie, Bowtie 2 and BWA

Thanks to Simone Scalabrin for sending test data to reproduce this error:
DistanceEst: error: The observed fragment of size 128 bp is shorter than 2*l (l=71). Decrease l to 64.

The minimum alignment length, 'l', is not used by Bowtie or BWA, and causes the above error.

ABYSS-P hangs at higher parallelism

With MAX_MESSAGES set to 100, ABYSS-P with np 8 or higher number of parallelism on the sparc based system hangs at NAS_GEN_ADJ state as two nodes issue MPI_Send almost at the same time against each other. MPI_Send is called by setBaseExtension while iterating through the k-mers assigned.

In my recollection, on x86 clusters I never observed above even with slightly higher parallelism - 16 or 32.

Work around is to reduce the size to 10, I was able to run the App with np=384 without a problem, but this should be understood anyway to achieve higher parallelism.

Downstream parallelization

After ABYSS-P the downstream analyses can not complete if one doesn't have a high-end machine which has enough memory. Any plans to introduce MPI apps to the downstream processes?

abyss-fixmate: std::unordered_map::erase can be slow

With a large number of buckets and low density, std::unordered_map::erase can be very slow. handleAlignment calls std::unordered_map::erase. Using boost::unordered_map::quick_erase is a straight-forward fix suggested by the submitter.

fanzhang wrote…

Hi, Shaun
I am a abyss user. First of all, thanks for your development of such a good program.
Recently, I am using the abyss, on high depth data(80x in a single file). When things goes into abyss-fixmate, it runs for a week, and seems will never stop. So I looked into it and located the problem of erase operation of the unordered_map. As far as I understand, this definitely doesn't need this much long time with O(1) complexit. And I found that, by the time it slows down, the bucket load is 0. I think this might be a side effect of the returning iterator of erase. And then I changed it into boost::unordered_map::quick_erase(). This solved the problem, program ends in hours.
So, I think maybe this could be a improvement on the abyss performance, though this is not a bug, but certainly is pain in the back.

scaffold.cc: warning: converting to unsigned int from double

Laurent MANCHON wrote...

Hi,

in the last version of Abyss (1.3.3)
when compiling i have this error:

g++ -DHAVE_CONFIG_H -I. -I..  -I.. -I../Common -I. -I/home/manchonl/abyss-1.3.3/boost -I/home/manchonl/openmpi/include -I/home/manchonl/sparsehash/include -Wall -Wextra -Werror -fopenmp -g -O2 -MT abyss_scaffold-scaffold.o -MD -MP -MF .deps/abyss_scaffold-scaffold.Tpo -c -o abyss_scaffold-scaffold.o `test -f 'scaffold.cc' || echo './'`scaffold.cc
cc1plus: warnings being treated as errors
scaffold.cc: In function int main(int, char**):
scaffold.cc:733: warning: converting to unsigned int from double
scaffold.cc:734: warning: converting to unsigned int from double
scaffold.cc:736: warning: converting to unsigned int from double
scaffold.cc:740: warning: converting to unsigned int from double
make[2]: *** [abyss_scaffold-scaffold.o] Error 1
make[2]: Leaving directory `/home/manchonl/abyss-1.3.3/Scaffold'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/manchonl/abyss-1.3.3'
make: *** [all] Error 2

Any patch exists ?

thank you for your help.

abyss-samtobreak: automake error

warning: deprecated feature: target 'abyss-samtobreak' overrides 'abyss-samtobreak$(EXEEXT)'
change your target to read 'abyss-samtobreak$(EXEEXT)'

error: no viable conversion from 'std::istream' (aka 'basic_istream<char>') to 'void *'

Does not compile on OS X 10.9 Mavericks

clang++ -DHAVE_CONFIG_H -I. -I../../Assembly -I..  -I../.. -I../../Common -I../../DataLayer -I..     -Wall -Wextra -Werror -g -O2 -MT libassembly_a-AssemblyAlgorithms.o -MD -MP -MF .deps/libassembly_a-AssemblyAlgorithms.Tpo -c -o libassembly_a-AssemblyAlgorithms.o `test -f 'AssemblyAlgorithms.cpp' || echo '../../Assembly/'`AssemblyAlgorithms.cpp
In file included from ../../Assembly/AssemblyAlgorithms.cpp:4:
../../DataLayer/FastaReader.h:49:35: error: no viable conversion from
      'std::istream' (aka 'basic_istream<char>') to 'void *'
                operator void*() const { return m_in; }
                                                ^~~~
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/c++/v1/ios:576:9: note: 
      candidate function
        operator bool() const {return !fail();}
        ^
1 error generated.

subcommand gives incorrect version

After git clone, autoconf, configure & make, ./bin/abyss-pe -v returns the version of GNU make rather than the version of abyss. This is a bit surprising.
(I haven't tested other subcommands)

ABYSS-P not exiting when file not found by bunzip2

...
1: Reading `/genesis/scratch/ABySS/traymond/CPCGC_32/B09WGABXX_8_1_concat_qseq.txt.bz2'...
bunzip2: Can't open input file /genesis/scratch/ABySS/traymond/CPCGC_32/B09WGABXX_8_1_concat_qseq.txt.bz2: No such file or directory.
/genesis/scratch/ABySS/traymond/CPCGC_32/B09WGABXX_8_1_concat_qseq.txt.bz2:0: warning: file is empty
1: Read 0 reads. 1: Hash load: 0 / 268435456 = 0 using 137 MB
warning: `/genesis/scratch/ABySS/traymond/CPCGC_32/B09WGABXX_8_1_concat_qseq.txt.bz2': contains no usable sequence
2: Read 100000 reads. 2: Hash load: 3770968 / 268435456 = 0.014 using 243 MB
...

ABYSS-P continues when bunzip2 returns an error that the file is empty. Occurs on Centos 5.4, but not on Mac OS X.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.