Git Product home page Git Product logo

repark's Introduction

## Description ##
RepARK.pl is a wrapper script for constructing a repeat library from sequencing reads using Jellyfish and either Velvet or CLC.


## System Requirements ##
- Jellyfish (tested with versions 1.1.6, 2.1.1, 2.1.4)
	http://www.cbcb.umd.edu/software/jellyfish
- Velvet (tested with version 1.2.08)
	http://www.ebi.ac.uk/~zerbino/velvet
- CLC Assembler (optional, tested with version 4.4.0)
	http://www.clcbio.com/

These programs should be in the PATH, or the appropriate variables (e.g. $jellyfish_path) in RepARK.pl should be modified accordingly.


## Input Data ##
Input data can be one or more fasta or fastq files. They can be given as arguments with the option -l (multiple times) or included in the RepARK.pl
The reads can be in gzipped format ending with ".gz". Note that it is currently not supported to mix compressd and uncompressed libraries.

## Example Demo Data ##
In order to test the pipeline run ./RepARK.pl without arguments.
The command line parameters are optional. If either are omitted, RepARK.pl will use the defaults which are adapted to the demo data.
Check if the assembly results in 133 sequences of a size between 27,902-27,917bp. Due to velvet creates slightly different assemblies with the same data, we provided this size range and 10 libraries based on the demo data with defauld parameters that can be downloaded from [1]. If you discover greater disagreements please contact the authors (see Support).


## Output ##
The repeat consensuses can be found in the file repeat_lib.fasta within the working directory.


## Usage ##

RepARK.pl [ options ]

Options: [default_values]
  -o  output dir [RepARK_working]
  -l  library file - can be used multiple times [pe400.fq.gz]
  -a  assembler  (options: clc,velvet) [velvet]
  -s  jellyfish hash size [100000000]
  -k  jellyfish kmer size [31]
  -p  number of threads used by jellyfish [1]
  -t  set manually a threshold, if automatic prediction is not working (this step will then be skipped)
  -d  debug mode gives some more information
  -n --nojf  skip Jellyfish computation (jf_RepARK.kmers|histo must exist in the working dir)
  -h --help  this help


## Notes ##
For 40x human data, we set -s to 54100000000 which reserves 370Gb of RAM for jellyfish (v 1.1.6) but there is no merging of the intermediate files needed.

In order to discriminate between the error peak and the unique-kmer peak, sufficient read coverage depth is required (>10x).

## Support ##
In case of problems, please run the script in debugging mode (-d) and send the output to [email protected].



[1] ftp://genome.fli-leibniz.de/pub/repeat-assemblies/RepARKdemoruns.tar.gz


Change Log
---------
1.3.0	- clc_assembler support
	- gzipped input file support
	
1.2.2	- Checking for k <= velvet's hash size

1.2.1	- Included support for jellyfish v2.x

1.2	- Initial release

repark's People

Contributors

phkoch avatar

Watchers

Chris Winefield avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.