Git Product home page Git Product logo

lighter_paper's Introduction

First:

  1. Change to the Evaluation subdirectory of the repo
  2. Obtain Quake; we used version 0.3
  3. Obtain Musket; we used version 1.1.
  4. Obtain BLESS; we used v0p17
  5. Obtain SOAPec: we used version 2.01

(For Musket, we changed the 156th line of kseq.cpp to "if (ks_getuntil(ks, '\n', &seq->name, &c) < 0)", so it can output Mason's id field )

To generate simulated E. coli data set:

  1. Download FASTA from refseq: http://www.ncbi.nlm.nih.gov/nuccore/NC_010473.1
  2. Move sequnce.fa to current directory and rename
  3. Run sh scripts/generate_simulate.sh. This will take a while and use several gigabytes of memory.

To generate simualted diploid test data:

  1. Run generate_diploid.sh

To generate simulated data set using art:

  1. Run generate_art_simulate.sh
  2. Run perl MasonHeader.pl xxx.fq xxx.aln to add Mason-style id field for each data set.

To generating the 2x100bp 75x ERR022075 data set:

  1. Download ERR022075_1.fastq.gz from the European Nucleotide Archive
  2. Unzip it: gunzip ERR022075_1.fastq.gz
  3. Make untrimmed read set: perl Sample.pl 0.077 < ERR022075_1.fastq > read1.fq
  4. Make trimmed read setperl Sample.pl 0.077 < ERR022075_1.fastq > read2_tmp.fq ; perl Trim.pl read2_tmp.fq > read2.fq

To generate simulated data using art_illumina:

Generating the kmers to test: perl GetRefKmers.pl 17 < e_coli_k12.fa perl GetKmersAroundpos.pl 17 ref1.fa ref2.fa mutate.pos

Softwares: quake v0.3 : run_quake.sh musket v1.1 : run_musket.sh soapec v2.01: run_soapec.sh bless v0.17 : run_bless.sh lighter : run_lighter.sh

Evaluating simulated data set: Compile verify.cpp simulate/run.sh

Evaluating real data set: in the corresponding fold: run_bowtie.sh evaluate_bowite.sh run_velvet.sh (pick the parameter giving best NG50) run_quast.sh

Evaluating memory usage: compile memusage.cpp (obtained from musket's project) memusage (command to the program). For musket, we tested different number of size of bloom filter and choose the result with minimum memory consumption. e.g. For gage_chr14 data set: ~/Tools/memusage ./musket -p 10 -inorder -k 19 2200000000 -omulti tmp ~/data/Lighter/gage_chr14/frag_1.fastq ~/data/Lighter/gage_chr14/frag_2.fastq

lighter_paper's People

Contributors

benlangmead avatar mourisl avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.