Git Product home page Git Product logo

short_project's Introduction

Author : Etienne JEAN
Date : 18th september 2018

This program is a de novo genome assembler. 
It uses Euler cycles in balanced De Bruijn Graphs in order to re-assemble a genome from a file of reads obtained by sequencing. See /doc directory for more information.
It runs in python 3. Only the basic libraries are needed. The source code and python executables are in the /src directory.
Some test data to run the program are in the /data directory.
Some results already generated are in the /results directory

All scripts implement a help display. Use the option -h with any of them to show the help.

---------------------------------------------------------------------
Examples of commands to run the program. All of the following should be executed in the base directory.


I - Generation of a random genome
Random genome of size 10kb
	src/genome_generation.py 10000 > data/random_10kb.fasta
Random genome of size 1Mb, with 60% GC content
	src/genome_generation.py 1000000 --gc-content 0.6 > data/random_1Mb.fasta


II - Sequencing simulation of a genome
Sequencing with default parameters (reads 100bp, coverage 50)
	src/sequencing.py data/random_10kb.fasta > data/random_10kb_reads.fasta
Sequencing with read length 120bp and coverage 60
	src/sequencing.py data/random_1Mb.fasta -l 120 -c 60 > data/random_1Mb_reads.fasta
Sequencing the genome of Mycoplasma geniotalium, read length 400bp, coverage 100
	src/sequencing.py -l 400 -c 100 data/NC_000908.2.fasta > data/NC_000908.2_reads.fasta


III - De novo assembly from a file of reads
De novo assembly of the genome of size 10kb, with k-mers length 30bp
	src/denovo_assembly.py 30 data/random_10kb_reads.fasta results/random_10kb_assembly.fasta > random_10kb_assembly.log
De novo assembly of the genome of size 1Mb, with k-mers length 60bp
	src/denovo_assembly.py 60 data/random_1Mb_reads.fasta results/random_1Mb_assembly.fasta > results/random_1Mb_assembly.log
De novo assembly of the genome of Mycoplasma genitalium, k-mers length 250bp
	src/denovo_assembly.py 250 data/NC_000908.2_reads.fasta results/NC_000908.2_assembly.fasta > results/NC_000908.2_assembly.log


IV - Compare the assembly with the reference genome
Test between reference and assembly for genome of size 10kb
	src/test_assembly.py data/random_10kb.fasta results/random_10kb_assembly.fasta 
Test between reference and assembly for genome of size 1Mb
	src/test_assembly.py data/random_1Mb.fasta results/random_1Mb_assembly.fasta
Test between reference and assembly for genome of Mycoplasma genitalium
	src/test_assembly.py data/NC_000908.2.fasta results/NC_000908.2_assembly.fasta

short_project's People

Contributors

etjean avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.