Git Product home page Git Product logo

phageparser's Introduction

phageParser

##Taming Pathogens

Pathogens have played a crucial role in human history. The most iconic is the Black Death, which was (is thought to have been) caused by bacteria named Yersinia pestis, in the 14th century. A more recent example is AIDS caused by human immunodeficiency virus (HIV). Hence, understanding fundamentals of the host-pathogen interactions have been a central problem in epidemiology, which with the genomic age requires tools from quantitative fields to make sense of the large amount of sequencing data that is being generated.

A virus-bacteria interaction provides the simplest host-pathogen pair and is crucial for functioning of human gut to marine ecosystems. Interestingly, amechanism for “adaptive” immune system in bacteria against the viruses infecting them (usually referred to as phages) was discovered just a few years ago. Curiously, much like the anti-virus software and intrusion detection systems that rely on detecting patters found in malicious code, bacteria keeps a dynamic library of small pieces of phage genomes (spacers) to detect and neutralize phage attacks.

The basic problem of understanding how this immune system works is to understand the pattern of spacers on phage genomes: how many per phage genome, where on a phage genome, if the spacers containing regions are more or less dynamic compared to the rest of the phage genome etc. Since we have a large number of sequenced phages and a library of spacers from a variety of bacteria - ranging from deadly human pathogens such as tuberculosis to bacteria that live in our guts - we can attempt to aggregate this information to develop a more “complete” understanding of phage-bacteria interactions.

##Data Challenge

Happily, much of the existing data needed to understand bacteria / phage interaction has been released openly to the public and is available over the web; the current challenge is to help extract the relevant parts of that huge database, and automate the production of targeted datasets for these studies. More details are in the issue tracker!

##Installation

This package depends on Biopython:

sudo pip install Biopython

Also, make yourself a directory phageParser/output - some data cleaning scripts will dump their results there.

##Usage

  • To get a phage dataset, take a fasta-formatted list of genes (example in data/velvet-distinct-spacers.fasta) and upload to http://phagesdb.org/blast/ - example result in data/blast-phagesdb.txt

  • To clean up the results returned from phagesdb.org, change the raw filename in filterByExpect.py from data/blast-phagesdb.txt to whatever file contains the results from the BLAST search, then do

python filterByExpect.py

The result will be written to a file in phageParser/output, in a CSV formatted as

 Query, Name, Length, Score, Expect, QueryStart, QueryEnd, SubjectStart, SubjectEnd

with one header row (see #1 for discussion and details)

  • To query NCBI for full genomes, do
 cat accessionNumber.txt | python acc2gb.py [email protected] > NCBIresults.txt

where accessionNumber.txt contains a list of accession numbers of interest; results will be dumped to NCBIresults.txt - see #2 for ongoing development here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.