Git Product home page Git Product logo

vcentenarian's Introduction

Supporting code repository for "Centenarians have a diverse population of gut bacteriophages that may promote healthy lifespan"

Repository content

  • Workflows Major processing steps writtein Widdle-scripts/workflows can be found in wdls directory. Pipelines/wdls used in this repository for data processing can be imported into a relevant environment like Terra (https://firecloud.terra.bio/#).
  • Workflow scripts Relevant individual R and python scripts related to every workflow can be found in code directory

Data

Virus genomes and annotation have been deposited on Zenondo DOI: https://zenodo.org/record/6579480#.Yo3xHZNBweY

References

If you find any of the supporting code or information useful, please remember to cite the following articles:

And S. Nayfach et al. for relevant code for organising viruses into vOTUs and genus-level groups.

vcentenarian's People

Contributors

joacjo avatar

Stargazers

Liu zifeng avatar  avatar Jinlong Ru avatar

Watchers

Jinlong Ru avatar Henry Webel avatar  avatar

vcentenarian's Issues

Getting a vOTU query

hi,
I download the vOTU.fna (There are 5251 vOTUs in there.)data in the text and used the MGV pipeline(https://github.com/snayfach/MGV/tree/master/viral_detection_pipeline), like this

conda activate hmmer
cd ~/virus1/temp/MGV/mgv/viral_detection_pipeline
cp ~/virus/viwrap_result/paper/vOTUs.fna ./input/
#Call viral genes
./prodigal.linux -i input/vOTUs.fna -a input/vOTUs.faa -d input/vOTUs.ffn -p meta -f gff > input/vOTUs.gff #5251
#Output files include proteins (.faa), genes (.ffn), and gene coordinates (.gff). Genes called in metagenomic mode (-p meta).
 
#Run HMMER on imgvr and pfam databases
hmmsearch -Z 1 --cpu 16 --noali --tblout output/imgvr.out input/imgvr.hmm input/vOTUs.faa
hmmsearch -Z 1 --cut_tc --cpu 16 --noali  --tblout output/pfam.out input/pfam.hmm input/vOTUs.faa
#Here the -Z 1 flag is specified to make E-values comparable between databases and between samples. The IMG/VR database contains viral genes while the Pfam database contains non-viral genes.

#Count genes hitting viral and microbial marker genes
python count_hmm_hits.py input/vOTUs.fna input/vOTUs.faa output/imgvr.out output/pfam.out > output/hmm_hits.tsv
#Each gene assigned according to its best hit with E-value <1e-10. We found that several of the HMMs from IMG/VR HMMs are commonly found in non-viral genomes and several of the HMMs from Pfam are commonly found in viral genomes. Those HMMs are excluded from the analysis.

#Run VirFinder
Rscript virfinder.R input/vOTUs.fna output/virfinder.tsv
#VirFinder scores each contig using a machine-learning algorithm based on kmers https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-017-0283-5

#Quantify strand switch rate of genes
python strand_switch.py input/vOTUs.fna input/vOTUs.faa > output/strand_switch.tsv
#Here the code scans the proteins from each genome in genomic order. It counts the nuber of strand switches (+ to - or - to +) and divides by the total number of genes.

#Create master table of sequence features
python master_table.py output/hmm_hits.tsv output/virfinder.tsv output/strand_switch.tsv > output/master_table.tsv

#Predict viral contigs
python viral_classify.py \
--features output/master_table.tsv \
--in_base input/vOTUs \
--out_base output/vOTUs #vOTUs.tsv  #最终得到2897个病毒

During the run no error was reported, all steps were 5251, after running the last step python viral_classify.py, finally only got 2897 viruses, in the literature the result was 4422, where is the problem, I see viral_classify.py used to a rules file classification_rules.tsv, is that file different from what you guys were using. Thanks for answering the questions!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.