Git Product home page Git Product logo

bi_project_2020's Introduction

BI_project_2020

The work describes the annotation of metagenomic assemblies of four cellulolytic microbial communities obtained from various substrates (straw, sawdust, leaf litter).

Student: Grigory Gladkov

Supervisors: Mike Raiko, Lavrentiy Danilov

System requements

  • Python 3.8:

    • tqdm 4.54
    • fn 0.43
    • bio 0.23
    • biopython 1.78
    • pandas 1.1.4
  • MetaBAT2 2.15

  • CheckM 1.0.18

  • taxonkit 0.6.2

  • hhmer 3.3.1

  • R 3.5.2:

    • dplyr 1.0.2
    • plyr 1.8.6
    • ggplot2 3.3

Aims and tasks

Find the differences between microbial communities that metabolising cellulose on straw and leaf litter from ONT assemblies.

Tasks:

  • de novo binning (MetaBAT2), compare with reference base binning (Kaiju), compare with 16S data

  • find community-specific glycoside hydrolases

Binning and quality control

Binning was done using Metabat2 with followed command:

metabat2 -i consensus.fasta -o metabat2_bins_3

Quality control was done out using CheckM:

checkm lineage_wf -x fa  -f checkm/N2_MAGs_checkm.tab -t 50 metabat2_bins_3/ out/
checkm qa ./out/lineage.ms ./out -o 2 -f res --tab_table -t 20

For ssu extraction, we use ssu_finder:

checkm ssu_finder consensus.fasta bins/ out/ -x fa -t 20

where out - out folder, metabat2_bins_3 - folder with bins, -t - number of used threads, -x format of fasta file, consensus.fasta - assemblies.

A preliminary attempt was made to binning using coverage information, which was unsuccessful. For alignment per assembly, we used two alignment algorithms:

bwa mem  -t 84 consensus.fasta all.fq | samtools sort -o all.bam
maCMD -p 'Nanopore' -x mc_out/consensus.json -i ../all.fq -t 64 -o mc_out/consensus.sam

To generate a file with sequencing depth, the following command was used:

jgi_summarize_bam_contig_depths --outputDepth true_depth.txt --referenceFasta consensus.fasta  all.bam

with result:

0 bins  formed.

Search for glycosyl hydrolases with hmmer

To search for domain families of hydroside hydrolases by hmm profile, the following code was used, which launched a search for each .faa file in the directory:

for filename in *.faa; do
    hmmscan --noali --notextw --acc -E 0.000000000000000001 --cpu 50 -o  hm_${filename}.txt hmm/glyco.hmm $filename
done

glyco.hmm - hmm profile, acc - accuracity trashold, filename - prokka output

Post-processing of the obtained results was carried out in jupyter notebook (parse_hmm - this code is based on the code of Ilya Corvigo). Adding Kaiju results to hhmer results was done using the ultimate pipe for each community:

cat ../c1_ann.tsv | cut -f2 |grep -f /dev/stdin pr_01.gff | awk '{print $1}' | grep -f /dev/stdin ka_01.tsv | cut -f3 | taxonkit --data-dir ~/storage/temp/  -j 50 lineage | taxonkit --data-dir ~/storage/temp/ -j 50 reformat > kid_01.txt

Where - c1_ann.tsv - output from hmmer, pr_01.gff - Prokka output, ka_01.tsv - Kaiju output, taxonkit - add NCBI taxonomy. Visualisation of hmmerscan results have done with com_vis.R script.

Results

  • As a result of binning, 10 genomes were obtained from the metagenome.
  • In the metagenomes found matches with catalase families responsible for cellulose/hemicellulose decomposition.
  • The collected metagenomes and the most represented groups in which the necessary catalases were found do not coincide.
  • Differences in taxonomy are more pronounced than differences in the representation of glycoside hydrolase families.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.