Git Product home page Git Product logo

profile-pipeline-for-metagenomic's Introduction

profile-pipeline-for-metagenomic

需要提前准备的文件目录,放在同一目录下

序列质控

mkdir 00_rawdata #原始序列文件
mkdir 01_trim #储存序列质控后的原始序列文件
mkdir 02_rmhuman #比对到人类基因组上去除人类序列

species profile

mkdir 03_metaphlan2 #metaphlan2物种丰度注释文件
mkdir 03_bowtie2 #metaphlan2的中间输出文件

functional profile

mkdir 04_humann2 #humann2功能注释结果文件
mkdir 04_combined #存放中间文件
mkdir 03_SAM #与ICG gene catalog比对得到的SAM比对结果文件

gene profile

mkdir 04_genecount #对SAM结果文件进行统计得到的每一个样本的基因计数
mkdir 05_geneprofile # normalized后结果

required softwares:

  • Bowtie2
  • Trimmomatic
  • BWA
  • bedtools
  • Samtools
  • metaphlan2
  • humann2

required DataBase:

  • Human Genome database: /home1/Laisenying/Tools/data/human/hg38
  • ICG gene catalog: /home1/Laisenying/.local/share/ngless/data/Modules/igc.ngm/0.9/igc.fna
  • chorophlan database: /share/home1/Laisenying/Tools/data/humann2/chocophlan
  • uniref database: /share/home1/Laisenying/Tools/data/humann2/uniref

1、------ 原始序列质量控制,并去除宿主(人类)序列 ------

./preprocess.sh accs_id.txt 

accs_id.txt : 存放NCBI run_id 的txt文件

  1. ------ generate species profile -------------
./SpeciesProfile.sh accs_id.txt
  1. ----- generate functional profile ----------
./FunctionalProfile.sh accs_id.txt
  1. ----- generate Gene profile ------
python bowtie2_mapping.py \
  -s /home1/Laisenying/Data-analysis/CRC/02_rmhuman \
  -ref /home1/Laisenying/.local/share/ngless/data/Modules/igc.ngm/0.9/igc.fna \
  -o /home1/Laisenying/Data-analysis/CRC/03_SAM \
  -BWA /home1/Laisenying/Tools/bwa \
  -sam /home1/Laisenying/Tools/samtools/samtools-1.9 \
  -bed /home1/Laisenying/Tools/bedtools2-2.25.0/bin \
  -list /home1/Laisenying/Data-analysis/CRC/accs_id2.txt
  
# 合并结果文件
python /home1/Laisenying/Tools/gene_annotation/generate_matrix.py \
  -i /home1/Laisenying/Data-analysis/CRC/03_SAM \
  -o /home1/Laisenying/Data-analysis/CRC/04_genecount 
  
# TPM normalization and filtered
python /home1/Laisenying/Tools/gene_annotation/normalization.py \
  -i /home1/Laisenying/Data-analysis/CRC/04_genecount/DNA_hGEM.txt \
  -minTPM 1.0 \
  -minSam 1 \
  -ref /home1/Laisenying/.local/share/ngless/data/Modules/igc.ngm/0.9/igc.fna \
  -o /home1/Laisenying/Data-analysis/CRC/05_geneprofil

profile-pipeline-for-metagenomic's People

Contributors

qxibai avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.