Git Product home page Git Product logo

amarinderthind / decontaminer Goto Github PK

View Code? Open in Web Editor NEW
8.0 3.0 0.0 1.48 MB

DecontaMiner is a tool designed and developed to investigate the presence of contaminating sequences in unmapped NGS data. It can suggest the presence of contaminating organisms in sequenced samples, that might derive either from laboratory contamination or from their biological source, and in both cases can be considered as worthy of further investigation and experimental validation. The novelty of DecontaMiner is mainly represented by its easy integration with the standard procedures of NGS data analysis, while providing a complete, reliable, and automatic pipeline. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2684-x

Home Page: http://www-labgtp.na.icar.cnr.it/decontaminer/

HTML 2.48% CSS 2.08% JavaScript 15.32% Perl 48.63% Shell 29.74% R 1.75%
rnaseq unmapped-ngs-data contaminating-sequences decontaminer-pipeline ngs decontaminer contamination contamination-detection investigation microbes infections sequencing bioinformatics rna-seq virus bacteria fungus virusscan cancer omics

decontaminer's Introduction

Decontaminer tool : detecting unexpected contamination in unmapped NGS data

Our decontaminer web server is under maintenance, so we are making it available here. Please read the decontaminer publication for more details.

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2684-x

Compressed folder "decontaMiner_1.4.tar.gz" contains latest version of the tool. Required databases can be downloaded from link mentioned below.

DecontaMiner, a tool to unravel the presence of contaminating sequences among the unmapped reads. It uses a subtraction approach to identify bacteria, fungi and viruses genome contamination. DecontaMiner generates several output files to track all the processed reads, and to provide a complete report of their characteristics. The good quality matches on microorganism genomes are counted and compared among samples. DecontaMiner builds an offline HTML page containing summary statistics and plots. The software is freely available at http://www-labgtp.na.icar.cnr.it/decontaminer.

Database download link (Pre built)

https://drive.google.com/drive/u/2/folders/1UQCiuUVnS5TpkT0We2AkRVew-km_gR_u

R script for o/p visualization

Please find the code here https://rpubs.com/amarinder1/deontaminer or https://github.com/amarinderthind/decontaminer/tree/master/R_decontaminer_visualizationScript

some updates for database configration settings

if you have fasta and idx path as listed below

fasta path DB/HUMAN_RNA/rRNA.fasta
idx path DB/HUMAN_RNA/rRNA.idx

You should mention this in configuration file in the following way

RIBO_DB=DB/HUMAN_RNA
RIBO_NAME=rRNA

Download link for example BAM files

https://drive.google.com/drive/u/2/folders/1B9WNJc1cGY_LIi2XGwkQ0h_9916_A8Ij

Three Main scripts to Run

(a) decontaMiner.sh

(b) filterBlastInfo.sh and

(c) collectInfo.sh

Updates on Paired end reads error

Decontaminer expects the following Paired end reads format:

@A00121:137:HTLF3DSXX:3:1110:3097:35571/1
@A00121:137:HTLF3DSXX:3:1110:3097:35571/2

If you have other reads format (like below) format of PR reads, you can rename/change from 

@A00121:137:HTLF3DSXX:3:1110:3097:35571 0:N: 00
@A00121:137:HTLF3DSXX:3:1110:3097:35571 1:N: 00

To following using simple linux command (mentioned below), with this command it will be converted to the required format like below:

@A00121:137:HTLF3DSXX:3:1110:3097:35571/1
@A00121:137:HTLF3DSXX:3:1110:3097:35571/2

Here is an example of Linux command 

sed 's/ 0:N:0://1/g' inputfile > outputfile
sed 's/ 1:N:0://2/g' inputfile > outputfile

Image of Decontaminer PipeLine

Overview of the results

Image of Decontaminer PipeLine

decontaminer's People

Contributors

amarinderthind avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

decontaminer's Issues

Format error for filterBlastInfo.sh

Hi,

When I run filterBlastInfo.sh with paired end option, I found the following error.
Although the error said "Paired end should be numbered as xxx/1 and xxx/2", the name of blast table file was like "XXX1.table" or "XXXX1_vs_bacteria.table". What should I do to address this error?

filterBlastInfo.sh -i /media/.../decontaminer/RESULTS/BACTERIA

--------------output--------------
-- FILTERING BLAST RESULTS --

INFO: paired/single end flag -s not specified. PAIRED END assumed (default option).
INFO: organism type flag -V not specified. O(thers) assumed (default option).

  • PROCESSING DETAILS
    INFO: processing PAIRED END data.
    INFO: processing NON virus alignments
    Processing file /media/.../decontaminer/RESULTS/BACTERIA/0_unmapped.1_vs_bacteria.table

Using L=read length G =0 M=0
Processing...

Format error on paired end names! Paired end should be numbered as xxx/1 and xxx/2 where xxx is the name of the query
--------------↑output--------------

Before filterBlastInfo.sh, I performed decontaMiner.sh using the paried end fastq files (0_unmapped.1.fastq, 0_unmapped.2.fastq). And in the directly RESULTS/BACTERIA, the files like 0_unmapped.1_vs_bacteria were generated.

Format error on paired end names!

Hi, I'm trying to run deconaMiner. The data I'm using is paired-end and in fastq format but as the manual says I merged the two fastqs. The problem is that when I try to run filterBlastInfo.sh I get this error

Format error on paired end names! Paired end should be numbered as xxx/1 and xxx/2 where xxx is the name of the query

I don't know if I'm merging the fastqs wrong or something, could you help me please?

Thanks!

Can't change parameters

Hey! I'm running decontaMiner with different parameters (filter_blastInfo) but it seems it's not letting me do it.
The parameters I'm setting are -g5 -m2 -l50 and the output says:

Using L=50G =2 M=2
Processing...
...done !

I can change -g to whatever and keeps telling me that, although if I change the length, it shows the correct set up.

The Path Issue

Hi,
I met a problem when I ran decontaMiner.sh. I just use your original databases and data(download from http://www-labgtp.na.icar.cnr.it/decontaminer/), but it said" ./Human_rna/rRNA.fasta, Human_rna/rRNA.idx is not an existing/valid absolute or relative path. I have checked the configure.txt, that's all right. Could you help me to solve the problem? Thank you so much!

error collectInfo.sh

Hi

I'm getting empty files after running collectInfo.sh. I think it is related to mineMicrobialData.pl.
Can you please advise on how to fix this error?

Thanks,

Lina

-- COLLECTING INFO --

INFO: organism type flag -V not specified. O(thers) assumed (default option).

  • PROCESSING DETAILS
    INFO: processing NON virus data
    INFO: filtering with DEFAULT match count threshold 5.
    Can't open input file: No such file or directory at /decontaminer_out/decontaMiner_1.4/perl_scripts/mineMicrobialData.pl line 44.
    Can't open input file: No such file or directory at /decontaMiner_1.4/perl_scripts/mineMicrobialData.pl line 44.
    INFO: filtering terminated. Generating data Matrix

WARNING: No output was written Input files do not contain non-zero matches

WARNING: No output was written Input files do not contain non-zero matches
INFO: processing terminated. Generating HTML report

ERROR !! NO FILES IN THE DIRECTORY: /decontaminer_out/RESULTS/BACTERIA/COLLECTED_INFO
ERROR !! NO FILES IN THE DIRECTORY: /decontaminer_out/RESULTS/BACTERIA/COLLECTED_INFOINFO: processing terminated. Output files written in the dir decontaminer_out/RESULTS/BACTERIA/HTML_REPORTS

Error with collectInfo.sh

./collectInfo.sh -i /path/to/BACTERIA/COLLECTED_INFO/VALID

-- COLLECTING INFO --

INFO: organism type flag -V not specified. O(thers) assumed (default option).

  • PROCESSING DETAILS
    INFO: processing NON virus data
    INFO: filtering with DEFAULT match count threshold 5.
    INFO: filtering terminated. Generating data Matrix

Processing sample IgG_S4, with total read counts: 571, valid reads: 571, others reads: 0
... finished.
Output files written in the directory:
/path/to/BACTERIA/COLLECTED_INFO/VALID

Processing sample IgG_S4, with total read counts: 571, valid reads: 571, others reads: 0
... finished.
Output files written in the directory:
/path/to/BACTERIA/COLLECTED_INFO/VALID

INFO: processing terminated. Generating HTML report

ERROR !! NO FILES IN THE DIRECTORY: /path/to/BACTERIA/COLLECTED_INFO/VALID/BACTERIA/COLLECTED_INFO
ERROR !! NO FILES IN THE DIRECTORY: /path/to/BACTERIA/COLLECTED_INFO/VALID/BACTERIA/COLLECTED_INFOINFO: processing terminated. Output files written in the dir /path/to/BACTERIA/COLLECTED_INFO/VALID/BACTERIA/HTML_REPORTS

How to interpret the output & visualize it

Dear authors,

Thank you for developing this nice tool. I would like to know how to interpret the output files of decontaminer and whether there is an option of your tool that visualizes the output of it.

Best regards,
Leonor

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.