Git Product home page Git Product logo

ngs-fzb / mtbseq_source Goto Github PK

View Code? Open in Web Editor NEW
38.0 13.0 21.0 198.14 MB

MTBseq is an automated pipeline for mapping, variant calling and detection of resistance mediating and phylogenetic variants from illumina whole genome sequence data of Mycobacterium tuberculosis complex isolates.

License: Other

Perl 100.00%
tuberculosis pipeline phylogeny variants epidemiology ngs genomics mycobacterium-tuberculosis

mtbseq_source's Introduction

License: GPL v3 GitHub release

install with bioconda Conda (channel only) Conda

Container

MTBseq

MTBseq is an automated pipeline for mapping, variant calling and detection of resistance mediating and phylogenetic variants from Illumina whole genome sequence data of Mycobacterium tuberculosis complex isolates.

Getting Started

For complete installation instructions, description and usage examples please read the MANUAL.md.

Installation

Conda

v.1.0.4 is now broken on bioconda as picard is requesting now a higher java version. You can fix this by downgrading picard to a version smaller than 3. conda install picard=2.27.5

This has been fixed in the bioconda v.1.1.0 which should be available soon on bioconda.

Install Conda or Miniconda hereafter install MTBseq with:

conda install -c bioconda mtbseq

Source

Please see the MANUAL.md for installation from source.

Requirements

* Perl: Perl 5, version 22, subversion 1 (v5.22.1)
* Java: Oracle Java 8 or OpenJDK 8 (no other version work with the GenomeAnalysisTK 3.8)

** MTBseq uses the following CPAN and core modules: **
* MCE                 (v1.833)
* Statistics::Basic   (v1.6611)

* FindBin             (v1.51)core
* Cwd                 (v3.62)core
* Getopt::Long        (v2.5)core
* File::Copy          (v2.30)core
* List::Util          (v1.49)core
* Exporter            (v5.72)core
* vars                (v1.03)core
* lib                 (v0.63)core
* strict              (v1.09)core
* warnings            (v1.34)core

** MTBseq uses the following third party software: **
** Binaries (compiled on Ubuntu 16.04) are included, except for GenomeAnalysisTK 3.8**
* bwa                 (v0.7.17)
* GenomeAnalysisTK    (v3.8)
* picard              (v2.17.0)
* samtools            (v1.6)

mtbseq_source's People

Contributors

abhi18av avatar cutpatel avatar takohl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mtbseq_source's Issues

454 input sequencing data (cff > fastq.gz)

Tried to analyze data from 454 sequencing. Got the error "failed: 256"

How can I use my data in the correct format?

2_1280_merged_R1.fastq.gz

_

`@JI8HBZ101EU04N length=92 xy=1876_2277 region=1 run=R_2015_07_08_13_05_49_
CGGCGGCGACTCGGGGCCGGGTCGCTGCCGTGGGCGACGACACCGATGCCGACGGTGGCGGCGCGGATCGCGGCCGCGTCATTGGAGCCGTC
+
IIIIIIIHICI>111777====IICICCCA<3339CCCCGFC@DDIID<<;<IEEIEEHEEIIIEEIIIIIEEEEHEEG;62222>998<@9
@JI8HBZ101ETFYY length=57 xy=1858_1928 region=1 run=R_2015_07_08_13_05_49_
ACGCGGAGGTACGCGACCGCCGATGACACCACGAAGACCGGTCGCGTACCTCCGCGT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIG777687776<CIIIIIIIIIHBB@EFIII`

_

best regards, thanks

No Read Files To Map

Hi All,

I am trying to run MTBseq and get the following error:

"No read files to map"

Anyone have any ideas? I have attached a copy of the terminal window below.

Cheers,

Peter

(base) linux-biostation@Linux-BioStation:~$ MTBseq --step TBfull '/home/linux-biostation/Documents/Peter F Work/MTBseq Test/DK progression to MDR/IEMDR03_S3_L001_R1_001.fastq.gz' + '/home/linux-biostation/Documents/Peter F Work/MTBseq Test/DK progression to MDR/IEMDR03_S3_L001_R2_001.fastq.gz'
Missing option after +
[2019-07-01 14:08:02] Found perl module: MCE
[2019-07-01 14:08:02] Found perl module: Statistics::Basic
[2019-07-01 14:08:02] Found bwa in your PATH!
[2019-07-01 14:08:02] Found samtools in your PATH!
[2019-07-01 14:08:02] Found gatk3 in your PATH!
[2019-07-01 14:08:02] Found picard in your PATH!

MTBseq 1.0.3 - Copyright (C) 2018 Thomas A. Kohl, Robin Koch, Christian Utpatel,
Maria Rosaria De Filippo, Viola Schleusener,
Patrick Beckert, Daniela M. Cirillo, Stefan Niemann

This program comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to redistribute it under certain conditions.

[2019-07-01 14:08:02] You are linux-biostation.
[2019-07-01 14:08:02] Your current working directory is: /home/linux-biostation
[2019-07-01 14:08:02] You requested 1 thread(s) for the pipeline.

[2019-07-01 14:08:02] Your parameter setting is:
[2019-07-01 14:08:02] --step TBfull
[2019-07-01 14:08:02] --continue 0
[2019-07-01 14:08:02] --samples NONE
[2019-07-01 14:08:02] --project NONE
[2019-07-01 14:08:02] --resilist /home/linux-biostation/miniconda3/share/mtbseq-1.0.4-0/var/res/MTB_Resistance_Mediating.txt
[2019-07-01 14:08:02] --intregions /home/linux-biostation/miniconda3/share/mtbseq-1.0.4-0/var/res/MTB_Extended_Resistance_Mediating.txt
[2019-07-01 14:08:02] --categories /home/linux-biostation/miniconda3/share/mtbseq-1.0.4-0/var/cat/MTB_Gene_Categories.txt
[2019-07-01 14:08:02] --basecalib /home/linux-biostation/miniconda3/share/mtbseq-1.0.4-0/var/res/MTB_Base_Calibration_List.vcf
[2019-07-01 14:08:02] --ref M._tuberculosis_H37Rv_2015-11-13.fasta
[2019-07-01 14:08:02] --minbqual 13
[2019-07-01 14:08:02] --all_vars 0
[2019-07-01 14:08:02] --snp_vars 0
[2019-07-01 14:08:02] --lowfreq_vars 0
[2019-07-01 14:08:02] --mincovf 4
[2019-07-01 14:08:02] --mincovr 4
[2019-07-01 14:08:02] --minphred20 4
[2019-07-01 14:08:02] --minfreq 75
[2019-07-01 14:08:02] --unambig 95
[2019-07-01 14:08:02] --window 12
[2019-07-01 14:08:02] --distance 12
[2019-07-01 14:08:02] --quiet 0

[2019-07-01 14:08:02] The following programs will be used, if necessary:
[2019-07-01 14:08:02] /home/linux-biostation/miniconda3/share/mtbseq-1.0.4-0/opt/bwa_0.7.17
[2019-07-01 14:08:02] /home/linux-biostation/miniconda3/share/mtbseq-1.0.4-0/opt/samtools_1.6
[2019-07-01 14:08:02] /home/linux-biostation/miniconda3/share/mtbseq-1.0.4-0/opt/picard_2.17.0
[2019-07-01 14:08:02] /home/linux-biostation/miniconda3/share/mtbseq-1.0.4-0/opt/GenomeAnalysisTK_3.8

[2019-07-01 14:08:02] The following directories will be used, if necessary:
[2019-07-01 14:08:02] /home/linux-biostation/Bam
[2019-07-01 14:08:02] /home/linux-biostation/GATK_Bam
[2019-07-01 14:08:02] /home/linux-biostation/Mpileup
[2019-07-01 14:08:02] /home/linux-biostation/Position_Tables
[2019-07-01 14:08:02] /home/linux-biostation/Called
[2019-07-01 14:08:02] /home/linux-biostation/Joint
[2019-07-01 14:08:02] /home/linux-biostation/Amend
[2019-07-01 14:08:02] /home/linux-biostation/Classification
[2019-07-01 14:08:02] /home/linux-biostation/Groups

[2019-07-01 14:08:02] ### [TBfull] selected ###

[2019-07-01 14:08:02] No read files to map! Check content of /home/linux-biostation!
(base) linux-biostation@Linux-BioStation:~$

No data in TBamend and TBgroup? Please help

[2019-11-28 12:25:50] Start creating joint variant table...
[2019-11-28 12:25:50] Start parsing M._tuberculosis_H37Rv_2015-11-13.fasta...
[2019-11-28 12:25:50] Finished parsing M._tuberculosis_H37Rv_2015-11-13.fasta!
[2019-11-28 12:25:59] Start parsing M._tuberculosis_H37Rv_2015-11-13_genes.txt...
[2019-11-28 12:26:02] Finished parsing M._tuberculosis_H37Rv_2015-11-13_genes.txt...
[2019-11-28 12:26:02] Parsing variant files...
[2019-11-28 12:26:09] Finished parsing variant files!
[2019-11-28 12:26:09] Sart printing joint variant file scaffold...
[2019-11-28 12:26:09] Finished printing joint variant file scaffold!
[2019-11-28 12:26:09] Start parsing position lists, extend called variants and complete joint variant list...
[2019-11-29 01:28:07] Finished parsing position lists, extend called variants and complete joint variant list!
[2019-11-29 01:28:09] Start printing coverage breadth output...
[2019-11-29 01:29:00] Finished printing coverage breadth output!
[2019-11-29 01:29:04] Finished creating joint variant table!

[2019-11-29 01:29:06] Amending joint variant table:
[2019-11-29 01:29:06] MTBphylo_joint_cf4_cr4_fr75_ph4_samples457.tab

[2019-11-29 01:29:06] Start amending joint variant table...
[2019-11-29 01:29:06] Start parsing /home/yfong/miniconda3/envs/MTBseq/share/mtbseq-1.0.4-0/var/cat/MTB_Gene_Categories.txt..
[2019-11-29 01:29:06] Finished parsing /home/yfong/miniconda3/envs/MTBseq/share/mtbseq-1.0.4-0/var/cat/MTB_Gene_Categories.txt!
[2019-11-29 01:29:06] Start parsing /home/yfong/miniconda3/envs/MTBseq/share/mtbseq-1.0.4-0/var/res/MTB_Resistance_Mediating.txt and /home/yfong/miniconda3/envs/MTBseq/share/mtbseq-1.0.4-0/var/res/MTB_Extended_Resistance_Mediating.txt..
[2019-11-29 01:29:07] Finished parsing /home/yfong/miniconda3/envs/MTBseq/share/mtbseq-1.0.4-0/var/res/MTB_Resistance_Mediating.txt and /home/yfong/miniconda3/envs/MTBseq/share/mtbseq-1.0.4-0/var/res/MTB_Extended_Resistance_Mediating.txt!
[2019-11-29 01:31:46] Start creating MTBphylo_joint_cf4_cr4_fr75_ph4_samples457_amended_u95_phylo.tab with window length 12...
[2019-11-29 01:31:46] Finished creating MTBphylo_joint_cf4_cr4_fr75_ph4_samples457_amended_u95_phylo.tab with window length 12!
[2019-11-29 01:31:46] Finished amending joint variant table!

[2019-11-29 01:31:46] Calling groups from:
[2019-11-29 01:31:46] MTBphylo_joint_cf4_cr4_fr75_ph4_samples457_amended_u95_phylo_w12.tab

[2019-11-29 01:31:46] Start group calling...
[2019-11-29 01:31:46] Start parsing MTBphylo_joint_cf4_cr4_fr75_ph4_samples457_amended_u95_phylo_w12.tab...
[2019-11-29 01:31:46] Finished parsing MTBphylo_joint_cf4_cr4_fr75_ph4_samples457_amended_u95_phylo_w12.tab!
[2019-11-29 01:31:46] Start building distance matrix for MTBphylo_joint_cf4_cr4_fr75_ph4_samples457_amended_u95_phylo_w12.tab...
[2019-11-29 01:31:47] Finished building distance matrix for MTBphylo_joint_cf4_cr4_fr75_ph4_samples457_amended_u95_phylo_w12.tab!
[2019-11-29 01:31:47] Start calling groups for MTBphylo_joint_cf4_cr4_fr75_ph4_samples457_amended_u95_phylo_w12.tab...
[2019-11-29 02:14:09] Finished calling groups for MTBphylo_joint_cf4_cr4_fr75_ph4_samples457_amended_u95_phylo_w12.tab!
[2019-11-29 02:14:09] Finished group calling!

[2019-11-29 02:14:09] ### MTBseq finished!!! ###

It finished the analysis for TBjoin, and proceed towards amend and grouping for distance matrix and phylo but the result doesn't seem right where I do not have any fasta sequence at all and also all distance matrix is all 0. Maybe I have done something wrong? But I can't spot it myself with limited knowledge in bioinformatics.

I zipped all the attachment.

D.zip
Can anyone please advise? and help?

Annotation File - Other organisms

Hi,

This isn't an issue but more help needed but cant find where to add the tag.

I want to add Staph aureus to the /MTBseq_source/var/ref path. I have added the fasta file and want to add the annotation now. The only thing is I'm not sure how to get the file identical to the H37Rv.txt file thats in the path all ready. I downloaded the file from NCBI as a GFF3 file but format is different to that used by MTBseq.

Where would you recommend one downloads annotation files? Is there a universal standard? DO I have to make it myself? I have attached a zip of the file from NCBI.

Cheers,

P

CP007659_1.gff3.zip

How do you upgrade to Java 1.8?

INFO> [2019-11-17 03:08:58] Found perl module: MCE
[2019-11-17 03:08:58] Found perl module: Statistics::Basic
[2019-11-17 03:08:58] Found bwa in your PATH!
[2019-11-17 03:08:58] Found samtools in your PATH!
[2019-11-17 03:08:58] Found gatk3 in your PATH!
[2019-11-17 03:08:58] Found picard in your PATH!
[2019-11-17 03:08:58] Need exatly java 1.8 for GATK execution. You have java 1.6!
I tried install numerous option to no success. please help

Improvement: Downgrade GATK version

Issue:
Working with GATK 3.8 is a hassle. It requires download the jar and registering it separately due to licensing issues. This limits the ease of installation and distribution of MTBseq.

Solution:
Downgrade the required version to GATK-lite 2.3.9. This is the last version of GATK released under a free for distribution license. It has pretty much the same functionality as version 3.8. There is a conda package available for it: https://anaconda.org/faircloth-lab/gatk-lite

Limitations:
GATK 2.3.9 does not support indel searching in BaseRecalibrator. This can easily be disabled in TBrefine with flag --disable_indel_quals. I made few other changes to make it work.
The java version would also have to be downgraded to 1.7.

Summary:
I've activated a conda environment and made the above changes. MTBseq produces the exact same results.

SNP trees

Hi,

This is more a query than an issue. After running

  1. MTBseq --step TBfull
  2. MTBseq --step TBjoin .......

I use the .....amended_u95_phylo_w12.plainIDs.fasta file to generate a newick file in FastTree

My query is, is this tree showing SNP differences in the distance between isolates?

Tree

Update Suggestions

Hi,

I wonder could I make some suggestions for future updates of MTBseq?

  1. I'm currently running it on a college cluster. Is it possible to run it in parallel across a number of nodes?

  2. The file name system for accepting input files. Illumina ends in _R1_001 and its a bit a bit laborious having to clean up the naming system as well as make a samples text file when you've hundreds of samples. Unless theres a command line I'm not aware of?

  3. The input/output directory. Can we point to a location? Its annoying that we can't and that everything is in the one directory that have to be moved after. Also, If I run TBfull on a run and a few weeks later run it again on a new run, if I want to run TBjoin on both I have to copy them into the one directory. It would be nicer to be able to point to the directory for the files.

  4. Can it run on a GPU or on more threads? Currently I am running 108 samples on 8threads. I started this on the morning of the 23rd Oct and it is now lunch time on the 25th and it is still on the creating positions lists. I'm just curious if there is a way in which it can be sped up.

Cheers,

P

Stop in TBjoin

I am analyzing 5 paired-end samples and at the time of doing the TBJoin I get the following error. "Could not find files for SampleID_LibID! Did you run TBlist and TBvariants for SampleID_LibID?"
My samples.txt file has two columns separated by tabs, the first row is "SampleID" and "LibID" and the next five rows the sample name and the library name. All samples have the same library name. The previous steps are ok. Attach log-file. Someone's had this issue
MTBseq_2019-11-26_.log

Thank you for your help

Tbjoin_Reg

   Im currently running MTBseq pipeline for our analysis. I have successfully run with TBfull option. Now when i try to run TBjoin option, im getting error as "Could not find files for  ampleID_LibID! Did you run TBlist and TBvariants for SampleID_LibID?".  TBlist and TBvariants are already ran. Please kindly help how to solve this.  

command - MTBseq --step TBjoin --samples list.txt --project phylo

TBtools.pm error?

Hi,

I was trying to run TBjoin on a large sample size and it appeared to run and finish. But when I looked at the w12.PlainID.fasta file there was nothing in it. When I looked at the log there was a stream of string like the below with the line number changing from 1 to 23861. I'm not sure what it means. I cant send the log.txt file because its over 24GB in size!

Any ideas?

Peter

Use of uninitialized value $freq1 in substitution (s///) at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1767, line 4.
Use of uninitialized value $type in string eq at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1769, line 4.
Use of uninitialized value $allel1 in string eq at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1769, line 4.
Use of uninitialized value $type in hash element at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1770, line 4.
Use of uninitialized value $allel1 in hash element at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1771, line 4.
Use of uninitialized value $covf in numeric ge (>=) at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1774, line 4.
Use of uninitialized value $allel1 in pattern match (m//) at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1780, line 4.
Use of uninitialized value $allel1 in pattern match (m//) at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1781, line 4.
Use of uninitialized value $type in pattern match (m//) at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1785, line 4.
Use of uninitialized value $type in pattern match (m//) at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1785, line 4.
Use of uninitialized value $type in concatenation (.) or string at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1789, line 4.
Use of uninitialized value $allel1 in concatenation (.) or string at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1789, line 4.
Use of uninitialized value $covf in concatenation (.) or string at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1789, line 4.
Use of uninitialized value $covr in concatenation (.) or string at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1789, line 4.
Use of uninitialized value $qual20 in concatenation (.) or string at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1789, line 4.
Use of uninitialized value $freq1 in concatenation (.) or string at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1789, line 4.
Use of uninitialized value $cov in concatenation (.) or string at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1789, line 4.
Use of uninitialized value $subs in concatenation (.) or string at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1789, line 4.
Use of uninitialized value $allel1 in concatenation (.) or string at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1790, line 4.
Use of uninitialized value $subs in concatenation (.) or string at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 1791, line 4.

docker

Has anyone made a docker image of this package?

Resistance output

Hello!
Sorry for the basic question, but where can I find the resistance information in the output files?

Thanks in advance!
Sara

Problems with GATK and provided reference

Hi,
I am trying to use your pipeline, I have been able to install in and run the first TBbwa.

However, after ensuring that java8 was being used (not java9) the following error resulted at the GATK step

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.8-0-ge9d806836): 
INFO  10:20:10,530 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Bad input: We encountered a non-standard non-IUPAC base in the provided reference: '0'
##### ERROR ------------------------------------------------------------------------------------------

After some investigation, it appeared that there was an inappropriate line ending in the reference, which I fixed, but which meant that the I had to re-index the fast file (samtools faidx path/to/provided/reference.fasta)

The pipeline appears to run to completion now.
Cheers
Kristy

--resislist

Hi sorry for the basic command questions. I completed the whole pipeline and I can't seem to find the conferring gene resistance prediction for first line/second line TB drugs? until i see the extra option --resislist? Now given i have completed the whole process how do I use this option without going back to restart the whole analysis again?

Updates for the resistance mutations database

Dear MTBseq developers,
I am using MTBseq, and so far it works very well, with ease of use and accurate results.
I do have one concern I would like you to help me with:
Do you update the resistance mutations database regularly?
If yes, how often?
If no, how do you propose to make sure the resistance data used is correct and up-to-date?

Regards,
Mor Rubinstein
National Public Health Laboratory, Tel Aviv, Israel

Do I have to put the FASTQ in the MTBseq.pl folder?

I am trying out MTBseq but am having some difficulties.

  1. I have installed it in a folder /opt/mtbseq/ as user nobody and have it in my $PATH
  2. I put my reads in /home/torst/tbdata with the right naming system.
  3. Then I cd /home/torst/tdbdata and run MTBseq.pl --step TBbwa as user torst

But i get these errors:

<ERROR> [2018-06-24 18:19:47]   move failed: TBbwa.pm line: 116

Which i track backwards to the .bamlog to find:

/opt/mtbseq/var/ref/M._tuberculosis_H37Rv_2015-11-13.fasta.pac' : Permission denied

Do i really need write access to the software folder?
That can never work on a HPC system.

Could you make it such that you have an --outdir and/or copy the reference into that first, along with all the other folders?

Relax *.fastq.gz to also support *.fq.gz

      die("wrong file name ($file_mod)") if not $file_mod =~ s/(R1|R2).fastq.gz//;

maybe this?

if not $file_mod =~ s/(R1|R2).f(ast)?q.gz$//;

not sure where else this would need to be changed.

Creates all the folders before it knows reads exist

% ls
(empty)

% MTBseq --step TBfull
No read files to map! Check content of dir

% ls
mend  Called          GATK_Bam  Joint    MTBseq_2018-08-17.log  Statistics
Bam    Classification  Groups    Mpileup  Position_Tables               R1.fq.gz R2.fq.gz

Gatk High Quality Score

Hi, I got the following error;

ERROR MESSAGE: SAM/BAM/CRAM file htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter@e584b2b appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 66. Please see https://software.broadinstitute.org/gatk/documentation/article?id=6470 for more details and options related to this error.

I went to the site and it says;

If this happens to you, you'll need to run again with the flag [ --fix_misencoded_quality_scores / -fixMisencodedQuals]. What will happen is that the engine will simply subtract 31 from every quality score as it is read in, and proceed with the corrected values. Output files will include the correct scores where applicable.

I'm just wondering, in my sbatch script can I just throw the commands in like this;

#!/bin/sh

SLURM Commands

#SBATCH --partition=ProdQ
#SBATCH --nodes=1
#SBATCH --time=24:00:00
#SBATCH --job-name=C10
#SBATCH --account=ndlif075c
#SBATCH --output=TBfull_Log.txt

##SBATCH --mail-user=xxxxxxxxxxxxxxxx
##SBATCH --mail-type=BEGIN,END

cd $SLURM_SUBMIT_DIR

load the environment module

module load conda/2

load the conda environment

source activate MTBseq

BASH Commands

MTBseq --step TBfull --distance 5 ----fix_misencoded_quality_scores -fixMisencodedQuals --threads 8

Cheers,

P

MTBseq unable to detect the Bam output, even though it's present in the file system

Hello team,

Thanks for the wonderful tool!

I've used this successfully for many genomes, however earlier today when I was trying to analyze something, I came across a wierd issue.

<INFO>  [2020-05-24 21:34:39]   Start statistics calculation...
<INFO>  [2020-05-24 21:34:39]   Start parsing M._tuberculosis_H37Rv_2015-11-13.fasta...
<INFO>  [2020-05-24 21:34:39]   Finished parsing M._tuberculosis_H37Rv_2015-11-13.fasta!
<INFO>  [2020-05-24 21:34:39]   Start parsing M._tuberculosis_H37Rv_2015-11-13_genes.txt...
<INFO>  [2020-05-24 21:34:50]   Finished parsing M._tuberculosis_H37Rv_2015-11-13_genes.txt!
<INFO>  [2020-05-24 21:34:50]   Start using Samtools for BWA mapping statistics of G04880_MTBSeq_nextseq_151bp.bam...
/root/projects/G04880/mtbseq/Bam/G04880_MTBSeq_nextseq_151bp.bam does not exist, TBstats.pm line: 83 
1 at /root/miniconda3/envs/mtbseq/share/mtbseq-1.0.4-0/lib/TBstats.pm line 83.

I'm also attaching the screenshot

image

However, the file does exists on the file system

(mtbseq) root@tuffy:~/projects/G04880/mtbseq/Bam# file /root/projects/G04880/mtbseq/Bam/G04880_MTBSeq_nextseq_151bp.bam
/root/projects/G04880/mtbseq/Bam/G04880_MTBSeq_nextseq_151bp.bam: gzip compressed data, extra field

Can't quite figure out what's wrong here - could you please help me?

MTBseq in Parallel

Hi,

I run MTBseq on a college cluster. I am just wondering can I run it in parallel across multiple nodes?

Cheers,

P

Replace all /usr/bin/perl

In the main script

#!/usr/bin/perl

should be

#!/usr/bin/env perl

so that it uses the Perl in my PATH, not the one you have chosen.

bam does not exist error on test data

When I run MTBseq --step TBfull in the test/ directory, it fails with a bam does not exist error. However, the bam file it is looking for does indeed exist. Here's a copy of my terminal:

vagrant@ubuntu-bionic:~/tmp/mtbseq/MTBseq_source-1.0.3/test$ MTBseq --step TBfull
<INFO>	[2018-09-28 15:44:00]	Found perl module: MCE
<INFO>	[2018-09-28 15:44:00]	Found perl module: Statistics::Basic
<INFO>	[2018-09-28 15:44:00]	Found bwa in your PATH!
<INFO>	[2018-09-28 15:44:00]	Found samtools in your PATH!
<INFO>	[2018-09-28 15:44:00]	Found gatk in the MTBseq /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/opt folder!
<INFO>	[2018-09-28 15:44:00]	Found picard in the MTBseq /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/opt folder!
<WARN>	[2018-09-28 15:44:00]	Need samtools >= 1.6, please upgrade the version of samtools in your PATH.
<WARN>	[2018-09-28 15:44:00]	For this execution of MTBseq samtools 1.6 from the MTBseq /opt directory will be used


MTBseq 1.0.3 - Copyright (C) 2018   Thomas A. Kohl, Robin Koch, Christian Utpatel,
                              Maria Rosaria De Filippo, Viola Schleusener,
                              Patrick Beckert, Daniela M. Cirillo, Stefan Niemann

This program comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to redistribute it under certain conditions.


<INFO>	[2018-09-28 15:44:00]	You are vagrant.
<INFO>	[2018-09-28 15:44:00]	Your current working directory is: /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test
<INFO>	[2018-09-28 15:44:00]	You requested 1 thread(s) for the pipeline.

<INFO>	[2018-09-28 15:44:00]	Your parameter setting is:
<INFO>	[2018-09-28 15:44:00]	--step		TBfull
<INFO>	[2018-09-28 15:44:00]	--continue	0
<INFO>	[2018-09-28 15:44:00]	--samples	NONE
<INFO>	[2018-09-28 15:44:00]	--project	NONE
<INFO>	[2018-09-28 15:44:00]	--resilist	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/var/res/MTB_Resistance_Mediating.txt
<INFO>	[2018-09-28 15:44:00]	--intregions	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/var/res/MTB_Extended_Resistance_Mediating.txt
<INFO>	[2018-09-28 15:44:00]	--categories	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/var/cat/MTB_Gene_Categories.txt
<INFO>	[2018-09-28 15:44:00]	--basecalib	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/var/res/MTB_Base_Calibration_List.vcf
<INFO>	[2018-09-28 15:44:00]	--ref		M._tuberculosis_H37Rv_2015-11-13.fasta
<INFO>	[2018-09-28 15:44:00]	--minbqual	13
<INFO>	[2018-09-28 15:44:00]	--all_vars	0
<INFO>	[2018-09-28 15:44:00]	--snp_vars	0
<INFO>	[2018-09-28 15:44:00]	--lowfreq_vars	0
<INFO>	[2018-09-28 15:44:00]	--mincovf	4
<INFO>	[2018-09-28 15:44:00]	--mincovr	4
<INFO>	[2018-09-28 15:44:00]	--minphred20	4
<INFO>	[2018-09-28 15:44:00]	--minfreq	75
<INFO>	[2018-09-28 15:44:00]	--unambig	95
<INFO>	[2018-09-28 15:44:00]	--window	12
<INFO>	[2018-09-28 15:44:00]	--distance	12
<INFO>	[2018-09-28 15:44:00]	--quiet		0

<INFO>	[2018-09-28 15:44:00]	The following programs will be used, if necessary:
<INFO>	[2018-09-28 15:44:00]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/opt/bwa_0.7.17
<INFO>	[2018-09-28 15:44:00]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/opt/samtools_1.6
<INFO>	[2018-09-28 15:44:00]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/opt/picard_2.17.0
<INFO>	[2018-09-28 15:44:00]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/opt/GenomeAnalysisTK_3.8

<INFO>	[2018-09-28 15:44:00]	The following directories will be used, if necessary:
<INFO>	[2018-09-28 15:44:00]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam
<INFO>	[2018-09-28 15:44:00]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/GATK_Bam
<INFO>	[2018-09-28 15:44:00]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Mpileup
<INFO>	[2018-09-28 15:44:00]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Position_Tables
<INFO>	[2018-09-28 15:44:00]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Called
<INFO>	[2018-09-28 15:44:00]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Joint
<INFO>	[2018-09-28 15:44:00]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Amend
<INFO>	[2018-09-28 15:44:00]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Classification
<INFO>	[2018-09-28 15:44:00]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Groups

<INFO>	[2018-09-28 15:44:00]	### [TBfull] selected ###

<INFO>	[2018-09-28 15:44:00]	Mapping samples:
<INFO>	[2018-09-28 15:44:00]	Test-20_MTBSeq_nextseq_151bp_R1.fastq.gz
<INFO>	[2018-09-28 15:44:00]	Test-20_MTBSeq_nextseq_151bp_R2.fastq.gz

<INFO>	[2018-09-28 15:44:00]	Start BWA mapping...
<INFO>	[2018-09-28 15:44:00]	Found at most two files for Test-20_MTBSeq_nextseq_151bp!
<INFO>	[2018-09-28 15:44:00]	Start BWA mapping for Test-20_MTBSeq_nextseq_151bp...
<INFO>	[2018-09-28 15:44:00]	bwa mem -t 1 -R '@RG\tID:Test-20_MTBSeq_nextseq_151bp\tSM:Test-20\tPL:Illumina\tLB:MTBSeq' /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/var/ref/M._tuberculosis_H37Rv_2015-11-13.fasta  /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Test-20_MTBSeq_nextseq_151bp_R1.fastq.gz /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Test-20_MTBSeq_nextseq_151bp_R2.fastq.gz > /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.sam 2>> /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.bamlog
<INFO>	[2018-09-28 15:44:40]	Finished BWA mapping for Test-20_MTBSeq_nextseq_151bp!
<INFO>	[2018-09-28 15:44:40]	Start using samtools to convert from .sam to .bam for Test-20_MTBSeq_nextseq_151bp...
<INFO>	[2018-09-28 15:44:40]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/opt/samtools view -@ 1 -b -T /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/var/ref/M._tuberculosis_H37Rv_2015-11-13.fasta -o /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.bam /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.sam 2>> /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.bamlog
<INFO>	[2018-09-28 15:44:55]	Finished file conversion for Test-20_MTBSeq_nextseq_151bp!
<INFO>	[2018-09-28 15:44:55]	Start using samtools for sorting of Test-20_MTBSeq_nextseq_151bp...
<INFO>	[2018-09-28 15:44:55]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/opt/samtools sort -@ 1 -T /tmp/Test-20_MTBSeq_nextseq_151bp.sorted -o /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.sorted.bam /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.bam 2>> /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.bamlog
<INFO>	[2018-09-28 15:45:08]	Finished using samtools for sorting of Test-20_MTBSeq_nextseq_151bp!
<INFO>	[2018-09-28 15:45:08]	Start using samtools for indexing of Test-20_MTBSeq_nextseq_151bp...
<INFO>	[2018-09-28 15:45:08]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/opt/samtools index -b /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.sorted.bam 2>> /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.bamlog
<INFO>	[2018-09-28 15:45:09]	Finished using samtools for indexing of Test-20_MTBSeq_nextseq_151bp!
<INFO>	[2018-09-28 15:45:09]	Start removing putative PCR duplicates from Test-20_MTBSeq_nextseq_151bp...
<INFO>	[2018-09-28 15:45:09]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/opt/samtools rmdup /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.sorted.bam /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.nodup.bam 2>> /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.bamlog
<INFO>	[2018-09-28 15:45:23]	Finished removing putative PCR duplicates for Test-20_MTBSeq_nextseq_151bp!
<INFO>	[2018-09-28 15:45:23]	Start recreating index for Test-20_MTBSeq_nextseq_151bp...
<INFO>	[2018-09-28 15:45:23]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/opt/samtools index -b /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.nodup.bam 2>> /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.bamlog
<INFO>	[2018-09-28 15:45:24]	Finished recreating index for Test-20_MTBSeq_nextseq_151bp!
<INFO>	[2018-09-28 15:45:24]	Removing temporary files...
<INFO>	[2018-09-28 15:45:24]	Finished mapping for Test-20_MTBSeq_nextseq_151bp!
<INFO>	[2018-09-28 15:45:24]	Finished BWA mapping!

<INFO>	[2018-09-28 15:45:24]	Refining mappings:
<INFO>	[2018-09-28 15:45:24]	Test-20_MTBSeq_nextseq_151bp.bam

<INFO>	[2018-09-28 15:45:24]	Start GATK refinement...
<INFO>	[2018-09-28 15:45:24]	Updating log file for Test-20_MTBSeq_nextseq_151bp...
<INFO>	[2018-09-28 15:45:24]	Start using GATK RealignerTargetCreator for Test-20_MTBSeq_nextseq_151bp...
<INFO>	[2018-09-28 15:45:24]	java -jar /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/opt/GenomeAnalysisTK.jar --analysis_type RealignerTargetCreator --reference_sequence /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/var/ref/M._tuberculosis_H37Rv_2015-11-13.fasta --input_file /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.bam --downsample_to_coverage 10000 --num_threads 1 --out /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/GATK_Bam/Test-20_MTBSeq_nextseq_151bp.gatk.intervals 2>> /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/GATK_Bam/Test-20_MTBSeq_nextseq_151bp.gatk.bamlog
There were no warn messages.
<INFO>	[2018-09-28 15:45:38]	Finished using GATK RealignerTargetCreator for Test-20_MTBSeq_nextseq_151bp!
<INFO>	[2018-09-28 15:45:38]	Start using GATK IndelRealigner for Test-20_MTBSeq_nextseq_151bp...
<INFO>	[2018-09-28 15:45:38]	java -jar /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/opt/GenomeAnalysisTK.jar --analysis_type IndelRealigner --reference_sequence /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/var/ref/M._tuberculosis_H37Rv_2015-11-13.fasta --input_file /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.bam --defaultBaseQualities 12 --targetIntervals /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/GATK_Bam/Test-20_MTBSeq_nextseq_151bp.gatk.intervals --noOriginalAlignmentTags --out /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/GATK_Bam/Test-20_MTBSeq_nextseq_151bp.realigned.bam 2>> /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/GATK_Bam/Test-20_MTBSeq_nextseq_151bp.gatk.bamlog
There were no warn messages.
<INFO>	[2018-09-28 15:45:55]	Finished using GATK IndelRealigner for Test-20_MTBSeq_nextseq_151bp!
<INFO>	[2018-09-28 15:45:55]	Start using GATK BaseRecalibrator for Test-20_MTBSeq_nextseq_151bp...
<INFO>	[2018-09-28 15:45:55]	java -jar /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/opt/GenomeAnalysisTK.jar --analysis_type BaseRecalibrator --reference_sequence /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/var/ref/M._tuberculosis_H37Rv_2015-11-13.fasta --input_file /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/GATK_Bam/Test-20_MTBSeq_nextseq_151bp.realigned.bam --knownSites /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/var/res/MTB_Base_Calibration_List.vcf --maximum_cycle_value 600 --num_cpu_threads_per_data_thread 1 --out /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/GATK_Bam/Test-20_MTBSeq_nextseq_151bp.gatk.grp 2>> /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/GATK_Bam/Test-20_MTBSeq_nextseq_151bp.gatk.bamlog
There were no warn messages.
<INFO>	[2018-09-28 15:47:06]	Finished using GATK BaseRecalibrator for Test-20_MTBSeq_nextseq_151bp!
<INFO>	[2018-09-28 15:47:06]	Start using GATK PrintReads for Test-20_MTBSeq_nextseq_151bp...
<INFO>	[2018-09-28 15:47:06]	java -jar /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/opt/GenomeAnalysisTK.jar -T --analysis_type PrintReads --reference_sequence /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/var/ref/M._tuberculosis_H37Rv_2015-11-13.fasta --input_file /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/GATK_Bam/Test-20_MTBSeq_nextseq_151bp.realigned.bam --BQSR /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/GATK_Bam/Test-20_MTBSeq_nextseq_151bp.gatk.grp --num_cpu_threads_per_data_thread 1 --out /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/GATK_Bam/Test-20_MTBSeq_nextseq_151bp.gatk.bam  2>> /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/GATK_Bam/Test-20_MTBSeq_nextseq_151bp.gatk.bamlog
There were no warn messages.
<INFO>	[2018-09-28 15:48:33]	Finished using GATK PrintReads for Test-20_MTBSeq_nextseq_151bp!
<INFO>	[2018-09-28 15:48:33]	Removing temporary files...
<INFO>	[2018-09-28 15:48:33]	GATK refinement finished for Test-20_MTBSeq_nextseq_151bp!
<INFO>	[2018-09-28 15:48:33]	Finished GATK refinement!

<INFO>	[2018-09-28 15:48:33]	Creating mpileups:
<INFO>	[2018-09-28 15:48:33]	Test-20_MTBSeq_nextseq_151bp.gatk.bam

<INFO>	[2018-09-28 15:48:33]	Start creating .mpileup files...
<INFO>	[2018-09-28 15:48:33]	Updating logfile for Test-20_MTBSeq_nextseq_151bp.gatk...
<INFO>	[2018-09-28 15:48:33]	Start using samtools for creating a .mpileup file for Test-20_MTBSeq_nextseq_151bp.gatk...
<INFO>	[2018-09-28 15:48:33]	/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/opt/samtools mpileup -B -A -x -f /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/var/ref/M._tuberculosis_H37Rv_2015-11-13.fasta /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/GATK_Bam/Test-20_MTBSeq_nextseq_151bp.gatk.bam > /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Mpileup/Test-20_MTBSeq_nextseq_151bp.gatk.mpileup 2>> /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Mpileup/Test-20_MTBSeq_nextseq_151bp.gatk.mpileuplog
<INFO>	[2018-09-28 15:48:38]	Finished using samtools for creating a .mpileup file for Test-20_MTBSeq_nextseq_151bp.gatk!
<INFO>	[2018-09-28 15:48:38]	Finished creating .mpileup files!

<INFO>	[2018-09-28 15:48:38]	Creating position lists:
<INFO>	[2018-09-28 15:48:38]	Test-20_MTBSeq_nextseq_151bp.gatk.mpileup

<INFO>	[2018-09-28 15:48:38]	Start creating position lists...
<INFO>	[2018-09-28 15:48:38]	Parsing reference genome M._tuberculosis_H37Rv_2015-11-13.fasta...
<INFO>	[2018-09-28 15:48:47]	Reference genome size (bp): 4411532!
<INFO>	[2018-09-28 15:48:47]	Start creating temporary output file...
<INFO>	[2018-09-28 15:48:47]	Finished creating temporary output file!
<INFO>	[2018-09-28 15:48:47]	Start parallel processing...
<INFO>	[2018-09-28 15:55:19]	Finished parallel processing!
<INFO>	[2018-09-28 15:55:19]	Start loading temporary file into hash structure...
<INFO>	[2018-09-28 15:55:44]	Finished loading temporary file into hash structure!
<INFO>	[2018-09-28 15:55:44]	Start creating final output file...
<INFO>	[2018-09-28 15:56:36]	Finished creating final output file!
<INFO>	[2018-09-28 15:56:36]	Finished creating position lists!

<INFO>	[2018-09-28 15:56:40]	Calling variants:
<INFO>	[2018-09-28 15:56:40]	Test-20_MTBSeq_nextseq_151bp.gatk_position_table.tab

<INFO>	[2018-09-28 15:56:40]	Start variant calling...
<INFO>	[2018-09-28 15:56:40]	Start parsing M._tuberculosis_H37Rv_2015-11-13.fasta...
<INFO>	[2018-09-28 15:56:40]	Finished parsing M._tuberculosis_H37Rv_2015-11-13.fasta!
<INFO>	[2018-09-28 15:56:40]	Parsing M._tuberculosis_H37Rv_2015-11-13_genes.txt...
<INFO>	[2018-09-28 15:56:47]	Finished parsing M._tuberculosis_H37Rv_2015-11-13_genes.txt!
<INFO>	[2018-09-28 15:56:47]	Start parsing /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/var/res/MTB_Resistance_Mediating.txt and /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/var/res/MTB_Extended_Resistance_Mediating.txt...
<INFO>	[2018-09-28 15:56:47]	Finished parsing /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/var/res/MTB_Resistance_Mediating.txt and /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/var/res/MTB_Extended_Resistance_Mediating.txt!
<INFO>	[2018-09-28 15:56:47]	Start parsing Test-20_MTBSeq_nextseq_151bp.gatk_position_table.tab...
<INFO>	[2018-09-28 15:57:15]	Finished parsing Test-20_MTBSeq_nextseq_151bp.gatk_position_table.tab!
<INFO>	[2018-09-28 15:57:15]	Start calling variants from Test-20_MTBSeq_nextseq_151bp.gatk_position_table.tab...
<INFO>	[2018-09-28 15:58:39]	Finished calling variants from Test-20_MTBSeq_nextseq_151bp.gatk_position_table.tab!
<INFO>	[2018-09-28 15:58:39]	Printing:
<INFO>	[2018-09-28 15:58:39]	Test-20_MTBSeq_nextseq_151bp.gatk_position_variants_cf4_cr4_fr75_ph4_outmode000.tab
<INFO>	[2018-09-28 15:58:39]	Test-20_MTBSeq_nextseq_151bp.gatk_position_uncovered_cf4_cr4_fr75_ph4_outmode000.tab
<INFO>	[2018-09-28 15:58:43]	Printing finished!
<INFO>	[2018-09-28 15:58:50]	Finished variant calling!

<INFO>	[2018-09-28 15:58:50]	Calculating statistics for mappings:
<INFO>	[2018-09-28 15:58:50]	Test-20_MTBSeq_nextseq_151bp.bam

<INFO>	[2018-09-28 15:58:50]	Start statistics calculation...
<INFO>	[2018-09-28 15:58:50]	Start parsing M._tuberculosis_H37Rv_2015-11-13.fasta...
<INFO>	[2018-09-28 15:58:51]	Finished parsing M._tuberculosis_H37Rv_2015-11-13.fasta!
<INFO>	[2018-09-28 15:58:51]	Start parsing M._tuberculosis_H37Rv_2015-11-13_genes.txt...
<INFO>	[2018-09-28 15:58:58]	Finished parsing M._tuberculosis_H37Rv_2015-11-13_genes.txt!
<INFO>	[2018-09-28 15:58:58]	Start using Samtools for BWA mapping statistics of Test-20_MTBSeq_nextseq_151bp.bam...
/home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.bam does not exist, TBstats.pm line: 83
1 at /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/lib/TBstats.pm line 83.
vagrant@ubuntu-bionic:~/tmp/mtbseq/MTBseq_source-1.0.3/test$ ls -lh /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.bam
-rw-rw-r-- 1 vagrant vagrant 62M Sep 28 15:45 /home/vagrant/tmp/mtbseq/MTBseq_source-1.0.3/test/Bam/Test-20_MTBSeq_nextseq_151bp.bam

gatk-register command

Hi,

I have installed MTBseq via conda and I am using the the gatk-register command to add GATK to the conda environment but I am told the command is not found.

Any ideas?

Cheers!

TBjoin missing samples?

I'm trying run TBjoin earlier and it said no samples defined for the analysis? I try to search for the input files but where do I find the samples.txt file? Do I need to create it? if yes how? and where should the samples.txt file be located? Please help. #42

Log file below.

[2019-11-27 12:13:39] ### MTBseq finished!!! ###
(MTBseq) yfong@yfong-VirtualBox:~/Documents/software/MTBseq/SequenceFiles$ MTBseq --step TBjoin --sample samples.txt --project phylo --threads 6
[2019-11-27 12:14:37] Found perl module: MCE
[2019-11-27 12:14:37] Found perl module: Statistics::Basic
[2019-11-27 12:14:37] Found bwa in your PATH!
[2019-11-27 12:14:37] Found samtools in your PATH!
[2019-11-27 12:14:37] Found gatk3 in your PATH!
[2019-11-27 12:14:37] Found picard in your PATH!

MTBseq 1.0.3 - Copyright (C) 2018 Thomas A. Kohl, Robin Koch, Christian Utpatel,
Maria Rosaria De Filippo, Viola Schleusener,
Patrick Beckert, Daniela M. Cirillo, Stefan Niemann

This program comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to redistribute it under certain conditions.

[2019-11-27 12:14:37] You are yfong.
[2019-11-27 12:14:37] Your current working directory is: /home/yfong/Documents/software/MTBseq/SequenceFiles
[2019-11-27 12:14:37] You requested 6 thread(s) for the pipeline.

[2019-11-27 12:14:37] Your parameter setting is:
[2019-11-27 12:14:37] --step TBjoin
[2019-11-27 12:14:37] --continue 0
[2019-11-27 12:14:37] --samples NONE
[2019-11-27 12:14:37] --project phylo
[2019-11-27 12:14:37] --resilist /home/yfong/miniconda3/envs/MTBseq/share/mtbseq-1.0.4-0/var/res/MTB_Resistance_Mediating.txt
[2019-11-27 12:14:37] --intregions /home/yfong/miniconda3/envs/MTBseq/share/mtbseq-1.0.4-0/var/res/MTB_Extended_Resistance_Mediating.txt
[2019-11-27 12:14:37] --categories /home/yfong/miniconda3/envs/MTBseq/share/mtbseq-1.0.4-0/var/cat/MTB_Gene_Categories.txt
[2019-11-27 12:14:37] --basecalib /home/yfong/miniconda3/envs/MTBseq/share/mtbseq-1.0.4-0/var/res/MTB_Base_Calibration_List.vcf
[2019-11-27 12:14:37] --ref M._tuberculosis_H37Rv_2015-11-13.fasta
[2019-11-27 12:14:37] --minbqual 13
[2019-11-27 12:14:37] --all_vars 0
[2019-11-27 12:14:37] --snp_vars 0
[2019-11-27 12:14:37] --lowfreq_vars 0
[2019-11-27 12:14:37] --mincovf 4
[2019-11-27 12:14:37] --mincovr 4
[2019-11-27 12:14:37] --minphred20 4
[2019-11-27 12:14:37] --minfreq 75
[2019-11-27 12:14:37] --unambig 95
[2019-11-27 12:14:37] --window 12
[2019-11-27 12:14:37] --distance 12
[2019-11-27 12:14:37] --quiet 0

[2019-11-27 12:14:37] The following programs will be used, if necessary:
[2019-11-27 12:14:37] /home/yfong/miniconda3/envs/MTBseq/share/mtbseq-1.0.4-0/opt/bwa_0.7.17
[2019-11-27 12:14:37] /home/yfong/miniconda3/envs/MTBseq/share/mtbseq-1.0.4-0/opt/samtools_1.6
[2019-11-27 12:14:37] /home/yfong/miniconda3/envs/MTBseq/share/mtbseq-1.0.4-0/opt/picard_2.17.0
[2019-11-27 12:14:37] /home/yfong/miniconda3/envs/MTBseq/share/mtbseq-1.0.4-0/opt/GenomeAnalysisTK_3.8

[2019-11-27 12:14:37] The following directories will be used, if necessary:
[2019-11-27 12:14:37] /home/yfong/Documents/software/MTBseq/SequenceFiles/Bam
[2019-11-27 12:14:37] /home/yfong/Documents/software/MTBseq/SequenceFiles/GATK_Bam
[2019-11-27 12:14:37] /home/yfong/Documents/software/MTBseq/SequenceFiles/Mpileup
[2019-11-27 12:14:37] /home/yfong/Documents/software/MTBseq/SequenceFiles/Position_Tables
[2019-11-27 12:14:37] /home/yfong/Documents/software/MTBseq/SequenceFiles/Called
[2019-11-27 12:14:37] /home/yfong/Documents/software/MTBseq/SequenceFiles/Joint
[2019-11-27 12:14:37] /home/yfong/Documents/software/MTBseq/SequenceFiles/Amend
[2019-11-27 12:14:37] /home/yfong/Documents/software/MTBseq/SequenceFiles/Classification
[2019-11-27 12:14:37] /home/yfong/Documents/software/MTBseq/SequenceFiles/Groups

[2019-11-27 12:14:37] ### [TBjoin] selected ###

[2019-11-27 12:14:37] Skipping Joint analysis. No samples defined for joint analysis.

[2019-11-27 12:14:37] ### MTBseq finished!!! ###

TBjoin issue

Hi,

I tried running TBjoin and got an error. I have attached a screenshot below. Could you perhaps tell me what I did wrong?

Cheers,

Peter

Screenshot 2019-08-27 at 11 11 16

cat failed: TBpile.pm line: 52

HI Guys,

Im having an error with the TBpile step with the following error:

cat failed: TBpile.pm line: 52

I'm running it on my college cluster in case than makes a difference from running it locally!

GATK error with test data

The pipeline is stopping at the "Start GATK refinement" step.

I'm seeing the following error in the gatk.bamlog file:

ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/[conda env]/opt/gatk-3.8/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...

I'm using Java 1.8

GATK error failed 256

I ran TBfull and it came out failed:256 message after GATK refinement started.
Is there any file for searching failed code?

BWA Mapping Error

Hi All,

I am trying to run MTBseq on all my sequence files and when it starts I get the error attached below. Can anyone shed light on this for me please?

Thanks,

P

Screenshot 2019-09-04 at 12 33 07

[bam_rmdup_core] inconsistent BAM file for pair

From gatk.bamlog

[bam_rmdup_core] processing reference M.tuberculosis_H37Rv...
[bam_rmdup_core] inconsistent BAM file for pair 'NS500345:176:HTCKYAFXX:4:11405:8774:6713'. Continue anyway.
[bam_rmdup_core] inconsistent BAM file for pair 'NS500345:176:HTCKYAFXX:3:21606:5786:8728'. Continue anyway.
[bam_rmdup_core] inconsistent BAM file for pair 'NS500345:176:HTCKYAFXX:2:21207:9075:4861'. Continue anyway.
[bam_rmdup_core] inconsistent BAM file for pair 'NS500345:176:HTCKYAFXX:2:21303:26846:3368'. Continue anyway.
[bam_rmdup_core] inconsistent BAM file for pair 'NS500345:176:HTCKYAFXX:1:11206:24436:8998'. Continue anyway.
[bam_rmdup_core] inconsistent BAM file for pair 'NS500345:176:HTCKYAFXX:1:11209:15229:9800'. Continue anyway.
[bam_rmdup_core] inconsistent BAM file for pair 'NS500345:176:HTCKYAFXX:1:21212:11379:14636'. Continue anyway.
[bam_rmdup_core] inconsistent BAM file for pair 'NS500345:176:HTCKYAFXX:2:11210:17442:6322'. Continue anyway.
[bam_rmdup_core] inconsistent BAM file for pair 'NS500345:176:HTCKYAFXX:2:21308:26619:1343'. Continue anyway.
[bam_rmdup_core] inconsistent BAM file for pair 'NS500345:176:HTCKYAFXX:2:21308:7076:15721'. Continue anyway.
[bam_rmdup_core] inconsistent BAM file for pair 'NS500345:176:HTCKYAFXX:2:21309:20464:13518'. Continue anyway.
[bam_rmdup_core] inconsistent BAM file for pair 'NS500345:176:HTCKYAFXX:3:11409:24863:14468'. Continue anyway.
[bam_rmdup_core] inconsistent BAM file for pair 'NS500345:176:HTCKYAFXX:3:11502:10273:15918'. Continue anyway.

Variant Calling with M bovis

Hi,

I placed an M bovis fasta reference file in ref path and ran MTBseq with teh following command;

MTBseq --step TBfull --ref Mbovis_AF2122_97 --distance 5 --threads 1

The program ran until it errored with this;

[2020-09-07 18:20:11] Start variant calling...
[2020-09-07 18:20:11] Start parsing Mbovis_AF2122_97.fasta...
[2020-09-07 18:20:11] Finished parsing Mbovis_AF2122_97.fasta!
[2020-09-07 18:20:11] Parsing Mbovis_AF2122_97_genes.txt...
[2020-09-07 18:20:11] Can't open Mbovis_AF2122_97_genes.txt: TBtools line: 703
1 at /ichec/home/users/peflanag/.conda/envs/MTBseq/share/mtbseq-1.0.4-0/lib/TBtools.pm line 703.

My question is, do i need to have made some form of _genes.txt file? I didnt see tah in the manual. It just mentioned a reference fasta and an annotations file. I didnt add an annotation file since I am not worried about annotations. I have noticed that MTBseq seems to have made additional files based on the Mbovis reference but no .txt file.

Cheers,

P

Screenshot 2020-09-08 at 10 09 12

mpileup to vcf

Hi,
I'm newbie at bioinformatic and I have one question.Do you recommend me some tool that is suitable to convert mpileup files that MTBseq generates to vcf files to visualize in IGV?? Or is it possible with MTBSeq generates the vcf files?
Thank you in advance

No Sequence in Amend - PlainIDs.fasta

Hi All,

I have managed to run MTBseq on 5 MiniSeq runs and generated MLTs. However, two of the runs failed to generate sequence in the PlainIDs.fasta file and I don't know why. Both logs for TBfull and TBjoin show that it finished fine.

I have attached the logs below if someone could point me in the right direction as to why it failed?

Cheers.

MTBseq_TBfull_Log.txt
MTBseq_TBjoin_Log.txt

A fatal error has been detected by the Java Runtime Environment:

When start using GATK PrintReads for Test-20_MTBSeq_nextseq_151bp... a fatal error has been detected by the Java Runtime Environment. The messages are in below.

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x0000000114ae66f7, pid=7410, tid=0x0000000000002503

JRE version: OpenJDK Runtime Environment (8.0_152-b12) (build 1.8.0_152-release-1056-b12)

Java VM: OpenJDK 64-Bit Server VM (25.152-b12 mixed mode bsd-amd64 compressed oops)

Problematic frame:

C [libgkl_compression3719787253608670960.dylib+0x76f7] deflate_medium+0x867

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

An error report file with more information is saved as:

/Users/tomotada/Desktop/iwamoto/MTB_test/hs_err_pid7410.log

If you would like to submit a bug report, please visit:

http://bugreport.java.com/bugreport/crash.jsp

The crash happened outside the Java Virtual Machine in native code.

See problematic frame for where to report the bug.

gatk3 -T --analysis_type PrintReads --reference_sequence /Users/tomotada/miniconda3/envs/MTBseq/share/mtbseq-1.0.4-1/var/ref/M._tuberculosis_H37Rv_2015-11-13.fasta --input_file /Users/tomotada/Desktop/iwamoto/MTB_test/GATK_Bam/Test-20_MTBSeq_nextseq_151bp.realigned.bam --BQSR /Users/tomotada/Desktop/iwamoto/MTB_test/GATK_Bam/Test-20_MTBSeq_nextseq_151bp.gatk.grp --num_cpu_threads_per_data_thread 1 --out /Users/tomotada/Desktop/iwamoto/MTB_test/GATK_Bam/Test-20_MTBSeq_nextseq_151bp.gatk.bam 2>> /Users/tomotada/Desktop/iwamoto/MTB_test/GATK_Bam/Test-20_MTBSeq_nextseq_151bp.gatk.bamlog failed: 64000

hs_err_pid7410.log

GATK error failed 256 - reopened

Hi,
I am having the same issue as shown here and didnt know how to reopen.

#24

I have gone in to look at the bamlog file. The only think is I don't know what to do when I see the error. I tried to go to the website it stated but it doesn't load. I have copied the bamlog and MTBseq --TBfull log below.

Cheers,

Peter
MTBseq_2019-09-09_linux-biostation.log

The error seems to be with samtools and a high quality score of 65
bamlog.txt

Originally posted by @peflanag in #24 (comment)

compare isolates in different sequencing runs

I have successfully run MTBseq on all the fastq files in a single sequencing run.
I am curious to know that if MTBseq could analyse the output files for different runs using TBjoin, TBamend and TBgroups, so that I could compare the isolates in different sequencing runs without re-mapping the fastq files again?

Amend multifasta with reference?

Dear MTBseq developers,

I am running MTBseq (TBfull) using an M.bovis reference (AF2122/97) and I am interested in the output files from the Amend command. I imagine that the multifasta that is obtained as an output could be considered the "core snps"? If so, I have realised that I am missing the reference sequence in this multifasta for me to build a phylogeny. Is this normal behaviour or am I doing something wrong here?

Thank you very much for your help,

Kind regards

non-unique naming in /tmp

Files are written into /tmp that are not guaranteed to have a unique name. I have this error:

[E::hts_open_format] Failed to open file /tmp/sampleID_libID.sorted.0000.bam
samtools sort: failed to create temporary file "/tmp/sampleID_libID.sorted.0000.bam": File exists

in the log file Bam/sampleID_libID.bamlog. I appreciate that the sample and library ID are put in the name, which is probably unique, but in my case it's not unique and breaks my pipeline if two runs happen to land on the same node of our compute cluster at the same time.

I suggest using File::Temp to safely create a temporary directory.

No Joint Variant File To Amend

Hi,

I am pooling together all my 302 samples from multiple TBfull steps to carry out one TBjoin step and when I run it stops after 1min 40sec and the log states the following:

[2020-04-10 13:28:21] IMRLH94_L001.gatk_position_variants_cf4_cr4_fr75_ph4_outmode000.tab
[2020-04-10 13:28:21] IMRLH96_L001.gatk_position_variants_cf4_cr4_fr75_ph4_outmode000.tab
[2020-04-10 13:28:21] IMRLH99_S1.gatk_position_variants_cf4_cr4_fr75_ph4_outmode000.tab

[2020-04-10 13:28:21] Start creating joint variant table...
[2020-04-10 13:28:21] Start parsing M._tuberculosis_H37Rv_2015-11-13.fasta...
[2020-04-10 13:28:21] Finished parsing M._tuberculosis_H37Rv_2015-11-13.fasta!
[2020-04-10 13:28:30] Start parsing M._tuberculosis_H37Rv_2015-11-13_genes.txt...
[2020-04-10 13:28:33] Finished parsing M._tuberculosis_H37Rv_2015-11-13_genes.txt...
[2020-04-10 13:28:33] Parsing variant files...
[2020-04-10 13:28:45] Finished parsing variant files!
[2020-04-10 13:28:45] Sart printing joint variant file scaffold...
[2020-04-10 13:28:45] Finished printing joint variant file scaffold!
[2020-04-10 13:28:45] Start parsing position lists, extend called variants and complete joint variant list...
[2020-04-10 13:28:45] Finished parsing position lists, extend called variants and complete joint variant list!
[2020-04-10 13:28:47] Start printing coverage breadth output...
[2020-04-10 13:29:32] Finished printing coverage breadth output!
[2020-04-10 13:29:36] Finished creating joint variant table!

[2020-04-10 13:29:38] No joint variant file All_No_C10_joint_cf4_cr4_fr75_ph4_samples302.tab to amend! Check content of /ichec/work/ndlif075c/Peter/Clusters_2019/All_Except_C10/Joint!

I'm at a bit of a loss. Any help would be great!

MPI Aware

Hi,

Is MTBseq MPI aware?

Cheers,

P

Check if each system() failed and stop pipeline

In trying to get the pipeline working i noticed it keeps on running even if a step fails.

system("  .... ");

should check for the return code

my $cmd = "bwa index ....";
system($cmd)==0 or die "Could not run command: $cmd";

w12.PlainIDs.fasta Missing info

Hi,

When I run MTBseq on most of my sequence runs it seems to run fine. However I recently ran it on a years worth of sequencing and after the TBjoin step the w12.PlainIDs.fasta file is empty meaning I cannot process it in RAxML to generate a newick file for a MLT. I've attached the file as well as the log file but I dont see any error in it! its just odd that on some runs the w12.PlainIDs file sometimes doesnt have any sequence.

IMRL_Clusters_2019_joint_cf4_cr4_fr75_ph4_samples175_amended_u95_phylo_w12.plainIDs.txt

MTBseq_TBjoin_Log.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.