tianshilu / qbrc-somatic-pipeline Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 7.0 90.03 MB

QBRC Somatic Mutation Calling Pipeline

Perl 26.72% R 14.42% Shell 0.25% Python 2.08% C 30.11% C++ 26.37% Rebol 0.05%

qbrc-somatic-pipeline's People

Contributors

Stargazers

Watchers

Forkers

decodebiology zpeng1989 qbrc anastasia0123 bit-vs-it sunqiangzai

qbrc-somatic-pipeline's Issues

picar.jar missing

Hi Tianshi,

I was trying to run your pipeline on the example data and I run into the following error:

java -Djava.io.tmpdir=../test_results//tumor/tmp -jar /mnt/home/icb/laura.martens/QBRCpipeline/QBRC-Somatic-Pipeline/somatic_script//somatic_script/picard.jar AddOrReplaceReadGroups INPUT=../test_results//tumor/alignment.sam OUTPUT=../test_results//tumor/rgAdded.bam SORT_ORDER=coordinate RGID=tumor RGLB=tumor RGPL=illumina RGPU=tumor RGSM=tumor CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT COMPRESSION_LEVEL=0

Error: Unable to access jarfile /mnt/home/icb/laura.martens/QBRCpipeline/QBRC-Somatic-Pipeline/somatic_script//somatic_script/picard.jar

When I check the somatic_script folder there is no picard.jar file in there, so I was wondering if I am missing anything?

Thanks a lot for your help,
Laura

disambiguate_pipeline/conda_env/

Hi,
Sorry to trouble you. I want to know how I can download the conda_env "QBRC-Somatic-Pipeline/tree/master/disambiguate_pipeline/conda_env/". Seeing that there are a lot of things in the script that need to be downloaded, I don't have to download conda_env after downloading it or not

Potential bug in somatic.pl

Dear @tianshilu ,

I am sorry to interrupt you again during this hard time. We suddenly came to realize that in about Feburary, we came across a confused coding in somatic.pl that we are not sure whether it was a bug or not. Recently I suddenly remember that so I raise an issue here.

In around line 487-491,

  system_call("lofreq call-parallel --pp-threads ".$thread." -s --sig 0.1 --bonf 1 -C 7 -f ".$index.
    " -S ".$resource_dbsnp." --call-indels -l ".$index.".exon.bed -o ".$output."/lofreq_t.vcf ".$tumor_bam);
  system_call("lofreq call-parallel --pp-threads ".$thread." -s --sig 1 --bonf 1 -C 7 -f ".$index.
    " -S ".$resource_dbsnp." --call-indels -l ".$index.".exon.bed -o ".$output."/lofreq_n.vcf ".$normal_bam);

We are not sure why in the tumor sample, the significance level was set at 0.1 whilst in the normal sample, the significance level was set at 1. Was it a typo or set on purpose?

Thank you in advance!

Best regards,
Jianning

Does it work with 10x scRNA-seq BAM ?

Hi,

I am just wondering if this work flow works with 10x data?

Wilson

About LocatIt_

Hi, Professor Wang@wtwt5237
Sorry to bother you. thank you for doing such an excellent job! I am trying to use the "QBRC-Somatic-Pipeline" in my deep exome sequencing data.I see that the software LocatIt_v4.0.1.jar is used here, and I get an error in the "mark duplicates" step when I use this software. I don't know why this is. By the way I would like to ask what is the difference between it and picard in the "mark duplicates" step?
Errors:
Saving /tumor/tmp/_login01_fccf7c70-7e84-4817-83b3-3f6c7c576376_041.bam, #reads: 800000 (0), 703822 amplicons written to file.
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)
at java.lang.StringCoding.encode(StringCoding.java:344)
at java.lang.StringCoding.encode(StringCoding.java:387)
at java.lang.String.getBytes(String.java:958)
at com.agilent.locatit.main.SequencingRead.readBarcode(SequencingRead.java:485)
at com.agilent.locatit.main.MolecularBarcodePairedEndProcess.createSequencingReadsFromSAMRecs(MolecularBarcodePairedEndProcess.java:295)
at com.agilent.locatit.main.MolecularBarcodePairedEndProcess.processRestOfCache(MolecularBarcodePairedEndProcess.java:580)
at com.agilent.locatit.main.MolecularBarcodePairedEndProcess.locate(MolecularBarcodePairedEndProcess.java:678)
at com.agilent.locatit.main.LocatIt.main(LocatIt.java:665)

Internal Error caught: 30.

Problem with GATK3

Hi @tianshilu,
You uesd GATK3 in the somatic.pl,but now when I use GATK3, the RealignerTargetCreator, IndelRealigner and PrintReads function of GATK3 can not be found in my GATK3,may be GATK4 replaces GATK3. But when I use GATK4, it has errors with RealignerTargetCreator, IndelRealigner and PrintReads function.Someone said that RealignerTargetCreator and IndelRealigner had little impact, so I wonder if this is ok to use GATK4 in your somatic.pl without RealignerTargetCreator and IndelRealigner steps.

A USER ERROR has occurred: IndelRealigner is no longer included in GATK as of version 4.0.0.0. Please use GATK3 to run this tool

Thanks!

Problems with job_somatic.pl

Hi,tianshi
Sorry to trouble you. I am trying to use your job_somatic.pl. However, I met a error with "Illegal modulus zero at /home/QBRC-somatic-mutation/job_somatic.pl line 31, line 1"
This is my code "perl /home/QBRC-somatic-mutation/job_somatic.pl somatic_design.txt example.sh 32 hg38 $genomeFasta /usr/bin/java /output 0 2 /disambiguate_3human_hepa"

Somatic_design.txt:
RNA:home/SRR8990697_R1.fastq.gz home/SRR8990697_R2.fastq.gz NA NA ./output human
RNA:home/SRR8990698_R1.fastq.gz home/SRR8990698_R2.fastq.gz NA NA ./output human

Best wishes!

Issues on somatic.pl

Dear @tianshilu ,

Hi! My fellow and I got to know about the QBRC Somatic Pipeline several months ago by a Cell paper discussed in our jounral club, and thanks to your piepline, we have managed to build up a local pipeline in our computer cluster.

However, there are still several code lines in somatic.ql that we could not fully understand. If you would like to help with us, I would highly appreciate it. Thank you in advance!

Here are the issues:

 system_call("annotate_variation.pl -geneanno -dbtype refGene -buildver ".$build." ".$output."/".$type."_mutations_".$build.".txt ".$annovar_path.$annovar_db);
  system_call("coding_change.pl --includesnp --alltranscript --newevf ".$output."/".$type."_mutations_".$build.".txt_tmp.txt ".$output."/".$type."_mutations_".$build.".txt".
    ".exonic_variant_function ".$annovar_path.$annovar_db."/".$build."_refGene.txt ".$annovar_path.$annovar_db."/".$build."_refGeneMrna.fa >/dev/null 2>/dev/null");
  system_call("Rscript ".$path."/somatic_script/add_fs_annotation.R ".$output." ".$build." ".$type);
  system_call("rm -f ".$output."/".$type."_mutations_".$build.".txt?*");
}

In these code lines, you end up with calling the add_fs_annotation.R. In the home directory, however, we also find a filter.R script. So what is the useage of filter.R? Does it need to be run before calling the add_fs_annotation.R?

Unable to access picard.jar

Hi，
I have a problem when using the somatic.pl, the following is the error "Error: Unable to access jarfile /QBRC-somatic-mutation/somatic_script//somatic_script/picard.jar". This is my code: perl /QBRC-somatic-mutation/somatic_script/somatic.pl NA NA SRR7246238_1.fastq.gz SRR7246238_2.fastq.gz 32 hg38 $gatkgenomeFasta /usr/bin/java $output human 1 /QBRC-somatic-mutation/disambiguate_pipeline .
I have picard.jar in this path "/QBRC-somatic-mutation/somatic_script/somatic_script". Maybe it's error because of the extra slashes "//somatic_script"?But I try to delete the extra slashes, it appears a new problem:

Use of /c modifier is meaningless in s/// at /QBRC-somatic-mutation/somatic_script/somatic.pl line 74.
String found where operator expected at /QBRC-somatic-mutation/somatic_script/somatic.pl line 75, near "$mutect=$path.""
(Missing semicolon on previous line?)
Use of /c modifier is meaningless without /g at /QBRC-somatic-mutation/somatic_script/somatic.pl line 75.
String found where operator expected at /QBRC-somatic-mutation/somatic_script/somatic.pl line 75, near "$picard=$path.""
(Missing semicolon on previous line?)
Use of /c modifier is meaningless without /g at /QBRC-somatic-mutation/somatic_script/somatic.pl line 75.
String found where operator expected at /QBRC-somatic-mutation/somatic_script/somatic.pl line 75, near "$bam2fastq=$path.""
(Missing semicolon on previous line?)
Unknown regexp modifier "/t" at /QBRC-somatic-mutation/somatic_script/somatic.pl line 74, at end of line
Unknown regexp modifier "/_" at /QBRC-somatic-mutation/somatic_script/somatic.pl line 74, at end of line
Unknown regexp modifier "/t" at /QBRC-somatic-mutation/somatic_script/somatic.pl line 74, at end of line
syntax error at /QBRC-somatic-mutation/somatic_script/somatic.pl line 75, near "$mutect=$path.""
/QBRC-somatic-mutation/somatic_script/somatic.pl has too many errors.

disambiguate_pipeline/conda_env/

Issue on filter_vcf.R

Dear @tianshilu ,

Thank you for your dealing with the issue I raised last time! That helped us a lot.

For filter_vcf.R, I notice that from line 108 to line 122,

  if (caller!="strelka_germline") 
  {
    vcf=vcf[vcf$normal_ref+vcf$normal_alt>=7,]
    vcf=vcf[vcf$tumor_alt>=3,]
    if (type=="somatic")
    {
      vcf=vcf[vcf$normal_alt/(vcf$normal_ref+vcf$normal_alt)<
                vcf$tumor_alt/(vcf$tumor_ref+vcf$tumor_alt)/2,]
      vcf=vcf[vcf$normal_alt/(vcf$normal_ref+vcf$normal_alt)<0.05,]
    }else
    {
      vcf=vcf[vcf$normal_alt>=3,]
    } 
  }else # for tumor-only calling, make the calling super sensitive
  {
    vcf=vcf[vcf$normal_ref+vcf$normal_alt>=3,]
    vcf=vcf[vcf$normal_alt>=1,]
  }

several filtering criteria were used here. I am a bit curious and confused, however, why those filtering criteria were applied here. To be more specific:

For both tumor-normal sample, it requires vcf=vcf[vcf$normal_ref+vcf$normal_alt>=7,] and vcf=vcf[vcf$tumor_alt>=3,]. I was a bit curious why the total number of ref and alt reads in normal read should be added up larger than 7, together with alt reads in tumor larger than 3? Is it because it considers the usual sequencing depth and coverage of tumor samples?
For somatic mutation, it requires vcf=vcf[vcf$normal_alt/(vcf$normal_ref+vcf$normal_alt)< vcf$tumor_alt/(vcf$tumor_ref+vcf$tumor_alt)/2,]. Why alter reads in tumor sample need to be divided by two here specifically?

I would highly appreciate it if you would like to help me on this issue. Thank you in advance!

Best wishes,
Jianning

Is there any mark error in the drawing fig（1B.1）?

Dear professors,

I am sorry to trouble you again!These days,I use the K563 patient in the CML dataset,and want to repeat the heatmap and the histogram in the fig(1.A/B).Up to now,I have some questions about this picture.

(1)Firstly,I have not found the ChrX 12975141 T-->A the variants calling result in my result files!However other mutation sites you listed in the fig(1.B) are both found in our result,and the changing tendency is both similar.The result is attached to this email.The VAF score of the ChrX 12975141 T-->A is 1 and the count frequency is up to 80,and in our result files we found none!That is really werid!So,we send this email and want to confirm that if you have misplaced the variant label in fig(1.B.1).We hope you can help us to confrim this doubt,because this result is really important for us to verify if there is any error in our calling pipeline!Only by confirming this,can we continue push our work forward!

Because the set-up of your email,I can not send the pictures.So,we just want you to check if the variant site ChrX 12975141 T-->A is avialable in you result files such as the vcf/germilne_mutations.txt/somatic_mutations.txt!

Thanks!
Xiu

Running somatic.pl through sbatch is very slow

Dear professors,
I am sorry to trouble you.
When I use the following code to run the “somatic.pl” file in “.sbatch” file to call mutation, the speed is particularly slow. It takes about 14 hours to get to the GATK BaseRecalibrator step.
I would like to ask you if there is any way to solve the problem.
The following is the content of the ".sbatch" file:
#!/bin/bash

#SBATCH -n 80
#SBATCH -t 0-30:00
#SBATCH -p xhacnormalb
#SBATCH --mem=150000
#SBATCH -o /public/home/wumeng01/NeoantigenML/SomaticMutationCalling.o
#SBATCH -e /public/home/wumeng01/NeoantigenML/SomaticMutationCalling.e

module load /public/software/modules/apps/biosoft/sambamba/0.8.1-linux-amd64
module load /public/software/modules/apps/biosoft/bwa/0.7.17-gcc-4.5.8
perl somatic.pl /public/home/wumeng01/NeoantigenML/PatientCohort/Patient1/SRR37_38N.R1.fastq.gz /public/home/wumeng01/NeoantigenML/PatientCohort/Patient1/SRR37_38N.R2.fastq.gz /public/home/wumeng01/NeoantigenML/PatientCohort/Patient1/SRR37_38T.R1.fastq.gz /public/home/wumeng01/NeoantigenML/PatientCohort/Patient1/SRR37_38T.R2.fastq.gz 32 hg38 /public/home/wumeng01/NeoantigenML/QBRC-Somatic-Pipeline/genome/hg38/hg38.fa /public/share/yujijun01/wumeng/software/java/jdk1.7.0_80/bin/java /public/home/wumeng01/NeoantigenML/QBRC-Somatic-Pipeline/output/Patient1/ human 1 /public/home/wumeng01/NeoantigenML/QBRC-Somatic-Pipeline/disambiguate_pipeline

Hope for your suggestions!Thank you for your nice work!
Best regrads,
Wu

What is the meaning to the normal/tumor files for the input in somatic.pl

Dear Professors@wtwt5237:

Recently,I have been learning this pipeline,and hope to transplant it in our own data.However,I have some questions about the input data in the somatic.pl.Why you set both the normal and tumor samples at the same time?Does it means that compare the tumor sample with the normal sample,and turn out the mutations in tumor samples against the normal?

I also notice your annoucement that"For tumor-only calling, put "NA NA" in the slots of the normal samples. Results will be written to germline files",Or maybe we can use the tumor only to call the germline files,while use the normal only to call what?
I can not understand the pair of normal and tumor samples,can how to define?The cells that come from the normal and tumor tissue from one patient?or the cells form the normal and camer patients respectively?

In other words,If I want to call the normal person's somatic mutations in one particular tissues to traces their development lineage,How can I input my files?

Hope for your suggestions!Thank you for your nice work!
Best regrads,
Xiu

tianshilu / qbrc-somatic-pipeline Goto Github PK

qbrc-somatic-pipeline's People

Contributors

Stargazers

Watchers

Forkers

qbrc-somatic-pipeline's Issues

Recommend Projects

Recommend Topics

Recommend Org