brentp / bwa-meth Goto Github PK
View Code? Open in Web Editor NEWfast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome
Home Page: https://arxiv.org/abs/1401.1129
License: MIT License
fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome
Home Page: https://arxiv.org/abs/1401.1129
License: MIT License
Hi,
it seems that bwameth.py only reverts the mapped reads back to their original state, while unmapped reads are reported in their c2t converted form in the aligned bam file.
Is this the desired behavior? For my application it would be crucial to get the original reads from the bam file also for the unmapped reads.
Is there a way to do this?
Thanks!
Hi,
I have installed bwa-meth and was running the install version on the test examples on the Git page. When I try to create the reference index files I get the following error: Traceback (most recent call last): File "/usr/local/bin/bwameth.py", line 5, in <module> pkg_resources.run_script('bwameth==0.10', 'bwameth.py') File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 487, in run_script File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1337, in run_script File "/usr/local/lib/python2.7/dist-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 45, in <module> checkX('samtools') File "/usr/local/lib/python2.7/dist-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 43, in checkX raise Exception("executable for '%s' not found" % cmd)
I have samtools1.3.1 installed and included in the path and I am wondering how to fix the error?
Thanks in advance for your help.
Hi,
I'm trying to install bwa-meth but I get the following error when trying to run
sudo python setup.py install
Traceback (most recent call last):
File "setup.py", line 5, in
import bwameth
File "/home/ilkka/biologia/bwa-meth-0.09/bwameth.py", line 42, in
checkX('samtools')
File "/home/ilkka/biologia/bwa-meth-0.09/bwameth.py", line 40, in checkX
raise Exception("executable for '%s' not found" % cmd)
Exception: executable for 'samtools' not found
It seems that the installer cannot find samtools. However, I have samtools in my $PATH. Any ideas?
Cheers,
Ilkka
I understand that bwa-meth aligns to the c2t converted reference. Would it be possible to change bwa-meth also to align to the g2a converted reference?
I believe this is what bismark calls "--pbat" option.
If not, are there any other tools out there that will wrap up "bwa mem" for bisulfite sequencing that may have this option added?
It'd be nice to add this to bioconda and Galaxy. For that, it'd be nice to include a version that's not from 2014 :) If you take care of the new release, I'll see to it getting added to bioconda and Galaxy.
Hi,
I used bwa-meth to align rust infected wheat samples to the rust genome. However, when I ran samtools flagstat I get the follow output:
22806328 + 4861610 in total (QC-passed reads + QC-failed reads)
154024 + 1050636 secondary
0 + 0 supplementary
0 + 0 duplicates
22634029 + 4846318 mapped (99.24% : 99.69%)
22652304 + 3810974 paired in sequencing
11326152 + 1905487 read1
11326152 + 1905487 read2
22435338 + 0 properly paired (99.04% : 0.00%)
22467696 + 3780390 with itself and mate mapped
12309 + 15292 singletons (0.05% : 0.40%)
30982 + 232010 with mate mapped to a different chr
13721 + 50746 with mate mapped to a different chr (mapQ>=5)
I didn't know BWA-meth gave secondary alignments. Also, the total number of QC-passed reads + QC-failed reads (27480347) is greater than the total of the pair-end reads (26463278).
Hi Brent,
I was trying to have a testrun with the example from bwa-meth and after running bwameth.py --reference ref.fa t_R1.fastq.gz t_R2.fastq.gz -t 12 | samtools view -b - > bwa-meth.bam
I get the following error:
Traceback (most recent call last):
File "/home/superste/.local/bin/bwameth.py", line 4, in <module>
__import__('pkg_resources').run_script('bwameth==0.2.1', 'bwameth.py')
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 750, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1527, in run_script
exec(code, namespace, namespace)
File "/home/superste/.local/lib/python2.7/site-packages/bwameth-0.2.1-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 509, in <module>
main(sys.argv[1:])
File "/home/superste/.local/lib/python2.7/site-packages/bwameth-0.2.1-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 506, in main
set_as_failed=args.set_as_failed)
File "/home/superste/.local/lib/python2.7/site-packages/bwameth-0.2.1-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 331, in bwa_mem
as_bam(cmd, fa, set_as_failed)
File "/home/superste/.local/lib/python2.7/site-packages/bwameth-0.2.1-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 341, in as_bam
sam_iter = nopen_keep_parent_stdin(pfile, 'r')
File "/home/superste/.local/lib/python2.7/site-packages/bwameth-0.2.1-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 49, in nopen_keep_parent_stdin
preexec_fn=toolshed.files.prefunc,
AttributeError: 'module' object has no attribute 'prefunc'
I tried to reinstall both bwa-meth and toolshed but the error is still there.
Thank you in advance for the help!
Stefan
[yubau@CNUH-i7 ~]$
python /local_work/bin/bwa-meth/bwameth.py
-t 12
--reference /DataShare/Transcriptome/Homo_sapiens/NCBI/ GRCh38/Sequence/BWAIndex/version0.6.0/genome.fa.bwameth.c2t
/home/yubau/Methylation/26-meth_S44_R1_001.fastq.gz
/home/yubau/Methylation/26-meth_S44_R2_001.fastq.gz > /home/yubau/26-meth_S44.sam
Traceback (most recent call last):
File "/local_work/bin/bwa-meth/bwameth.py", line 509, in
main(sys.argv[1:])
File "/local_work/bin/bwa-meth/bwameth.py", line 506, in main
set_as_failed=args.set_as_failed)
File "/local_work/bin/bwa-meth/bwameth.py", line 315, in bwa_mem
raise BWAMethException("first run bwameth.py index %s" % fa)
main.BWAMethException: first run bwameth.py index /DataShare/Transcriptome/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/v ersion0.6.0/genome.fa.bwameth.c2t
yes i already execute "bwameth.py index $REF"
[yubau@CNUH-i7 ~]$ ls /DataShare/Transcriptome/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/version0.6.0/
genome.fa
genome.fa.ann
genome.fa.bwameth.c2t.amb
genome.fa.bwameth.c2t.bwt
genome.fa.bwameth.c2t.sa
genome.fa.fai genome.fa.sa
genome.fa.amb
genome.fa.bwameth.c2t
genome.fa.bwameth.c2t.ann
genome.fa.bwameth.c2t.pac
genome.fa.bwt
genome.fa.pac
[yubau@CNUH-i7 bwa-meth]$ ls
bam-merge-pairs.py
bwameth.py
compare
example
ez_setup.py
LICENSE
paper
README.md
requirements.txt
scripts
setup.py
[yubau@CNUH-i7 bwa-meth]$ pwd
/local_work/bin/bwa-meth
[yubau@CNUH-i7 bwa-meth]$ python -V
Python 2.7.15 :: Anaconda, Inc.
[yubau@CNUH-i7 bwa-meth]$ samtools --version
samtools 1.9
Using htslib 1.9
Copyright (C) 2018 Genome Research Ltd.
[yubau@CNUH-i7 bwa-meth]$ bwa
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.17-r1188
Why is there a problem?
In version 1.3 samtools deleted some deprecated CLI options for "samtools sort" command (see http://www.htslib.org/doc/samtools-1.3.html) :
Historically samtools sort also accepted a less flexible way of specifying the final and temporary output filenames:
samtools sort [-f] [-o] in.bam out.prefix
This has now been removed. The previous out.prefix argument (and -f option, if any) should be changed to an appropriate combination of -T PREFIX and -o FILE. The previous -o option should be removed, as output defaults to standard output
So bwa-meth.py doesn't work with latest samtools. It can fail with different kinds of errors:
Details:
$ bwameth.py --reference /indexes/hg38.fa --threads 20 --prefix issue18 ./foo.fastq
:unning: bwa mem -T 40 -B 2 -L 10 -CM -R '@RG ID:bm SM:bm' -t 20 /indexes/hg38.fa.bwameth.c2t '</home/user/anaconda/bin/python /home/user/anaconda/bin/bwameth.py c2t ./foo.fastq NA'
writing to:
samtools view -bS - | samtools sort -m 2415919104 - issue18
converting reads in ./foo.fastq,NA
WARNING: running bwameth in single-end mode
[M::main_mem] read 10 sequences (900 bp)...
[M::mem_process_seqs] Processed 10 reads in 0.006 CPU sec, 0.002 real sec
[main] Version: 0.7.9a-r786
[main] CMD: bwa mem -T 40 -B 2 -L 10 -CM -R @RG ID:bm SM:bm -t 20 /indexes/hg38.fa.bwameth.c2t </home/user/anaconda/bin/python /home/user/anaconda/bin/bwameth.py c2t ./foo.fastq NA
[main] Real time: 13.531 sec; CPU: 12.509 sec
Traceback (most recent call last):
File "/home/user/anaconda/bin/bwameth.py", line 548, in <module>
main(sys.argv[1:])
File "/home/user/anaconda/bin/bwameth.py", line 533, in main
set_as_failed=args.set_as_failed)
File "/home/user/anaconda/bin/bwameth.py", line 198, in bwa_mem
as_bam(cmd, fa, prefix, calmd, set_as_failed)
File "/home/user/anaconda/bin/bwameth.py", line 243, in as_bam
assert p.wait() == 0
AssertionError
Where foo.fastq is fastq content suggested in #18
2. Exception: bad or empty fastqs
bwameth.py --reference /indexes/hg38.fa --threads 20 --prefix SRR033942 ./GSM491349/sra/SRX015763/SRR033942/SRR033942.sra
running: bwa mem -T 40 -B 2 -L 10 -CM -R '@RG ID:bm SM:bm' -t 20 /indexes/hg38.fa.bwameth.c2t '</home/user/anaconda/bin/python /home/user/anaconda/bin/bwameth.py c2t ./GSM491349/sra/SRX015763/SRR033942/SRR033942.sra NA'
writing to:
samtools view -bS - | samtools sort -m 2415919104 - SRR033942
converting reads in ./GSM491349/sra/SRX015763/SRR033942/SRR033942.sra,NA
WARNING: running bwameth in single-end mode
[gzclose] buffer error
Traceback (most recent call last):
File "/home/user/anaconda/bin/bwameth.py", line 548, in <module>
main(sys.argv[1:])
File "/home/user/anaconda/bin/bwameth.py", line 533, in main
set_as_failed=args.set_as_failed)
File "/home/user/anaconda/bin/bwameth.py", line 198, in bwa_mem
as_bam(cmd, fa, prefix, calmd, set_as_failed)
File "/home/user/anaconda/bin/bwameth.py", line 232, in as_bam
raise Exception("bad or empty fastqs")
Exception: bad or empty fastqs
IOError: [Errno 32] Broken pipe
bwameth.py --reference /indexes/hg38.fa --threads 20 --prefix //GSE19418_tmp/GSM491349/SRR033987_hg38 //GSE19418_tmp/GSM491349/tmp/SRR033987.fastq.gz
running: bwa mem -T 40 -B 2 -L 10 -CM -R '@RG ID:bm SM:bm' -t 20 //indexes/hg38.fa.bwameth.c2t '</home/user/anaconda/bin/python /home/user/anaconda/bin/bwameth.py c2t /GSE19418_tmp/GSM491349/tmp/SRR033987.fastq.gz NA'
writing to:
samtools view -bS - | samtools sort -m 2415919104 - /GSE19418_tmp/GSM491349/SRR033987_hg38
converting reads in /GSE19418_tmp/GSM491349/tmp/SRR033987.fastq.gz,NA
WARNING: running bwameth in single-end mode
[M::main_mem] read 2000000 sequences (200000000 bp)...
[M::mem_process_seqs] Processed 2000000 reads in 7131.578 CPU sec, 896.316 real sec
Traceback (most recent call last):
File "/home/user/anaconda/bin/bwameth.py", line 548, in <module>
main(sys.argv[1:])
File "/home/user/anaconda/bin/bwameth.py", line 533, in main
set_as_failed=args.set_as_failed)
File "/home/user/anaconda/bin/bwameth.py", line 198, in bwa_mem
as_bam(cmd, fa, prefix, calmd, set_as_failed)
File "/home/user/anaconda/bin/bwameth.py", line 238, in as_bam
out.write(str(aln) + '\n')
IOError: [Errno 32] Broken pipe
[fputs] Broken pipe
Solution, just use new API, available at least since 1.0 (http://www.htslib.org/doc/samtools-1.0.html)
My fixes: iromeo@c45210d and iromeo@0a1703b
What does "pct" mean?
how can I tell which site is methylated?
Is it possible to support >=Python2.6?
Update: bellow is a list of Python2.7 features I've spotted in the code
argparse
, which is only available in Python2.7. This can be fixed by adding a conditional dependency on argparse
in setup.py
.toolshed/files.py
and toolshed/pool.py
assume sys.version_info
is a namedtuple
. This is easily fixed by using plain indexing, eg. sys.version_info[0]
.Hi Brent,
I am using following to map single end RRBS-seq data with bwa-meth
python {config[bwmeth_path]} {params.custom} --reference {config[ref_fa]} {input} \
--read-group '{params.rg}' 2> {log.bwa} \
| samtools sort -m 2G -@ 5 -T {output[0]}.tmp -o {output[0]}
samtools index {output[0]}
for RRBS, one expects to see many duplicates at the same CpG (exact restriction enzyme cut site).
MethylDakel has an option --keepDups
to remain the duplicates. Do I need to mark the bam files from bwa-meth and then go with MethylDakel?
Thanks!
Tommy
lines 44-45
GSNAP:
python src/gsnap-meth.py index $REF
I think there needs to be a kmer size specified in this command following $REF
. What kmer size have you used?
Hi everyone,
Thanks for the excellent tool!
(1) Could you update the README so the --prefix
flag isn't shown? I missed the note above following the examples; it would help productivity if you just created a "legacy code" section or eliminate all references to --prefix
all together
(2) I'm a bit confused, as it appears that I do not have an output for bwameth.py --reference ref.fa t_R1.fastq.gz t_R2.fastq.gz -t X
.
When running this command, I see the output using bwa mem
with the last item as NA
. Normally, if I were to use bwa mem alone, I would pipe this to an output. When running bwameth.py --reference ref.fa t_R1.fastq.gz t_R2.fastq.gz -t X
as in the example https://github.com/brentp/bwa-meth/tree/master/example/, how exactly do I access the output?
I have encountered a problem with bwameth that pops up when the FASTQ comment contains a read group. In this case, bwameth only outputs the SAM header without any reads.
This is the command I run:
> bwameth.py --reference ../data/arabidopsis_thaliana/genome_assembly/TAIR10.fasta -t 4 data/test/test-line_A-R1.classified.qc.fastq data/test/test-line_A-R2.classified.qc.fastq > data/test/test-line_A.mapped.sam
The stdout/stderr output is here:
running: /home/oender/anaconda3/envs/population-epigenetics/bin/python /home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/bwameth-0.2.2-py2.7.egg-info/scripts/bwameth.py c2t data/test/test-line_A-R1.classified.qc.fastq data/test/test-line_A-R2.classified.qc.fastq |bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@RG\tID:test-line_A-R.classified.qc\tSM:test-line_A-R.classified.qc' -t 4 ../data/arabidopsis_thaliana/genome_assembly/TAIR10.fasta.bwameth.c2t -
converting reads in data/test/test-line_A-R1.classified.qc.fastq,data/test/test-line_A-R2.classified.qc.fastq
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 320080 sequences (40000212 bp)...
[M::process] 0 single-end sequences; 320080 paired-end sequences
WARNING: 1709 reads with length < 80
: this program is designed for long reads
[M::process] read 121626 sequences (15199052 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 97487, 4, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (169, 215, 277)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 493)
[M::mem_pestat] mean and std.dev: (227.57, 79.20)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 601)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 320080 reads in 245.362 CPU sec, 61.338 real sec
['NM:i:21', 'MD:Z:40^GGAATTGTTGATTTGGATTT80G5', 'MC:Z:126M', 'AS:i:97', 'XS:i:83', 'RG:Z:test-line_A-R.classified.qc', 'XA:Z:f3,+14193782,40S86M,1;f3,+14204191,40S86M,1;', 'RG:Z:CB0L6ANXX:1:ATTCCT YS:Z:TTTGGATTTGGAATTGTTGAGAAAAGTTTATCGGGTTTGAGGAATTGTTGAGAAAAGTTTATTGGGTTTGAGGATTTGTTGATTAGGAGTGGAAATTGTTGAGAAAAATTTATTGGGTTTTAGGAA', 'YC:Z:CT']
700523F:121:CB0L6ANXX:1:1103:2712:2482
Traceback (most recent call last):
File "/home/oender/anaconda3/envs/population-epigenetics/bin/bwameth.py", line 4, in <module>
__import__('pkg_resources').run_script('bwameth==0.2.2', 'bwameth.py')
File "/home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/pkg_resources/__init__.py", line 664, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1444, in run_script
exec(code, namespace, namespace)
File "/home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/bwameth-0.2.2-py2.7.egg-info/scripts/bwameth.py", line 509, in <module>
main(sys.argv[1:])
File "/home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/bwameth-0.2.2-py2.7.egg-info/scripts/bwameth.py", line 506, in main
set_as_failed=args.set_as_failed)
File "/home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/bwameth-0.2.2-py2.7.egg-info/scripts/bwameth.py", line 331, in bwa_mem
as_bam(cmd, fa, set_as_failed)
File "/home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/bwameth-0.2.2-py2.7.egg-info/scripts/bwameth.py", line 353, in as_bam
for aln in handle_reads(pair_list, set_as_failed):
File "/home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/bwameth-0.2.2-py2.7.egg-info/scripts/bwameth.py", line 376, in handle_reads
orig_seq = aln.original_seq
File "/home/oender/anaconda3/envs/population-epigenetics/lib/python3.6/site-packages/bwameth-0.2.2-py2.7.egg-info/scripts/bwameth.py", line 284, in original_seq
return next(x for x in self.other if x.startswith("YS:Z:"))[5:]
StopIteration
[M::process] 0 single-end sequences; 121626 paired-end sequences
As you can see, RG:Z:CB0L6ANXX:1:ATTCCT
is the RG that was part of the FASTQ input:
> head -n1 data/test/test-line_A-R{1,2}.classified.qc.fastq
==> data/test/test-line_A-R1.classified.qc.fastq <==
@700523F:121:CB0L6ANXX:1:1103:2712:2482 RG:Z:CB0L6ANXX:1:ATTCCT
==> data/test/test-line_A-R2.classified.qc.fastq <==
@700523F:121:CB0L6ANXX:1:1103:2712:2482 RG:Z:CB0L6ANXX:1:ATTCCT
I think it is a bug that bwameth adds RG:Z:test-line_A-R.classified.qc
although I did not supply any read group parameter and actually want to pass through the RGs in the FASTQs. Indeed, when I run the command
bwameth.py c2t data/test/test-line_A-R1.classified.qc.fastq data/test/test-line_A-R2.classified.qc.fastq |bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -t 4 ../data/arabidopsis_thaliana/genome_assembly/TAIR10.fasta.bwameth.c2t -
(i.e., explicitly removing -R '...') everything works, although the SAM has to be converted back.
As I see it, the problem arises because of the way in which the read group argument is handled. Probably, you can leave the function bwa_mem
as it is but change how it is called. It is not quite clear but I guess in the call of bwa_mem,
rg=args.read_group or rname(*args.fastqs)
causes the trouble if I do not supply a read group parameter on the command line. Or you have to disentangle the addition of RG to the header from RGs for individual reads.
Hi,
I am trying to create reference genome from hg38 and getting following error:
/bwameth.py index hg38_no_alt.fa
converting c2t in hg38_no_alt.fa to hg38_no_alt.fa.bwameth.c2t
indexing: hg38_no_alt.fa.bwameth.c2t
[bwa_index] Pack FASTA... 53.61 sec
[bwa_index] Reverse the packed sequence... 19.25 sec
[bwa_index] Construct BWT for the packed sequence...
TextLengthFromBytePacked(): text length > 2^32!
cmd was:bwa index -a bwtsw hg38_no_alt.fa.bwameth.c2t
Traceback (most recent call last):
File "/bwa-meth-0.10/bwameth.py", line 601, in
main(sys.argv[1:])
File "/bwa-meth-0.10/bwameth.py", line 548, in main
sys.exit(bwa_index(convert_fasta(args[1])))
File "/bwa-meth-0.10/bwameth.py", line 151, in bwa_index
run("bwa index -a bwtsw %s" % fa)
File "/bwa-meth-0.10/bwameth.py", line 60, in run
list(nopen("|%s" % cmd.lstrip("|")))
File "/anaconda/lib/python2.7/site-packages/toolshed-0.4.0-py2.7.egg/toolshed/files.py", line 53, in process_iter
raise ProcessException(cmd)
toolshed.files.ProcessException: bwa index -a bwtsw hg38_no_alt.fa.bwameth.c2t
Any idea on resolving the issue?
Thanks,
Ali
Hi,
I get the following error when trying to align my reads:
[M::main_mem] read 100000 sequences (10000000 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 13941, 1, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (87, 118, 163)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 315)
[M::mem_pestat] mean and std.dev: (128.17, 55.20)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 391)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 100000 reads in 171.495 CPU sec, 171.414 real sec
Traceback (most recent call last):
File "/home/markuss/bin/bwameth.py", line 5, in <module>
pkg_resources.run_script('bwameth==0.10', 'bwameth.py')
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 488, in run_script
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1338, in run_script
File "/pica/h1/markuss/.local/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 601, in <module>
main(sys.argv[1:])
File "/pica/h1/markuss/.local/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 586, in main
set_as_failed=args.set_as_failed)
File "/pica/h1/markuss/.local/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 259, in bwa_mem
as_bam(cmd, fa, prefix, calmd, set_as_failed)
File "/pica/h1/markuss/.local/lib/python2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 299, in as_bam
out.write(str(aln) + '\n')
IOError: [Errno 32] Broken pipe
[fputs] Broken pipe
I am using bwa-meth version 0.10 and toolshed version 0.4.2 with python 2.7.6. Your provided example data works great. The bam file is created with a header, but no alignments are written.
/Markus
Hi
many thanks for developing this tools, it is super fast! I have a question. Do bwa-meth output unique mapping reads or should I add an additional filter?
Many thanks for your help
Alice
I cannot get bwameth to work.
I tried to run the following:
bwameth.py --threads 16 --reference ESR1_nmasked.fasta.bwameth.c2t DMA11392-22973-TSF-13-05-POM_S84_L001_R1_001_val_1.fq DMA11392-22973-TSF-13-05-POM_S84_L001_R2_001_val_2.fq -t 12 | samtools view -b - > bwa-meth.bam
Regardless of whether I am in the folder where these files are located or I specify the full path for the reference genome and the for/rev reads, I get the following error:
Traceback (most recent call last): File "/nv/hp10/bin/bwa-meth-0.10/bwameth.py", line 601, in <module> main(sys.argv[1:]) File "/nv/hp10/bin/bwa-meth-0.10/bwameth.py", line 586, in main set_as_failed=args.set_as_failed) File "/nv/hp10/bin/bwa-meth-0.10/bwameth.py", line 246, in bwa_mem raise BWAMethException("first run bwameth.py index %s" % fa) __main__.BWAMethException: first run bwameth.py index /nv/hp10/scratch/nmasked.fasta.bwameth.c2t
I'm using bwa-0.7.16a and bwa-meth-0.10 on a computing cluster running x86_64 x86_64 x86_64 GNU/Linux.
It would be nice for bwameth to accept interleaved reads like bwa, so you can adapter trim on a pipe (like bwakit). Would you entertain a PR to that effect?
When paired end files are sent in as
bwameth.py --reference hg19.fasta test_R1.fastq test_R2.fastq > test.sam
The output starts with:
running: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p ...
The -p flag is incorrect. According to BWA:
If mates.fq file is absent and option -p is not set, this command regards input reads are single-end. If mates.fq is present, this command assumes the i-th read in reads.fq and the i-th read in mates.fq constitute a read pair. If -p is used, the command assumes the 2i-th and the (2i+1)-th read in reads.fq constitute a read pair (such input file is said to be interleaved). In this case, mates.fq is ignored. In the paired-end mode, the mem command will infer the read orientation and the insert size distribution from a batch of reads.
Just wondering if the bam file generated by bwa-meth file can be used directly in other SNP callers like biscuit? My (naive) self is just curious since bwa-meth performs the C -> T conversion to speed up alignment.
Hi Brent,
I am aware of tseemann/snippy#100
I am using bwa/0.7.10.
but I checked https://github.com/brentp/bwa-meth/blob/master/bwameth.py, it seems you already took care of it https://github.com/brentp/bwa-meth/blob/master/bwameth.py#L293 by
escaping the tab rg = '@RG\\tID:{rg}\\tSM:{rg}'.format(rg=rg)
what's wrong here?
python /scratch/genomic_med/apps/bwa-meth/bwa-meth-0.2.0/bwameth.py --threads 8 --reference /scratch/genomic_med/apps/annot/bwa_meth_index/UCSC_hg19_genome.fa 01trimmed_fastqs/Panc1_Exo1_trimmed.fq --read-group '@RG ID:Panc1_Exo1 SM:Panc1_Exo1'
running: bwa mem -T 40 -B 2 -L 10 -CM -R '@RG ID:Panc1_Exo1 SM:Panc1_Exo1' -t 8 /scratch/genomic_med/apps/annot/bwa_meth_index/UCSC_hg19_genome.fa.bwameth.c2t '</scratch/genomic_med/apps/python/anaconda/default/bin/python /scratch/genomic_med/apps/bwa-meth/bwa-meth-0.2.0/bwameth.py c2t 01trimmed_fastqs/Panc1_Exo1_trimmed.fq NA'
[E::bwa_set_rg] no ID at the read group line
Traceback (most recent call last):
File "/scratch/genomic_med/apps/bwa-meth/bwa-meth-0.2.0/bwameth.py", line 450, in <module>
main(sys.argv[1:])
File "/scratch/genomic_med/apps/bwa-meth/bwa-meth-0.2.0/bwameth.py", line 447, in main
set_as_failed=args.set_as_failed)
File "/scratch/genomic_med/apps/bwa-meth/bwa-meth-0.2.0/bwameth.py", line 273, in bwa_mem
as_bam(cmd, fa, set_as_failed)
File "/scratch/genomic_med/apps/bwa-meth/bwa-meth-0.2.0/bwameth.py", line 290, in as_bam
raise Exception("bad or empty fastqs")
Exception: bad or empty fastqs
thanks.
When I run bwa-meth, it calls bwa-mem with these parameter set:
bwa mem -T 40 -B 2 -L 10 -CM
Would it be possible to pass-through the -T/B/L options to the script to tune it to the desired options by the user?
Morning Brent
I think the README for v0.2.0/master is a little out of date. I'm assuming you dropped the --prefix and bai indexing support from bwameth.py , as when I specify --prefix as the README suggests, this is no longer recognised by bwameth.py. And due to the new opt passthrough functionality it's passed through to bwa mem, as which point I get the following error:
mem: invalid option -- '-'
Also the installation instructions are a little out of date, reference v0.10.0 instead of v0.2.0.
I would have simply issued a pull request on the README, but I wasn't sure about the wider considerations with the latest release, so thought it best to leave it with you.
When I get time, I will try and tighten up the option pass through to make it more explicit using something like --bwa_mem_opts.
Thanks
Nathan
It would be great if the script how to get wig or bedgraph for methylation level in the usage section.
index worked, align failed see below.
python 2.7.6, Centos 6.2, setuptools 3.3,
note: lib/python2.7/site-packages/
contains only a single file: bwameth-0.09-py2.7.egg
It does not contain a directory so named, etc:
bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py
ERROR Report:
bwameth.py --reference ref.fa t_R1.fastq.gz t_R2.fastq.gz -t 12
running: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@rg ID:t_R SM:t_R' -t 12 ref.fa.bwameth.c2t '</home/pi/vogelw/bin/python /home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz'
writing to:
samtools view -bS - | samtools sort - bwa-meth
/home/pi/vogelw/bin/python: can't find 'main' module in '/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py'
[gzclose] buffer error
cmd was:bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@rg ID:t_R SM:t_R' -t 12 ref.fa.bwameth.c2t '</home/pi/vogelw/bin/python /home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz'
Traceback (most recent call last):
File "/home/pi/vogelw/bin/bwameth.py", line 5, in
pkg_resources.run_script('bwameth==0.09', 'bwameth.py')
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 528, in run_script
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1401, in run_script
File "/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 582, in
File "/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 579, in main
File "/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 258, in bwa_mem
File "/home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 286, in as_bam
File "/home/pi/vogelw/lib/python2.7/site-packages/toolshed-0.3.6-py2.7.egg/toolshed/files.py", line 252, in reader
for toks in line_gen:
File "/home/pi/vogelw/lib/python2.7/site-packages/toolshed-0.3.6-py2.7.egg/toolshed/files.py", line 53, in process_iter
raise ProcessException(cmd)
toolshed.files.ProcessException: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@rg ID:t_R SM:t_R' -t 12 ref.fa.bwameth.c2t '</home/pi/vogelw/bin/python /home/pi/vogelw/lib/python2.7/site-packages/bwameth-0.09-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz'
I'll work on this later today... sorry I missed it earlier!
Hi,
Can you help me udnerstand this error:
...
running: samtools index Sample_CGATGT_AC4190ANXX_L004_001.bam
[fputs] Broken pipe
Seems like the BAM is ok and indexing just failed but want to be absolutely sure. Any ideas?
thanks for your help.
The current documentation states that it's possible to pass a bunch of paired reads, eg
$ bwameth.py --reference $REF foo_1.fastq,foo_2.fastq bar_1.fastq,bar_2.fastq
Traceback (most recent call last):
File "/home/user/.virtualenvs/env/bin/bwameth.py", line 581, in <module>
main(sys.argv[1:])
File "/home/user/.virtualenvs/env/bin/bwameth.py", line 576, in main
rname(*args.fastqs), calmd=args.calmd,
TypeError: rname() takes at most 2 arguments (3 given)
However, args.fastqs
are handled in way which makes this use-case impossible.
I think it would be nice to split bwa-meth
into separate subcommands, e.g.
+ bwameth
|
+-- index.py
+-- c2t.py
+-- tabulate.py
+-- cnvs.py
+-- __main__.py
Then each subcommand can be called with python -m bwameth.<cmd>
.
Hey Brent :D
So it seems after bwa-meth's bias-plot has finished running, writes it's output and closes, "samtools view -F4" continues to hang around reading the file (presumably until the end) and not close?
I guess the bias plot takes a sample of the data and not all the data when creating it's plot (because it's so quick) - so i guess samtools should close down too once its done :)
Perhaps there could be an --all flag when running the bias-plot to check all the reads? I'm a little nervous about subsamples, particularly on sorted data. All the best, and again thank you so much for this awesome mapper :)
It seems that when running tabulate like so:
/home/john/toolshed-0.4.0/bwa-meth-0.10/bwameth.py tabulate --map-q 60 --bissnp /ex/BisSNP-0.82.2.jar --prefix Mm01.WGBS -t 10 --reference /ref/mm10.fa --trim 3,1 /media/john/DATA1/WGBS/44_Mm01_WEAd_C2_WGBS_E_no_dupes.bam
I get the following output:
java -Xmx24g -jar /ex/BisSNP-0.82.2.jar \
-R /ref/mm10.fa \
-I /media/john/DATA1/WGBS/44_Mm01_WEAd_C2_WGBS_E_no_dupes.bam \
-T BisulfiteGenotyper \
--trim_5_end_bp 3 \
--trim_3_end_bp 1 \
-vfn1 Mm01.WGBS.meth.vcf -vfn2 Mm01.WGBS.snp.vcf \
--non_directional_protocol \
-mbq 12 \
-minConv 0 \
-toCoverage 1000 \
-mmq 60 \
-nt 10
0 T* C
0 A* G
0 A* G
0 C* T
0 T* C
0 C* T
0 C* T
0 C* T
0 T* C
0 G* A
0 T* C
0 T* C
0 C* T
0 G* A
0 A* G
0 T* C
0 C* T
Mm01.WGBS.meth.vcf
Traceback (most recent call last):
File "/home/john/toolshed-0.4.0/bwa-meth-0.10/bwameth.py", line 601, in <module>
main(sys.argv[1:])
File "/home/john/toolshed-0.4.0/bwa-meth-0.10/bwameth.py", line 554, in main
sys.exit(tabulate_main(args[1:]))
File "/home/john/toolshed-0.4.0/bwa-meth-0.10/bwameth.py", line 506, in tabulate_main
.format(prefix=a.prefix, sample=sample), "w")
IOError: [Errno 2] No such file or directory: 'Mm01.WGBS.C57BL/6J.meth.bed'
I think the error is about there not being a directory called Mm01.WGBS.C57BL - but it only wants to write to that directory because my RGIDs look like this:
@RG ID:HWI-ST552.C2J56ACXX.1.NA SM:C57BL/6J LB:Mm01.WGBS PL:illumina CN:Essen DS:Circadian Day
@RG ID:HWI-ST552.C2J56ACXX.2.NA SM:C57BL/6J LB:Mm01.WGBS PL:illumina CN:Essen DS:Circadian Day
@RG ID:HWI-ST552.C2J56ACXX.3.NA SM:C57BL/6J LB:Mm01.WGBS PL:illumina CN:Essen DS:Circadian Day
@RG ID:HWI-ST552.D2FWTACXX.1.CAGATCA SM:C57BL/6J LB:Mm01.WGBS PL:illumina CN:Essen DS:Circadian Day
and I think the "/" i the sample name is not being escaped/deleted when writing out the new file. Also, i'm not sure using data from the RG is a good idea regardless, since 1 BAM file can contain many RGIDs (it just so happens that in my file they all have the same SM/LB) I did however get the following outputs:
Mm01.WGBS.meth.vcf
Mm01.WGBS.meth.vcf.MethySummarizeList.txt
Mm01.WGBS.snp.vcf
All the best! :)
Thanks for the tool. One question - is there an easy way to detect from the read itself whether or not it is methylated? MethylDackel calculates on a per-cytosine basis but I'm also looking for essentially the inverse. Bismark does this with the XM tag.
I’ve noticed that running bwa meth results in a higher rate of MAPQ 0 reads than vanilla bwa (I did a test on non-bisulfite converted data). Is there any way to ameliorate this or is it just a natural consequence of needing to distinguish between methylated and nonmethylated loci?
Dear brentp,
Here's my SLURM script:
echo pwd=pwd
module load python/2.7.11
module load samtools
REFERENCE=/lustre/project/hpcstaff/user/OryziasLatipes/GCF_000313675.1_ASM31367v1_genomic.fna.bgz
echo Generating index for $REFERENCE
time bwameth.py index $REFERENCE
echo Analyzing methylation..
time bwameth.py --reference
echo Analysis succeeded.
...and here's the result of my alignment check...
37067603 + 3256037 in total (QC-passed reads + QC-failed reads)
63598 + 967992 secondary
0 + 0 supplimentary
0 + 0 duplicates
35373511 + 3256037 mapped (95.43%:100.00%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Q: This says zero reads paired, true? Is this correct?
Please let me know if you need more info, thanks.
Best,
CB
These two seem to be the bottlenecks during alignment. Have you considered using Cython or C/C++ to speed things up?
Hi Brent,
Thanks for maintaining and answering questions on bwa-meth. I am not sure I understand the YD and YC tags that are set by bwa-meth. My understanding so far:
YD tag makes note of the direction of mapping: whether it maps to the forward (f) or reverse (r) strand of the reference genome.
YC tag tells you whether or not your read mapped back to the C to T or G to A converted genome.
As bwa-meth assumes directional libraries, shouldn't the YD / YC tags always be f + CT and r + GA? When parsing through some bwa-meth aligned files, I noticed that some reads contain r + CT as well as f + GA.
Thanks for your help!
I have tried running the example through bwa-meth, but I fail at the tabulation step, any idea what goes wrong ?
jvh@host:/location_on_disk/bwa-meth/example$ ../bwameth.py index ref.fa
converting c2t in ref.fa to ref.fa.bwameth.c2t
indexing: ref.fa.bwameth.c2t
jvh@host:/location_on_disk/bwa-meth/example$ ../bwameth.py --reference ref.fa t_R1.fastq.gz t_R2.fastq.gz -t 12
running: bwa mem -T 40 -B 3 -L 25 -CM -U 100 -p -R '@RG ID:t_R SM:t_R' -t 12 ref.fa.bwameth.c2t '</usr/bin/python ../bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz'
writing to:
samtools view -bS - | samtools sort -@3 - bwa-meth
converting reads in t_R1.fastq.gz,t_R2.fastq.gz
[M::main_mem] read 92726 sequences (9365326 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 44884, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (109, 137, 175)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 307)
[M::mem_pestat] mean and std.dev: (145.08, 48.71)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 373)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[main] Version: 0.7.6a-dev-r434
[main] CMD: bwa mem -T 40 -B 3 -L 25 -CM -U 100 -p -R @RG ID:t_R SM:t_R -t 12 ref.fa.bwameth.c2t </usr/bin/python ../bwameth.py c2t t_R1.fastq.gz t_R2.fastq.gz
[main] Real time: 9.615 sec; CPU: 16.204 sec
running: samtools index bwa-meth.bam
[E::hts_idx_push] chromosome blocks not continuous
jvh@host:/location_on_disk/bwa-meth/example$ samtools flagstat bwa-meth.bam
92791 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
92723 + 0 mapped (99.93%:-nan%)
92791 + 0 paired in sequencing
46399 + 0 read1
46392 + 0 read2
92276 + 0 properly paired (99.44%:-nan%)
92652 + 0 with itself and mate mapped
71 + 0 singletons (0.08%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
jvh@host:/location_on_disk/bwa-meth/example$ ../bwameth.py tabulate --trim 3,3 --map-q 60 --bissnp $BISSNP --prefix ex -t 12 --reference ref.fa bwa-meth.bam
Traceback (most recent call last):
File "../bwameth.py", line 500, in <module>
main(sys.argv[1:])
File "../bwameth.py", line 475, in main
sys.exit(tabulate_main(args[1:]))
File "../bwameth.py", line 407, in tabulate_main
% a.reference)
AssertionError: run samtools faidx ref.fa
Hi Brent,
I am using bwa-meth to align paired end reads with the alignment failing due to an IO error,
I am using
toolshed 0.4.6
bwa-meth-0.10
bwa 0.7.15-r1140
samtools 1.3.1
See the cmd line output below,
Kind Regards
Nicola
running: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@RG ID:BP_1207.BPR.150912.HiSeq2500.FCB.lane7.R SM:BP_1207.BPR.150912.HiSeq2500.FCB.lane7.R' -t 8 hg19_bwameth/human_g1k_v37.fasta.bwameth.c2t '</usr/bin/python /Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py c2t BP_1207.BPR.150912.HiSeq2500.FCB.lane7.R1.fastq BP_1207.BPR.150912.HiSeq2500.FCB.lane7.R2.fastq'
writing to:
samtools view -bS - | samtools sort - BP_1207.lane7
[M::bwa_idx_load_from_disk] read 0 ALT contigs
converting reads in BP_1207.BPR.150912.HiSeq2500.FCB.lane7.R1.fastq,BP_1207.BPR.150912.HiSeq2500.FCB.lane7.R2.fastq
[M::process] read 634922 sequences (80000172 bp)...
[M::process] 0 single-end sequences; 634922 paired-end sequences
[M::process] read 634922 sequences (80000172 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 101430, 14, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (105, 126, 155)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (5, 255)
[M::mem_pestat] mean and std.dev: (131.80, 37.94)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 305)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (198, 445, 1766)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 4902)
[M::mem_pestat] mean and std.dev: (824.00, 985.05)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 6470)
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation RF
[M::mem_process_seqs] Processed 634922 reads in 953.682 CPU sec, 119.192 real sec
Traceback (most recent call last):
File "/usr/local/bin/bwameth.py", line 5, in <module>
pkg_resources.run_script('bwameth==0.10', 'bwameth.py')
File "build/bdist.macosx-10.9-intel/egg/pkg_resources.py", line 487, in run_script
ns.clear()
File "build/bdist.macosx-10.9-intel/egg/pkg_resources.py", line 1337, in run_script
raise AssertionError(
File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 601, in <module>
main(sys.argv[1:])
File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 586, in main
set_as_failed=args.set_as_failed)
File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 259, in bwa_mem
as_bam(cmd, fa, prefix, calmd, set_as_failed)
File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 299, in as_bam
out.write(str(aln) + '\n')
IOError: [Errno 32] Broken pipe
[fputs] Broken pipe
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 601, in <module>
main(sys.argv[1:])
File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 551, in main
sys.exit(convert_reads(args[1], args[2]))
File "/Library/Python/2.7/site-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 102, in convert_reads
out.write("".join((name, seq, "\n+\n", qual)))
IOError: [Errno 32] Broken pipe
maybe a function to remove the reads for which the bisuflite conversion is failed (multiple methylated CHH)
Hi,
I am running bwameth.py tabulate
on an Ecoli bisulfite treated sample aligned with bismark/bowtie2.
I was getting no methylation calls with the default parameters, so I changed to --context all
and ran a parameter sweep on the --map-q
option. If I run this on the first 1000bp of the Ecoli genome, I get no calls for mapq values ranging from 60-43, but then I get 138 lines in the meth.vcf file from map-q 42 downwards.
Any ideas what would be a good way of running bwameth/BisSNP on Ecoli data like this?
I read a paper that says Ecoli methylation is found in around 1-2 cases every 1000bp for the CmCWGG motif, but I am unsure how to specify that in BisSNP. Is it possible?
Hey Brent,
It's been a while since I used bwa-meth, but I've recently generated a bunch of WGBS data and I was comparing alignment stats from mapping with Bismark vs bwa-meth on our real data (some other mappers too, but I'm unlikely to proceed with those).
For one of the samples Bismark reported an alignment rate of 58.8% in its output log file. My understanding is that Bismark only considers an alignment for downstream mC calling if it is unique and both reads of a pair are aligned.
I aligned the same sample with bwa-meth and the samtools flagstat output indicates that 96.92% are properly paired (both mates mapped to the same chromosome, etc.). This seemed unrealistically high to me. I wasn't sure if samtools flagstat was including multimapped reads so I went ahead and ran samtools view -c -F 0xF00 -f 66. Which should equate to counting the alignments which are NOT supplemental, secondary, QC-failed, or duplicates (MethylDackel uses those same ignore sam flags), and ARE the first mate of a pair, and are aligned in a proper pair. This resulted in an alignment rate of 96.5%. Still suspiciously high.
How would you recommend calculating the alignment rate after mapping with bwa-meth? Or is it really that much better?!
Best,
Jeff
Hello Brent,
I make simulation fastq files and then use bwa-meth to calculate methylation rate. First, i simulate RRBS reads. Second, i trim those RRBS reads using trim_galore. Then i use bwa-meth to calculate the methylation rate. But after running bwameth.py, it says
WARNING: 5355 reads with length < 80
: this program is designed for long reads
Could you be kind enough to help me solve this problem ?
Thanks
Bowtie has --norc
option
If --norc is specified, bowtie will not attempt to align against the reverse-complement reference strand.
I'm not aware of a similar option for BWA, but I think it makes sense as a default strategy for bs-seq alignments, because the reads from the reverse strand don't add any extra information in terms of methylation status.
What do you think?
--Hi,
is Bwa-Meth use a similar algorithm as ERNE-BS5 (https://sourceforge.net/projects/erne/) ?
And is somebody has already try to align and tabulate Bs-seq reads with Bwa-Meth and then use to-mr from MethPipe (https://github.com/smithlabcode/methpipe) ?
thank you --
Hello, the master
version seems to be broken after 1eaa967.
Can you please take a look at bwameth.py:291
?
I got this several error when run BWA-METH.
/haifengc/panfs/bireads/SRR1171540_1_1000000.fastq NA'
writing to:
samtools view -bS - | samtools sort - bwa-meth
[M::bwa_idx_load_from_disk] read 0 ALT contigs
converting reads in /home/rcf-40/haifengc/panfs/bireads/SRR1171540_1_1000000.fastq,NA
WARNING: running bwameth in single-end mode
[M::process] read 666668 sequences (60000120 bp)...
[M::process] read 333332 sequences (29999880 bp)...
[M::mem_process_seqs] Processed 666668 reads in 316.005 CPU sec, 319.250 real sec
Traceback (most recent call last):
File "bwameth.py", line 601, in
main(sys.argv[1:])
File "bwameth.py", line 586, in main
set_as_failed=args.set_as_failed)
File "bwameth.py", line 259, in bwa_mem
as_bam(cmd, fa, prefix, calmd, set_as_failed)
File "bwameth.py", line 299, in as_bam
out.write(str(aln) + '\n')
IOError: [Errno 32] Broken pipe
[fputs] Broken pipe
Errors, raised by samtools view ... | samtools sort ...
aren't reported, due to the buffering of sys.stderr
. In my case the error was:
[samopen] SAM header is present: 25 sequences.
Parse error at line 31: sequence and quality are inconsistent
Instead of outputting the error bwa-meth
printed the following traceback:
Traceback (most recent call last):
File "/home/user/.virtualenvs/env/bin/bwameth.py", line 10, in <module>
execfile(__file__)
File "/home/user/.virtualenvs/env/src/bwameth-head/bwameth.py", line 596, in <module>
main(sys.argv[1:])
File "/home/user/.virtualenvs/env/src/bwameth-head/bwameth.py", line 581, in main
set_as_failed=args.set_as_failed)
File "/home/user/.virtualenvs/env/src/bwameth-head/bwameth.py", line 259, in bwa_mem
as_bam(cmd, fa, prefix, calmd, set_as_failed)
File "/home/user/.virtualenvs/env/src/bwameth-head/bwameth.py", line 296, in as_bam
out.write(str(aln) + '\n')
IOError: [Errno 32] Broken pipe
The traceback makes perfect sense, because after reporting the error samtools
exited and the file descriptor behind out
became stale. However, the original error is much more helpful.
I'm ready to submit a patch for this.
It's currently not possible to configure the memory limit for samtools sort
. Can you please add a command line option?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.