csmiller / emirge Goto Github PK
View Code? Open in Web Editor NEWEMIRGE reconstructs full length ribosomal genes from short read sequencing data.
EMIRGE reconstructs full length ribosomal genes from short read sequencing data.
I have come across an error while attempting to run EMIRGE (listed below) does anyone know the possible cause? Thanks!
Traceback (most recent call last):
File "/work/kgwin1/packages/python/bin/emirge.py", line 1616, in
main()
File "/work/kgwin1/packages/python/bin/emirge.py", line 1609, in main
do_iterations(em, max_iter = options.iterations, save_every = options.save_every)
File "/work/kgwin1/packages/python/bin/emirge.py", line 1348, in do_iterations
em.do_iteration(em.current_bam_filename, em.current_reference_fasta_filename)
File "/work/kgwin1/packages/python/bin/emirge.py", line 432, in do_iteration
self.calc_likelihoods()
File "/work/kgwin1/packages/python/bin/emirge.py", line 978, in calc_likelihoods
self.calc_probN() # (handles initial iteration differently within this method)
File "/work/kgwin1/packages/python/bin/emirge.py", line 1139, in calc_probN
bases = numpy.array(self.fastafile.fetch(fastaname), dtype='c')[zero_indices[0]]
IndexError: index 1399 out of bounds 0<=index<0
Hi,
I've been having some difficulties for a while resuming Emirge runs from a completed iteration.
All my last attempts resulted in the same errors. See below:
If you use EMIRGE in your work, please cite these manuscripts, as appropriate.
Miller CS, Baker BJ, Thomas BC, Singer SW, Banfield JF (2011)
EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data.
Genome biology 12: R44. doi:10.1186/gb-2011-12-5-r44.
Miller CS, Handley KM, Wrighton KC, Frischkorn KR, Thomas BC, Banfield JF (2013)
Short-Read Assembly of Full-Length 16S Amplicons Reveals Bacterial Diversity in Subsurface Sediments.
PloS one 8: e56018. doi:10.1371/journal.pone.0056018.
imported _emirge C functions from: /home/pierre.pericard/anaconda3/envs/py27/lib/python2.7/site-packages/_emirge.so
Command:
/home/pierre.pericard/anaconda3/envs/py27/bin/emirge.py emirge_outdir -1 /workdir/pierre.pericard/paper/16S_rRNA/human_microbiome_project/SRS049896/SRS049896.denovo_duplicates_marked.trimmed.1.fastq -2 /workdir/pierre.pericard/paper/16S_rRNA/human_microbiome_project/SRS049896/SRS049896.denovo_duplicates_marked.trimmed.2.fastq -f /workdir/pierre.pericard/paper/16S_rRNA/ref_db/emirge_default_db/SILVA_128_SSURef_Nr99_tax_silva_trunc.ge1200bp.le2000bp.0.97.fixed.fasta -b /workdir/pierre.pericard/paper/16S_rRNA/ref_db/emirge_default_db/SILVA_128_SSURef_Nr99_tax_silva_trunc.ge1200bp.le2000bp.0.97.fixed -l 101 -i 150 -s 50 -j 1 -p 0.001 -n 100 -r 9 -a 16 --phred33
EMIRGE started at Fri Jan 13 11:31:25 2017
Resuming EMIRGE from iteration 09 at Fri Jan 13 11:31:25 2017 ...
Starting from information in directory:
/workdir/pierre.pericard/paper/16S_rRNA/human_microbiome_project/SRS049896/emirge/raw_default_db_j1_p0.001/emirge_outdir/iter.09
DONE with resume initialization at Fri Jan 13 11:31:25 2017...
Starting iteration 9 at Fri Jan 13 11:31:25 2017...
Reading bam file /workdir/pierre.pericard/paper/16S_rRNA/human_microbiome_project/SRS049896/emirge/raw_default_db_j1_p0.001/emirge_outdir/iter.08/bowtie.iter.08.PE.bam at Fri Jan 13 11:31:25 2017...
DONE Reading bam file /workdir/pierre.pericard/paper/16S_rRNA/human_microbiome_project/SRS049896/emirge/raw_default_db_j1_p0.001/emirge_outdir/iter.08/bowtie.iter.08.PE.bam at Fri Jan 13 11:32:32 2017 [0:01:07.117363]...
Calculating likelihood (13671, 378106) for iteration 9 at Fri Jan 13 11:32:35 2017...
Calculating Pr(N=n) for iteration 9 at Fri Jan 13 11:32:35 2017...
Loading probN for resume case from /workdir/pierre.pericard/paper/16S_rRNA/human_microbiome_project/SRS049896/emirge/raw_default_db_j1_p0.001/emirge_outdir/iter.09/probN.pkl
DONE calculating Pr(N=n) for iteration 9 at Fri Jan 13 11:32:37 2017 [0:00:02.734468]...
DONE Calculating likelihood for iteration 9 at Fri Jan 13 11:34:30 2017 [0:01:55.514443]...
Calculating posteriors for iteration 9 at Fri Jan 13 11:34:30 2017...
/home/pierre.pericard/anaconda3/envs/py27/bin/emirge.py:1180: RuntimeWarning: invalid value encountered in divide
self.posteriors[-1].data = self.posteriors[-1].data / denom[(self.posteriors[-1].col,)] # index out denom with column indices from coo format.
DONE Calculating posteriors for iteration 9 at Fri Jan 13 11:34:34 2017 [3.878 seconds]...
Writing consensus for iteration 9 at Fri Jan 13 11:34:34 2017...
snp_minor_prob_thresh = 0.100
snp_percentage_thresh = 0.001
Traceback (most recent call last):
File "/home/pierre.pericard/anaconda3/envs/py27/bin/emirge.py", line 1697, in
main()
File "/home/pierre.pericard/anaconda3/envs/py27/bin/emirge.py", line 1688, in main
do_iterations(em, max_iter = options.iterations, save_every = options.save_every)
File "/home/pierre.pericard/anaconda3/envs/py27/bin/emirge.py", line 1444, in do_iterations
os.path.join(subdir, "iter.%02d.cons.fasta"%(em.iteration_i)))
File "/home/pierre.pericard/anaconda3/envs/py27/bin/emirge.py", line 499, in do_iteration
self.write_consensus(consensus_filename) # culls and splits
File "/home/pierre.pericard/anaconda3/envs/py27/bin/emirge.py", line 590, in write_consensus
if self.min_depth is not None and self.coverage[seq_i] < self.min_depth: # could also do this only after self.iteration_i > 5 or something
IndexError: list index out of range
Command exited with non-zero status 1
Command being timed: "emirge.py emirge_outdir -1 /workdir/pierre.pericard/paper/16S_rRNA/human_microbiome_project/SRS049896/SRS049896.denovo_duplicates_marked.trimmed.1.fastq -2 /workdir/pierre.pericard/paper/16S_rRNA/human_microbiome_project/SRS049896/SRS049896.denovo_duplicates_marked.trimmed.2.fastq -f /workdir/pierre.pericard/paper/16S_rRNA/ref_db/emirge_default_db/SILVA_128_SSURef_Nr99_tax_silva_trunc.ge1200bp.le2000bp.0.97.fixed.fasta -b /workdir/pierre.pericard/paper/16S_rRNA/ref_db/emirge_default_db/SILVA_128_SSURef_Nr99_tax_silva_trunc.ge1200bp.le2000bp.0.97.fixed -l 101 -i 150 -s 50 -j 1 -p 0.001 -n 100 -r 9 -a 16 --phred33"
User time (seconds): 185.75
System time (seconds): 6.40
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:13.40
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 8311924
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 62
Minor (reclaiming a frame) page faults: 5182494
Voluntary context switches: 9793
Involuntary context switches: 765
Swaps: 0
File system inputs: 798784
File system outputs: 140032
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 1
Rewrites the reads then dies
DONE Rewriting reads with indexes in headers at Thu Apr 18 00:12:57 2013 [0:01:21.308602]...
Number of reads (or read pairs) in input file(s): 3017638
Preallocating reads and quals in memory at Thu Apr 18 00:12:57 2013...
Traceback (most recent call last):
File "/srv/whitlam/bio/apps/12.04/sw//emirge/0.60/bin/emirge_amplicon.py", line 1543, in <module>
main()
File "/srv/whitlam/bio/apps/12.04/sw//emirge/0.60/bin/emirge_amplicon.py", line 1509, in main
rewrite_reads = not options.no_rewrite_reads)
File "/srv/whitlam/bio/apps/12.04/sw//emirge/0.60/bin/emirge_amplicon.py", line 221, in __init__
_emirge.populate_reads_arrays(self)
File "_emirge_amplicon.pyx", line 525, in _emirge_amplicon.populate_reads_arrays (_emirge_amplicon.c:6577)
IndexError: Out of bounds on buffer access (axis 2)
Switch to importing setuptools instead of disutils.core and add
install_requires=[
"BioPython",
"Cython",
"pysam",
"scipy"
],
so the setup area. The installation will automatically ensure that these dependencies are installed on the machine as part of the install.
Many of EMIRGE's largest data structures could be converted to sparse matrices, which should reduce memory usage substantially.
HI,
We have a project that going to reconstruct ribosome gene from meta-rna dataset. I know emirge_amplicon can handle up a few millions of reads while emirge is used for normal case, but there is no quite detailed description in your previous paper. I have some quesions about these two versions:
Thanks :)
Best regards
Recently received an error when running the emirge_rename_fasta.py script.
$ emirge_rename_fasta.py iter.40 > iter.40.cons.rn.fasta
Traceback (most recent call last):
File "/home/micro/anaconda2/bin/emirge_rename_fasta.py", line 164, in
main()
File "/home/micro/anaconda2/bin/emirge_rename_fasta.py", line 159, in main
rename(wd, options.prob_min, options.record_prefix, options.no_N, options.no_trim_N)
File "/home/micro/anaconda2/bin/emirge_rename_fasta.py", line 123, in rename
for prior, record in sorted(sorted_records, reverse=True):
File "/home/micro/anaconda2/lib/python2.7/site-packages/Bio/SeqRecord.py", line 720, in eq
raise NotImplementedError(_NO_SEQRECORD_COMPARISON)
NotImplementedError: SeqRecord comparison is deliberately not implemented. Explicitly compare the attributes of interest.
Hi,
Is EMIRGE still maintained and is it being tested with newer versions of usearch, bowtie and samtools as well as the required python packages? I'm trying to install it centrally on our cluster but hitting segmentation faults:
/tmp/1453995841.5729613: line 8: 19013 Segmentation fault emirge.py /lustre/scratch108/pathogen/maa/emirge/ -1 18512_8#4_1.fastq.gz -f ../silva/filtered_SILVA_123_SSURef_Nr99_tax_silva.fasta -b ../silva/filtered_SILVA_123_SSURef_Nr99_tax_silva.fasta -l 75 --phred33
Thanks,
Martin
The link to download the candidate db file doesnt work anymore.
https://googledrive.com/host/0B7hz7JVEE15dbUtkRmxKVlhtd1U/SSURef_111_candidate_db.fasta.gz
Could you update it, please ?
I'd like to use pre-trimmed FASTA files with EMIRGE. I believe this is supported in Bowtie. Is it possible to add this feature?
Apparently, running emirge_makedb.py
from behind a proxy-server can be problematic.
This could be worked-around most easily if users would have the option to download the databases from SILVA manually and provide them to emirge_makedb.py
per argument.
Hey there,
I'm using EMIRGE as a preliminary contamination screen for single-cell genomes. Unfortunately, EMIRGE is exhausting the RAM on our compute server (264Gb RAM, running RHEL). I have the 64-bit version of usearch (v8.1) so it should be able to utilize all available memory.
Here is the output:
[fai_load] build FASTA index.
[fai_load] build FASTA index.
usearch command was:
usearch -usearch_global /home/cmorganlang/Hallam_projects/OMZs/ProcessedData/EMIRGE_16S_prediction/minDepth10_outputs/IX0866_D1CP6ACXX_8_AACCCC/iter.05/iter.05.cons.fasta.tmp.fasta --db /home/cmorganlang/Hallam_projects/OMZs/ProcessedData/EMIRGE_16S_prediction/minDepth10_outputs/IX0866_D1CP6ACXX_8_AACCCC/iter.05/iter.05.cons.fasta.tmp.fasta --id 0.800 -quicksort -query_cov 0.5 -target_cov 0.5 -strand plus --userout /home/cmorganlang/Hallam_projects/OMZs/ProcessedData/EMIRGE_16S_prediction/minDepth10_outputs/IX0866_D1CP6ACXX_8_AACCCC/iter.05/iter.05.cons.fasta.tmp.fasta.us.txt --userfields query+target+id+caln+qlo+qhi+tlo+thi -threads 4 --maxaccepts 8 --maxrejects 256
Traceback (most recent call last):
File "/usr/bin/emirge.py", line 4, in <module>
__import__('pkg_resources').run_script('EMIRGE==0.60.3', 'emirge.py')
File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 726, in run_script
File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 1491, in run_script
File "/usr/lib64/python2.7/site-packages/EMIRGE-0.60.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/emirge.py", line 1697, in <module>
File "/usr/lib64/python2.7/site-packages/EMIRGE-0.60.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/emirge.py", line 1688, in main
File "/usr/lib64/python2.7/site-packages/EMIRGE-0.60.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/emirge.py", line 1444, in do_iterations
File "/usr/lib64/python2.7/site-packages/EMIRGE-0.60.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/emirge.py", line 500, in do_iteration
File "/usr/lib64/python2.7/site-packages/EMIRGE-0.60.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/emirge.py", line 807, in cluster_sequences
File "/usr/lib64/python2.7/site-packages/EMIRGE-0.60.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/emirge.py", line 871, in cluster_sequences2
File "/usr/lib64/python2.7/subprocess.py", line 537, in check_call
retcode = call(*popenargs, **kwargs)
File "/usr/lib64/python2.7/subprocess.py", line 524, in call
return Popen(*popenargs, **kwargs).wait()
File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1224, in _execute_child
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
There are ~50 million reads (HiSeq) per SAG. iter.05.cons.fasta contained 2123 sequences.
Let me know if you need more information. Thanks!
If EMIRGE cannot reconstruct any sequences due to low read coverage, it currently gets to the clustering stage where usearch throws a segfault and emirge exits without an informative error message.
Hi,
I installed EMIRGE following the instrcution. The orginal 'emirge.py' seems work well, but the new version 'emrige_amplicon.py' that for large dataset got errors:
_$ ./emirge_amplicon.py
Traceback (most recent call last):
File "./emirge_amplicon.py", line 61, in
import _emirge_amplicon as _emirge
File "build/bdist.linux-x86_64/egg/_emirge_amplicon.py", line 7, in
File "build/bdist.linux-x86_64/egg/_emirge_amplicon.py", line 6, in bootstrap
ImportError: /Home/ii/yaxinx/.python-eggs/EMIRGE-0.60.2a6-py2.7-linux-x86_64.egg-tmp/emirge_amplicon.so: undefined symbol: gzread
Anyone know how to fix it? Thanks very much.
Hi there,
I'm running Emirge on a dataset that keeps getting to iteration 12 and then failing (see below). I saw someone posted about this error previously, and that it had to do with read length, but the comment below said this should not longer be an issue.
Any ideas?
I've run Emirge successfully on other datasets recently, so I'm not sure why this one is a problem.
Thanks!
-Roxanne-
DONE Reading bam file /n/girguis_lab/Users/rbeinart/Emirge/BC_metatranscriptome_Emirge_output/iter.12/bowtie.iter.12.PE.bam at Tue Aug 23 16:21:28
2016 [1:11:31.149570]...
Calculating likelihood (25714, 8320954) for iteration 13 at Tue Aug 23 16:22:47 2016...
Calculating Pr(N=n) for iteration 13 at Tue Aug 23 16:22:47 2016...
Loading probN for resume case from /n/girguis_lab/Users/rbeinart/Emirge/
BC_metatranscriptome_Emirge_output/iter.13/probN.pkl
DONE calculating Pr(N=n) for iteration 13 at Tue Aug 23 16:22:54 2016 [0:00:06.533856]...
Traceback (most recent call last):
File "/n/girguis_lab/Users/rbeinart/Apps/EMIRGE-master/emirge.py", line 1697, in
main()
File "/n/girguis_lab/Users/rbeinart/Apps/EMIRGE-master/emirge.py", line 1688, in main
do_iterations(em, max_iter = options.iterations, save_every = options.save_every)
File "/n/girguis_lab/Users/rbeinart/Apps/EMIRGE-master/emirge.py", line 1444, in do_iterations
os.path.join(subdir, "iter.%02d.cons.fasta"%(em.iteration_i)))
File "/n/girguis_lab/Users/rbeinart/Apps/EMIRGE-master/emirge.py", line 491, in do_iteration
self.calc_likelihoods()
File "/n/girguis_lab/Users/rbeinart/Apps/EMIRGE-master/emirge.py", line 1141, in calc_likelihoods
lik_data)
File "_emirge.pyx", line 132, in _emirge._calc_likelihood (_emirge.c:2661)
s += ( qual2one_minus_p[qualints[i]] * probN_single[pos + i, j] ) # lookup table
IndexError: Out of bounds on buffer access (axis 1)
Hi there,
I'm new to using EMIRGE, and not sure if this is still supported, but running emirge_makedb.py
I ran into the issue that it was by default expecting the SILVA file to be ...SSURef_Nr99_tax...
, whereas in the current version it is ...SSURef_NR99_tax...
.
I modified the code to simply search for NR
, and it appears to have run as expected, but perhaps it could be good to add a regex in there to search for either NR
or Nr
.
Cheers,
Mike.
EMIRGE currently uses uclust v3, which is outdated.
Need to move to usearch v4. Should just involve changing the command line subprocess call and testing.
Hello.
I know emirge
has only been tested on versions 0.12* or so of bowtie
. But, we had no issues running it up to version 1.1.2 of bowtie
. A recent upgrade to bowtie 1.2
, however, broke emirge
.
The command generated at do_initial_mapping
does not seem to work. We get an error saying the file does not appear to be a FASTQ file.
Running latest build of emirge (0.61.0), python 2.7.13, and latest versions of numpy, scipy, biopython, and pysam.
Right now the only thing using multiple threads/CPUs is the mapping stage.
Several stages could probably be multithreaded:
Hello, I am running emirge on a PE dataset with about 3 million paired end reads. The command I used is:
time emirge.py ../011-Emirge -1 BEI_16SCap_1P.fastq.gz -f ~/Database/EMIRGE/SILVA_128_SSURef_Nr99_tax_silva_trunc.ge1200bp.le2000bp.0.97.fixed.fasta -b ~/Database/EMIRGE/SILVA_128_SSURef_Nr99_tax_silva_trunc.ge1200bp.le2000bp.0.97.fixed -l 151 -2 BEI_16SCap_2P.fastq -i 375 -s 60 -a 2 --phred33 2>&1 | tee emirge_bei_20170223_2.log
I am running it in a ubuntu virtual machine with 8GB and 4 cores.
The program exit without an error message about 25 minutes after initiating the iteration 1, still at the step [fai_load] build FASTA index. The complete output is attached. The iter.1 folder is empty so I cannot resume the iteration. I used to get emirge working on the clusters my university provided last May.
Thank you very much!
emirge_bei_20170223_2.txt
Requires adjustment of internal mapping (easy) and understanding any differences with reported quality values (maybe harder).
Your dependencies state bowtie v. 0.12.7 or v.0.12.8 as dependencies. However bowtie currently is already at version 1.2.0.
Is there a reason for emirge to require/prefer these old versions of bowtie? Or would the new versions of bowtie work just as well with emirge?
Upon running emirge_amplicon.py assembly with the following command:
emirge_amplicon.py assembly --phred33 --iterations 120 --fasta_db /srv/databases/emirge/SSURef_111_candidate_db.fasta --bowtie_db /srv/databases/emirge/SSU_111_candidate_db --mapping TL04arch.mapped.sorted.bam --max_read_length 126 --insert_mean 100 --insert_stddev 10 -1 ...TL02arch-Overland-DNA2.forward.decontam.adapt_trim.qual_trim.fastq.gz -2 ...TL02arch-Overland-DNA2.reverse.decontam.adapt_trim.qual_trim.fastq
the program crashes with:
EMIRGE started at Fri Feb 19 17:02:01 2016
Rewriting reads with indices in headers at Fri Feb 19 17:02:01 2016...
DONE Rewriting reads with indexes in headers at Fri Feb 19 17:03:03 2016 [0:01:02.031644]...
Number of reads (or read pairs) in input file(s): 5806520
Preallocating reads and quals in memory at Fri Feb 19 17:03:03 2016...
DONE Preallocating reads and quals in memory at Fri Feb 19 17:03:47 2016 [0:00:44.743259]...
Beginning initialization at Fri Feb 19 17:03:47 2016...
Reading bam file ...TL04arch.mapped.sorted.bam at Fri Feb 19 17:03:47 2016...
Traceback (most recent call last):
File "/bin/emirge_amplicon.py", line 1576, in
main()
File "/bin/emirge_amplicon.py", line 1565, in main
em.initialize_EM(options.mapping, options.fasta_db, randomize_priors = options.randomize_init_priors)
File "/bin/emirge_amplicon.py", line 384, in initialize_EM
self.read_bam(bam_filename, reference_fasta_filename)
File "/bin/emirge_amplicon.py", line 334, in read_bam
_emirge.process_bamfile(self, BOWTIE_ASCII_OFFSET)
File "_emirge_amplicon.pyx", line 553, in _emirge_amplicon.process_bamfile (_emirge_amplicon.c:7241)
AttributeError: 'EM' object has no attribute 'n_alignments'
I've been investigating the error and this is what seems to be happening:
emirge_amplicon.py
line 1576 calls main()
line 1533 initializes the em class, there is not a self.n_alignments variable initialized with the class
line 1565 calls em.initialize_EM
line 384 in initialize_EM(args) calls self.read_bam(bam_filename, reference_fasta_filename)
line 334 in read_bam calls _emirge.process_bamfile(self, BOWTIE_ASCII_OFFSET)
line 553 in the aforementioned function calls bamfile_data = np.empty((em.n_alignments, 6), dtype=np.uint32)
n_alignments appears to be generated by get_n_alignments_from_bowtie on line 1086 which is called on line 1058 by do_mapping_bowtie which is called by do_mapping which is called by do_iteration which is called by do_iterations which is called on line 1568 near the end of main().
In summary, EM does not initialize any n_alignments variable but immediately tries to process the BAM file which attempts to use the n_alignments variable as part of a numpy array. Since n_alignments doesn't exist, the numpy array cannot initialize and the program crashes, i.e., the function ultimately requiring n_alignments is called before the function generating n_alignments. It seems that if no BAM file is provided to EMIRGE then it will derive n_alignments from bowtie's stderr but if you provide a BAM file, there's no way to provide this info and the program crashes.
Hi,
I encountered an unexpected crash in the 17th iteration of an Emirge run.
This occurred only with one dataset and I reproduced the bug on 2 different computers.
I have no message in STDOUT telling me where is coming the pb from...
I joined the run log.
Can you help me please ?
Thanks in advance
python /home/arkg/EMIRGE/emirge.py emirge-output/ -1 ../../JdF_1362A_J2.573-2/reads/SSU_reads/JdF1362AcombinedSSU.fastq -f SILVA_132_SSURef_Nr99_tax_silva_trunc.ge1200bp.le2000bp.0.97.fixed.fasta -b SILVA_132_SSURef_Nr99_tax_silva_trunc.ge1200bp.le2000bp.0.97.fixed.fasta -l 150 -a 16 -m SSUreadsA.bam.sorted
If you use EMIRGE in your work, please cite these manuscripts, as appropriate.
Miller CS, Baker BJ, Thomas BC, Singer SW, Banfield JF (2011)
EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data.
Genome biology 12: R44. doi:10.1186/gb-2011-12-5-r44.
Miller CS, Handley KM, Wrighton KC, Frischkorn KR, Thomas BC, Banfield JF (2013)
Short-Read Assembly of Full-Length 16S Amplicons Reveals Bacterial Diversity in Subsurface Sediments.
PloS one 8: e56018. doi:10.1371/journal.pone.0056018.
imported _emirge C functions from: /home/arkg/.cache/Python-Eggs/EMIRGE-0.61.0-py2.7-linux-x86_64.egg-tmp/_emirge.so
Command:
/home/arkg/EMIRGE/emirge.py emirge-output/ -1 ../../JdF_1362A_J2.573-2/reads/SSU_reads/JdF1362AcombinedSSU.fastq -f SILVA_132_SSURef_Nr99_tax_silva_trunc.ge1200bp.le2000bp.0.97.fixed.fasta -b SILVA_132_SSURef_Nr99_tax_silva_trunc.ge1200bp.le2000bp.0.97.fixed.fasta -l 150 -a 16 -m SSUreadsA.bam.sorted
EMIRGE started at Sat Jan 27 17:29:17 2018
Beginning initialization at Sat Jan 27 17:29:17 2018...
Reading bam file /mnt/maximus/data1/chan/arkadiy/Juan_de_Fuca/JdF_1362AB/emirge_SSU/SSUreadsA.bam.sorted at Sat Jan 27 17:29:17 2018...
DONE Reading bam file /mnt/maximus/data1/chan/arkadiy/Juan_de_Fuca/JdF_1362AB/emirge_SSU/SSUreadsA.bam.sorted at Sat Jan 27 17:29:22 2018 [0:00:04.452040]...
DONE with initialization at Sat Jan 27 17:29:22 2018...
Starting iteration 0 at Sat Jan 27 17:29:22 2018...
Reading bam file /mnt/maximus/data1/chan/arkadiy/Juan_de_Fuca/JdF_1362AB/emirge_SSU/SSUreadsA.bam.sorted at Sat Jan 27 17:29:22 2018...
DONE Reading bam file /mnt/maximus/data1/chan/arkadiy/Juan_de_Fuca/JdF_1362AB/emirge_SSU/SSUreadsA.bam.sorted at Sat Jan 27 17:29:25 2018 [0:00:03.364967]...
Calculating likelihood (1041, 42776) for iteration 0 at Sat Jan 27 17:29:25 2018...
Calculating Pr(N=n) for iteration 0 at Sat Jan 27 17:29:25 2018...
DONE calculating Pr(N=n) for iteration 0 at Sat Jan 27 17:29:26 2018 [0:00:01.241517]...
Traceback (most recent call last):
File "/home/arkg/EMIRGE/emirge.py", line 1697, in <module>
main()
File "/home/arkg/EMIRGE/emirge.py", line 1688, in main
do_iterations(em, max_iter = options.iterations, save_every = options.save_every)
File "/home/arkg/EMIRGE/emirge.py", line 1439, in do_iterations
em.do_iteration(em.current_bam_filename, em.current_reference_fasta_filename)
File "/home/arkg/EMIRGE/emirge.py", line 491, in do_iteration
self.calc_likelihoods()
File "/home/arkg/EMIRGE/emirge.py", line 1141, in calc_likelihoods
lik_data)
File "_emirge.pyx", line 130, in _emirge._calc_likelihood
if numeric_bases[i] == j: # this is called base, set 1-P
IndexError: Out of bounds on buffer access (axis 0)
I am running on SSU reads pulled out from the raw quality reads using sortmerna. These include read pairs combined and uncombined with flash
I heard there might be an EMIRGE "version 2" coming?
EMIRGE is not always easy to use. Two suggestions to improve this:
Right now, EMIRGE uses bowtie for its read mapping, which does not handle indels. Consequently, EMIRGE is not coded to handle indels either. To be able to get more accurate reconstructions, as well as use homopolymer-rich 454/Roche sequencing data, we need to incorporate indels into the mapping and statistical model.
Erin Nuccio pointed out that it would be nice to be able to recover the per-base probabilities calculated by EMIRGE for each reconstructed sequence. These are stored in the probN numpy array, if anyone wants to take a crack. Erin suggested converting to a fastq format, where quality scores represent the prob of the reported base. I also have code for plotting all four base probabilities at each position I should clean up and post as a stand-alone script.
Hello,
I would like to try EMIRGE on metatranscriptome reads. Indeed I have already sorted the putative SSU reads out with cmsearch. Would you suggest using the normal EMIRGE version or the amplicon variant?
Thanks,
Domenico
Hi,
I used emirge_amplicon.py v0.60 with single reads (non paired-end):
emirge_amplicon.py emirge_dir -1 reads.fastq --phred33 --fasta_db SSU_candidate_db.fna --bowtie_db SSU_candidate_db_btindex --max_read_length 302 --processors 10
And got the error:
emirge_amplicon.py: error: --insert_mean is required, but is not specified (try --help)
In the end, I had to add both a bogus insert_means and insert_stddev to make EMIRGE happy:
emirge_amplicon.py emirge_dir -1 reads.fastq --phred33 --fasta_db SSU_candidate_db.fna --bowtie_db SSU_candidate_db_btindex --max_read_length 302 --processors 10 --insert_mean 100 --insert_stddev 1
When not using paired-end data (the -2) option, it would be nice of EMIRGE did not request insert parameters.
Other than that, it seems like EMIRGE worked fine with my non-paired-end data. Thanks,
Florent
Time loading forward index: 00:00:00
I am attempting to run EMIRGE and receive the following error (listed below). I am operating USEARCH v 5.2.32 (64-bit to increase the RAM). I am running EMIRGE in an empty directory. Does anyone know what might be happening? Thanks.
Time loading mirror index: 00:00:00
[samopen] SAM header is present: 21 sequences.
Seeded quality full-index search: 00:12:47
Reported 9956 alignments to 1 output stream(s)
Time searching: 00:12:47
Overall time: 00:12:47
Beginning initialization at Wed Apr 10 10:31:21 2013...
Reading bam file /home/localuser/Desktop/Emirge_runs/emirge_output/initial_mapping/initial_bowtie_mapping.PE.bam at Wed Apr 10 10:31:21 2013...
Segmentation fault
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.