bioruby / bioruby Goto Github PK
View Code? Open in Web Editor NEWbioruby
Home Page: http://bioruby.open-bio.org
License: Other
bioruby
Home Page: http://bioruby.open-bio.org
License: Other
require 'bio'
ff = Bio::FlatFile.new(Bio::FastaFormat, 'NC_005213.ffn')
ff.each_entry do |f|
puts "definition : " + f.definition
puts "nalen : " + f.nalen.to_s
puts "naseq : " + f.naseq
end
The above code fails with:
NoMethodError: private method `getc' called for "NC_005213.ffn":String
The official tutorial tells you to use the above code, and as it fails,
the tutorial should be updated:
http://thebird.nl/bioruby/Tutorial.rd.html
This is the part in the Tutorial where it then fails:
"For example, in turn, reading FASTA format files:"
Under Ruby 1.9.2 and later, warnings about circular requires are given if $VERBOSE is set to true:
$ ruby -w -e 'require "bio"; Bio::Sequence::NA.new("atgcatgcaaaa")'
/Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence/compat.rb:15: warning: loading in progress, circular require considered harmful - /Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence.rb
from -e:1:in `<main>'
from /Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence.rb:15:in `<top (required)>'
from /Users/agrimm/.rvm/rubies/ruby-head/lib/ruby/site_ruby/2.0.0/rubygems/custom_require.rb:36:in `require'
from /Users/agrimm/.rvm/rubies/ruby-head/lib/ruby/site_ruby/2.0.0/rubygems/custom_require.rb:36:in `require'
from /Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence/compat.rb:13:in `<top (required)>'
from /Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence/compat.rb:15:in `<module:Bio>'
$ ruby --version
ruby 2.0.0dev (2012-05-05 trunk 35543) [x86_64-darwin10.8.0]
This also occurs in the current version of bioruby in the master branch.
Submitted by Yannick Wurm via Rubyforge on 2009-08-09
Hello,
to parse a blast file, only the 3rd method I tried actually worked. For newcomers it can be quite disappointing
This worked:
reportsArray = Bio::FlatFile.foreach(path) do |report|
report.each_iteration do |iter|
iter.each do |hit| # actually there is only a single hit here #iteration
print "hit . "
bestHsp = hit.hsps[0]
puts bestHsp.query_frame
end
end
end
But trying to get the same results using the following approaches always led to crashes:
Bio::FlatFile.open(Bio::Blast::Default::Report,path) do |ff|
ff.each do |report|
...
or:
Bio::Blast.reports(path) do |report|
...
Partially, it looks like ruby is going into the wrong parser. Eg for the latter:
/sw/lib/ruby/site_ruby/1.8/bio/appl/blast.rb:402: warning: useless use of :: in void context
/sw/lib/ruby/site_ruby/1.8/bio/appl/blast.rb:265: warning: method redefined; discarding old server=
/sw/lib/ruby/site_ruby/1.8/bio/appl/blast/format8.rb:70:in tab_parse_hsp': undefined method
strip' for nil:NilClass
(NoMethodError)
My blast output here is -m 0. But I reports weren't being parser properly with -m7 or -m8 either.
Is bioruby trying to support too many blast output formats? It could be helpful to document in the blast rdoc which
blast versions and output parameters ruby was tested on.
(my blast output here was generated with -p tblastx -v 1 -b 1 -e 1.0e-4 -m 0 -V T in blast-2.2.15 (but also tried 2.2.10
and 2.2.18)).
ruby 1.8.6 (2007-03-13 patchlevel 0) [powerpc-darwin]
bioruby 1.3.0
On PacBio produced fastq file, the auto-detection failed for the code shown below.
require 'bio'
ff = Bio::FlatFile.new(nil, ARGF)
while fe = ff.next_entry
puts "#{fe.entry_id}\t#{fe.seq.length}"
end
Because fastq file may have more than one line of nucleotides and there is currently no
format that is identical to the second line but have something different after the second id line.
the regular expression in autodetection.rb
fastq = RuleRegexp[ 'Bio::Fastq',
/^\@.+(?:\r|\r?\n)(?:[^\@\+].*(?:\r|\r?\n))+\+.*(?:\r|\r?\n).+(?:\r|\r?\n)/ ],
might be shortened to
fastq = RuleRegexp[ 'Bio::Fastq',
/^\@.+(?:\r|\r?\n)(?:[^\@\+].*(?:\r|\r?\n))+/ ],
Hi!
I've noticed that the tests test_randomize_with_hash_equiprobability and test_randomize_equiprobability from test/unit/bio/sequence/test_common.rb are sometimes failing. Running the tests about 460 times, I got 11 failures. I guess it is normal since they involve probabilistic sampling and statistical tests. However, it is a bit disorienting to have tests failing randomly, if the code seems ok.
On Debian, the test suite is run during the build of the package, and a test failure means that the package is not built. We will thus have to disable these tests. Could you provide a mecanism to easily exclude these tests based on randomness out the test suite, by e.g. moving these tests to a particular file, so that one can be sure the tests will pass?
Thanks a lot!
Doing
require "bio/sequence/compat"
will cause a "circular require considered harmful" warning when warnings are on for recent versions of Ruby.
The following will reproduce the warning for the git repo, if you're in the (git root)/lib directory:
$ ruby --disable-gems -w
$: << "."
require "bio/sequence/compat"
[snip]/sandbox/bioruby/lib/bio/sequence.rb:77: warning: loading in progress, circular require considered harmful - [snip]/sandbox/bioruby/lib/bio/sequence/compat.rb
from -:2:in `<main>'
from -:2:in `require'
from [snip]/sandbox/bioruby/lib/bio/sequence/compat.rb:10:in `<top (required)>'
from [snip]/sandbox/bioruby/lib/bio/sequence/compat.rb:12:in `<module:Bio>'
from [snip]/sandbox/bioruby/lib/bio/sequence/compat.rb:12:in `require'
from [snip]/sandbox/bioruby/lib/bio/sequence.rb:13:in `<top (required)>'
from [snip]/sandbox/bioruby/lib/bio/sequence.rb:62:in `<module:Bio>'
from [snip]/sandbox/bioruby/lib/bio/sequence.rb:77:in `<class:Sequence>'
from [snip]/sandbox/bioruby/lib/bio/sequence.rb:77:in `require'
$ ruby --version
ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-linux]
If I'm not supposed to be requiring only part of bioruby, let me know.
I'm currently doing this so that I can keep track of what parts of bioruby I'm using in which parts of my program.
Hi,
I think I found a bug in Bio::FastaFormat:
query() calls factory.query(@entry) but @entry is only set upon calling entry() so @entry will be nil if entry() is not called before calling query().
As a result, query() will return no hits because the search is conducted with an empty sequence.
I have created a gist to illustrate the issue:
https://gist.github.com/985332
In my case, the resulting output is:
0
23
[Finished]
Hi,
I just noticed that GenomeNet path variable @ exec_genomenet, appl/blast/genomenet.rb:161 is outdated. The new path is should be /tools-bin/blast.
Cheers,
Joao
The UniProtKB data format changes in each of its release. Please read recent changes of UniProtKB written in http://www.uniprot.org/docs/sp_news.htm and update Bio::UniProtKB.
Submitted by Yannick Wurm via Rubyforge on 2008-05-21
NCBI's blastall output format changed once again.
Using reportsArray = Bio::FlatFile.foreach(blastReportPath) do |report|
I can parse blastall-2.2.18's output correctly only if -m 7 -V (xml format; use legacy engine) or if nothing (new engine,
"default text output") is specified. Using -m 7, only a single query/hit is found (and it may be incorrect).
(this is dangerous, since no error message is displayed).
It's due to the fact that "old" blastall output when blasting a multi-entry fasta file against a database
was equal to the sum of several single-entry outputs. (ie the BLAST headers were output once for each query sequence
in the input fasta file). "new" blastall output considers each query sequence as another "iteration"
of blast... the Blast headers are listed only once.
I've attached example output.
I am aware that bioruby is an open-source community project, but the frequency at which bugs like this are encountered
make it very difficult to justify using bioruby in a production environment....
Kind regards,
Yannick Wurm - http://yannick.poulet.org
Submitted by Naohisa Goto via Rubyforge on 2008-08-31
Three failures in test/functional/bio/io/test_ensembl.rb, during running test/runner.rb.
BioRuby version: git commit ID: e86f8d7
% ruby -v
ruby 1.8.5 (2006-08-25) [i486-linux]
% uname -a
Linux xxx 2.6.18-6-686 #1 SMP Fri Jun 6 22:22:11 UTC 2008 i686 GNU/Linux
Failure:
test_gff_exportview(Bio::FuncTestEnsemblHuman) [./test/functional/bio/io/test_ensembl.rb:95]:
<"4\tEnsembl\tGene\t1148366\t1151952\t.\t+\t1\tgene_id=ENSG00000206158; transcript_id=ENST00000382964;
exon_id=ENSE00001494097; gene_type=KNOWN_protein_coding\n"> expected but was
<"">.
Failure:
test_gff_exportview_with_named_args(Bio::FuncTestEnsemblHuman) [./test/functional/bio/io/test_ensembl.rb:121]:
<"4\tEnsembl\tGene\t1148366\t1151952\t.\t+\t1\tgene_id=ENSG00000206158; transcript_id=ENST00000382964;
exon_id=ENSE00001494097; gene_type=KNOWN_protein_coding\n"> expected but was
<"">.
Failure:
test_tab_exportview_with_named_args(Bio::FuncTestEnsemblHuman) [./test/functional/bio/io/test_ensembl.rb:180]:
<"seqname\tsource\tfeature\tstart\tend\tscore\tstrand\tframe\tgene_id\ttranscript_id\texon_id\tgene_type\n4\tEns
embl\tGene\t1148366\t1151952\t.\t+\t1\tENSG00000206158\tENST00000382964\tENSE00001494097\tKNOWN_protein_coding\n">
expected but was
<"seqname\tsource\tfeature\tstart\tend\tscore\tstrand\tframe\tgene_id\ttranscript_id\texon_id\tgene_type\n"
.
Submitted by Jan Aerts on Rubyforge site on 2008-02-14
When trying to format the features from a Bio::Sequence (using Bio::Sequence#format_features), the output is not what
it should be. Using the following parameters, part of the expected output for AJ224122 should look like this:
FT source 1..3827
FT /organism="Arabidopsis thaliana"
FT /chromosome="3"
FT /cultivar="Wassilewskija"
FT /mol_type="genomic DNA"
FT /db_xref="taxon:3702"
FT mRNA join(1726..1863,2548..3052,3137..3827)
FT /gene="DAG1"
FT /product="DNA-binding protein"
FT /function="transcription factor"
FT /experiment="experimental evidence, no additional details
FT recorded"
However, the observed output is:
FT source 1..3827FT /organism="Arabidopsis thaliana"FT mRNA
join(1726..1863,2548..3052,3137..3827)FT /gene="DAG1"FT CDS
join(1840..1863,2548..3052,3137..3498)FT /gene="DAG1"FT exon 1726..1863FT
/gene="DAG1"FT intron 1864..2547FT /gene="DAG1"FT exon
2548..3052FT /gene="DAG1"FT intron 3053..3136FT
/gene="DAG1"FT exon 3137..3495FT /gene="DAG1"
For pulls from pubmed without a valid URL, converting to endnote (and possibly other formats) will fail. The code at fault is line 145 of lib/bio/reference.rb
@url = hash['url']
should be
@url = hash['url'] || ''
Warren
Submitted by Rodrigo Jardim via Rubyforge on 2010-10-14
There are some errors in restncbi.rb in method esearch. The value to step is too much. The NCBI rest just retrive 100
records per time. The loop with 0.step is wrong too. I already build a new code, may I send you?
Thanks
The tutorial page says to look at the new tutorial at this url
https://raw.github.com/bioruby/bioruby/master/doc/Tutorial.rd.html
This displays as raw text not as html unfortunately :(
This is test. Please ignore.
fsfasd
Hiya,
I have the feeling that http://www.bioruby.org/rdoc/classes/Bio/FlatFile.html is unable to deal with fasta quality files. (It may be a good idea to add support for reading those as well as fastq files that are commonplace when people are using ultra-high-throughput sequencing)
(perl has http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SeqIO/qual.pm )
cheers,
yannick
Hi! We are happy to see BioRuby on travis-ci.org and have a little favor to ask for. One of your forks generates 80 or even 100+ runs per build. This is a little but unfair to the rest of travis-ci.org users with Ruby projects because it takes well over an hour to build 100+ rows, for every single push.
I submitted a pull request to reduce the matrix but it was ignored so far. The fork maintainer seems to be a BioRuby org member. If you know how to get in touch with him (her?), please merge that pull request.
Lots of Ruby developers who use travis ci will be very thankful to you.
Thank you. On behalf of the travis-ci.org maintainers team,
MK.
require 'bio'
entry = Bio::Fetch.query('hal', 'VNG1467G')
OpenURI::HTTPError: 404 Not Found
from /usr/lib/ruby/1.8/open-uri.rb:277:in open_http' from /usr/lib/ruby/1.8/open-uri.rb:616:in
buffer_open'
from /usr/lib/ruby/1.8/open-uri.rb:164:in open_loop' from /usr/lib/ruby/1.8/open-uri.rb:162:in
catch'
from /usr/lib/ruby/1.8/open-uri.rb:162:in open_loop' from /usr/lib/ruby/1.8/open-uri.rb:132:in
open_uri'
from /usr/lib/ruby/site_ruby/1.8/bio/command.rb:625:in read_uri' from /usr/lib/ruby/site_ruby/1.8/bio/io/fetch.rb:183:in
_get'
from /usr/lib/ruby/site_ruby/1.8/bio/io/fetch.rb:111:in fetch' from /usr/lib/ruby/site_ruby/1.8/bio/io/fetch.rb:128:in
query'
from (irb):12
Hmm. Not sure where the error is. But it would be nice if OpenURI::HTTPError: 404 Not Found
errors could feedback the URL to the user, so that he can easily check manually.
Right now I have no idea what is going on.
i'd like to update the included REBASE data. does anyone have an objection to this?
currently the source has this page stating the terms:
http://rebase.neb.com/rebase/rebcit.html
Those seeking to distribute REBASE files with their software packages are welcome to do so, providing it is clear to your users that they are not being charged for the REBASE data. It should be transparent that REBASE is a free and independent resource, with the following bibliographical reference:Roberts, R.J., Vincze, T., Posfai, J., Macelis, D. (2010) REBASE--a database for DNA restriction and modification: enzymes, genes and genomes. Nucl. Acids Res. 38: D234-D236.
could i add that to the LICENSE
file?
NCBI has changed their outputformats yet again :(
It seems that they are unparseable again. Perhaps bioruby should only support xml?
all the best,
yannick
software:
http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
paper:
http://dx.doi.org/10.1186/1471-2105-10-421
FWIW: here's an example output (default format) with blastplus 2.2.24
http://fourmidable.unil.ch/temp/cleanedESTs.Bx.SI223prot.zip
Because jruby is not recognised as being unable to support fork(), it (using master as of March 19 2010) produces the following error:
NotImplementedError: popen("-") is unimplemented
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/command.rb:245:in `call_command_fork'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/command.rb:153:in `call_command'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/appl/clustalw.rb:209:in `exec_local'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/appl/clustalw.rb:177:in `query_by_filename'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/appl/clustalw.rb:151:in `query_string'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/appl/clustalw.rb:127:in `query_align'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/appl/clustalw.rb:110:in `query'
When "|java" is added to the regular expression on line 150 in lib/bio/command.rb for non-fork supporting platforms, the error goes away. A similar error occurs with "Ruby Installer" for windows, which has an unrecognised platform of "i386-mingw32".
Using Ruby 1.9.2 and bio (HEAD):
seq = Bio::Sequence::AA.new('MFRTKRSALVRRLWRSRAPGGEDEEEGAGGGGGGGELRGE')
blast = Bio::Blast.remote 'blastp', 'swissprot', '-e 0.0001', 'genomenet'
blast.query(seq)
produces:
/Users/audy/.rvm/gems/ruby-1.9.2-p290/bundler/gems/bioruby-c552aa3a6773/lib/bio/appl/blast/genomenet.rb:251:in `exec_genomenet': cannot understand response (RuntimeError)
from /Users/audy/.rvm/gems/ruby-1.9.2-p290/bundler/gems/bioruby-c552aa3a6773/lib/bio/appl/blast.rb:368:in `query'
from ./phone_blast.rb:10:in `<main>'
This is a test if the contributors get an email.
In Bio::MEDLINE#reference, doi should be filled.
I'm using bioruby 1.4.0 and ran into a problem with performance of
Bio::RestrictionEnzyme::Analysis - cutting a 37kbp sequence with a
single enzyme takes more than 5 minutes.
I downloaded this GenBank file to disk:
http://www.i-dcc.org/targ_rep/alleles/5682/escell-clone-genbank-file
...and extracted the first sequence:
gb = Bio::GenBank.open( PATH_TO_GBK_FILE ).next_entry
...then asked for a restriction enzyme analysis for BstEII:
cuts = Bio::RestrictionEnzyme::Analysis.cut( gb.seq, "BstEII", { :view_ranges => true } )
It's that call to cut() that takes 5 minutes; running cut() under RubyProf tells us:
Thread ID: 70368668447160 Total: 384.810000 %self total self wait child calls name 54.69 210.44 210.44 0.00 0.00 546320457 Fixnum#== (ruby_runtime:0} 45.06 383.83 173.41 0.00 210.42 148978 Array#include? (ruby_runtime:0} 0.11 384.22 0.43 0.00 384.22 33 Array#each (ruby_runtime:0} [SNIP]
So most of the time was spent in 546,320,457 calls to Fixnum#==. Am I
doing something silly, or is the restriction enzyme analysis algorithm
in need of some optimization?
Bio::RestrictionEnzyme::Analysis.cut_without_permutations() is almost
as slow, so it's not the permutations killing it. Is anyone else using
this module with more success?
Submitted by Raoul on 2008-02-13 at Rubyforge
Reading a generic GenBank FILE, the system returns one entry more than expected
data=Bio::FlatFile.auto("AJ561198.gb")
data.each_entry do |entry|
puts entry.entry_id
end
You get
AJ561198
nil
I think the parser identify the "\n" at the end of the genbank (after
"//\n") and think there is another entry, but it's wrong.
Deleting the last line, works.
ruby-1.9.2-preview1 > Bio::Sequence.guess("ACGT" )
=> Bio::Sequence::NA
ruby-1.9.2-preview1 > Bio::Sequence.guess("ACGT\n" )
=> Bio::Sequence::AA
whitespace should not affect sequence determination?
and perhaps Bio::Sequence.guess(" ") should throw an error instead of returning AA?
cheers,
yannick
Submitted by Ben Woodcroft via Rubyforge on 2008-08-29
Using blastxl3 with xml output (see attached), then parsing with bioruby gives this warning:
bio/appl/blast/rexml.rb:70: warning: Float 4.94066e-324 out of range
That number is from the Statistics_entropy value in the XML. That number is out of float range:
Float::MIN
=> 2.2250738585072e-308
The returned float value then becomes 0.0, which is close but wrong in a strict sense.
The same error is given for both edge bioruby:
http://github.com/bioruby/bioruby/commit/85a596da60d0ba0636fdb66e1dbbbd6b16a07a21
and my personal blastxml rexml new format fix branch:
http://github.com/wwood/bioruby/commit/ed03a8f42f64921589cf61884a7953a466e4e60c
Is there a ruby equivalent to the long double which appears to be used in the NCBI blast code?
Hi. I was trying to parse a kgml file but I found out that the coords field (used in the big maps) is not available!
Regards,
João Cardoso
I'm working with @catfeet to write a Blast pipeline.
The tool used at Cardiff University is Nucleotide BLAST with the nr/nt database from Genbank.
It seems like the only options with bioruby are genomenet
and ddbj
. However, genomenet.rb
references http://www.ncbi.nlm.nih.gov/blast/
in the notes.
Basically we want to be able to do:
blast = Bio::Blast.remote 'blastn', 'nr-nt', '-e 0.05 -m 8', 'genbank'
Does this mean I'll have to write a Bio::Blast::Remote::Genbank
module to receive output from that tool?
Bug submitted by Yannick Wurm via Rubyforge on 2008-05-21
Hi,
Just ran into another blast parsing bug.
Using ncbi's blastall 2.2.18, ruby 1.8.6 (2007-03-13 patchlevel 0) [powerpc-darwin] and bio.rb,v 1.88 2007/12/29
The following code works on almost every default blast 2.2.18 output I throw at it:
blastReportPath = ARGV[0]
outputPath = ARGV[1]
print "begin " +blastReportPath + "\n"
File.open(outputPath, "w") do |outputFile|
outputFile << "hit.target_id" + "\t"+ "report.query_def" + "\t"
However it hangs on one (very very long) protein sequence, FBgn0086906. When I kill it I get this:
^C/sw/lib/ruby/site_ruby/1.8/bio/appl/blast/format0.rb:267:in format0_parse_query': Interrupt from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast/format0.rb:168:in
query_def'
from /Volumes/Shiva/Users/yannickwurm/ruby/topHitsForQueryFromBlastReport.rb:54
from /sw/lib/ruby/site_ruby/1.8/bio/io/flatfile.rb:520:in foreach' from /sw/lib/ruby/site_ruby/1.8/bio/io/flatfile.rb:655:in
each'
from /sw/lib/ruby/site_ruby/1.8/bio/io/flatfile.rb:519:in foreach' from /sw/lib/ruby/site_ruby/1.8/bio/io/flatfile.rb:481:in
_open_file'
If I change the following two lines:
Query= FB|FBgn0086906 symbol:sls
(18,141 letters)
to :
Query= FB|FBgn0086906 symbol:sls
(18141 letters)
Then it works again. So somewhere the "," in the query length is confusing the blast parser. Below you can
see the context in which the "Query definition" lines are found. I've attached the complete blast output file
and ruby script fwiw.
Kind regards,
TBLASTN 2.2.18 [Mar-02-2008]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Reference for compositional score matrix adjustment: Altschul, Stephen F.,
John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis,
Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches
using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109.
Query= FB|FBgn0086906 symbol:sls
(18,141 letters)
Database: fourmidable012007
11,864 sequences; 9,098,808 total letters
Searching..................................................done
Score E
Sequences producing significant alignments: (bits) Value
SiJWB02BAW2.scf 299 2e-80
SiJWE01BDQ.scf 165 6e-40
SiJWA04CAU2.scf 103 2e-21
SI.CL.6.cl.653.Contig1 91 2e-17
Submitted by Juergen Helmers via Rubyforge on 2009-08-31
The sample code in the tutorial for searching NCBI PubMed is not functional. Same is true for the sample script
pmsearch.rb
entries = Bio::PubMed.esearch(ARGV.join(' '))
entries.each do |id|
case form
when 'medline'
puts entry = Bio::PubMed.efetch(id)
else
entry = Bio::PubMed.efetch(id)
puts Bio::MEDLINE.new(entry).reference.send(form)
end
end
Patch file is attached. It would be nice if the code could be updated since new users might be struggling with the false
code.
Keep up the good work!
Cheers Juergen
Hi,
when I execute the sample code from http://bioruby.open-bio.org/rdoc/classes/Bio/Blast.html I get the following error:
/Users/philipp/.rvm/gems/ruby-1.9.2-p180/gems/bio-1.4.1/lib/bio/appl/blast/genomenet.rb:240:in `exec_genomenet': cannot understand response (RuntimeError)
from /Users/philipp/.rvm/gems/ruby-1.9.2-p180/gems/bio-1.4.1/lib/bio/appl/blast.rb:368:in `query'
from blast_test.rb:12:in `<main>'
My script looks like this:
require 'rubygems'
require 'bio'
seq = Bio::Sequence::AA.new('MFRTKRSALVRRLWRSRAPGGEDEEEGAGGGGGGGELRGE')
# To run an actual BLAST analysis:
#1. create a BLAST factory
remote_blast_factory = Bio::Blast.remote('blastp', 'SWISS', '-e 0.0001', 'genomenet')
#2. run the actual BLAST by querying the factory
report = remote_blast_factory.query(seq)
Entrez-delivered MEDLINE records seem to be line-wrapped to 85 columns (for example, see PMID 20146148). This means that some exceptionally long and qualified MeSH headings (e.g., "Motorcycles/classification/legislation & jurisprudence/*statistics & numerical data") don't get parsed properly by MEDLINE#initialize- the parts that got wrapped to a second line get stuck in as separate mesh headings when split up by MEDLINE#mh.
Would it be possible to include the stderr output from ClustalW as a instance variable for Bio::ClustalW?
Here's what I've come up with but I don't feel confidant in my knowledge of processes and IO.
module Bio
class ClustalW
#redifine errorlog with the newly generated stderr output
def errorlog
@data_stderr
end
#redefine exec_local using call_command_open3 so we can get stderr
def exec_local(opt)
@command = [ @program, *opt ]
#STDERR.print "DEBUG: ", @command.join(" "), "\n"
@data_stdout = nil
@exit_status = nil
@data_stderr = nil
Bio::Command.call_command_open3(@command) do |pin,pout,perr|
@data_stdout = pout.read
@data_stderr = perr.read
end
@exit_status = $?
end
end
end
Thanks
Submitted by Masahide Kikkawa on Rubyforge on 2007-06-21 09:47
Due to the changes of pubmed interface, a method Bio::PubMed.query(pubmed_id) does not work.
Change the following lines
def self.query(id)
host = "www.ncbi.nlm.nih.gov"
path = "/Entrez/query.fcgi?tool=bioruby&cmd=Text&dopt=MEDLINE&db=PubMed&uid="
to
path = "sites/entrez?tool=bioruby&cmd=Text&dopt=MEDLINE&db=PubMed&uid="
BioRuby.new.gc_content # => "70.000000"
BioRuby.gc_content
NoMethodError: undefined method `gc_content' for BioRuby:Class
Perhaps we could add class methods for BioRuby? For certain class-methods like .gc_content and so on?
As for title:
in bioruby (github, master)
rake test
(in /usr/local/src/bioruby)
ununitialized constant XML::SaxParser (NameError)
I tested both the "official" gem and with more recent mumboe-soap4r.
If I uninstall the gem soap4r-ruby1.9 all tests pass. (so is a soap4r/XML SAX parser problem)
Running on OS X 10.7, with ruby2.0 (installed via Fink), tests fail with this error:
[2877/3867] Bio::TestPhyloXML_class_methods#test_new = 0.00 s
1) Error:
test_new(Bio::TestPhyloXML_class_methods):
ArgumentError: invalid byte sequence in US-ASCII
/sw/build.build/bioruby-rb20-1.4.3.0001-1/bioruby-1.4.3.0001/lib/bio/db/phyloxml/phyloxml_parser.rb:326:in `=~'
/sw/build.build/bioruby-rb20-1.4.3.0001-1/bioruby-1.4.3.0001/lib/bio/db/phyloxml/phyloxml_parser.rb:326:in `!~'
/sw/build.build/bioruby-rb20-1.4.3.0001-1/bioruby-1.4.3.0001/lib/bio/db/phyloxml/phyloxml_parser.rb:326:in `initialize'
/sw/build.build/bioruby-rb20-1.4.3.0001-1/bioruby-1.4.3.0001/test/unit/bio/db/test_phyloxml.rb:105:in `new'
/sw/build.build/bioruby-rb20-1.4.3.0001-1/bioruby-1.4.3.0001/test/unit/bio/db/test_phyloxml.rb:105:in `test_new'
/sw/lib/ruby/2.0/minitest/unit.rb:1301:in `run'
/sw/lib/ruby/2.0/test/unit/testcase.rb:17:in `run'
/sw/lib/ruby/2.0/minitest/unit.rb:919:in `block in _run_suite'
/sw/lib/ruby/2.0/minitest/unit.rb:912:in `map'
/sw/lib/ruby/2.0/minitest/unit.rb:912:in `_run_suite'
/sw/lib/ruby/2.0/test/unit.rb:657:in `block in _run_suites'
/sw/lib/ruby/2.0/test/unit.rb:655:in `each'
/sw/lib/ruby/2.0/test/unit.rb:655:in `_run_suites'
/sw/lib/ruby/2.0/minitest/unit.rb:867:in `_run_anything'
/sw/lib/ruby/2.0/minitest/unit.rb:1060:in `run_tests'
/sw/lib/ruby/2.0/minitest/unit.rb:1047:in `block in _run'
/sw/lib/ruby/2.0/minitest/unit.rb:1046:in `each'
/sw/lib/ruby/2.0/minitest/unit.rb:1046:in `_run'
/sw/lib/ruby/2.0/minitest/unit.rb:1035:in `run'
/sw/lib/ruby/2.0/test/unit.rb:21:in `run'
/sw/lib/ruby/2.0/test/unit.rb:774:in `run'
/sw/lib/ruby/2.0/test/unit.rb:834:in `run'
test/runner.rb:36:in `<main>'
This patch fixes it by making sure UTF-8 is used during the test (source: http://wiki.lifesciencedb.jp/mw/index.php/BioRuby ):
--- a/test/unit/bio/db/test_phyloxml.rb
+++ b/test/unit/bio/db/test_phyloxml.rb
@@ -100,6 +100,7 @@ end #end module TestPhyloXMLData
end
def test_new
+ Encoding.default_external="UTF-8"
str = File.read(TestPhyloXMLData.example_xml)
assert_instance_of(Bio::PhyloXML::Parser,
phyloxml = Bio::PhyloXML::Parser.new(str))
Hi,
Today I noticed this very useful method in bioruby. However, I think it perhaps is not working correctly, (maybe for trees that don't have distance?)
tree = Bio::Newick.new('(A,B,(C,(D,G)H)E)F; ').tree
tree.subtree(%w(A B D G).collect{|s| tree.get_node_by_name(s)}).newick
gives
=> "(\n)A;\n"
The underlying tree is
=> #<Bio::Tree:0x92c18f8 @pathway=#<Bio::Pathway:0x92c18d0 @undirected=true, @relations=[],
@graph={(Node:"A")=>{}, (Node:"B")=>{}, (Node:"D")=>{}, (Node:"G")=>{}},
@index={}, @label={}>, @root=nil, @options={}, @cache_parent={}>
This is using both the bioruby 1.4.2 and the current github master. Have I spotted a bug?
Thanks in advance.
ben
Submitted by Nobody via Rubyforge on 2009-06-29
The method 'name' in Tree::Bio::Node class replaces the underscore '_' by space.
E.g.
The newick tree
(A_B, X);
The name of the node becomes "A B" which is inconsistence with what specified by the input and the Bio::Sequence
class.
Submitted by Jan Aerts via Rubyforge on 2008-07-04
See discussion on mailing list: http://lists.open-bio.org/pipermail/bioruby/2008-June/000653.html
Hi,
When building Debian packages of bioruby, the test suite is run. The test test_cut_symbol.rb
is failing because constant Bio::RestrictionEnzyme::CutSymbol is not initialized. I guess that there may be a problem with the way bio/util/restriction_enzyme/cut_symbol
is required by this test. Requiring instead bio/util/restriction_enzyme
would ensure that everything is well defined (cut_symbol is then automatically loaded).
Here is the patch applied in Debian to solve this issue:
--- a/test/unit/bio/util/restriction_enzyme/test_cut_symbol.rb
+++ b/test/unit/bio/util/restriction_enzyme/test_cut_symbol.rb
@@ -15,7 +15,8 @@
# libraries needed for the tests
require 'test/unit'
-require 'bio/util/restriction_enzyme/cut_symbol'
+require 'bio/util/restriction_enzyme'
+#require 'bio/util/restriction_enzyme/cut_symbol'
module Bio; module TestRestrictionEnzyme #:nodoc:
From the bioruby documentation:
s = Bio::Sequence::NA.new('atggcgtga')
puts s.gc_content #=> 0.555555555555556
But when using ruby 1.9.3 and bioruby 1.4.3 and do the same:
s = Bio::Sequence::NA.new('atggcgtga')
puts s.gc_content #=> 5/9
also appears to affect other methods that should return float:
puts s.at_content #=> 4/9
puts s.gc_skew #=> 3/5
puts s.at_skew #=> 0/1
shevy: do you know why bioruby's wiki doesnt work? http://bioruby.open-bio.org/wiki/
no idea, we have to ask ngoto when he is back
or we could file a bug report on github :)
shall I file one?
yeah
I think it's been down few weeks maybe
ok
Anyone knows why BioRuby Wiki is down and how to repair it?
The error we get is:
(Can't contact the database server: Unknown database 'biorubywikidb' (localhost))
Possibly the database entry has been removed or something like that?
If you try to call reparse() on a newick tree you will get:
NameError: `tree' is not allowed as an instance variable name
from /usr/local/lib/ruby/gems/1.8/gems/bio-1.4.1/lib/bio/db/newick.rb:346:in `remove_instance_variable'
from /usr/local/lib/ruby/gems/1.8/gems/bio-1.4.1/lib/bio/db/newick.rb:346:in `reparse'
from (irb):5
from /usr/local/lib/ruby/site_ruby/1.8/rubygems.rb:123
The offending code is line 346 of newick.rb as the error states:
def reparse
remove_instance_variable(:tree)
self.tree
self
end
You can clearly see the incorrect parameter being passed to remove_instance_variable(). The method should read:
def reparse
remove_instance_variable(:@tree)
self.tree
self
end
Pubmed is returning 301 Permentant Redirects for all requests from the Bio::PubMed library.
The path in Bio::PubMed is wrong, '/sites/entrez' should be just '/pubmed' now.
This is an easy fix, which I'll do now.
However it raises the question of whether we should be automatically following redirects.
Current PDB format version is 3.3 but current BioRuby's Bio::PDB only supports PDB format version 2.x which is obsolete.
In PDB format version 3.3, some columns are expanded (e.g. serNum in SEQRES) and current Bio::PDB fails to parse large PDB entries.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.