require 'bio'

ff =, 'NC_005213.ffn')

ff.each_entry do |f|
puts "definition : " + f.definition
puts "nalen : " + f.nalen.to_s
puts "naseq : " + f.naseq

The above code fails with:

NoMethodError: private method `getc' called for "NC_005213.ffn":String

The official tutorial tells you to use the above code, and as it fails,
the tutorial should be updated:

This is the part in the Tutorial where it then fails:

"For example, in turn, reading FASTA format files:"

Circular require warning

Under Ruby 1.9.2 and later, warnings about circular requires are given if $VERBOSE is set to true:

$ ruby -w -e 'require "bio";"atgcatgcaaaa")'
/Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence/compat.rb:15: warning: loading in progress, circular require considered harmful - /Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence.rb
from -e:1:in `<main>'
from /Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence.rb:15:in `<top (required)>'
from /Users/agrimm/.rvm/rubies/ruby-head/lib/ruby/site_ruby/2.0.0/rubygems/custom_require.rb:36:in `require'
from /Users/agrimm/.rvm/rubies/ruby-head/lib/ruby/site_ruby/2.0.0/rubygems/custom_require.rb:36:in `require'
from /Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence/compat.rb:13:in `<top (required)>'
from /Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence/compat.rb:15:in `<module:Bio>'
$ ruby --version
ruby 2.0.0dev (2012-05-05 trunk 35543) [x86_64-darwin10.8.0]

This also occurs in the current version of bioruby in the master branch.

Equivalent blast parsing approaches aren't

Submitted by Yannick Wurm via Rubyforge on 2009-08-09


to parse a blast file, only the 3rd method I tried actually worked. For newcomers it can be quite disappointing

This worked:
reportsArray = Bio::FlatFile.foreach(path) do |report|
report.each_iteration do |iter|
iter.each do |hit| # actually there is only a single hit here #iteration
print "hit . "
bestHsp = hit.hsps[0]
puts bestHsp.query_frame

But trying to get the same results using the following approaches always led to crashes:,path) do |ff|
ff.each do |report|

Bio::Blast.reports(path) do |report|

Partially, it looks like ruby is going into the wrong parser. Eg for the latter:
/sw/lib/ruby/site_ruby/1.8/bio/appl/blast.rb:402: warning: useless use of :: in void context
/sw/lib/ruby/site_ruby/1.8/bio/appl/blast.rb:265: warning: method redefined; discarding old server=
/sw/lib/ruby/site_ruby/1.8/bio/appl/blast/format8.rb:70:in tab_parse_hsp': undefined methodstrip' for nil:NilClass

My blast output here is -m 0. But I reports weren't being parser properly with -m7 or -m8 either.
Is bioruby trying to support too many blast output formats? It could be helpful to document in the blast rdoc which
blast versions and output parameters ruby was tested on.

(my blast output here was generated with -p tblastx -v 1 -b 1 -e 1.0e-4 -m 0 -V T in blast-2.2.15 (but also tried 2.2.10
and 2.2.18)).

ruby 1.8.6 (2007-03-13 patchlevel 0) [powerpc-darwin]
bioruby 1.3.0

flatfile.rb: file format auto-detection fail

On PacBio produced fastq file, the auto-detection failed for the code shown below.

require 'bio'
ff =, ARGF)
while fe = ff.next_entry
  puts "#{fe.entry_id}\t#{fe.seq.length}"

Because fastq file may have more than one line of nucleotides and there is currently no
format that is identical to the second line but have something different after the second id line.

the regular expression in autodetection.rb

      fastq  = RuleRegexp[ 'Bio::Fastq',
        /^\@.+(?:\r|\r?\n)(?:[^\@\+].*(?:\r|\r?\n))+\+.*(?:\r|\r?\n).+(?:\r|\r?\n)/ ],

might be shortened to

      fastq  = RuleRegexp[ 'Bio::Fastq',
        /^\@.+(?:\r|\r?\n)(?:[^\@\+].*(?:\r|\r?\n))+/ ],

tests using chi2 are randomly failing (rarely, but still)


I've noticed that the tests test_randomize_with_hash_equiprobability and test_randomize_equiprobability from test/unit/bio/sequence/test_common.rb are sometimes failing. Running the tests about 460 times, I got 11 failures. I guess it is normal since they involve probabilistic sampling and statistical tests. However, it is a bit disorienting to have tests failing randomly, if the code seems ok.
On Debian, the test suite is run during the build of the package, and a test failure means that the package is not built. We will thus have to disable these tests. Could you provide a mecanism to easily exclude these tests based on randomness out the test suite, by e.g. moving these tests to a particular file, so that one can be sure the tests will pass?

Thanks a lot!

Circular require warning for compat.rb


require "bio/sequence/compat"

will cause a "circular require considered harmful" warning when warnings are on for recent versions of Ruby.

The following will reproduce the warning for the git repo, if you're in the (git root)/lib directory:

$ ruby --disable-gems -w
$: << "."
require "bio/sequence/compat"
[snip]/sandbox/bioruby/lib/bio/sequence.rb:77: warning: loading in progress, circular require considered harmful - [snip]/sandbox/bioruby/lib/bio/sequence/compat.rb
    from -:2:in `<main>'
    from -:2:in `require'
    from [snip]/sandbox/bioruby/lib/bio/sequence/compat.rb:10:in `<top (required)>'
    from [snip]/sandbox/bioruby/lib/bio/sequence/compat.rb:12:in `<module:Bio>'
    from [snip]/sandbox/bioruby/lib/bio/sequence/compat.rb:12:in `require'
    from [snip]/sandbox/bioruby/lib/bio/sequence.rb:13:in `<top (required)>'
    from [snip]/sandbox/bioruby/lib/bio/sequence.rb:62:in `<module:Bio>'
    from [snip]/sandbox/bioruby/lib/bio/sequence.rb:77:in `<class:Sequence>'
    from [snip]/sandbox/bioruby/lib/bio/sequence.rb:77:in `require'
$ ruby --version
ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-linux]

If I'm not supposed to be requiring only part of bioruby, let me know.

I'm currently doing this so that I can keep track of what parts of bioruby I'm using in which parts of my program.

Bio::FastaFormat.query() returns no hits if Bio::FastaFormat.entry() is not called beforehand

I think I found a bug in Bio::FastaFormat:
query() calls factory.query(@entry) but @entry is only set upon calling entry() so @entry will be nil if entry() is not called before calling query().
As a result, query() will return no hits because the search is conducted with an empty sequence.

I have created a gist to illustrate the issue:
In my case, the resulting output is:


BLAST formats

Submitted by Yannick Wurm via Rubyforge on 2008-05-21

NCBI's blastall output format changed once again.

Using reportsArray = Bio::FlatFile.foreach(blastReportPath) do |report|

I can parse blastall-2.2.18's output correctly only if -m 7 -V (xml format; use legacy engine) or if nothing (new engine,
"default text output") is specified. Using -m 7, only a single query/hit is found (and it may be incorrect).
(this is dangerous, since no error message is displayed).

It's due to the fact that "old" blastall output when blasting a multi-entry fasta file against a database
was equal to the sum of several single-entry outputs. (ie the BLAST headers were output once for each query sequence
in the input fasta file). "new" blastall output considers each query sequence as another "iteration"
of blast... the Blast headers are listed only once.

I've attached example output.

I am aware that bioruby is an open-source community project, but the frequency at which bugs like this are encountered
make it very difficult to justify using bioruby in a production environment....

Kind regards,

Yannick Wurm -

failures of test/functional/bio/io/test_ensembl.rb

Submitted by Naohisa Goto via Rubyforge on 2008-08-31

Three failures in test/functional/bio/io/test_ensembl.rb, during running test/runner.rb.

BioRuby version: git commit ID: e86f8d7

% ruby -v
ruby 1.8.5 (2006-08-25) [i486-linux]
% uname -a
Linux xxx 2.6.18-6-686 #1 SMP Fri Jun 6 22:22:11 UTC 2008 i686 GNU/Linux

  1. Failure:
    test_gff_exportview(Bio::FuncTestEnsemblHuman) [./test/functional/bio/io/test_ensembl.rb:95]:
    <"4\tEnsembl\tGene\t1148366\t1151952\t.\t+\t1\tgene_id=ENSG00000206158; transcript_id=ENST00000382964;
    exon_id=ENSE00001494097; gene_type=KNOWN_protein_coding\n"> expected but was

  2. Failure:
    test_gff_exportview_with_named_args(Bio::FuncTestEnsemblHuman) [./test/functional/bio/io/test_ensembl.rb:121]:
    <"4\tEnsembl\tGene\t1148366\t1151952\t.\t+\t1\tgene_id=ENSG00000206158; transcript_id=ENST00000382964;
    exon_id=ENSE00001494097; gene_type=KNOWN_protein_coding\n"> expected but was

  3. Failure:
    test_tab_exportview_with_named_args(Bio::FuncTestEnsemblHuman) [./test/functional/bio/io/test_ensembl.rb:180]:
    expected but was


Formatting of sequence features broken

Submitted by Jan Aerts on Rubyforge site on 2008-02-14

When trying to format the features from a Bio::Sequence (using Bio::Sequence#format_features), the output is not what
it should be. Using the following parameters, part of the expected output for AJ224122 should look like this:

FT source 1..3827
FT /organism="Arabidopsis thaliana"
FT /chromosome="3"
FT /cultivar="Wassilewskija"
FT /mol_type="genomic DNA"
FT /db_xref="taxon:3702"
FT mRNA join(1726..1863,2548..3052,3137..3827)
FT /gene="DAG1"
FT /product="DNA-binding protein"
FT /function="transcription factor"
FT /experiment="experimental evidence, no additional details
FT recorded"

However, the observed output is:
FT source 1..3827FT /organism="Arabidopsis thaliana"FT mRNA
join(1726..1863,2548..3052,3137..3827)FT /gene="DAG1"FT CDS
join(1840..1863,2548..3052,3137..3498)FT /gene="DAG1"FT exon 1726..1863FT
/gene="DAG1"FT intron 1864..2547FT /gene="DAG1"FT exon
2548..3052FT /gene="DAG1"FT intron 3053..3136FT
/gene="DAG1"FT exon 3137..3495FT /gene="DAG1"

Bio::Reference lib/bio/reference.rb url hash code error

For pulls from pubmed without a valid URL, converting to endnote (and possibly other formats) will fail. The code at fault is line 145 of lib/bio/reference.rb

  @url      = hash['url']

should be

  @url      = hash['url'] || ''



Submitted by Rodrigo Jardim via Rubyforge on 2010-10-14

There are some errors in restncbi.rb in method esearch. The value to step is too much. The NCBI rest just retrive 100
records per time. The loop with 0.step is wrong too. I already build a new code, may I send you?



This is test. Please ignore.

Please reduce your build matrix

Hi! We are happy to see BioRuby on and have a little favor to ask for. One of your forks generates 80 or even 100+ runs per build. This is a little but unfair to the rest of users with Ruby projects because it takes well over an hour to build 100+ rows, for every single push.

I submitted a pull request to reduce the matrix but it was ignored so far. The fork maintainer seems to be a BioRuby org member. If you know how to get in touch with him (her?), please merge that pull request.
Lots of Ruby developers who use travis ci will be very thankful to you.

Thank you. On behalf of the maintainers team,


Bug - Bio::Fetch.query

require 'bio'
entry = Bio::Fetch.query('hal', 'VNG1467G')

OpenURI::HTTPError: 404 Not Found
from /usr/lib/ruby/1.8/open-uri.rb:277:in open_http' from /usr/lib/ruby/1.8/open-uri.rb:616:inbuffer_open'
from /usr/lib/ruby/1.8/open-uri.rb:164:in open_loop' from /usr/lib/ruby/1.8/open-uri.rb:162:incatch'
from /usr/lib/ruby/1.8/open-uri.rb:162:in open_loop' from /usr/lib/ruby/1.8/open-uri.rb:132:inopen_uri'
from /usr/lib/ruby/site_ruby/1.8/bio/command.rb:625:in read_uri' from /usr/lib/ruby/site_ruby/1.8/bio/io/fetch.rb:183:in_get'
from /usr/lib/ruby/site_ruby/1.8/bio/io/fetch.rb:111:in fetch' from /usr/lib/ruby/site_ruby/1.8/bio/io/fetch.rb:128:inquery'
from (irb):12

Hmm. Not sure where the error is. But it would be nice if OpenURI::HTTPError: 404 Not Found
errors could feedback the URL to the user, so that he can easily check manually.

Right now I have no idea what is going on.

updating REBASE data

i'd like to update the included REBASE data. does anyone have an objection to this?

currently the source has this page stating the terms:

Those seeking to distribute REBASE files with their software packages are welcome to do so, providing it is clear to your users that they are not being charged for the REBASE data. It should be transparent that REBASE is a free and independent resource, with the following bibliographical reference:
Roberts, R.J., Vincze, T., Posfai, J., Macelis, D. (2010)
REBASE--a database for DNA restriction and modification: enzymes, genes and genomes.
Nucl. Acids Res. 38: D234-D236. 

could i add that to the LICENSE file?

fork() is called on platforms that do not support it

Because jruby is not recognised as being unable to support fork(), it (using master as of March 19 2010) produces the following error:

NotImplementedError: popen("-") is unimplemented

/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio- `call_command_fork'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio- `call_command'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio- `exec_local'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio- `query_by_filename'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio- `query_string'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio- `query_align'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio- `query'

When "|java" is added to the regular expression on line 150 in lib/bio/command.rb for non-fork supporting platforms, the error goes away. A similar error occurs with "Ruby Installer" for windows, which has an unrecognised platform of "i386-mingw32".

remote BLAST not working

Using Ruby 1.9.2 and bio (HEAD):


blast = Bio::Blast.remote 'blastp', 'swissprot', '-e 0.0001', 'genomenet'



/Users/audy/.rvm/gems/ruby-1.9.2-p290/bundler/gems/bioruby-c552aa3a6773/lib/bio/appl/blast/genomenet.rb:251:in `exec_genomenet': cannot understand response (RuntimeError)
    from /Users/audy/.rvm/gems/ruby-1.9.2-p290/bundler/gems/bioruby-c552aa3a6773/lib/bio/appl/blast.rb:368:in `query'
    from ./phone_blast.rb:10:in `<main>'

test issue

This is a test if the contributors get an email.

Bio::RestrictionEnzyme::Analysis performance

I'm using bioruby 1.4.0 and ran into a problem with performance of
Bio::RestrictionEnzyme::Analysis - cutting a 37kbp sequence with a
single enzyme takes more than 5 minutes.

I downloaded this GenBank file to disk:

...and extracted the first sequence:

gb = PATH_TO_GBK_FILE ).next_entry

...then asked for a restriction enzyme analysis for BstEII:

cuts = Bio::RestrictionEnzyme::Analysis.cut( gb.seq, "BstEII", { :view_ranges => true } )

It's that call to cut() that takes 5 minutes; running cut() under RubyProf tells us:

Thread ID: 70368668447160
Total: 384.810000

 %self     total     self     wait    child    calls  name
 54.69    210.44   210.44     0.00     0.00 546320457  Fixnum#== (ruby_runtime:0}
 45.06    383.83   173.41     0.00   210.42   148978  Array#include? (ruby_runtime:0}
  0.11    384.22     0.43     0.00   384.22       33  Array#each (ruby_runtime:0}

So most of the time was spent in 546,320,457 calls to Fixnum#==. Am I
doing something silly, or is the restriction enzyme analysis algorithm
in need of some optimization?

Bio::RestrictionEnzyme::Analysis.cut_without_permutations() is almost
as slow, so it's not the permutations killing it. Is anyone else using
this module with more success?

GenBank each_entry, last entry is always nil

Submitted by Raoul on 2008-02-13 at Rubyforge

Reading a generic GenBank FILE, the system returns one entry more than expected"")

data.each_entry do |entry|
puts entry.entry_id

You get


I think the parser identify the "\n" at the end of the genbank (after
"//\n") and think there is another entry, but it's wrong.
Deleting the last line, works.

Bio::Sequence.guess issue

ruby-1.9.2-preview1 > Bio::Sequence.guess("ACGT" )

=> Bio::Sequence::NA

ruby-1.9.2-preview1 > Bio::Sequence.guess("ACGT\n" )

=> Bio::Sequence::AA

whitespace should not affect sequence determination?
and perhaps Bio::Sequence.guess(" ") should throw an error instead of returning AA?


Blast Database XML Entropy Statistics Can Be so small that it is outside the Float range

Submitted by Ben Woodcroft via Rubyforge on 2008-08-29

Using blastxl3 with xml output (see attached), then parsing with bioruby gives this warning:

bio/appl/blast/rexml.rb:70: warning: Float 4.94066e-324 out of range

That number is from the Statistics_entropy value in the XML. That number is out of float range:

=> 2.2250738585072e-308

The returned float value then becomes 0.0, which is close but wrong in a strict sense.

The same error is given for both edge bioruby:
and my personal blastxml rexml new format fix branch:

Is there a ruby equivalent to the long double which appears to be used in the NCBI blast code?

KGML Parser

Hi. I was trying to parse a kgml file but I found out that the coords field (used in the big maps) is not available!


João Cardoso

Genbank Support

I'm working with @catfeet to write a Blast pipeline.

The tool used at Cardiff University is Nucleotide BLAST with the nr/nt database from Genbank.

It seems like the only options with bioruby are genomenet and ddbj. However, genomenet.rbreferences in the notes.

Basically we want to be able to do:

blast = Bio::Blast.remote 'blastn', 'nr-nt', '-e 0.05 -m 8', 'genbank'

Does this mean I'll have to write a Bio::Blast::Remote::Genbank module to receive output from that tool?

BLAST parsing - bug with long sequences

Bug submitted by Yannick Wurm via Rubyforge on 2008-05-21


Just ran into another blast parsing bug.
Using ncbi's blastall 2.2.18, ruby 1.8.6 (2007-03-13 patchlevel 0) [powerpc-darwin] and bio.rb,v 1.88 2007/12/29

The following code works on almost every default blast 2.2.18 output I throw at it:
blastReportPath = ARGV[0]
outputPath = ARGV[1]
print "begin " +blastReportPath + "\n", "w") do |outputFile|
    outputFile << "hit.target_id" + "\t"+ "report.query_def" + "\t"
  • "hit.evalue" + "\n"
    i = 0
    print i.to_s + "\n"
    reportsArray = Bio::FlatFile.foreach(blastReportPath) do |report|
    firstHit = TRUE
    print i.to_s + "\n"
    print " " + report.query_def + "\n"
    report.each do |hit|
    if ((hit.evalue < 1.0e-20) || (firstHit == TRUE))
    print " " + hit.target_id + "\n"
    outputFile << hit.target_id + "\t"+ report.query_def + "\t" + hit.evalue.to_s +
    firstHit = FALSE
    i = i +1

However it hangs on one (very very long) protein sequence, FBgn0086906. When I kill it I get this:
^C/sw/lib/ruby/site_ruby/1.8/bio/appl/blast/format0.rb:267:in format0_parse_query': Interrupt from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast/format0.rb:168:inquery_def'
from /Volumes/Shiva/Users/yannickwurm/ruby/topHitsForQueryFromBlastReport.rb:54
from /sw/lib/ruby/site_ruby/1.8/bio/io/flatfile.rb:520:in foreach' from /sw/lib/ruby/site_ruby/1.8/bio/io/flatfile.rb:655:ineach'
from /sw/lib/ruby/site_ruby/1.8/bio/io/flatfile.rb:519:in foreach' from /sw/lib/ruby/site_ruby/1.8/bio/io/flatfile.rb:481:in_open_file'

If I change the following two lines:

Query= FB|FBgn0086906 symbol:sls
(18,141 letters)

to :

Query= FB|FBgn0086906 symbol:sls
(18141 letters)

Then it works again. So somewhere the "," in the query length is confusing the blast parser. Below you can
see the context in which the "Query definition" lines are found. I've attached the complete blast output file
and ruby script fwiw.

Kind regards,

yannick wurm

TBLASTN 2.2.18 [Mar-02-2008]

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Reference for compositional score matrix adjustment: Altschul, Stephen F.,
John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis,
Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches
using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109.

Query= FB|FBgn0086906 symbol:sls
(18,141 letters)

Database: fourmidable012007
11,864 sequences; 9,098,808 total letters


                                                             Score    E

Sequences producing significant alignments: (bits) Value

SiJWB02BAW2.scf 299 2e-80
SiJWE01BDQ.scf 165 6e-40
SiJWA04CAU2.scf 103 2e-21 91 2e-17

Tutorail code for PubMed search is not functioning

Submitted by Juergen Helmers via Rubyforge on 2009-08-31

The sample code in the tutorial for searching NCBI PubMed is not functional. Same is true for the sample script

  1. the sample script do miss the "require 'rubygem'" statement otherwise the bioruby gem will not be found
  2. does not work, it has to be replaced by BioPubMEd.esearch as recommended in the code itself.
  3. puts returns empty objects. One first has to fetch the Pubmed Article
    Object as the esearch only returns PubMed ids nothing more.

entries = Bio::PubMed.esearch(ARGV.join(' '))
entries.each do |id|
case form
when 'medline'
puts entry = Bio::PubMed.efetch(id)
entry = Bio::PubMed.efetch(id)

Patch file is attached. It would be nice if the code could be updated since new users might be struggling with the false

Keep up the good work!
Cheers Juergen

Remote Blast Fails

when I execute the sample code from I get the following error:

/Users/philipp/.rvm/gems/ruby-1.9.2-p180/gems/bio-1.4.1/lib/bio/appl/blast/genomenet.rb:240:in `exec_genomenet': cannot understand response (RuntimeError)
    from /Users/philipp/.rvm/gems/ruby-1.9.2-p180/gems/bio-1.4.1/lib/bio/appl/blast.rb:368:in `query'
    from blast_test.rb:12:in `<main>'

My script looks like this:

require 'rubygems'
require 'bio'


# To run an actual BLAST analysis:

#1. create a BLAST factory
remote_blast_factory = Bio::Blast.remote('blastp', 'SWISS', '-e 0.0001', 'genomenet')

#2. run the actual BLAST by querying the factory
report = remote_blast_factory.query(seq)

Bio::MEDLINE#initialize handles multi-line MeSH terms incorrectly

Entrez-delivered MEDLINE records seem to be line-wrapped to 85 columns (for example, see PMID 20146148). This means that some exceptionally long and qualified MeSH headings (e.g., "Motorcycles/classification/legislation & jurisprudence/*statistics & numerical data") don't get parsed properly by MEDLINE#initialize- the parts that got wrapped to a second line get stuck in as separate mesh headings when split up by MEDLINE#mh.

Bio::ClustalW error output

Would it be possible to include the stderr output from ClustalW as a instance variable for Bio::ClustalW?

Here's what I've come up with but I don't feel confidant in my knowledge of processes and IO.

module Bio
  class ClustalW

    #redifine errorlog with the newly generated stderr output
    def errorlog

    #redefine exec_local using call_command_open3 so we can get stderr
    def exec_local(opt)
      @command = [ @program,  *opt ]
      #STDERR.print "DEBUG: ", @command.join(" "), "\n"
      @data_stdout = nil
      @exit_status = nil
      @data_stderr = nil
      Bio::Command.call_command_open3(@command) do |pin,pout,perr|
        @data_stdout =
        @data_stderr =
      @exit_status = $?



Submitted by Masahide Kikkawa on Rubyforge on 2007-06-21 09:47

Due to the changes of pubmed interface, a method Bio::PubMed.query(pubmed_id) does not work.

Change the following lines

def self.query(id)
host = ""
path = "/Entrez/query.fcgi?tool=bioruby&cmd=Text&dopt=MEDLINE&db=PubMed&uid="


  path = "sites/entrez?tool=bioruby&cmd=Text&dopt=MEDLINE&db=PubMed&uid="

Provide class methods for common actions? # => "70.000000"

NoMethodError: undefined method `gc_content' for BioRuby:Class

Perhaps we could add class methods for BioRuby? For certain class-methods like .gc_content and so on?

Bio::TestPhyloXML_class_methods test failure

Running on OS X 10.7, with ruby2.0 (installed via Fink), tests fail with this error:

[2877/3867] Bio::TestPhyloXML_class_methods#test_new = 0.00 s                                                                                            
  1) Error:
ArgumentError: invalid byte sequence in US-ASCII
    /sw/ `=~'
    /sw/ `!~'
    /sw/ `initialize'
    /sw/ `new'
    /sw/ `test_new'
    /sw/lib/ruby/2.0/minitest/unit.rb:1301:in `run'
    /sw/lib/ruby/2.0/test/unit/testcase.rb:17:in `run'
    /sw/lib/ruby/2.0/minitest/unit.rb:919:in `block in _run_suite'
    /sw/lib/ruby/2.0/minitest/unit.rb:912:in `map'
    /sw/lib/ruby/2.0/minitest/unit.rb:912:in `_run_suite'
    /sw/lib/ruby/2.0/test/unit.rb:657:in `block in _run_suites'
    /sw/lib/ruby/2.0/test/unit.rb:655:in `each'
    /sw/lib/ruby/2.0/test/unit.rb:655:in `_run_suites'
    /sw/lib/ruby/2.0/minitest/unit.rb:867:in `_run_anything'
    /sw/lib/ruby/2.0/minitest/unit.rb:1060:in `run_tests'
    /sw/lib/ruby/2.0/minitest/unit.rb:1047:in `block in _run'
    /sw/lib/ruby/2.0/minitest/unit.rb:1046:in `each'
    /sw/lib/ruby/2.0/minitest/unit.rb:1046:in `_run'
    /sw/lib/ruby/2.0/minitest/unit.rb:1035:in `run'
    /sw/lib/ruby/2.0/test/unit.rb:21:in `run'
    /sw/lib/ruby/2.0/test/unit.rb:774:in `run'
    /sw/lib/ruby/2.0/test/unit.rb:834:in `run'
    test/runner.rb:36:in `<main>'

This patch fixes it by making sure UTF-8 is used during the test (source: ):

--- a/test/unit/bio/db/test_phyloxml.rb
+++ b/test/unit/bio/db/test_phyloxml.rb
@@ -100,6 +100,7 @@ end #end module TestPhyloXMLData

     def test_new
+      Encoding.default_external="UTF-8" 
       str =
                          phyloxml =

Bio::Tree#subtree not behaving as expected


Today I noticed this very useful method in bioruby. However, I think it perhaps is not working correctly, (maybe for trees that don't have distance?)

tree ='(A,B,(C,(D,G)H)E)F; ').tree
tree.subtree(%w(A B D G).collect{|s| tree.get_node_by_name(s)}).newick


 => "(\n)A;\n" 

The underlying tree is

 => #<Bio::Tree:0x92c18f8 @pathway=#<Bio::Pathway:0x92c18d0 @undirected=true, @relations=[], 
@graph={(Node:"A")=>{}, (Node:"B")=>{}, (Node:"D")=>{}, (Node:"G")=>{}}, 
@index={}, @label={}>, @root=nil, @options={}, @cache_parent={}> 

This is using both the bioruby 1.4.2 and the current github master. Have I spotted a bug?

Thanks in advance.


Submitted by Nobody via Rubyforge on 2009-06-29

The method 'name' in Tree::Bio::Node class replaces the underscore '_' by space.


The newick tree

(A_B, X);

The name of the node becomes "A B" which is inconsistence with what specified by the input and the Bio::Sequence

test_cut_symbol fails because of uninitialized constant Bio::RestrictionEnzyme::CutSymbol


When building Debian packages of bioruby, the test suite is run. The test test_cut_symbol.rb is failing because constant Bio::RestrictionEnzyme::CutSymbol is not initialized. I guess that there may be a problem with the way bio/util/restriction_enzyme/cut_symbol is required by this test. Requiring instead bio/util/restriction_enzyme would ensure that everything is well defined (cut_symbol is then automatically loaded).

Here is the patch applied in Debian to solve this issue:

--- a/test/unit/bio/util/restriction_enzyme/test_cut_symbol.rb
+++ b/test/unit/bio/util/restriction_enzyme/test_cut_symbol.rb
@@ -15,7 +15,8 @@

 # libraries needed for the tests
 require 'test/unit'
-require 'bio/util/restriction_enzyme/cut_symbol'
+require 'bio/util/restriction_enzyme'
+#require 'bio/util/restriction_enzyme/cut_symbol'

 module Bio; module TestRestrictionEnzyme #:nodoc:

Bio::Sequence::NA returns Rational not Float

From the bioruby documentation:

s ='atggcgtga')
puts s.gc_content   #=> 0.555555555555556

But when using ruby 1.9.3 and bioruby 1.4.3 and do the same:

s ='atggcgtga')
puts s.gc_content   #=> 5/9

also appears to affect other methods that should return float:

puts s.at_content #=> 4/9
puts s.gc_skew #=> 3/5
puts s.at_skew #=> 0/1

BioRuby Wiki seems to be down

shevy: do you know why bioruby's wiki doesnt work?
no idea, we have to ask ngoto when he is back
or we could file a bug report on github :)
shall I file one?
I think it's been down few weeks maybe

Anyone knows why BioRuby Wiki is down and how to repair it?

The error we get is:

(Can't contact the database server: Unknown database 'biorubywikidb' (localhost))

Possibly the database entry has been removed or something like that?

Bug in bio/db/newick.rb

If you try to call reparse() on a newick tree you will get:

NameError: `tree' is not allowed as an instance variable name
    from /usr/local/lib/ruby/gems/1.8/gems/bio-1.4.1/lib/bio/db/newick.rb:346:in `remove_instance_variable'
    from /usr/local/lib/ruby/gems/1.8/gems/bio-1.4.1/lib/bio/db/newick.rb:346:in `reparse'
    from (irb):5
    from /usr/local/lib/ruby/site_ruby/1.8/rubygems.rb:123

The offending code is line 346 of newick.rb as the error states:

def reparse

You can clearly see the incorrect parameter being passed to remove_instance_variable(). The method should read:

def reparse

Bio::Pubmed returning no results for searches

Pubmed is returning 301 Permentant Redirects for all requests from the Bio::PubMed library.

The path in Bio::PubMed is wrong, '/sites/entrez' should be just '/pubmed' now.

This is an easy fix, which I'll do now.

However it raises the question of whether we should be automatically following redirects.

support PDB format version 3.3

Current PDB format version is 3.3 but current BioRuby's Bio::PDB only supports PDB format version 2.x which is obsolete.

In PDB format version 3.3, some columns are expanded (e.g. serNum in SEQRES) and current Bio::PDB fails to parse large PDB entries.

