bioruby / bioruby Goto Github PK

View Code? Open in Web Editor NEW

376.0 36.0 109.0 5.77 MB

bioruby

Home Page: http://bioruby.open-bio.org

License: Other

Ruby 99.81% Shell 0.01% Gnuplot 0.07% Perl 0.01% Parrot 0.01% HTML 0.11% Batchfile 0.01%

bioruby's Introduction

–

README.rdoc - README for BioRuby¶ ↑

Copyright: Copyright © 2001-2007 Toshiaki Katayama <[email protected]>, Copyright © 2008 Jan Aerts <[email protected]> Copyright © 2011-2019 Naohisa Goto <[email protected]>
License: The Ruby License

The above statement is limited to this file. See below about BioRuby’s copyright and license.

BioRuby¶ ↑

BioRuby is an open source Ruby library for developing bioinformatics software. Object oriented scripting language Ruby has many features suitable for bioinformatics research, for example, clear syntax to express complex objects, regular expressions for text handling as powerful as Perl’s, a wide variety of libraries including web service etc. As the syntax of the Ruby language is simple and very clean, we believe that it is easy to learn for beginners, easy to use for biologists, and also powerful enough for the software developers.

In BioRuby, you can retrieve biological database entries from flat files, internet web servers and local relational databases. These database entries can be parsed to extract information you need. Biological sequences can be treated with the fulfilling methods of the Ruby’s String class and with regular expressions. Daily tools like Blast, Fasta, Hmmer and many other software packages for biological analysis can be executed within the BioRuby script, and the results can be fully parsed to extract the portion you need. BioRuby supports major biological database formats and provides many ways for accessing them through flatfile indexing, web services etc. Various web services can be easily utilized by BioRuby.

FOR MORE INFORMATION¶ ↑

See RELEASE_NOTES.rdoc for news and important changes in this version.

Documents in this distribution¶ ↑

Release notes, important changes and issues¶ ↑

README.rdoc: This file. General information and installation procedure.
RELEASE_NOTES.rdoc: News and important changes in this release.
KNOWN_ISSUES.rdoc: Known issues and bugs in BioRuby.
doc/RELEASE_NOTES-*.rdoc: Release notes for old versions.
doc/Changes-1.3.rdoc: News and incompatible changes from 1.2.1 to 1.3.0.
doc/Changes-0.7.rd: News and incompatible changes from 0.6.4 to 1.2.1.

Tutorials and other useful information¶ ↑

doc/Tutorial.rd: BioRuby Tutorial.
doc/Tutorial.rd.html: HTML version of Tutorial.rd.

BioRuby development¶ ↑

ChangeLog: History of changes.
doc/ChangeLog-*: ChangeLog for old versions.
doc/ChangeLog-before-1.4.2: changes before 1.4.2.
doc/ChangeLog-before-1.3.1: changes before 1.3.1.
README_DEV.rdoc: Describes ways to contribute to the BioRuby project, including coding styles and documentation guidelines.

Documents written in Japanese¶ ↑

doc/Tutorial.rd.ja: BioRuby Tutorial written in Japanese.
doc/Tutorial.rd.ja.html: HTML version of Tutorial.rd.ja.

Sample codes¶ ↑

In sample/, There are many sample codes and demo scripts.

WWW¶ ↑

BioRuby’s official website is at bioruby.org/. You will find links to related resources including downloads, mailing lists, Wiki documentation etc. in the top page.

bioruby.org/

Mirror site is available, hosted on Open Bioinformatics Foundation (OBF).

bioruby.open-bio.org/

WHERE TO OBTAIN¶ ↑

WWW¶ ↑

The stable release is freely available from the BioRuby website.

bioruby.org/archive/

RubyGems¶ ↑

RubyGems (packaging system for Ruby) version of the BioRuby package is also available for easy installation.

rubygems.org/gems/bio

git¶ ↑

If you need the latest development version, this is provided at

github.com/bioruby/bioruby

and can be obtained by the following procedure:

% git clone git://github.com/bioruby/bioruby.git

REQUIREMENTS¶ ↑

Ruby 2.0.0 or later – www.ruby-lang.org/
- Ruby 2.7.8, 3.0.6, 3.1.4, 3.2.2 or later is recommended.
- See KNOWN_ISSUES.rdoc for Ruby version specific problems.

OPTIONAL REQUIREMENTS¶ ↑

Some optional libraries can be utilized to extend BioRuby’s functionality. If your needs meets the following conditions, install them by using RubyGems, or download and install from the following web sites.

Creating faster flatfile index using Berkley DB:

GitHub:ruby-bdb (which took over bdb) (No RubyGems available)
- Oracle Berkeley DB and C compiler will be required.

INSTALL¶ ↑

INSTALL by using RubyGems (recommended)¶ ↑

If you are using RubyGems, just type

% gem install bio

Alternatively, manually download bio-X.X.X.gem from bioruby.org/archive/ and install it by using gems command.

Running self-test¶ ↑

To check if bioruby works fine on a machine, self-test codes are bundled. Note that some tests may need internet connection.

To run tests,

% ruby test/runner.rb

For those familiar with Rake,

% rake test

also works.

Before reporting test failure, please check KNOWN_ISSUES.rdoc about known platform-dependent issues. We are happy if you write patches to solve the issues.

SETUP¶ ↑

If you want to use the OBDA (Open Bio Database Access) to obtain database entries, copy a sample configuration file in the BioRuby distribution

bioruby-x.x.x/etc/bioinformatics/seqdatabase.ini

/etc/bioinformatics/seqdatabase.ini	(system wide configuration)

~/.bioinformatics/seqdatabase.ini	(personal configuration)

and change the contents according to your preference. For more information on the OBDA, see obda.open-bio.org/ .

USAGE¶ ↑

You can load all BioRuby classes just by requiring ‘bio.rb’. All the BioRuby classes and modules are located under the module name ‘Bio’ to separate the name space.

#!/usr/bin/env ruby
require 'bio'

You can also read other documentation in the ‘doc’ directory.

bioruby-x.x.x/doc/

PLUGIN (Biogem)¶ ↑

Many plugins (called Biogem) are now available. See biogems.info/ for list of plugins and related software utilizing BioRuby.

biogems.info/

Plugins (Biogems) listed below had been included in BioRuby in former days, and were split to separate packages to reduce complexity and external dependencies.

bio-shell
bio-executables
bio-blast-xmlparser
bioruby-phyloxml
- NOTE: Please uninstall bio-phyloxml, that have been created as a preliminary trial of splitting a module in 2012 and have not been maintained after that.
bio-biosql

Plugins (Biogems) listed below may be useful for running existing codes.

bio-old-biofetch-emulator – Emulates deprecated BioRuby’s BioFetch server by using other existing web services.

To develop your own plugin, see “Plugins” pages of BioRuby Wiki.

bioruby.open-bio.org/wiki/Plugins

Recommended Plugins (gems)¶ ↑

For existing BioRuby users, it is recommended to install the following gems:

bio-shell: If you use the BioRuby Shell.
bio-executables: If you use br_bio* commands.
bio-old-biofetch-emulator: If you run existing codes using BioFetch, including sample and demo codes in sample/.
bio-blast-xmlparser: If you treat BLAST XML result files and Expat XML parser (with development files) is installed in your system.
bioruby-phyloxml: If you use Bio::PhyloXML and Libxml2 (with developemnt files) is installed in your system.

Note that it is NOT recommended to install bio-biosql unless you have really used Bio::SQL, because it depends on older version of ActiveRecords and ActiveSupport that may not be run on recent Ruby versions.

LICENSE¶ ↑

BioRuby can be freely distributed under the same terms as Ruby. See the file COPYING (or COPYING.ja written in Japanese).

As written in the file COPYING, see the file LEGAL for files distributed under different license.

REFERENCE¶ ↑

If you use BioRuby in academic research, please consider citing the following publication.

BioRuby: Bioinformatics software for the Ruby programming language. Naohisa Goto, Pjotr Prins, Mitsuteru Nakao, Raoul Bonnal, Jan Aerts and Toshiaki Katayama. Bioinformatics (2010) 26(20): 2617-2619.
- doi: 10.1093/bioinformatics/btq475
- PMID: 20739307

CONTACT¶ ↑

Current staff of the BioRuby project can be reached by sending e-mail to <[email protected]>.

bioruby's People

Contributors

Stargazers

Watchers

Forkers

ngoto nakao kwicher pjotrp wwood agrimm antunderwood latvianlinuxgirl fredrikj nuin ktemme srayburn gsoc2010kh jandot benq ktym rjpbonnal chmille4 skwsm ryanlower tomoakin mjy dsilberschmidt mizuy nndegwa sonalhenson maryanne evakaranjah injaci kuriamartin mtakaingara trizah protas gracemariene georgesemango gmichuki princelab stevenbedrick nshota reddyonrails yannis occamsrzr okeefm fogonthedowns syncrou phylogenomics mkj5000 alanjcfs domthu peterjc guniorobot ikix a1aks med1 lomereiter csw zazenergy garethrees travisdoering ehames tripitakit kmamiya bioinformaticsarchive crm114 epictetus doel kumarsaurabh20 bioteam ngopinath kewinwang gdv daisieh dklounge joermungandr homonecloco cartersgenes minoltafan anzaika arrmac pasted jasnow averissimo cooljl31 donandres ilanusse curseoff mikeaddison93 junaruga rwk202 markwilkinson oehs7 kojix2 ramadis randomexecutable kozo2 danielwsink august4056 evolbeginner hwpplayers jaysonvirissimo

bioruby's Issues

Blast::Remote - GenomeNet url path is outdated

Hi,

I just noticed that GenomeNet path variable @ exec_genomenet, appl/blast/genomenet.rb:161 is outdated. The new path is should be /tools-bin/blast.

Cheers,

Joao

Tutorial pages don't display correctlty

The tutorial page says to look at the new tutorial at this url

https://raw.github.com/bioruby/bioruby/master/doc/Tutorial.rd.html

This displays as raw text not as html unfortunately :(

BLAST parsing - bug with long sequences

Bug submitted by Yannick Wurm via Rubyforge on 2008-05-21

Hi,

Just ran into another blast parsing bug.
Using ncbi's blastall 2.2.18, ruby 1.8.6 (2007-03-13 patchlevel 0) [powerpc-darwin] and bio.rb,v 1.88 2007/12/29

The following code works on almost every default blast 2.2.18 output I throw at it:
blastReportPath = ARGV[0]
outputPath = ARGV[1]
print "begin " +blastReportPath + "\n"

File.open(outputPath, "w") do |outputFile|
    outputFile << "hit.target_id" + "\t"+ "report.query_def" + "\t"

"hit.evalue" + "\n"
i = 0
print i.to_s + "\n"
reportsArray = Bio::FlatFile.foreach(blastReportPath) do |report|
firstHit = TRUE
print i.to_s + "\n"
print " " + report.query_def + "\n"
report.each do |hit|
if ((hit.evalue < 1.0e-20) || (firstHit == TRUE))
print " " + hit.target_id + "\n"
outputFile << hit.target_id + "\t"+ report.query_def + "\t" + hit.evalue.to_s +
"\n"
end
firstHit = FALSE
i = i +1
end
end
end

However it hangs on one (very very long) protein sequence, FBgn0086906. When I kill it I get this:
^C/sw/lib/ruby/site_ruby/1.8/bio/appl/blast/format0.rb:267:in format0_parse_query': Interrupt from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast/format0.rb:168:inquery_def'
from /Volumes/Shiva/Users/yannickwurm/ruby/topHitsForQueryFromBlastReport.rb:54
from /sw/lib/ruby/site_ruby/1.8/bio/io/flatfile.rb:520:in foreach' from /sw/lib/ruby/site_ruby/1.8/bio/io/flatfile.rb:655:ineach'
from /sw/lib/ruby/site_ruby/1.8/bio/io/flatfile.rb:519:in foreach' from /sw/lib/ruby/site_ruby/1.8/bio/io/flatfile.rb:481:in_open_file'

If I change the following two lines:

Query= FB|FBgn0086906 symbol:sls
(18,141 letters)

to :

Query= FB|FBgn0086906 symbol:sls
(18141 letters)

Then it works again. So somewhere the "," in the query length is confusing the blast parser. Below you can
see the context in which the "Query definition" lines are found. I've attached the complete blast output file
and ruby script fwiw.

Kind regards,

yannick wurm

TBLASTN 2.2.18 [Mar-02-2008]

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Reference for compositional score matrix adjustment: Altschul, Stephen F.,
John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis,
Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches
using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109.

Query= FB|FBgn0086906 symbol:sls
(18,141 letters)

Database: fourmidable012007
11,864 sequences; 9,098,808 total letters

Searching..................................................done

                                                             Score    E

Sequences producing significant alignments: (bits) Value

SiJWB02BAW2.scf 299 2e-80
SiJWE01BDQ.scf 165 6e-40
SiJWA04CAU2.scf 103 2e-21
SI.CL.6.cl.653.Contig1 91 2e-17

Please update UniProtKB parser

The UniProtKB data format changes in each of its release. Please read recent changes of UniProtKB written in http://www.uniprot.org/docs/sp_news.htm and update Bio::UniProtKB.

Bio::TestPhyloXML_class_methods test failure

Running on OS X 10.7, with ruby2.0 (installed via Fink), tests fail with this error:

[2877/3867] Bio::TestPhyloXML_class_methods#test_new = 0.00 s                                                                                            
  1) Error:
test_new(Bio::TestPhyloXML_class_methods):
ArgumentError: invalid byte sequence in US-ASCII
    /sw/build.build/bioruby-rb20-1.4.3.0001-1/bioruby-1.4.3.0001/lib/bio/db/phyloxml/phyloxml_parser.rb:326:in `=~'
    /sw/build.build/bioruby-rb20-1.4.3.0001-1/bioruby-1.4.3.0001/lib/bio/db/phyloxml/phyloxml_parser.rb:326:in `!~'
    /sw/build.build/bioruby-rb20-1.4.3.0001-1/bioruby-1.4.3.0001/lib/bio/db/phyloxml/phyloxml_parser.rb:326:in `initialize'
    /sw/build.build/bioruby-rb20-1.4.3.0001-1/bioruby-1.4.3.0001/test/unit/bio/db/test_phyloxml.rb:105:in `new'
    /sw/build.build/bioruby-rb20-1.4.3.0001-1/bioruby-1.4.3.0001/test/unit/bio/db/test_phyloxml.rb:105:in `test_new'
    /sw/lib/ruby/2.0/minitest/unit.rb:1301:in `run'
    /sw/lib/ruby/2.0/test/unit/testcase.rb:17:in `run'
    /sw/lib/ruby/2.0/minitest/unit.rb:919:in `block in _run_suite'
    /sw/lib/ruby/2.0/minitest/unit.rb:912:in `map'
    /sw/lib/ruby/2.0/minitest/unit.rb:912:in `_run_suite'
    /sw/lib/ruby/2.0/test/unit.rb:657:in `block in _run_suites'
    /sw/lib/ruby/2.0/test/unit.rb:655:in `each'
    /sw/lib/ruby/2.0/test/unit.rb:655:in `_run_suites'
    /sw/lib/ruby/2.0/minitest/unit.rb:867:in `_run_anything'
    /sw/lib/ruby/2.0/minitest/unit.rb:1060:in `run_tests'
    /sw/lib/ruby/2.0/minitest/unit.rb:1047:in `block in _run'
    /sw/lib/ruby/2.0/minitest/unit.rb:1046:in `each'
    /sw/lib/ruby/2.0/minitest/unit.rb:1046:in `_run'
    /sw/lib/ruby/2.0/minitest/unit.rb:1035:in `run'
    /sw/lib/ruby/2.0/test/unit.rb:21:in `run'
    /sw/lib/ruby/2.0/test/unit.rb:774:in `run'
    /sw/lib/ruby/2.0/test/unit.rb:834:in `run'
    test/runner.rb:36:in `<main>'

This patch fixes it by making sure UTF-8 is used during the test (source: http://wiki.lifesciencedb.jp/mw/index.php/BioRuby ):

--- a/test/unit/bio/db/test_phyloxml.rb
+++ b/test/unit/bio/db/test_phyloxml.rb
@@ -100,6 +100,7 @@ end #end module TestPhyloXMLData
     end

     def test_new
+      Encoding.default_external="UTF-8" 
       str = File.read(TestPhyloXMLData.example_xml)
       assert_instance_of(Bio::PhyloXML::Parser,
                          phyloxml = Bio::PhyloXML::Parser.new(str))

installing soap4r-ruby1.9 generate: uninitialized constant XML::SaxParser during test/runner.rb

As for title:

in bioruby (github, master)

rake test
(in /usr/local/src/bioruby)

ununitialized constant XML::SaxParser (NameError)

I tested both the "official" gem and with more recent mumboe-soap4r.

If I uninstall the gem soap4r-ruby1.9 all tests pass. (so is a soap4r/XML SAX parser problem)

Blast Database XML Entropy Statistics Can Be so small that it is outside the Float range

Submitted by Ben Woodcroft via Rubyforge on 2008-08-29

Using blastxl3 with xml output (see attached), then parsing with bioruby gives this warning:

bio/appl/blast/rexml.rb:70: warning: Float 4.94066e-324 out of range

That number is from the Statistics_entropy value in the XML. That number is out of float range:

Float::MIN
=> 2.2250738585072e-308

The returned float value then becomes 0.0, which is close but wrong in a strict sense.

The same error is given for both edge bioruby:
http://github.com/bioruby/bioruby/commit/85a596da60d0ba0636fdb66e1dbbbd6b16a07a21
and my personal blastxml rexml new format fix branch:
http://github.com/wwood/bioruby/commit/ed03a8f42f64921589cf61884a7953a466e4e60c

Is there a ruby equivalent to the long double which appears to be used in the NCBI blast code?

Rest NCBI

Submitted by Rodrigo Jardim via Rubyforge on 2010-10-14

There are some errors in restncbi.rb in method esearch. The value to step is too much. The NCBI rest just retrive 100
records per time. The loop with 0.step is wrong too. I already build a new code, may I send you?

Thanks

Can not use .getc -> .each_entry for Bio::FlatFile.new() fails thus

require 'bio'

ff = Bio::FlatFile.new(Bio::FastaFormat, 'NC_005213.ffn')

ff.each_entry do |f|
puts "definition : " + f.definition
puts "nalen : " + f.nalen.to_s
puts "naseq : " + f.naseq
end

The above code fails with:

NoMethodError: private method `getc' called for "NC_005213.ffn":String

The official tutorial tells you to use the above code, and as it fails,
the tutorial should be updated:

http://thebird.nl/bioruby/Tutorial.rd.html

This is the part in the Tutorial where it then fails:

"For example, in turn, reading FASTA format files:"

Bug in bio/db/newick.rb

If you try to call reparse() on a newick tree you will get:

NameError: `tree' is not allowed as an instance variable name
    from /usr/local/lib/ruby/gems/1.8/gems/bio-1.4.1/lib/bio/db/newick.rb:346:in `remove_instance_variable'
    from /usr/local/lib/ruby/gems/1.8/gems/bio-1.4.1/lib/bio/db/newick.rb:346:in `reparse'
    from (irb):5
    from /usr/local/lib/ruby/site_ruby/1.8/rubygems.rb:123

The offending code is line 346 of newick.rb as the error states:

def reparse
    remove_instance_variable(:tree)
    self.tree
    self
end

You can clearly see the incorrect parameter being passed to remove_instance_variable(). The method should read:

def reparse
    remove_instance_variable(:@tree)
    self.tree
    self
end

failures of test/functional/bio/io/test_ensembl.rb

Submitted by Naohisa Goto via Rubyforge on 2008-08-31

Three failures in test/functional/bio/io/test_ensembl.rb, during running test/runner.rb.

BioRuby version: git commit ID: e86f8d7

% ruby -v
ruby 1.8.5 (2006-08-25) [i486-linux]
% uname -a
Linux xxx 2.6.18-6-686 #1 SMP Fri Jun 6 22:22:11 UTC 2008 i686 GNU/Linux

Failure:
test_gff_exportview(Bio::FuncTestEnsemblHuman) [./test/functional/bio/io/test_ensembl.rb:95]:
<"4\tEnsembl\tGene\t1148366\t1151952\t.\t+\t1\tgene_id=ENSG00000206158; transcript_id=ENST00000382964;
exon_id=ENSE00001494097; gene_type=KNOWN_protein_coding\n"> expected but was
<"">.
Failure:
test_gff_exportview_with_named_args(Bio::FuncTestEnsemblHuman) [./test/functional/bio/io/test_ensembl.rb:121]:
<"4\tEnsembl\tGene\t1148366\t1151952\t.\t+\t1\tgene_id=ENSG00000206158; transcript_id=ENST00000382964;
exon_id=ENSE00001494097; gene_type=KNOWN_protein_coding\n"> expected but was
<"">.
Failure:
test_tab_exportview_with_named_args(Bio::FuncTestEnsemblHuman) [./test/functional/bio/io/test_ensembl.rb:180]:
<"seqname\tsource\tfeature\tstart\tend\tscore\tstrand\tframe\tgene_id\ttranscript_id\texon_id\tgene_type\n4\tEns
embl\tGene\t1148366\t1151952\t.\t+\t1\tENSG00000206158\tENST00000382964\tENSE00001494097\tKNOWN_protein_coding\n">
expected but was
<"seqname\tsource\tfeature\tstart\tend\tscore\tstrand\tframe\tgene_id\ttranscript_id\texon_id\tgene_type\n"

.

KGML Parser

Hi. I was trying to parse a kgml file but I found out that the coords field (used in the big maps) is not available!

Regards,

João Cardoso

GenBank each_entry, last entry is always nil

Submitted by Raoul on 2008-02-13 at Rubyforge

Reading a generic GenBank FILE, the system returns one entry more than expected

data=Bio::FlatFile.auto("AJ561198.gb")

data.each_entry do |entry|
puts entry.entry_id
end

You get

AJ561198
nil

I think the parser identify the "\n" at the end of the genbank (after
"//\n") and think there is another entry, but it's wrong.
Deleting the last line, works.

(test)

This is test. Please ignore.

BLAST formats

Submitted by Yannick Wurm via Rubyforge on 2008-05-21

NCBI's blastall output format changed once again.

Using reportsArray = Bio::FlatFile.foreach(blastReportPath) do |report|

I can parse blastall-2.2.18's output correctly only if -m 7 -V (xml format; use legacy engine) or if nothing (new engine,
"default text output") is specified. Using -m 7, only a single query/hit is found (and it may be incorrect).
(this is dangerous, since no error message is displayed).

It's due to the fact that "old" blastall output when blasting a multi-entry fasta file against a database
was equal to the sum of several single-entry outputs. (ie the BLAST headers were output once for each query sequence
in the input fasta file). "new" blastall output considers each query sequence as another "iteration"
of blast... the Blast headers are listed only once.

I've attached example output.

I am aware that bioruby is an open-source community project, but the frequency at which bugs like this are encountered
make it very difficult to justify using bioruby in a production environment....

Kind regards,

Yannick Wurm - http://yannick.poulet.org

Bio::ClustalW error output

Would it be possible to include the stderr output from ClustalW as a instance variable for Bio::ClustalW?

Here's what I've come up with but I don't feel confidant in my knowledge of processes and IO.

module Bio
  class ClustalW

    #redifine errorlog with the newly generated stderr output
    def errorlog
      @data_stderr
    end

    #redefine exec_local using call_command_open3 so we can get stderr
    def exec_local(opt)
      @command = [ @program,  *opt ]
      #STDERR.print "DEBUG: ", @command.join(" "), "\n"
      @data_stdout = nil
      @exit_status = nil
      @data_stderr = nil
      Bio::Command.call_command_open3(@command) do |pin,pout,perr|
        @data_stdout = pout.read
        @data_stderr = perr.read
      end
      @exit_status = $?
    end
  end
end

Thanks

Untitled

Hiya,
I have the feeling that http://www.bioruby.org/rdoc/classes/Bio/FlatFile.html is unable to deal with fasta quality files. (It may be a good idea to add support for reading those as well as fastq files that are commonplace when people are using ultra-high-throughput sequencing)
(perl has http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SeqIO/qual.pm )
cheers,
yannick

updating REBASE data

i'd like to update the included REBASE data. does anyone have an objection to this?

currently the source has this page stating the terms:
http://rebase.neb.com/rebase/rebcit.html

Those seeking to distribute REBASE files with their software packages are welcome to do so, providing it is clear to your users that they are not being charged for the REBASE data. It should be transparent that REBASE is a free and independent resource, with the following bibliographical reference:
Roberts, R.J., Vincze, T., Posfai, J., Macelis, D. (2010)
REBASE--a database for DNA restriction and modification: enzymes, genes and genomes.
Nucl. Acids Res. 38: D234-D236. 

could i add that to the LICENSE file?

Bio::Sequence.guess issue

ruby-1.9.2-preview1 > Bio::Sequence.guess("ACGT" )

=> Bio::Sequence::NA

ruby-1.9.2-preview1 > Bio::Sequence.guess("ACGT\n" )

=> Bio::Sequence::AA

whitespace should not affect sequence determination?
and perhaps Bio::Sequence.guess(" ") should throw an error instead of returning AA?

cheers,
yannick

blastplus

NCBI has changed their outputformats yet again :(
It seems that they are unparseable again. Perhaps bioruby should only support xml?

all the best,
yannick

software:
http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
paper:
http://dx.doi.org/10.1186/1471-2105-10-421

FWIW: here's an example output (default format) with blastplus 2.2.24
http://fourmidable.unil.ch/temp/cleanedESTs.Bx.SI223prot.zip

Tutorail code for PubMed search is not functioning

Submitted by Juergen Helmers via Rubyforge on 2009-08-31

The sample code in the tutorial for searching NCBI PubMed is not functional. Same is true for the sample script
pmsearch.rb

the sample script do miss the "require 'rubygem'" statement otherwise the bioruby gem will not be found
BioPubMEd.search does not work, it has to be replaced by BioPubMEd.esearch as recommended in the code itself.
puts Bio::MEDLINE.new(entry).reference.send(form) returns empty objects. One first has to fetch the Pubmed Article
Object as the esearch only returns PubMed ids nothing more.

entries = Bio::PubMed.esearch(ARGV.join(' '))
entries.each do |id|
case form
when 'medline'
puts entry = Bio::PubMed.efetch(id)
else
entry = Bio::PubMed.efetch(id)
puts Bio::MEDLINE.new(entry).reference.send(form)
end
end

Patch file is attached. It would be nice if the code could be updated since new users might be struggling with the false
code.

Keep up the good work!
Cheers Juergen

pubmed.rb

Submitted by Masahide Kikkawa on Rubyforge on 2007-06-21 09:47

Due to the changes of pubmed interface, a method Bio::PubMed.query(pubmed_id) does not work.

Change the following lines

def self.query(id)
host = "www.ncbi.nlm.nih.gov"
path = "/Entrez/query.fcgi?tool=bioruby&cmd=Text&dopt=MEDLINE&db=PubMed&uid="

  path = "sites/entrez?tool=bioruby&cmd=Text&dopt=MEDLINE&db=PubMed&uid="

Bio::MEDLINE#initialize handles multi-line MeSH terms incorrectly

Entrez-delivered MEDLINE records seem to be line-wrapped to 85 columns (for example, see PMID 20146148). This means that some exceptionally long and qualified MeSH headings (e.g., "Motorcycles/classification/legislation & jurisprudence/*statistics & numerical data") don't get parsed properly by MEDLINE#initialize- the parts that got wrapped to a second line get stuck in as separate mesh headings when split up by MEDLINE#mh.

a test post for github issue tracking system

fsfasd

Tree::Bio::Node

Submitted by Nobody via Rubyforge on 2009-06-29

The method 'name' in Tree::Bio::Node class replaces the underscore '_' by space.

E.g.

The newick tree

(A_B, X);

The name of the node becomes "A B" which is inconsistence with what specified by the input and the Bio::Sequence
class.

Bio::Sequence::NA returns Rational not Float

From the bioruby documentation:

s = Bio::Sequence::NA.new('atggcgtga')
puts s.gc_content   #=> 0.555555555555556

But when using ruby 1.9.3 and bioruby 1.4.3 and do the same:

s = Bio::Sequence::NA.new('atggcgtga')
puts s.gc_content   #=> 5/9

also appears to affect other methods that should return float:

puts s.at_content #=> 4/9
puts s.gc_skew #=> 3/5
puts s.at_skew #=> 0/1

Bio::Tree#subtree not behaving as expected

Hi,

Today I noticed this very useful method in bioruby. However, I think it perhaps is not working correctly, (maybe for trees that don't have distance?)

tree = Bio::Newick.new('(A,B,(C,(D,G)H)E)F; ').tree
tree.subtree(%w(A B D G).collect{|s| tree.get_node_by_name(s)}).newick

gives

 => "(\n)A;\n"

The underlying tree is

 => #<Bio::Tree:0x92c18f8 @pathway=#<Bio::Pathway:0x92c18d0 @undirected=true, @relations=[], 
@graph={(Node:"A")=>{}, (Node:"B")=>{}, (Node:"D")=>{}, (Node:"G")=>{}}, 
@index={}, @label={}>, @root=nil, @options={}, @cache_parent={}>

This is using both the bioruby 1.4.2 and the current github master. Have I spotted a bug?

Thanks in advance.
ben

Please reduce your travis-ci.org build matrix

Hi! We are happy to see BioRuby on travis-ci.org and have a little favor to ask for. One of your forks generates 80 or even 100+ runs per build. This is a little but unfair to the rest of travis-ci.org users with Ruby projects because it takes well over an hour to build 100+ rows, for every single push.

I submitted a pull request to reduce the matrix but it was ignored so far. The fork maintainer seems to be a BioRuby org member. If you know how to get in touch with him (her?), please merge that pull request.
Lots of Ruby developers who use travis ci will be very thankful to you.

Thank you. On behalf of the travis-ci.org maintainers team,

MK.

fork() is called on platforms that do not support it

Because jruby is not recognised as being unable to support fork(), it (using master as of March 19 2010) produces the following error:

NotImplementedError: popen("-") is unimplemented

/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/command.rb:245:in `call_command_fork'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/command.rb:153:in `call_command'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/appl/clustalw.rb:209:in `exec_local'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/appl/clustalw.rb:177:in `query_by_filename'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/appl/clustalw.rb:151:in `query_string'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/appl/clustalw.rb:127:in `query_align'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/appl/clustalw.rb:110:in `query'

When "|java" is added to the regular expression on line 150 in lib/bio/command.rb for non-fork supporting platforms, the error goes away. A similar error occurs with "Ruby Installer" for windows, which has an unrecognised platform of "i386-mingw32".

Doi field in Bio::MEDLINE#reference

In Bio::MEDLINE#reference, doi should be filled.

pjotrp@6cedcf1

Bio::RestrictionEnzyme::Analysis performance

I'm using bioruby 1.4.0 and ran into a problem with performance of
Bio::RestrictionEnzyme::Analysis - cutting a 37kbp sequence with a
single enzyme takes more than 5 minutes.

I downloaded this GenBank file to disk:

http://www.i-dcc.org/targ_rep/alleles/5682/escell-clone-genbank-file

...and extracted the first sequence:

gb = Bio::GenBank.open( PATH_TO_GBK_FILE ).next_entry

...then asked for a restriction enzyme analysis for BstEII:

cuts = Bio::RestrictionEnzyme::Analysis.cut( gb.seq, "BstEII", { :view_ranges => true } )

It's that call to cut() that takes 5 minutes; running cut() under RubyProf tells us:

Thread ID: 70368668447160
Total: 384.810000

 %self     total     self     wait    child    calls  name
 54.69    210.44   210.44     0.00     0.00 546320457  Fixnum#== (ruby_runtime:0}
 45.06    383.83   173.41     0.00   210.42   148978  Array#include? (ruby_runtime:0}
  0.11    384.22     0.43     0.00   384.22       33  Array#each (ruby_runtime:0}
[SNIP]

So most of the time was spent in 546,320,457 calls to Fixnum#==. Am I
doing something silly, or is the restriction enzyme analysis algorithm
in need of some optimization?

Bio::RestrictionEnzyme::Analysis.cut_without_permutations() is almost
as slow, so it's not the permutations killing it. Is anyone else using
this module with more success?

flatfile.rb: file format auto-detection fail

On PacBio produced fastq file, the auto-detection failed for the code shown below.

require 'bio'
ff = Bio::FlatFile.new(nil, ARGF)
while fe = ff.next_entry
  puts "#{fe.entry_id}\t#{fe.seq.length}"
end

Because fastq file may have more than one line of nucleotides and there is currently no
format that is identical to the second line but have something different after the second id line.

the regular expression in autodetection.rb

      fastq  = RuleRegexp[ 'Bio::Fastq',
        /^\@.+(?:\r|\r?\n)(?:[^\@\+].*(?:\r|\r?\n))+\+.*(?:\r|\r?\n).+(?:\r|\r?\n)/ ],

might be shortened to

      fastq  = RuleRegexp[ 'Bio::Fastq',
        /^\@.+(?:\r|\r?\n)(?:[^\@\+].*(?:\r|\r?\n))+/ ],

GFF3 functionality is incomplete

Submitted by Jan Aerts via Rubyforge on 2008-07-04

See discussion on mailing list: http://lists.open-bio.org/pipermail/bioruby/2008-June/000653.html

Bio::FastaFormat.query() returns no hits if Bio::FastaFormat.entry() is not called beforehand

Hi,
I think I found a bug in Bio::FastaFormat:
query() calls factory.query(@entry) but @entry is only set upon calling entry() so @entry will be nil if entry() is not called before calling query().
As a result, query() will return no hits because the search is conducted with an empty sequence.

I have created a gist to illustrate the issue:
https://gist.github.com/985332
In my case, the resulting output is:

0
23
[Finished]

Circular require warning

Under Ruby 1.9.2 and later, warnings about circular requires are given if $VERBOSE is set to true:

$ ruby -w -e 'require "bio"; Bio::Sequence::NA.new("atgcatgcaaaa")'
/Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence/compat.rb:15: warning: loading in progress, circular require considered harmful - /Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence.rb
from -e:1:in `<main>'
from /Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence.rb:15:in `<top (required)>'
from /Users/agrimm/.rvm/rubies/ruby-head/lib/ruby/site_ruby/2.0.0/rubygems/custom_require.rb:36:in `require'
from /Users/agrimm/.rvm/rubies/ruby-head/lib/ruby/site_ruby/2.0.0/rubygems/custom_require.rb:36:in `require'
from /Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence/compat.rb:13:in `<top (required)>'
from /Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence/compat.rb:15:in `<module:Bio>'
$ ruby --version
ruby 2.0.0dev (2012-05-05 trunk 35543) [x86_64-darwin10.8.0]

This also occurs in the current version of bioruby in the master branch.

support PDB format version 3.3

Current PDB format version is 3.3 but current BioRuby's Bio::PDB only supports PDB format version 2.x which is obsolete.

In PDB format version 3.3, some columns are expanded (e.g. serNum in SEQRES) and current Bio::PDB fails to parse large PDB entries.

tests using chi2 are randomly failing (rarely, but still)

Hi!

I've noticed that the tests test_randomize_with_hash_equiprobability and test_randomize_equiprobability from test/unit/bio/sequence/test_common.rb are sometimes failing. Running the tests about 460 times, I got 11 failures. I guess it is normal since they involve probabilistic sampling and statistical tests. However, it is a bit disorienting to have tests failing randomly, if the code seems ok.
On Debian, the test suite is run during the build of the package, and a test failure means that the package is not built. We will thus have to disable these tests. Could you provide a mecanism to easily exclude these tests based on randomness out the test suite, by e.g. moving these tests to a particular file, so that one can be sure the tests will pass?

Thanks a lot!

Formatting of sequence features broken

Submitted by Jan Aerts on Rubyforge site on 2008-02-14

When trying to format the features from a Bio::Sequence (using Bio::Sequence#format_features), the output is not what
it should be. Using the following parameters, part of the expected output for AJ224122 should look like this:

FT source 1..3827
FT /organism="Arabidopsis thaliana"
FT /chromosome="3"
FT /cultivar="Wassilewskija"
FT /mol_type="genomic DNA"
FT /db_xref="taxon:3702"
FT mRNA join(1726..1863,2548..3052,3137..3827)
FT /gene="DAG1"
FT /product="DNA-binding protein"
FT /function="transcription factor"
FT /experiment="experimental evidence, no additional details
FT recorded"

However, the observed output is:
FT source 1..3827FT /organism="Arabidopsis thaliana"FT mRNA
join(1726..1863,2548..3052,3137..3827)FT /gene="DAG1"FT CDS
join(1840..1863,2548..3052,3137..3498)FT /gene="DAG1"FT exon 1726..1863FT
/gene="DAG1"FT intron 1864..2547FT /gene="DAG1"FT exon
2548..3052FT /gene="DAG1"FT intron 3053..3136FT
/gene="DAG1"FT exon 3137..3495FT /gene="DAG1"

Circular require warning for compat.rb

Doing

require "bio/sequence/compat"

will cause a "circular require considered harmful" warning when warnings are on for recent versions of Ruby.

The following will reproduce the warning for the git repo, if you're in the (git root)/lib directory:

$ ruby --disable-gems -w
$: << "."
require "bio/sequence/compat"
[snip]/sandbox/bioruby/lib/bio/sequence.rb:77: warning: loading in progress, circular require considered harmful - [snip]/sandbox/bioruby/lib/bio/sequence/compat.rb
    from -:2:in `<main>'
    from -:2:in `require'
    from [snip]/sandbox/bioruby/lib/bio/sequence/compat.rb:10:in `<top (required)>'
    from [snip]/sandbox/bioruby/lib/bio/sequence/compat.rb:12:in `<module:Bio>'
    from [snip]/sandbox/bioruby/lib/bio/sequence/compat.rb:12:in `require'
    from [snip]/sandbox/bioruby/lib/bio/sequence.rb:13:in `<top (required)>'
    from [snip]/sandbox/bioruby/lib/bio/sequence.rb:62:in `<module:Bio>'
    from [snip]/sandbox/bioruby/lib/bio/sequence.rb:77:in `<class:Sequence>'
    from [snip]/sandbox/bioruby/lib/bio/sequence.rb:77:in `require'
$ ruby --version
ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-linux]

If I'm not supposed to be requiring only part of bioruby, let me know.

I'm currently doing this so that I can keep track of what parts of bioruby I'm using in which parts of my program.

Bio::Pubmed returning no results for searches

Pubmed is returning 301 Permentant Redirects for all requests from the Bio::PubMed library.

The path in Bio::PubMed is wrong, '/sites/entrez' should be just '/pubmed' now.

This is an easy fix, which I'll do now.

However it raises the question of whether we should be automatically following redirects.

Provide class methods for common actions?

BioRuby.new.gc_content # => "70.000000"

BioRuby.gc_content
NoMethodError: undefined method `gc_content' for BioRuby:Class

Perhaps we could add class methods for BioRuby? For certain class-methods like .gc_content and so on?

Remote Blast Fails

Hi,
when I execute the sample code from http://bioruby.open-bio.org/rdoc/classes/Bio/Blast.html I get the following error:

/Users/philipp/.rvm/gems/ruby-1.9.2-p180/gems/bio-1.4.1/lib/bio/appl/blast/genomenet.rb:240:in `exec_genomenet': cannot understand response (RuntimeError)
    from /Users/philipp/.rvm/gems/ruby-1.9.2-p180/gems/bio-1.4.1/lib/bio/appl/blast.rb:368:in `query'
    from blast_test.rb:12:in `<main>'

My script looks like this:

require 'rubygems'
require 'bio'

seq = Bio::Sequence::AA.new('MFRTKRSALVRRLWRSRAPGGEDEEEGAGGGGGGGELRGE')

# To run an actual BLAST analysis:

#1. create a BLAST factory
remote_blast_factory = Bio::Blast.remote('blastp', 'SWISS', '-e 0.0001', 'genomenet')

#2. run the actual BLAST by querying the factory
report = remote_blast_factory.query(seq)

remote BLAST not working

Using Ruby 1.9.2 and bio (HEAD):

seq = Bio::Sequence::AA.new('MFRTKRSALVRRLWRSRAPGGEDEEEGAGGGGGGGELRGE')

blast = Bio::Blast.remote 'blastp', 'swissprot', '-e 0.0001', 'genomenet'

blast.query(seq)

produces:

/Users/audy/.rvm/gems/ruby-1.9.2-p290/bundler/gems/bioruby-c552aa3a6773/lib/bio/appl/blast/genomenet.rb:251:in `exec_genomenet': cannot understand response (RuntimeError)
    from /Users/audy/.rvm/gems/ruby-1.9.2-p290/bundler/gems/bioruby-c552aa3a6773/lib/bio/appl/blast.rb:368:in `query'
    from ./phone_blast.rb:10:in `<main>'

Bug - Bio::Fetch.query

require 'bio'
entry = Bio::Fetch.query('hal', 'VNG1467G')

OpenURI::HTTPError: 404 Not Found
from /usr/lib/ruby/1.8/open-uri.rb:277:in open_http' from /usr/lib/ruby/1.8/open-uri.rb:616:inbuffer_open'
from /usr/lib/ruby/1.8/open-uri.rb:164:in open_loop' from /usr/lib/ruby/1.8/open-uri.rb:162:incatch'
from /usr/lib/ruby/1.8/open-uri.rb:162:in open_loop' from /usr/lib/ruby/1.8/open-uri.rb:132:inopen_uri'
from /usr/lib/ruby/site_ruby/1.8/bio/command.rb:625:in read_uri' from /usr/lib/ruby/site_ruby/1.8/bio/io/fetch.rb:183:in_get'
from /usr/lib/ruby/site_ruby/1.8/bio/io/fetch.rb:111:in fetch' from /usr/lib/ruby/site_ruby/1.8/bio/io/fetch.rb:128:inquery'
from (irb):12

Hmm. Not sure where the error is. But it would be nice if OpenURI::HTTPError: 404 Not Found
errors could feedback the URL to the user, so that he can easily check manually.

Right now I have no idea what is going on.

Bio::Reference lib/bio/reference.rb url hash code error

For pulls from pubmed without a valid URL, converting to endnote (and possibly other formats) will fail. The code at fault is line 145 of lib/bio/reference.rb

  @url      = hash['url']

should be

  @url      = hash['url'] || ''

Warren

Genbank Support

I'm working with @catfeet to write a Blast pipeline.

The tool used at Cardiff University is Nucleotide BLAST with the nr/nt database from Genbank.

It seems like the only options with bioruby are genomenet and ddbj. However, genomenet.rbreferences http://www.ncbi.nlm.nih.gov/blast/ in the notes.

Basically we want to be able to do:

blast = Bio::Blast.remote 'blastn', 'nr-nt', '-e 0.05 -m 8', 'genbank'

Does this mean I'll have to write a Bio::Blast::Remote::Genbank module to receive output from that tool?

test_cut_symbol fails because of uninitialized constant Bio::RestrictionEnzyme::CutSymbol

Hi,

When building Debian packages of bioruby, the test suite is run. The test test_cut_symbol.rb is failing because constant Bio::RestrictionEnzyme::CutSymbol is not initialized. I guess that there may be a problem with the way bio/util/restriction_enzyme/cut_symbol is required by this test. Requiring instead bio/util/restriction_enzyme would ensure that everything is well defined (cut_symbol is then automatically loaded).

Here is the patch applied in Debian to solve this issue:

--- a/test/unit/bio/util/restriction_enzyme/test_cut_symbol.rb
+++ b/test/unit/bio/util/restriction_enzyme/test_cut_symbol.rb
@@ -15,7 +15,8 @@

 # libraries needed for the tests
 require 'test/unit'
-require 'bio/util/restriction_enzyme/cut_symbol'
+require 'bio/util/restriction_enzyme'
+#require 'bio/util/restriction_enzyme/cut_symbol'

 module Bio; module TestRestrictionEnzyme #:nodoc:

Equivalent blast parsing approaches aren't

Submitted by Yannick Wurm via Rubyforge on 2009-08-09

Hello,

to parse a blast file, only the 3rd method I tried actually worked. For newcomers it can be quite disappointing

But trying to get the same results using the following approaches always led to crashes:
Bio::FlatFile.open(Bio::Blast::Default::Report,path) do |ff|
ff.each do |report|
...

or:
Bio::Blast.reports(path) do |report|
...

Partially, it looks like ruby is going into the wrong parser. Eg for the latter:
/sw/lib/ruby/site_ruby/1.8/bio/appl/blast.rb:402: warning: useless use of :: in void context
/sw/lib/ruby/site_ruby/1.8/bio/appl/blast.rb:265: warning: method redefined; discarding old server=
/sw/lib/ruby/site_ruby/1.8/bio/appl/blast/format8.rb:70:in tab_parse_hsp': undefined methodstrip' for nil:NilClass
(NoMethodError)

My blast output here is -m 0. But I reports weren't being parser properly with -m7 or -m8 either.
Is bioruby trying to support too many blast output formats? It could be helpful to document in the blast rdoc which
blast versions and output parameters ruby was tested on.

(my blast output here was generated with -p tblastx -v 1 -b 1 -e 1.0e-4 -m 0 -V T in blast-2.2.15 (but also tried 2.2.10
and 2.2.18)).

ruby 1.8.6 (2007-03-13 patchlevel 0) [powerpc-darwin]
bioruby 1.3.0

test issue

This is a test if the contributors get an email.

BioRuby Wiki seems to be down

shevy: do you know why bioruby's wiki doesnt work? http://bioruby.open-bio.org/wiki/
no idea, we have to ask ngoto when he is back
or we could file a bug report on github :)
shall I file one?
yeah
I think it's been down few weeks maybe
ok

Anyone knows why BioRuby Wiki is down and how to repair it?

The error we get is:

(Can't contact the database server: Unknown database 'biorubywikidb' (localhost))

Possibly the database entry has been removed or something like that?

bioruby / bioruby Goto Github PK

bioruby's Introduction

README.rdoc - README for BioRuby¶ ↑

BioRuby¶ ↑

FOR MORE INFORMATION¶ ↑

Documents in this distribution¶ ↑

Release notes, important changes and issues¶ ↑

Tutorials and other useful information¶ ↑

BioRuby development¶ ↑

Documents written in Japanese¶ ↑

Sample codes¶ ↑

WWW¶ ↑

WHERE TO OBTAIN¶ ↑

WWW¶ ↑

RubyGems¶ ↑

git¶ ↑

REQUIREMENTS¶ ↑

OPTIONAL REQUIREMENTS¶ ↑

INSTALL¶ ↑

INSTALL by using RubyGems (recommended)¶ ↑

Running self-test¶ ↑

SETUP¶ ↑

USAGE¶ ↑

PLUGIN (Biogem)¶ ↑

Recommended Plugins (gems)¶ ↑

LICENSE¶ ↑

REFERENCE¶ ↑

CONTACT¶ ↑

bioruby's People

Contributors

Stargazers

Watchers

Forkers

bioruby's Issues

yannick wurm

Recommend Projects

Recommend Topics

Recommend Org