bioruby / bioruby Goto Github PK

View Code? Open in Web Editor NEW

376.0 376.0 109.0 5.77 MB

bioruby

Home Page: http://bioruby.open-bio.org

License: Other

Ruby 99.81% Shell 0.01% Gnuplot 0.07% Perl 0.01% Parrot 0.01% HTML 0.11% Batchfile 0.01%

bioruby's Issues

Can not use .getc -> .each_entry for Bio::FlatFile.new() fails thus

require 'bio'

ff = Bio::FlatFile.new(Bio::FastaFormat, 'NC_005213.ffn')

ff.each_entry do |f|
puts "definition : " + f.definition
puts "nalen : " + f.nalen.to_s
puts "naseq : " + f.naseq
end

The above code fails with:

NoMethodError: private method `getc' called for "NC_005213.ffn":String

The official tutorial tells you to use the above code, and as it fails,
the tutorial should be updated:

http://thebird.nl/bioruby/Tutorial.rd.html

This is the part in the Tutorial where it then fails:

"For example, in turn, reading FASTA format files:"

Circular require warning

Under Ruby 1.9.2 and later, warnings about circular requires are given if $VERBOSE is set to true:

$ ruby -w -e 'require "bio"; Bio::Sequence::NA.new("atgcatgcaaaa")'
/Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence/compat.rb:15: warning: loading in progress, circular require considered harmful - /Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence.rb
from -e:1:in `<main>'
from /Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence.rb:15:in `<top (required)>'
from /Users/agrimm/.rvm/rubies/ruby-head/lib/ruby/site_ruby/2.0.0/rubygems/custom_require.rb:36:in `require'
from /Users/agrimm/.rvm/rubies/ruby-head/lib/ruby/site_ruby/2.0.0/rubygems/custom_require.rb:36:in `require'
from /Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence/compat.rb:13:in `<top (required)>'
from /Users/agrimm/.rvm/gems/ruby-head/gems/bio-1.4.2/lib/bio/sequence/compat.rb:15:in `<module:Bio>'
$ ruby --version
ruby 2.0.0dev (2012-05-05 trunk 35543) [x86_64-darwin10.8.0]

This also occurs in the current version of bioruby in the master branch.

Equivalent blast parsing approaches aren't

Submitted by Yannick Wurm via Rubyforge on 2009-08-09

Hello,

to parse a blast file, only the 3rd method I tried actually worked. For newcomers it can be quite disappointing

But trying to get the same results using the following approaches always led to crashes:
Bio::FlatFile.open(Bio::Blast::Default::Report,path) do |ff|
ff.each do |report|
...

or:
Bio::Blast.reports(path) do |report|
...

Partially, it looks like ruby is going into the wrong parser. Eg for the latter:
/sw/lib/ruby/site_ruby/1.8/bio/appl/blast.rb:402: warning: useless use of :: in void context
/sw/lib/ruby/site_ruby/1.8/bio/appl/blast.rb:265: warning: method redefined; discarding old server=
/sw/lib/ruby/site_ruby/1.8/bio/appl/blast/format8.rb:70:in tab_parse_hsp': undefined methodstrip' for nil:NilClass
(NoMethodError)

My blast output here is -m 0. But I reports weren't being parser properly with -m7 or -m8 either.
Is bioruby trying to support too many blast output formats? It could be helpful to document in the blast rdoc which
blast versions and output parameters ruby was tested on.

(my blast output here was generated with -p tblastx -v 1 -b 1 -e 1.0e-4 -m 0 -V T in blast-2.2.15 (but also tried 2.2.10
and 2.2.18)).

ruby 1.8.6 (2007-03-13 patchlevel 0) [powerpc-darwin]
bioruby 1.3.0

flatfile.rb: file format auto-detection fail

On PacBio produced fastq file, the auto-detection failed for the code shown below.

require 'bio'
ff = Bio::FlatFile.new(nil, ARGF)
while fe = ff.next_entry
  puts "#{fe.entry_id}\t#{fe.seq.length}"
end

Because fastq file may have more than one line of nucleotides and there is currently no
format that is identical to the second line but have something different after the second id line.

the regular expression in autodetection.rb

      fastq  = RuleRegexp[ 'Bio::Fastq',
        /^\@.+(?:\r|\r?\n)(?:[^\@\+].*(?:\r|\r?\n))+\+.*(?:\r|\r?\n).+(?:\r|\r?\n)/ ],

might be shortened to

      fastq  = RuleRegexp[ 'Bio::Fastq',
        /^\@.+(?:\r|\r?\n)(?:[^\@\+].*(?:\r|\r?\n))+/ ],

tests using chi2 are randomly failing (rarely, but still)

Hi!

I've noticed that the tests test_randomize_with_hash_equiprobability and test_randomize_equiprobability from test/unit/bio/sequence/test_common.rb are sometimes failing. Running the tests about 460 times, I got 11 failures. I guess it is normal since they involve probabilistic sampling and statistical tests. However, it is a bit disorienting to have tests failing randomly, if the code seems ok.
On Debian, the test suite is run during the build of the package, and a test failure means that the package is not built. We will thus have to disable these tests. Could you provide a mecanism to easily exclude these tests based on randomness out the test suite, by e.g. moving these tests to a particular file, so that one can be sure the tests will pass?

Thanks a lot!

Circular require warning for compat.rb

Doing

require "bio/sequence/compat"

will cause a "circular require considered harmful" warning when warnings are on for recent versions of Ruby.

The following will reproduce the warning for the git repo, if you're in the (git root)/lib directory:

$ ruby --disable-gems -w
$: << "."
require "bio/sequence/compat"
[snip]/sandbox/bioruby/lib/bio/sequence.rb:77: warning: loading in progress, circular require considered harmful - [snip]/sandbox/bioruby/lib/bio/sequence/compat.rb
    from -:2:in `<main>'
    from -:2:in `require'
    from [snip]/sandbox/bioruby/lib/bio/sequence/compat.rb:10:in `<top (required)>'
    from [snip]/sandbox/bioruby/lib/bio/sequence/compat.rb:12:in `<module:Bio>'
    from [snip]/sandbox/bioruby/lib/bio/sequence/compat.rb:12:in `require'
    from [snip]/sandbox/bioruby/lib/bio/sequence.rb:13:in `<top (required)>'
    from [snip]/sandbox/bioruby/lib/bio/sequence.rb:62:in `<module:Bio>'
    from [snip]/sandbox/bioruby/lib/bio/sequence.rb:77:in `<class:Sequence>'
    from [snip]/sandbox/bioruby/lib/bio/sequence.rb:77:in `require'
$ ruby --version
ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-linux]

If I'm not supposed to be requiring only part of bioruby, let me know.

I'm currently doing this so that I can keep track of what parts of bioruby I'm using in which parts of my program.

Bio::FastaFormat.query() returns no hits if Bio::FastaFormat.entry() is not called beforehand

Hi,
I think I found a bug in Bio::FastaFormat:
query() calls factory.query(@entry) but @entry is only set upon calling entry() so @entry will be nil if entry() is not called before calling query().
As a result, query() will return no hits because the search is conducted with an empty sequence.

I have created a gist to illustrate the issue:
https://gist.github.com/985332
In my case, the resulting output is:

0
23
[Finished]

Blast::Remote - GenomeNet url path is outdated

Hi,

I just noticed that GenomeNet path variable @ exec_genomenet, appl/blast/genomenet.rb:161 is outdated. The new path is should be /tools-bin/blast.

Cheers,

Joao

Please update UniProtKB parser

The UniProtKB data format changes in each of its release. Please read recent changes of UniProtKB written in http://www.uniprot.org/docs/sp_news.htm and update Bio::UniProtKB.

BLAST formats

Submitted by Yannick Wurm via Rubyforge on 2008-05-21

NCBI's blastall output format changed once again.

Using reportsArray = Bio::FlatFile.foreach(blastReportPath) do |report|

I can parse blastall-2.2.18's output correctly only if -m 7 -V (xml format; use legacy engine) or if nothing (new engine,
"default text output") is specified. Using -m 7, only a single query/hit is found (and it may be incorrect).
(this is dangerous, since no error message is displayed).

It's due to the fact that "old" blastall output when blasting a multi-entry fasta file against a database
was equal to the sum of several single-entry outputs. (ie the BLAST headers were output once for each query sequence
in the input fasta file). "new" blastall output considers each query sequence as another "iteration"
of blast... the Blast headers are listed only once.

I've attached example output.

I am aware that bioruby is an open-source community project, but the frequency at which bugs like this are encountered
make it very difficult to justify using bioruby in a production environment....

Kind regards,

Yannick Wurm - http://yannick.poulet.org

failures of test/functional/bio/io/test_ensembl.rb

Submitted by Naohisa Goto via Rubyforge on 2008-08-31

Three failures in test/functional/bio/io/test_ensembl.rb, during running test/runner.rb.

BioRuby version: git commit ID: e86f8d7

% ruby -v
ruby 1.8.5 (2006-08-25) [i486-linux]
% uname -a
Linux xxx 2.6.18-6-686 #1 SMP Fri Jun 6 22:22:11 UTC 2008 i686 GNU/Linux

Failure:
test_gff_exportview(Bio::FuncTestEnsemblHuman) [./test/functional/bio/io/test_ensembl.rb:95]:
<"4\tEnsembl\tGene\t1148366\t1151952\t.\t+\t1\tgene_id=ENSG00000206158; transcript_id=ENST00000382964;
exon_id=ENSE00001494097; gene_type=KNOWN_protein_coding\n"> expected but was
<"">.
Failure:
test_gff_exportview_with_named_args(Bio::FuncTestEnsemblHuman) [./test/functional/bio/io/test_ensembl.rb:121]:
<"4\tEnsembl\tGene\t1148366\t1151952\t.\t+\t1\tgene_id=ENSG00000206158; transcript_id=ENST00000382964;
exon_id=ENSE00001494097; gene_type=KNOWN_protein_coding\n"> expected but was
<"">.
Failure:
test_tab_exportview_with_named_args(Bio::FuncTestEnsemblHuman) [./test/functional/bio/io/test_ensembl.rb:180]:
<"seqname\tsource\tfeature\tstart\tend\tscore\tstrand\tframe\tgene_id\ttranscript_id\texon_id\tgene_type\n4\tEns
embl\tGene\t1148366\t1151952\t.\t+\t1\tENSG00000206158\tENST00000382964\tENSE00001494097\tKNOWN_protein_coding\n">
expected but was
<"seqname\tsource\tfeature\tstart\tend\tscore\tstrand\tframe\tgene_id\ttranscript_id\texon_id\tgene_type\n"

.

Formatting of sequence features broken

Submitted by Jan Aerts on Rubyforge site on 2008-02-14

When trying to format the features from a Bio::Sequence (using Bio::Sequence#format_features), the output is not what
it should be. Using the following parameters, part of the expected output for AJ224122 should look like this:

FT source 1..3827
FT /organism="Arabidopsis thaliana"
FT /chromosome="3"
FT /cultivar="Wassilewskija"
FT /mol_type="genomic DNA"
FT /db_xref="taxon:3702"
FT mRNA join(1726..1863,2548..3052,3137..3827)
FT /gene="DAG1"
FT /product="DNA-binding protein"
FT /function="transcription factor"
FT /experiment="experimental evidence, no additional details
FT recorded"

However, the observed output is:
FT source 1..3827FT /organism="Arabidopsis thaliana"FT mRNA
join(1726..1863,2548..3052,3137..3827)FT /gene="DAG1"FT CDS
join(1840..1863,2548..3052,3137..3498)FT /gene="DAG1"FT exon 1726..1863FT
/gene="DAG1"FT intron 1864..2547FT /gene="DAG1"FT exon
2548..3052FT /gene="DAG1"FT intron 3053..3136FT
/gene="DAG1"FT exon 3137..3495FT /gene="DAG1"

Bio::Reference lib/bio/reference.rb url hash code error

For pulls from pubmed without a valid URL, converting to endnote (and possibly other formats) will fail. The code at fault is line 145 of lib/bio/reference.rb

  @url      = hash['url']

should be

  @url      = hash['url'] || ''

Warren

Rest NCBI

Submitted by Rodrigo Jardim via Rubyforge on 2010-10-14

There are some errors in restncbi.rb in method esearch. The value to step is too much. The NCBI rest just retrive 100
records per time. The loop with 0.step is wrong too. I already build a new code, may I send you?

Thanks

Tutorial pages don't display correctlty

The tutorial page says to look at the new tutorial at this url

https://raw.github.com/bioruby/bioruby/master/doc/Tutorial.rd.html

This displays as raw text not as html unfortunately :(

(test)

This is test. Please ignore.

a test post for github issue tracking system

fsfasd

Untitled

Hiya,
I have the feeling that http://www.bioruby.org/rdoc/classes/Bio/FlatFile.html is unable to deal with fasta quality files. (It may be a good idea to add support for reading those as well as fastq files that are commonplace when people are using ultra-high-throughput sequencing)
(perl has http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SeqIO/qual.pm )
cheers,
yannick

Please reduce your travis-ci.org build matrix

Hi! We are happy to see BioRuby on travis-ci.org and have a little favor to ask for. One of your forks generates 80 or even 100+ runs per build. This is a little but unfair to the rest of travis-ci.org users with Ruby projects because it takes well over an hour to build 100+ rows, for every single push.

I submitted a pull request to reduce the matrix but it was ignored so far. The fork maintainer seems to be a BioRuby org member. If you know how to get in touch with him (her?), please merge that pull request.
Lots of Ruby developers who use travis ci will be very thankful to you.

Thank you. On behalf of the travis-ci.org maintainers team,

MK.

Bug - Bio::Fetch.query

require 'bio'
entry = Bio::Fetch.query('hal', 'VNG1467G')

OpenURI::HTTPError: 404 Not Found
from /usr/lib/ruby/1.8/open-uri.rb:277:in open_http' from /usr/lib/ruby/1.8/open-uri.rb:616:inbuffer_open'
from /usr/lib/ruby/1.8/open-uri.rb:164:in open_loop' from /usr/lib/ruby/1.8/open-uri.rb:162:incatch'
from /usr/lib/ruby/1.8/open-uri.rb:162:in open_loop' from /usr/lib/ruby/1.8/open-uri.rb:132:inopen_uri'
from /usr/lib/ruby/site_ruby/1.8/bio/command.rb:625:in read_uri' from /usr/lib/ruby/site_ruby/1.8/bio/io/fetch.rb:183:in_get'
from /usr/lib/ruby/site_ruby/1.8/bio/io/fetch.rb:111:in fetch' from /usr/lib/ruby/site_ruby/1.8/bio/io/fetch.rb:128:inquery'
from (irb):12

Hmm. Not sure where the error is. But it would be nice if OpenURI::HTTPError: 404 Not Found
errors could feedback the URL to the user, so that he can easily check manually.

Right now I have no idea what is going on.

updating REBASE data

i'd like to update the included REBASE data. does anyone have an objection to this?

currently the source has this page stating the terms:
http://rebase.neb.com/rebase/rebcit.html

Those seeking to distribute REBASE files with their software packages are welcome to do so, providing it is clear to your users that they are not being charged for the REBASE data. It should be transparent that REBASE is a free and independent resource, with the following bibliographical reference:
Roberts, R.J., Vincze, T., Posfai, J., Macelis, D. (2010)
REBASE--a database for DNA restriction and modification: enzymes, genes and genomes.
Nucl. Acids Res. 38: D234-D236. 

could i add that to the LICENSE file?

blastplus

NCBI has changed their outputformats yet again :(
It seems that they are unparseable again. Perhaps bioruby should only support xml?

all the best,
yannick

software:
http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
paper:
http://dx.doi.org/10.1186/1471-2105-10-421

FWIW: here's an example output (default format) with blastplus 2.2.24
http://fourmidable.unil.ch/temp/cleanedESTs.Bx.SI223prot.zip

fork() is called on platforms that do not support it

Because jruby is not recognised as being unable to support fork(), it (using master as of March 19 2010) produces the following error:

NotImplementedError: popen("-") is unimplemented

/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/command.rb:245:in `call_command_fork'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/command.rb:153:in `call_command'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/appl/clustalw.rb:209:in `exec_local'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/appl/clustalw.rb:177:in `query_by_filename'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/appl/clustalw.rb:151:in `query_string'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/appl/clustalw.rb:127:in `query_align'
/Users/agrimm/ruby/jruby/jruby-1.4.0/lib/ruby/gems/1.8/gems/bio-1.4.0.5000/lib/bio/appl/clustalw.rb:110:in `query'

When "|java" is added to the regular expression on line 150 in lib/bio/command.rb for non-fork supporting platforms, the error goes away. A similar error occurs with "Ruby Installer" for windows, which has an unrecognised platform of "i386-mingw32".

remote BLAST not working

Using Ruby 1.9.2 and bio (HEAD):

seq = Bio::Sequence::AA.new('MFRTKRSALVRRLWRSRAPGGEDEEEGAGGGGGGGELRGE')

blast = Bio::Blast.remote 'blastp', 'swissprot', '-e 0.0001', 'genomenet'

blast.query(seq)

produces:

/Users/audy/.rvm/gems/ruby-1.9.2-p290/bundler/gems/bioruby-c552aa3a6773/lib/bio/appl/blast/genomenet.rb:251:in `exec_genomenet': cannot understand response (RuntimeError)
    from /Users/audy/.rvm/gems/ruby-1.9.2-p290/bundler/gems/bioruby-c552aa3a6773/lib/bio/appl/blast.rb:368:in `query'
    from ./phone_blast.rb:10:in `<main>'

test issue

This is a test if the contributors get an email.

Doi field in Bio::MEDLINE#reference

In Bio::MEDLINE#reference, doi should be filled.

pjotrp@6cedcf1

Bio::RestrictionEnzyme::Analysis performance

I'm using bioruby 1.4.0 and ran into a problem with performance of
Bio::RestrictionEnzyme::Analysis - cutting a 37kbp sequence with a
single enzyme takes more than 5 minutes.

I downloaded this GenBank file to disk:

http://www.i-dcc.org/targ_rep/alleles/5682/escell-clone-genbank-file

...and extracted the first sequence:

gb = Bio::GenBank.open( PATH_TO_GBK_FILE ).next_entry

...then asked for a restriction enzyme analysis for BstEII:

cuts = Bio::RestrictionEnzyme::Analysis.cut( gb.seq, "BstEII", { :view_ranges => true } )

It's that call to cut() that takes 5 minutes; running cut() under RubyProf tells us:

Thread ID: 70368668447160
Total: 384.810000

 %self     total     self     wait    child    calls  name
 54.69    210.44   210.44     0.00     0.00 546320457  Fixnum#== (ruby_runtime:0}
 45.06    383.83   173.41     0.00   210.42   148978  Array#include? (ruby_runtime:0}
  0.11    384.22     0.43     0.00   384.22       33  Array#each (ruby_runtime:0}
[SNIP]

So most of the time was spent in 546,320,457 calls to Fixnum#==. Am I
doing something silly, or is the restriction enzyme analysis algorithm
in need of some optimization?

Bio::RestrictionEnzyme::Analysis.cut_without_permutations() is almost
as slow, so it's not the permutations killing it. Is anyone else using
this module with more success?

GenBank each_entry, last entry is always nil

Submitted by Raoul on 2008-02-13 at Rubyforge

Reading a generic GenBank FILE, the system returns one entry more than expected

data=Bio::FlatFile.auto("AJ561198.gb")

data.each_entry do |entry|
puts entry.entry_id
end

You get

AJ561198
nil

I think the parser identify the "\n" at the end of the genbank (after
"//\n") and think there is another entry, but it's wrong.
Deleting the last line, works.

Bio::Sequence.guess issue

ruby-1.9.2-preview1 > Bio::Sequence.guess("ACGT" )

=> Bio::Sequence::NA

ruby-1.9.2-preview1 > Bio::Sequence.guess("ACGT\n" )

=> Bio::Sequence::AA

whitespace should not affect sequence determination?
and perhaps Bio::Sequence.guess(" ") should throw an error instead of returning AA?

cheers,
yannick

Blast Database XML Entropy Statistics Can Be so small that it is outside the Float range

Submitted by Ben Woodcroft via Rubyforge on 2008-08-29

Using blastxl3 with xml output (see attached), then parsing with bioruby gives this warning:

bio/appl/blast/rexml.rb:70: warning: Float 4.94066e-324 out of range

That number is from the Statistics_entropy value in the XML. That number is out of float range:

Float::MIN
=> 2.2250738585072e-308

The returned float value then becomes 0.0, which is close but wrong in a strict sense.

The same error is given for both edge bioruby:
http://github.com/bioruby/bioruby/commit/85a596da60d0ba0636fdb66e1dbbbd6b16a07a21
and my personal blastxml rexml new format fix branch:
http://github.com/wwood/bioruby/commit/ed03a8f42f64921589cf61884a7953a466e4e60c

Is there a ruby equivalent to the long double which appears to be used in the NCBI blast code?

KGML Parser

Hi. I was trying to parse a kgml file but I found out that the coords field (used in the big maps) is not available!

Regards,

João Cardoso

Genbank Support

I'm working with @catfeet to write a Blast pipeline.

The tool used at Cardiff University is Nucleotide BLAST with the nr/nt database from Genbank.

It seems like the only options with bioruby are genomenet and ddbj. However, genomenet.rbreferences http://www.ncbi.nlm.nih.gov/blast/ in the notes.

Basically we want to be able to do:

blast = Bio::Blast.remote 'blastn', 'nr-nt', '-e 0.05 -m 8', 'genbank'

Does this mean I'll have to write a Bio::Blast::Remote::Genbank module to receive output from that tool?

BLAST parsing - bug with long sequences

Bug submitted by Yannick Wurm via Rubyforge on 2008-05-21

Hi,

Just ran into another blast parsing bug.
Using ncbi's blastall 2.2.18, ruby 1.8.6 (2007-03-13 patchlevel 0) [powerpc-darwin] and bio.rb,v 1.88 2007/12/29

The following code works on almost every default blast 2.2.18 output I throw at it:
blastReportPath = ARGV[0]
outputPath = ARGV[1]
print "begin " +blastReportPath + "\n"

File.open(outputPath, "w") do |outputFile|
    outputFile << "hit.target_id" + "\t"+ "report.query_def" + "\t"

"hit.evalue" + "\n"
i = 0
print i.to_s + "\n"
reportsArray = Bio::FlatFile.foreach(blastReportPath) do |report|
firstHit = TRUE
print i.to_s + "\n"
print " " + report.query_def + "\n"
report.each do |hit|
if ((hit.evalue < 1.0e-20) || (firstHit == TRUE))
print " " + hit.target_id + "\n"
outputFile << hit.target_id + "\t"+ report.query_def + "\t" + hit.evalue.to_s +
"\n"
end
firstHit = FALSE
i = i +1
end
end
end

However it hangs on one (very very long) protein sequence, FBgn0086906. When I kill it I get this:
^C/sw/lib/ruby/site_ruby/1.8/bio/appl/blast/format0.rb:267:in format0_parse_query': Interrupt from /sw/lib/ruby/site_ruby/1.8/bio/appl/blast/format0.rb:168:inquery_def'
from /Volumes/Shiva/Users/yannickwurm/ruby/topHitsForQueryFromBlastReport.rb:54
from /sw/lib/ruby/site_ruby/1.8/bio/io/flatfile.rb:520:in foreach' from /sw/lib/ruby/site_ruby/1.8/bio/io/flatfile.rb:655:ineach'
from /sw/lib/ruby/site_ruby/1.8/bio/io/flatfile.rb:519:in foreach' from /sw/lib/ruby/site_ruby/1.8/bio/io/flatfile.rb:481:in_open_file'

If I change the following two lines:

Query= FB|FBgn0086906 symbol:sls
(18,141 letters)

to :

Query= FB|FBgn0086906 symbol:sls
(18141 letters)

Then it works again. So somewhere the "," in the query length is confusing the blast parser. Below you can
see the context in which the "Query definition" lines are found. I've attached the complete blast output file
and ruby script fwiw.

Kind regards,

yannick wurm

TBLASTN 2.2.18 [Mar-02-2008]

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.

Reference for compositional score matrix adjustment: Altschul, Stephen F.,
John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis,
Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches
using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109.

Query= FB|FBgn0086906 symbol:sls
(18,141 letters)

Database: fourmidable012007
11,864 sequences; 9,098,808 total letters

Searching..................................................done

                                                             Score    E

Sequences producing significant alignments: (bits) Value

SiJWB02BAW2.scf 299 2e-80
SiJWE01BDQ.scf 165 6e-40
SiJWA04CAU2.scf 103 2e-21
SI.CL.6.cl.653.Contig1 91 2e-17

Tutorail code for PubMed search is not functioning

Submitted by Juergen Helmers via Rubyforge on 2009-08-31

The sample code in the tutorial for searching NCBI PubMed is not functional. Same is true for the sample script
pmsearch.rb

the sample script do miss the "require 'rubygem'" statement otherwise the bioruby gem will not be found
BioPubMEd.search does not work, it has to be replaced by BioPubMEd.esearch as recommended in the code itself.
puts Bio::MEDLINE.new(entry).reference.send(form) returns empty objects. One first has to fetch the Pubmed Article
Object as the esearch only returns PubMed ids nothing more.

entries = Bio::PubMed.esearch(ARGV.join(' '))
entries.each do |id|
case form
when 'medline'
puts entry = Bio::PubMed.efetch(id)
else
entry = Bio::PubMed.efetch(id)
puts Bio::MEDLINE.new(entry).reference.send(form)
end
end

Patch file is attached. It would be nice if the code could be updated since new users might be struggling with the false
code.

Keep up the good work!
Cheers Juergen

Remote Blast Fails

Hi,
when I execute the sample code from http://bioruby.open-bio.org/rdoc/classes/Bio/Blast.html I get the following error:

/Users/philipp/.rvm/gems/ruby-1.9.2-p180/gems/bio-1.4.1/lib/bio/appl/blast/genomenet.rb:240:in `exec_genomenet': cannot understand response (RuntimeError)
    from /Users/philipp/.rvm/gems/ruby-1.9.2-p180/gems/bio-1.4.1/lib/bio/appl/blast.rb:368:in `query'
    from blast_test.rb:12:in `<main>'

My script looks like this:

require 'rubygems'
require 'bio'

seq = Bio::Sequence::AA.new('MFRTKRSALVRRLWRSRAPGGEDEEEGAGGGGGGGELRGE')

# To run an actual BLAST analysis:

#1. create a BLAST factory
remote_blast_factory = Bio::Blast.remote('blastp', 'SWISS', '-e 0.0001', 'genomenet')

#2. run the actual BLAST by querying the factory
report = remote_blast_factory.query(seq)

Bio::MEDLINE#initialize handles multi-line MeSH terms incorrectly

Entrez-delivered MEDLINE records seem to be line-wrapped to 85 columns (for example, see PMID 20146148). This means that some exceptionally long and qualified MeSH headings (e.g., "Motorcycles/classification/legislation & jurisprudence/*statistics & numerical data") don't get parsed properly by MEDLINE#initialize- the parts that got wrapped to a second line get stuck in as separate mesh headings when split up by MEDLINE#mh.

Bio::ClustalW error output

Would it be possible to include the stderr output from ClustalW as a instance variable for Bio::ClustalW?

Here's what I've come up with but I don't feel confidant in my knowledge of processes and IO.

module Bio
  class ClustalW

    #redifine errorlog with the newly generated stderr output
    def errorlog
      @data_stderr
    end

    #redefine exec_local using call_command_open3 so we can get stderr
    def exec_local(opt)
      @command = [ @program,  *opt ]
      #STDERR.print "DEBUG: ", @command.join(" "), "\n"
      @data_stdout = nil
      @exit_status = nil
      @data_stderr = nil
      Bio::Command.call_command_open3(@command) do |pin,pout,perr|
        @data_stdout = pout.read
        @data_stderr = perr.read
      end
      @exit_status = $?
    end
  end
end

Thanks

pubmed.rb

Submitted by Masahide Kikkawa on Rubyforge on 2007-06-21 09:47

Due to the changes of pubmed interface, a method Bio::PubMed.query(pubmed_id) does not work.

Change the following lines

def self.query(id)
host = "www.ncbi.nlm.nih.gov"
path = "/Entrez/query.fcgi?tool=bioruby&cmd=Text&dopt=MEDLINE&db=PubMed&uid="

  path = "sites/entrez?tool=bioruby&cmd=Text&dopt=MEDLINE&db=PubMed&uid="

Provide class methods for common actions?

BioRuby.new.gc_content # => "70.000000"

BioRuby.gc_content
NoMethodError: undefined method `gc_content' for BioRuby:Class

Perhaps we could add class methods for BioRuby? For certain class-methods like .gc_content and so on?

installing soap4r-ruby1.9 generate: uninitialized constant XML::SaxParser during test/runner.rb

As for title:

in bioruby (github, master)

rake test
(in /usr/local/src/bioruby)

ununitialized constant XML::SaxParser (NameError)

I tested both the "official" gem and with more recent mumboe-soap4r.

If I uninstall the gem soap4r-ruby1.9 all tests pass. (so is a soap4r/XML SAX parser problem)

Bio::TestPhyloXML_class_methods test failure

Running on OS X 10.7, with ruby2.0 (installed via Fink), tests fail with this error:

[2877/3867] Bio::TestPhyloXML_class_methods#test_new = 0.00 s                                                                                            
  1) Error:
test_new(Bio::TestPhyloXML_class_methods):
ArgumentError: invalid byte sequence in US-ASCII
    /sw/build.build/bioruby-rb20-1.4.3.0001-1/bioruby-1.4.3.0001/lib/bio/db/phyloxml/phyloxml_parser.rb:326:in `=~'
    /sw/build.build/bioruby-rb20-1.4.3.0001-1/bioruby-1.4.3.0001/lib/bio/db/phyloxml/phyloxml_parser.rb:326:in `!~'
    /sw/build.build/bioruby-rb20-1.4.3.0001-1/bioruby-1.4.3.0001/lib/bio/db/phyloxml/phyloxml_parser.rb:326:in `initialize'
    /sw/build.build/bioruby-rb20-1.4.3.0001-1/bioruby-1.4.3.0001/test/unit/bio/db/test_phyloxml.rb:105:in `new'
    /sw/build.build/bioruby-rb20-1.4.3.0001-1/bioruby-1.4.3.0001/test/unit/bio/db/test_phyloxml.rb:105:in `test_new'
    /sw/lib/ruby/2.0/minitest/unit.rb:1301:in `run'
    /sw/lib/ruby/2.0/test/unit/testcase.rb:17:in `run'
    /sw/lib/ruby/2.0/minitest/unit.rb:919:in `block in _run_suite'
    /sw/lib/ruby/2.0/minitest/unit.rb:912:in `map'
    /sw/lib/ruby/2.0/minitest/unit.rb:912:in `_run_suite'
    /sw/lib/ruby/2.0/test/unit.rb:657:in `block in _run_suites'
    /sw/lib/ruby/2.0/test/unit.rb:655:in `each'
    /sw/lib/ruby/2.0/test/unit.rb:655:in `_run_suites'
    /sw/lib/ruby/2.0/minitest/unit.rb:867:in `_run_anything'
    /sw/lib/ruby/2.0/minitest/unit.rb:1060:in `run_tests'
    /sw/lib/ruby/2.0/minitest/unit.rb:1047:in `block in _run'
    /sw/lib/ruby/2.0/minitest/unit.rb:1046:in `each'
    /sw/lib/ruby/2.0/minitest/unit.rb:1046:in `_run'
    /sw/lib/ruby/2.0/minitest/unit.rb:1035:in `run'
    /sw/lib/ruby/2.0/test/unit.rb:21:in `run'
    /sw/lib/ruby/2.0/test/unit.rb:774:in `run'
    /sw/lib/ruby/2.0/test/unit.rb:834:in `run'
    test/runner.rb:36:in `<main>'

This patch fixes it by making sure UTF-8 is used during the test (source: http://wiki.lifesciencedb.jp/mw/index.php/BioRuby ):

--- a/test/unit/bio/db/test_phyloxml.rb
+++ b/test/unit/bio/db/test_phyloxml.rb
@@ -100,6 +100,7 @@ end #end module TestPhyloXMLData
     end

     def test_new
+      Encoding.default_external="UTF-8" 
       str = File.read(TestPhyloXMLData.example_xml)
       assert_instance_of(Bio::PhyloXML::Parser,
                          phyloxml = Bio::PhyloXML::Parser.new(str))

Bio::Tree#subtree not behaving as expected

Hi,

Today I noticed this very useful method in bioruby. However, I think it perhaps is not working correctly, (maybe for trees that don't have distance?)

tree = Bio::Newick.new('(A,B,(C,(D,G)H)E)F; ').tree
tree.subtree(%w(A B D G).collect{|s| tree.get_node_by_name(s)}).newick

gives

 => "(\n)A;\n"

The underlying tree is

 => #<Bio::Tree:0x92c18f8 @pathway=#<Bio::Pathway:0x92c18d0 @undirected=true, @relations=[], 
@graph={(Node:"A")=>{}, (Node:"B")=>{}, (Node:"D")=>{}, (Node:"G")=>{}}, 
@index={}, @label={}>, @root=nil, @options={}, @cache_parent={}>

This is using both the bioruby 1.4.2 and the current github master. Have I spotted a bug?

Thanks in advance.
ben

Tree::Bio::Node

Submitted by Nobody via Rubyforge on 2009-06-29

The method 'name' in Tree::Bio::Node class replaces the underscore '_' by space.

E.g.

The newick tree

(A_B, X);

The name of the node becomes "A B" which is inconsistence with what specified by the input and the Bio::Sequence
class.

GFF3 functionality is incomplete

Submitted by Jan Aerts via Rubyforge on 2008-07-04

See discussion on mailing list: http://lists.open-bio.org/pipermail/bioruby/2008-June/000653.html

test_cut_symbol fails because of uninitialized constant Bio::RestrictionEnzyme::CutSymbol

Hi,

When building Debian packages of bioruby, the test suite is run. The test test_cut_symbol.rb is failing because constant Bio::RestrictionEnzyme::CutSymbol is not initialized. I guess that there may be a problem with the way bio/util/restriction_enzyme/cut_symbol is required by this test. Requiring instead bio/util/restriction_enzyme would ensure that everything is well defined (cut_symbol is then automatically loaded).

Here is the patch applied in Debian to solve this issue:

--- a/test/unit/bio/util/restriction_enzyme/test_cut_symbol.rb
+++ b/test/unit/bio/util/restriction_enzyme/test_cut_symbol.rb
@@ -15,7 +15,8 @@

 # libraries needed for the tests
 require 'test/unit'
-require 'bio/util/restriction_enzyme/cut_symbol'
+require 'bio/util/restriction_enzyme'
+#require 'bio/util/restriction_enzyme/cut_symbol'

 module Bio; module TestRestrictionEnzyme #:nodoc:

Bio::Sequence::NA returns Rational not Float

From the bioruby documentation:

s = Bio::Sequence::NA.new('atggcgtga')
puts s.gc_content   #=> 0.555555555555556

But when using ruby 1.9.3 and bioruby 1.4.3 and do the same:

s = Bio::Sequence::NA.new('atggcgtga')
puts s.gc_content   #=> 5/9

also appears to affect other methods that should return float:

puts s.at_content #=> 4/9
puts s.gc_skew #=> 3/5
puts s.at_skew #=> 0/1

BioRuby Wiki seems to be down

shevy: do you know why bioruby's wiki doesnt work? http://bioruby.open-bio.org/wiki/
no idea, we have to ask ngoto when he is back
or we could file a bug report on github :)
shall I file one?
yeah
I think it's been down few weeks maybe
ok

Anyone knows why BioRuby Wiki is down and how to repair it?

The error we get is:

(Can't contact the database server: Unknown database 'biorubywikidb' (localhost))

Possibly the database entry has been removed or something like that?

Bug in bio/db/newick.rb

If you try to call reparse() on a newick tree you will get:

NameError: `tree' is not allowed as an instance variable name
    from /usr/local/lib/ruby/gems/1.8/gems/bio-1.4.1/lib/bio/db/newick.rb:346:in `remove_instance_variable'
    from /usr/local/lib/ruby/gems/1.8/gems/bio-1.4.1/lib/bio/db/newick.rb:346:in `reparse'
    from (irb):5
    from /usr/local/lib/ruby/site_ruby/1.8/rubygems.rb:123

The offending code is line 346 of newick.rb as the error states:

def reparse
    remove_instance_variable(:tree)
    self.tree
    self
end

You can clearly see the incorrect parameter being passed to remove_instance_variable(). The method should read:

def reparse
    remove_instance_variable(:@tree)
    self.tree
    self
end

Bio::Pubmed returning no results for searches

Pubmed is returning 301 Permentant Redirects for all requests from the Bio::PubMed library.

The path in Bio::PubMed is wrong, '/sites/entrez' should be just '/pubmed' now.

This is an easy fix, which I'll do now.

However it raises the question of whether we should be automatically following redirects.

support PDB format version 3.3

Current PDB format version is 3.3 but current BioRuby's Bio::PDB only supports PDB format version 2.x which is obsolete.

In PDB format version 3.3, some columns are expanded (e.g. serNum in SEQRES) and current Bio::PDB fails to parse large PDB entries.

bioruby / bioruby Goto Github PK

bioruby's Issues

yannick wurm

Recommend Projects

Recommend Topics

Recommend Org