biopython / biopython Goto Github PK

Official git repository for Biopython (originally converted from CVS)

License: Other

Python 93.81% C 2.90% HTML 0.34% Prolog 0.01% Nu 0.04% Gnuplot 0.02% Parrot 2.04% Roff 0.59% QMake 0.01% Visual Basic 6.0 0.21% AngelScript 0.03% CAP CDS 0.01%

python bioinformatics genomics biopython protein protein-structure sequence-alignment phylogenetics dna

biopython's Introduction

Biopython README file

The Biopython Project is an international association of developers of freely available Python tools for computational molecular biology.

This README file is intended primarily for people interested in working with the Biopython source code, either one of the releases from the http://biopython.org website, or from our repository on GitHub https://github.com/biopython/biopython

Our user-centric documentation, The Biopython Tutorial and Cookbook, and API documentation, is generated from our repository using Sphinx.

The NEWS file summarises the changes in each release of Biopython, alongside the DEPRECATED file which notes API breakages.

The Biopython package is open source software made available under generous terms. Please see the LICENSE file for further details.

If you use Biopython in work contributing to a scientific publication, we ask that you cite our application note (below) or one of the module specific publications (listed on our website):

Cock, P.J.A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009 Jun 1; 25(11) 1422-3 https://doi.org/10.1093/bioinformatics/btp163 pmid:19304878

For the impatient

Python includes the package management system "pip" which should allow you to install Biopython (and its dependency NumPy if needed), upgrade or uninstall with just one terminal command:

pip install biopython
pip install --upgrade biopython
pip uninstall biopython

Since Biopython 1.70 we have provided pre-compiled binary wheel packages on PyPI for Linux, macOS and Windows. This means pip install should be quick, and not require a compiler.

As a developer or potential contributor, you may wish to download, build and install Biopython yourself. This is described below.

Python Requirements

We currently recommend using Python 3.11 from http://www.python.org

Biopython is currently supported and tested on the following Python implementations:

Python 3.8, 3.9, 3.10, 3.11 and 3.12 -- see http://www.python.org
PyPy3.8 v7.3.11 -- or later, see http://www.pypy.org

Optional Dependencies

Biopython requires NumPy (see http://www.numpy.org) which will be installed automatically if you install Biopython with pip (see below for compiling Biopython yourself).

Depending on which parts of Biopython you plan to use, there are a number of other optional Python dependencies, which can be installed later if needed:

ReportLab, see http://www.reportlab.com/opensource/ (optional) This package is only used in Bio.Graphics, so if you do not need this functionality, you will not need to install this package.
matplotlib, see http://matplotlib.org/ (optional) Bio.Phylo uses this package to plot phylogenetic trees.
networkx, see https://networkx.github.io/ (optional) and pygraphviz or pydot, see https://pygraphviz.github.io/ and http://code.google.com/p/pydot/ (optional) These packages are used for certain niche functions in Bio.Phylo.
rdflib, see https://github.com/RDFLib/rdflib (optional) This package is used in the CDAO parser under Bio.Phylo.
psycopg2, see http://initd.org/psycopg/ (optional) or PyGreSQL (pgdb), see http://www.pygresql.org/ (optional) These packages are used by BioSQL to access a PostgreSQL database.
MySQL Connector/Python, see http://dev.mysql.com/downloads/connector/python/ This package is used by BioSQL to access a MySQL database, and is supported on PyPy too.
mysqlclient, see https://github.com/PyMySQL/mysqlclient-python (optional) This is a fork of the older MySQLdb and is used by BioSQL to access a MySQL database. It is supported by PyPy.

In addition there are a number of useful third party tools you may wish to install such as standalone NCBI BLAST, EMBOSS or ClustalW.

Installation From Source

We recommend using the pre-compiled binary wheels available on PyPI using:

pip install biopython

However, if you need to compile Biopython yourself, the following are required at compile time:

Python including development header files like python.h, which on Linux are often not installed by default (trying looking for and installing a package named python-dev or python-devel as well as the python package).
Appropriate C compiler for your version of Python, for example GCC on Linux, MSVC on Windows. For Mac OS X, or as it is now branded, macOS, use Apple's command line tools, which can be installed with the terminal command:
```
xcode-select --install
```
This will offer to install Apple's XCode development suite - you can, but it is not needed and takes a lot of disk space.

Then either download and decompress our source code, or fetch it using git. Now change directory to the Biopython source code folder and run:

pip install -e .
python setup.py test
sudo python setup.py install

Substitute python with your specific version if required, for example python3, or pypy3.

To exclude tests that require an internet connection (and which may take a long time), use the --offline option:

python setup.py test --offline

If you need to do additional configuration, e.g. changing the install directory prefix, please type python setup.py.

Testing

Biopython includes a suite of regression tests to check if everything is running correctly. To run the tests, go to the biopython source code directory and type:

pip install -e .
python setup.py test

If you want to skip the online tests (which is recommended when doing repeated testing), use:

python setup.py test --offline

Do not panic if you see messages warning of skipped tests:

test_DocSQL ... skipping. Install MySQLdb if you want to use Bio.DocSQL.

This most likely means that a package is not installed. You can ignore this if it occurs in the tests for a module that you were not planning on using. If you did want to use that module, please install the required dependency and re-run the tests.

Some of the tests may fail due to network issues, this is often down to chance or a service outage. If the problem does not go away on re-running the tests, you can use the --offline option.

There is more testing information in the Biopython Tutorial & Cookbook.

Experimental code

Biopython 1.61 introduced a new warning, Bio.BiopythonExperimentalWarning, which is used to mark any experimental code included in the otherwise stable Biopython releases. Such 'beta' level code is ready for wider testing, but still likely to change, and should only be tried by early adopters in order to give feedback via the biopython-dev mailing list.

We'd expect such experimental code to reach stable status within one or two releases, at which point our normal policies about trying to preserve backwards compatibility would apply.

Bugs

While we try to ship a robust package, bugs inevitably pop up. If you are having problems that might be caused by a bug in Biopython, it is possible that it has already been identified. Update to the latest release if you are not using it already, and retry. If the problem persists, please search our bug database and our mailing lists to see if it has already been reported (and hopefully fixed), and if not please do report the bug. We can't fix problems we don't know about ;)

Issue tracker: https://github.com/biopython/biopython/issues

If you suspect the problem lies within a parser, it is likely that the data format has changed and broken the parsing code. (The text BLAST and GenBank formats seem to be particularly fragile.) Thus, the parsing code in Biopython is sometimes updated faster than we can build Biopython releases. You can get the most recent parser by pulling the relevant files (e.g. the ones in Bio.SeqIO or Bio.Blast) from our git repository. However, be careful when doing this, because the code in github is not as well-tested as released code, and may contain new dependencies.

In any bug report, please let us know:

Which operating system and hardware (32 bit or 64 bit) you are using
Python version
Biopython version (or git commit/date)
Traceback that occurs (the full error message)

And also ideally:

Example code that breaks
A data file that causes the problem

Contributing, Bug Reports

Biopython is run by volunteers from all over the world, with many types of backgrounds. We are always looking for people interested in helping with code development, web-site management, documentation writing, technical administration, and whatever else comes up.

If you wish to contribute, please first read CONTRIBUTING.rst here, visit our web site http://biopython.org and join our mailing list: http://biopython.org/wiki/Mailing_lists

Distribution Structure

README.rst -- This file.
NEWS.rst -- Release notes and news.
LICENSE.rst -- What you can do with the code.
CONTRIB.rst -- An (incomplete) list of people who helped Biopython in one way or another.
CONTRIBUTING.rst -- An overview about how to contribute to Biopython.
DEPRECATED.rst -- Contains information about modules in Biopython that were removed or no longer recommended for use, and how to update code that uses those modules.
MANIFEST.in -- Configures which files to include in releases.
setup.py -- Installation file.
Bio/ -- The main code base code.
BioSQL/ -- Code for using Biopython with BioSQL databases.
Doc/ -- Documentation.
Scripts/ -- Miscellaneous, possibly useful, standalone scripts.
Tests/ -- Regression testing code including sample data files.

biopython's People

Contributors

Stargazers

Watchers

Forkers

etal peterjc barwil nmatzke jamescasbon spling eoc21 mswiatek jkala nesnidal frankkl ndaniel chapmanb apierleoni jamestbrown mat-d pajanne nizy abelsen jdiez kuikuisven krieg pgarland barendt laserson ntamas joaorodrigues jfinkels crosvera lidaof nickloman wgillett komal-s csds2 rajgopals mtrellet polyatail sbassi habnabit dwinter academicrobot bow asafpr mok0 mchelem starius davidcain benreynwar awesomo rachel-latour timwintle guniorobot dirkpascal biodec chiragmatkar sterlesser spock poneill elictricocean luwening katerose lennax bigben446 mercutio22 jeffhsu3 zachcp markstoehr soledad11 eclarke cathoderay msabramo kblin christophchamp tejastank battmatt chrismit jeffhussmann bingw a-nai bgruening jingping jonas-r kevinwuhoo jamesonnetworks midnighter cbrueffer bioinformaticsarchive donkang75 asford igorrcosta dnanerd olgabot davidmam bendmorris kdm9 rwbarrette konrad bsmith89 anntzer ak352

biopython's Issues

plans to implement k-means++ initiation?

Are there plans to include the ability to use smart initiation of clusters with k-means++ or similar yet? A search on this repo's code/issues yielded no results. I hope this is not a duplicate post!

Gus

Bio/SeqIO/SffIO: assert in _check_eof

Seems there is one more place where biopython-1.63 is still too strict while writing out SFF files (see issue #219).

/usr/lib64/python2.7/site-packages/Bio/SeqIO/SffIO.py:941: BiopythonParserWarning: Your SFF file is invalid, post index 4 byte null padding region contained data.

  File "myapp", line 18609, in write_clipped_sff_and_fasta_qual
    _wrote1 = SeqIO.write(_fixed_sff_records, _filename, "sff")
  File "/usr/lib64/python2.7/site-packages/Bio/SeqIO/__init__.py", line 463, in write
    count = writer_class(fp).write_file(sequences)
  File "/usr/lib64/python2.7/site-packages/Bio/SeqIO/SffIO.py", line 1027, in write_file
    for record in records:
  File "/usr/share/SFF_inspector/squid2", line 18196, in fix_SFF_records
    for _i, _record in enumerate(SeqIO.parse(prefix + ".sff", "sff")):
  File "/usr/lib64/python2.7/site-packages/Bio/SeqIO/__init__.py", line 582, in parse
    for r in i:
  File "/usr/lib64/python2.7/site-packages/Bio/SeqIO/SffIO.py", line 896, in SffIterator
    _check_eof(handle, index_offset, index_length)
  File "/usr/lib64/python2.7/site-packages/Bio/SeqIO/SffIO.py", line 944, in _check_eof
    assert offset % 8 == 0
AssertionError

The input file was SRR653464.sra converted to SFF. It seems records in both datasets GM2U7V101 and GM2U7V102 of the merged SRA file have the issue.

Incorrect Alignment Returned by Bio.pairwise2.align.localds

I ran across this issue while testing my implementation of Smith-Waterman against BioPython's (version 1.63 on OSX 10.9). The following test case illustrates a non-optimal alignment returned by BioPython. I was following the docs from http://biopython.org/DIST/docs/api/Bio.pairwise2-module.html. I didn't see any open issues documenting this, hence I decided to open this one.

Thanks!

import Bio.pairwise2
from Bio.SubsMat.MatrixInfo import blosum62

print ('K','Q'), 'blosum62 score is',blosum62[('K','Q')]
print ('A','A'), 'blosum62 score is',blosum62[('A','A')]
print ('H','H'), 'blosum62 score is',blosum62[('H','H')]

alignments = Bio.pairwise2.align.localds('VKAHGKKV', 'FQAHCAGV', blosum62, -4, -4)
for a in alignments:
     print Bio.pairwise2.format_alignment(*a)

print "However the score 13 is possible by adding the pairing K-Q..."

which for convenience when run produces

('K', 'Q') blosum62 score is 1
('A', 'A') blosum62 score is 4
('H', 'H') blosum62 score is 8
VKAHGKKV
  ||
FQAHCAGV
  Score=12

However the score 13 is possible by adding the pairing K-Q...

missing DTD

Just informing since ran a new setup.py test and gave me this error:

test_Entrez_online ... /Users/katzlab32/Downloads/biopython-1.63/build/lib.macosx-10.9-intel-2.7/Bio/Entrez/Parser.py:525: UserWarning: Unable to load DTD file einfo.dtd.

Bio.Entrez uses NCBI's DTD files to parse XML files returned by NCBI Entrez.
Though most of NCBI's DTD files are included in the Biopython distribution,
sometimes you may find that a particular DTD file is missing. While we can
access the DTD file through the internet, the parser is much faster if the
required DTD files are available locally.

For this purpose, please download einfo.dtd from

http://eutils.ncbi.nlm.nih.gov/eutils/dtd/20130322/einfo.dtd

and save it either in directory

/Users/katzlab32/Downloads/biopython-1.63/build/lib.macosx-10.9-intel-2.7/Bio/Entrez/DTDs

or in directory

/Users/katzlab32/.biopython/Bio/Entrez/DTDs

in order for Bio.Entrez to find it.

Alternatively, you can save einfo.dtd in the directory
Bio/Entrez/DTDs in the Biopython distribution, and reinstall Biopython.

Please also inform the Biopython developers about this missing DTD, by
reporting a bug on https://github.com/biopython/biopython/issues or sign
up to our mailing list and emailing us, so that we can include it with the
next release of Biopython.

Proceeding to access the DTD file through the internet...

Possible crash location Bio/Blast/NCBIXML.py", line 106 in endElement or Python-2.7.5-r2/Modules/pyexpat.c:616

Oh Dear,
I am experiencing some crashes and thanks to python configured during install using "configure --with-pydebug" and thanks to https://pypi.python.org/pypi/faulthandler I have much better stacktraces in gdb and on STDERR.
Looks the route took me again to legacy BLASTN and to old bug in blast. NCBI asnwred to me in the past they won't fix legacy blastn. So, we have to fix biopython blastn parser, and now it even seems expat/biopython is crashing.

It will take me a while to get through all of the output but unless the bug is elsewhere the below stacktrace shpuld be enough . Most likely this is the bug I saw already in biopython-1.59 but right now have biopython-1.62b (pre-release-beta) installed, see the line numbers below.

Fatal Python error: Segmentation fault

Current thread 0x00007f9316072700:
File "/usr/lib64/python2.7/site-packages/Bio/Blast/NCBIXML.py", line 106 in endElement
File "/mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Modules/pyexpat.c", line 618 in EndElement
File "/usr/lib64/python2.7/site-packages/Bio/Blast/NCBIXML.py", line 654 in parse
File "blah.py", line 19469 in parse_blastn_XML_and_write_csv
...

(gdb) where
#0 0x00007f9315810acb in raise () from /lib64/libpthread.so.0
#1 0x00007f93149365f6 in faulthandler_fatal_error (signum=11) at faulthandler.c:321
#2
#3 0x00007f9315bc6e40 in visit_decref (op=<unknown at remote 0x46966a0>, data=0x0) at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Modules/gcmodule.c:360
#4 0x00007f9315abc37c in list_traverse (o=0x6998150, visit=0x7f9315bc6e02 <visit_decref>, arg=0x0) at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Objects/listobject.c:2362
#5 0x00007f9315bc6f32 in subtract_refs (containers=0x7f9315e789c0 <generations+96>) at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Modules/gcmodule.c:385
#6 0x00007f9315bc7fb3 in collect (generation=2) at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Modules/gcmodule.c:925
#7 0x00007f9315bc830c in collect_generations () at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Modules/gcmodule.c:1050
#8 0x00007f9315bc8fc3 in _PyObject_GC_Malloc (basicsize=408) at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Modules/gcmodule.c:1511
#9 0x00007f9315bc9064 in _PyObject_GC_NewVar (tp=0x7f9315e4f120 <PyFrame_Type>, nitems=1) at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Modules/gcmodule.c:1531
#10 0x00007f9315aae3e3 in PyFrame_New (tstate=0x20000a0, code=0x77d9d50, globals=

{'xml': <module at remote 0x448b268>, 'BlastParser': <classobj at remote 0x4d549e0>, '__builtins__': {'bytearray': <type at remote 0x7f9315e43520>, 'IndexError': <type at remote 0x7f9315e49a00>, 'all': <built-in function all>, 'help': <_Helper at remote 0x22091b0>, 'vars': <built-in function vars>, 'SyntaxError': <type at remote 0x7f9315e49380>, 'unicode': <type at remote 0x7f9315e60be0>, 'UnicodeDecodeError': <type at remote 0x7f9315e4a320>, 'memoryview': <type at remote 0x7f9315e54640>, 'isinstance': <built-in function isinstance>, 'copyright': <_Printer(_Printer__data='Copyright (c) 2001-2013 Python Software Foundation.\nAll Rights Reserved.\n\nCopyright (c) 2000 BeOpen.com.\nAll Rights Reserved.\n\nCopyright (c) 1995-2001 Corporation for National Research Initiatives.\nAll Rights Reserved.\n\nCopyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.\nAll Rights Reserved.', _Printer__lines=None, _Printer__name='copyright', _Printer__dirs=(), _Printer__files=(...)) at remote 0x21bdc30>, 'NameError'...(truncated), locals=
{'self': <BlastParser(_parser=<ExpatParser(_namespaces=0, _parser=None, _external_ges=0, _source=<InputSource(_InputSource__charfile=None, _InputSource__bytefile=None, _InputSource__public_id=None, _InputSource__system_id=None, _InputSource__encoding=None) at remote 0x6998600>, _bufsize=65516, _cont_handler=<...>, _dtd_handler=<DTDHandler() at remote 0x6998e70>, _entity_stack=[], _err_handler=<ErrorHandler() at remote 0x6998858>, _lex_handler_prop=None, _parsing=0, _ent_handler=<EntityResolver() at remote 0x6998948>, _interning=None) at remote 0x6998b28>, _mult_al=<MultipleAlignment(alignment=[]) at remote 0x77d0f40>, _debug=0, _hsp=<HSP(sbjct_end=None, sbjct='', bits=None, frame=(), query_end=None, score=None, gaps=(None, None), expect=None, query='', sbjct_start=None, positives=(None, None), align_length=None, num_alignments=None, identities=(None, None), query_start=None, strand=(None, None), match='') at remote 0x77d0ed0>, _descr=<Description(e=<float at remote 0x68fa3c0>, title=u'gnl|BL_ORD_ID|14 poly_A'...(truncated)) at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Objects/frameobject.c:682

I suspect the bug happens because either of:
positives=(None, None)
identities=(None, None)
strand=(None, None)

I know these funny tuples were already reported for gaps and identities if I remember right .... so there may be more? :(
https://redmine.open-bio.org/issues/3363
https://redmine.open-bio.org/issues/3354

Why the expat crash on "Hsp_bit-score" (see rows #23 and #26 from gdb below)?

Nevertheless, I think biopython should sanitize its values if XML entry is crap. If you find why expat crashes than its only good. ;-)
#13 0x00007f9315baca72 in run_mod (mod=0x29b97b8, filename=0x7f9315c0fdd5 "",

globals={'xml': <module at remote 0x448b268>, 'BlastParser': <classobj at remote 0x4d549e0>, '__builtins__': {'bytearray': <type at remote 0x7f9315e43520>, 'IndexError': <type at remote 0x7f9315e49a00>, 'all': <built-in function all>, 'help': <_Helper at remote 0x22091b0>, 'vars': <built-in function vars>, 'SyntaxError': <type at remote 0x7f9315e49380>, 'unicode': <type at remote 0x7f9315e60be0>, 'UnicodeDecodeError': <type at remote 0x7f9315e4a320>, 'memoryview': <type at remote 0x7f9315e54640>, 'isinstance': <built-in function isinstance>, 'copyright': <_Printer(_Printer__data='Copyright (c) 2001-2013 Python Software Foundation.\nAll Rights Reserved.\n\nCopyright (c) 2000 BeOpen.com.\nAll Rights Reserved.\n\nCopyright (c) 1995-2001 Corporation for National Research Initiatives.\nAll Rights Reserved.\n\nCopyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.\nAll Rights Reserved.', _Printer__lines=None, _Printer__name='copyright', _Printer__dirs=(), _Printer__files=(...)) at remote 0x21bdc30>, 'NameError'...(truncated),
locals={'self': <BlastParser(_parser=<ExpatParser(_namespaces=0, _parser=None, _external_ges=0, _source=<InputSource(_InputSource__charfile=None, _InputSource__bytefile=None, _InputSource__public_id=None, _InputSource__system_id=None, _InputSource__encoding=None) at remote 0x6998600>, _bufsize=65516, _cont_handler=<...>, _dtd_handler=<DTDHandler() at remote 0x6998e70>, _entity_stack=[], _err_handler=<ErrorHandler() at remote 0x6998858>, _lex_handler_prop=None, _parsing=0, _ent_handler=<EntityResolver() at remote 0x6998948>, _interning=None) at remote 0x6998b28>, _mult_al=<MultipleAlignment(alignment=[]) at remote 0x77d0f40>, _debug=0, _hsp=<HSP(sbjct_end=None, sbjct='', bits=None, frame=(), query_end=None, score=None, gaps=(None, None), expect=None, query='', sbjct_start=None, positives=(None, None), align_length=None, num_alignments=None, identities=(None, None), query_start=None, strand=(None, None), match='') at remote 0x77d0ed0>, _descr=<Description(e=<float at remote 0x68fa3c0>, title=u'gnl|BL_ORD_ID|14 poly_A'...(truncated), flags=0x7fff7c311d50, arena=0x4137e40)
at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Python/pythonrun.c:1365

#14 0x00007f9315bac923 in PyRun_StringFlags (str=0x77d0fc4 "self._end_Hsp_bit_score()", start=258,

globals={'xml': <module at remote 0x448b268>, 'BlastParser': <classobj at remote 0x4d549e0>, '__builtins__': {'bytearray': <type at remote 0x7f9315e43520>, 'IndexError': <type at remote 0x7f9315e49a00>, 'all': <built-in function all>, 'help': <_Helper at remote 0x22091b0>, 'vars': <built-in function vars>, 'SyntaxError': <type at remote 0x7f9315e49380>, 'unicode': <type at remote 0x7f9315e60be0>, 'UnicodeDecodeError': <type at remote 0x7f9315e4a320>, 'memoryview': <type at remote 0x7f9315e54640>, 'isinstance': <built-in function isinstance>, 'copyright': <_Printer(_Printer__data='Copyright (c) 2001-2013 Python Software Foundation.\nAll Rights Reserved.\n\nCopyright (c) 2000 BeOpen.com.\nAll Rights Reserved.\n\nCopyright (c) 1995-2001 Corporation for National Research Initiatives.\nAll Rights Reserved.\n\nCopyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.\nAll Rights Reserved.', _Printer__lines=None, _Printer__name='copyright', _Printer__dirs=(), _Printer__files=(...)) at remote 0x21bdc30>, 'NameError'...(truncated),
locals={'self': <BlastParser(_parser=<ExpatParser(_namespaces=0, _parser=None, _external_ges=0, _source=<InputSource(_InputSource__charfile=None, _InputSource__bytefile=None, _InputSource__public_id=None, _InputSource__system_id=None, _InputSource__encoding=None) at remote 0x6998600>, _bufsize=65516, _cont_handler=<...>, _dtd_handler=<DTDHandler() at remote 0x6998e70>, _entity_stack=[], _err_handler=<ErrorHandler() at remote 0x6998858>, _lex_handler_prop=None, _parsing=0, _ent_handler=<EntityResolver() at remote 0x6998948>, _interning=None) at remote 0x6998b28>, _mult_al=<MultipleAlignment(alignment=[]) at remote 0x77d0f40>, _debug=0, _hsp=<HSP(sbjct_end=None, sbjct='', bits=None, frame=(), query_end=None, score=None, gaps=(None, None), expect=None, query='', sbjct_start=None, positives=(None, None), align_length=None, num_alignments=None, identities=(None, None), query_start=None, strand=(None, None), match='') at remote 0x77d0ed0>, _descr=<Description(e=<float at remote 0x68fa3c0>, title=u'gnl|BL_ORD_ID|14 poly_A'...(truncated), flags=0x7fff7c311d50)
at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Python/pythonrun.c:1328

#15 0x00007f9315b658b5 in builtin_eval (self=0x0, args=(u'self._end_Hsp_bit_score()',)) at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Python/bltinmodule.c:695
#16 0x00007f9315ad5006 in PyCFunction_Call (func=, arg=(u'self._end_Hsp_bit_score()',), kw=0x0) at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Objects/methodobject.c:81
#17 0x00007f9315b7b1d4 in call_function (pp_stack=0x7fff7c311f90, oparg=1) at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Python/ceval.c:4021
#18 0x00007f9315b75cd8 in PyEval_EvalFrameEx (

f=Frame 0x3edea80, for file /usr/lib64/python2.7/site-packages/Bio/Blast/NCBIXML.py, line 106, in endElement (self=<BlastParser(_parser=<ExpatParser(_namespaces=0, _parser=None, _external_ges=0, _source=<InputSource(_InputSource__charfile=None, _InputSource__bytefile=None, _InputSource__public_id=None, _InputSource__system_id=None, _InputSource__encoding=None) at remote 0x6998600>, _bufsize=65516, _cont_handler=<...>, _dtd_handler=<DTDHandler() at remote 0x6998e70>, _entity_stack=[], _err_handler=<ErrorHandler() at remote 0x6998858>, _lex_handler_prop=None, _parsing=0, _ent_handler=<EntityResolver() at remote 0x6998948>, _interning=None) at remote 0x6998b28>, _mult_al=<MultipleAlignment(alignment=[]) at remote 0x77d0f40>, _debug=0, _hsp=<HSP(sbjct_end=None, sbjct='', bits=None, frame=(), query_end=None, score=None, gaps=(None, None), expect=None, query='', sbjct_start=None, positives=(None, None), align_length=None, num_alignments=None, identities=(None, None), query_start=None, strand=(None, None), match='') a...(truncated), throwflag=0)
at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Python/ceval.c:2666

#19 0x00007f9315b7870e in PyEval_EvalCodeEx (co=0x50a4bf0,

globals={'xml': <module at remote 0x448b268>, 'BlastParser': <classobj at remote 0x4d549e0>, '__builtins__': {'bytearray': <type at remote 0x7f9315e43520>, 'IndexError': <type at remote 0x7f9315e49a00>, 'all': <built-in function all>, 'help': <_Helper at remote 0x22091b0>, 'vars': <built-in function vars>, 'SyntaxError': <type at remote 0x7f9315e49380>, 'unicode': <type at remote 0x7f9315e60be0>, 'UnicodeDecodeError': <type at remote 0x7f9315e4a320>, 'memoryview': <type at remote 0x7f9315e54640>, 'isinstance': <built-in function isinstance>, 'copyright': <_Printer(_Printer__data='Copyright (c) 2001-2013 Python Software Foundation.\nAll Rights Reserved.\n\nCopyright (c) 2000 BeOpen.com.\nAll Rights Reserved.\n\nCopyright (c) 1995-2001 Corporation for National Research Initiatives.\nAll Rights Reserved.\n\nCopyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.\nAll Rights Reserved.', _Printer__lines=None, _Printer__name='copyright', _Printer__dirs=(), _Printer__files=(...)) at remote 0x21bdc30>, 'NameError'...(truncated), locals=0x0, args=0x4172718, argcount=2, kws=0x0, kwcount=0, defs=0x0, defcount=0,
closure=0x0) at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Python/ceval.c:3253

#20 0x00007f9315ab0f2e in function_call (func=<function at remote 0x67ce8e8>,

arg=(<BlastParser(_parser=<ExpatParser(_namespaces=0, _parser=None, _external_ges=0, _source=<InputSource(_InputSource__charfile=None, _InputSource__bytefile=None, _InputSource__public_id=None, _InputSource__system_id=None, _InputSource__encoding=None) at remote 0x6998600>, _bufsize=65516, _cont_handler=<...>, _dtd_handler=<DTDHandler() at remote 0x6998e70>, _entity_stack=[], _err_handler=<ErrorHandler() at remote 0x6998858>, _lex_handler_prop=None, _parsing=0, _ent_handler=<EntityResolver() at remote 0x6998948>, _interning=None) at remote 0x6998b28>, _mult_al=<MultipleAlignment(alignment=[]) at remote 0x77d0f40>, _debug=0, _hsp=<HSP(sbjct_end=None, sbjct='', bits=None, frame=(), query_end=None, score=None, gaps=(None, None), expect=None, query='', sbjct_start=None, positives=(None, None), align_length=None, num_alignments=None, identities=(None, None), query_start=None, strand=(None, None), match='') at remote 0x77d0ed0>, _descr=<Description(e=<float at remote 0x68fa3c0>, title=u'gnl|BL_ORD_ID|14 poly_A', access...(truncated), kw=0x0)
at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Objects/funcobject.c:526

#21 0x00007f9315a6f840 in PyObject_Call (func=<function at remote 0x67ce8e8>,

arg=(<BlastParser(_parser=<ExpatParser(_namespaces=0, _parser=None, _external_ges=0, _source=<InputSource(_InputSource__charfile=None, _InputSource__bytefile=None, _InputSource__public_id=None, _InputSource__system_id=None, _InputSource__encoding=None) at remote 0x6998600>, _bufsize=65516, _cont_handler=<...>, _dtd_handler=<DTDHandler() at remote 0x6998e70>, _entity_stack=[], _err_handler=<ErrorHandler() at remote 0x6998858>, _lex_handler_prop=None, _parsing=0, _ent_handler=<EntityResolver() at remote 0x6998948>, _interning=None) at remote 0x6998b28>, _mult_al=<MultipleAlignment(alignment=[]) at remote 0x77d0f40>, _debug=0, _hsp=<HSP(sbjct_end=None, sbjct='', bits=None, frame=(), query_end=None, score=None, gaps=(None, None), expect=None, query='', sbjct_start=None, positives=(None, None), align_length=None, num_alignments=None, identities=(None, None), query_start=None, strand=(None, None), match='') at remote 0x77d0ed0>, _descr=<Description(e=<float at remote 0x68fa3c0>, title=u'gnl|BL_ORD_ID|14 poly_A', access...(truncated), kw=0x0)
at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Objects/abstract.c:2529

#22 0x00007f9315a8ba6f in instancemethod_call (func=<function at remote 0x67ce8e8>,

arg=(<BlastParser(_parser=<ExpatParser(_namespaces=0, _parser=None, _external_ges=0, _source=<InputSource(_InputSource__charfile=None, _InputSource__bytefile=None, _InputSource__public_id=None, _InputSource__system_id=None, _InputSource__encoding=None) at remote 0x6998600>, _bufsize=65516, _cont_handler=<...>, _dtd_handler=<DTDHandler() at remote 0x6998e70>, _entity_stack=[], _err_handler=<ErrorHandler() at remote 0x6998858>, _lex_handler_prop=None, _parsing=0, _ent_handler=<EntityResolver() at remote 0x6998948>, _interning=None) at remote 0x6998b28>, _mult_al=<MultipleAlignment(alignment=[]) at remote 0x77d0f40>, _debug=0, _hsp=<HSP(sbjct_end=None, sbjct='', bits=None, frame=(), query_end=None, score=None, gaps=(None, None), expect=None, query='', sbjct_start=None, positives=(None, None), align_length=None, num_alignments=None, identities=(None, None), query_start=None, strand=(None, None), match='') at remote 0x77d0ed0>, _descr=<Description(e=<float at remote 0x68fa3c0>, title=u'gnl|BL_ORD_ID|14 poly_A', access...(truncated), kw=0x0)
at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Objects/classobject.c:2602

#23 0x00007f9315a6f840 in PyObject_Call (func=<instancemethod at remote 0x415f460>, arg=(u'Hsp_bit-score',), kw=0x0) at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Objects/abstract.c:2529
#24 0x00007f9315b7a900 in PyEval_CallObjectWithKeywords (func=<instancemethod at remote 0x415f460>, arg=(u'Hsp_bit-score',), kw=0x0) at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Python/ceval.c:3890
#25 0x00007f930c412705 in call_with_frame (c=0x67d5bf0, func=<instancemethod at remote 0x415f460>, args=(u'Hsp_bit-score',), self=0x50a2ba0) at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Modules/pyexpat.c:355
#26 0x00007f930c4135bd in my_EndElementHandler (userData=0x50a2ba0, name=0x5367d60 "Hsp_bit-score") at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Modules/pyexpat.c:616
#27 0x00007f930c1ef2d2 in doContent () from /usr/lib64/libexpat.so.1
#28 0x00007f930c1f01b4 in contentProcessor () from /usr/lib64/libexpat.so.1
#29 0x00007f930c1eae2a in XML_ParseBuffer () from /usr/lib64/libexpat.so.1
#30 0x00007f930c416199 in xmlparse_Parse (self=0x50a2ba0,

args=('n>\n              <Hsp_qseq>AAAAAAAAAACAAAAAAAAAANAAAAAAAAACAA</Hsp_qseq>\n              <Hsp_hseq>AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA</Hsp_hseq>\n              <Hsp_midline>|||||||||| |||||||||| ||||||||| ||</Hsp_midline>\n            </Hsp>\n            <Hsp>\n              <Hsp_num>728</Hsp_num>\n              <Hsp_bit-score>49.9773</Hsp_bit-score>\n              <Hsp_score>54</Hsp_score>\n              <Hsp_evalue>2.68758e-09</Hsp_evalue>\n              <Hsp_query-from>97</Hsp_query-from>\n              <Hsp_query-to>130</Hsp_query-to>\n              <Hsp_hit-from>728</Hsp_hit-from>\n              <Hsp_hit-to>761</Hsp_hit-to>\n              <Hsp_query-frame>1</Hsp_query-frame>\n              <Hsp_hit-frame>1</Hsp_hit-frame>\n              <Hsp_identity>31</Hsp_identity>\n              <Hsp_positive>31</Hsp_positive>\n              <Hsp_align-len>34</Hsp_align-len>\n              <Hsp_qseq>AAAAAAAAAACAAAAAAAAAANAAAAAAAAACAA</Hsp_qseq>\n              <Hsp_hseq>AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA</Hsp_hseq',...(truncated)) at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Modules/pyexpat.c:902

#31 0x00007f9315ad5006 in PyCFunction_Call (func=<built-in method Parse of pyexpat.xmlparser object at remote 0x50a2ba0>,

arg=('n>\n              <Hsp_qseq>AAAAAAAAAACAAAAAAAAAANAAAAAAAAACAA</Hsp_qseq>\n              <Hsp_hseq>AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA</Hsp_hseq>\n              <Hsp_midline>|||||||||| |||||||||| ||||||||| ||</Hsp_midline>\n            </Hsp>\n            <Hsp>\n              <Hsp_num>728</Hsp_num>\n              <Hsp_bit-score>49.9773</Hsp_bit-score>\n              <Hsp_score>54</Hsp_score>\n              <Hsp_evalue>2.68758e-09</Hsp_evalue>\n              <Hsp_query-from>97</Hsp_query-from>\n              <Hsp_query-to>130</Hsp_query-to>\n              <Hsp_hit-from>728</Hsp_hit-from>\n              <Hsp_hit-to>761</Hsp_hit-to>\n              <Hsp_query-frame>1</Hsp_query-frame>\n              <Hsp_hit-frame>1</Hsp_hit-frame>\n              <Hsp_identity>31</Hsp_identity>\n              <Hsp_positive>31</Hsp_positive>\n              <Hsp_align-len>34</Hsp_align-len>\n              <Hsp_qseq>AAAAAAAAAACAAAAAAAAAANAAAAAAAAACAA</Hsp_qseq>\n              <Hsp_hseq>AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA</Hsp_hseq',...(truncated), kw=0x0)
at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Objects/methodobject.c:81

#32 0x00007f9315b7b1d4 in call_function (pp_stack=0x7fff7c312a70, oparg=2) at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Python/ceval.c:4021
#33 0x00007f9315b75cd8 in PyEval_EvalFrameEx (

f=Frame 0x58dab40, for file /usr/lib64/python2.7/site-packages/Bio/Blast/NCBIXML.py, line 654, in parse (handle=<file at remote 0x533db80>, debug=0, expat=<module at remote 0x67d2130>, BLOCK=1024, MARGIN=10, XML_START='<?xml', text='n>\n              <Hsp_qseq>AAAAAAAAAACAAAAAAAAAANAAAAAAAAACAA</Hsp_qseq>\n              <Hsp_hseq>AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA</Hsp_hseq>\n              <Hsp_midline>|||||||||| |||||||||| ||||||||| ||</Hsp_midline>\n            </Hsp>\n            <Hsp>\n              <Hsp_num>728</Hsp_num>\n              <Hsp_bit-score>49.9773</Hsp_bit-score>\n              <Hsp_score>54</Hsp_score>\n              <Hsp_evalue>2.68758e-09</Hsp_evalue>\n              <Hsp_query-from>97</Hsp_query-from>\n              <Hsp_query-to>130</Hsp_query-to>\n              <Hsp_hit-from>728</Hsp_hit-from>\n              <Hsp_hit-to>761</Hsp_hit-to>\n              <Hsp_query-frame>1</Hsp_query-frame>\n              <Hsp_hit-frame>1</Hsp_hit-frame>\n              <Hsp_identity>31</Hsp_identity>\n        ...(truncated), throwflag=0)
at /mnt/1TB/var/tmp/portage/dev-lang/python-2.7.5-r2/work/Python-2.7.5/Python/ceval.c:2666

It seems it crashed because there were TWO broken XML entries in the XML stream while and on the third (non-bogus) it crashed but deeply in it on <Hsp_num>728</Hsp_num> ...:

<Iteration>
  <Iteration_iter-num>5195</Iteration_iter-num>
  <Iteration_query-ID>lcl|5195_0</Iteration_query-ID>
  <Iteration_query-def>EYI1BW404I60E4 length=245 xy=3653_1102 region=4 run=R_2007_11_06_15_29_46_</Iteration_query-def>
  <Iteration_query-len>253</Iteration_query-len>
  <Iteration_stat>
    <Statistics>
      <Statistics_db-num>30</Statistics_db-num>
      <Statistics_db-len>20176</Statistics_db-len>
      <Statistics_hsp-len>0</Statistics_hsp-len>
      <Statistics_eff-space>0</Statistics_eff-space>
      <Statistics_kappa>0.41</Statistics_kappa>
      <Statistics_lambda>0.625</Statistics_lambda>
      <Statistics_entropy>0.78</Statistics_entropy>
    </Statistics>
  </Iteration_stat>
  <Iteration_message>No hits found</Iteration_message>
</Iteration>
<Iteration>
  <Iteration_iter-num>5196</Iteration_iter-num>
  <Iteration_query-ID>lcl|5196_0</Iteration_query-ID>
  <Iteration_query-def>EYI1BW404I5AGB length=255 xy=3633_2713 region=4 run=R_2007_11_06_15_29_46_</Iteration_query-def>
  <Iteration_query-len>259</Iteration_query-len>
  <Iteration_stat>
    <Statistics>
      <Statistics_db-num>30</Statistics_db-num>
      <Statistics_db-len>20176</Statistics_db-len>
      <Statistics_hsp-len>0</Statistics_hsp-len>
      <Statistics_eff-space>0</Statistics_eff-space>
      <Statistics_kappa>0.41</Statistics_kappa>
      <Statistics_lambda>0.625</Statistics_lambda>
      <Statistics_entropy>0.78</Statistics_entropy>
    </Statistics>
  </Iteration_stat>
  <Iteration_message>No hits found</Iteration_message>
</Iteration>
<Iteration>
  <Iteration_iter-num>5197</Iteration_iter-num>
  <Iteration_query-ID>lcl|5197_0</Iteration_query-ID>
  <Iteration_query-def>EYI1BW404IB6HP length=88 xy=3302_0331 region=4 run=R_2007_11_06_15_29_46_</Iteration_query-def>
  <Iteration_query-len>166</Iteration_query-len>
  <Iteration_hits>
    <Hit>
      <Hit_num>1</Hit_num>
      <Hit_id>gnl|BL_ORD_ID|14</Hit_id>
      <Hit_def>poly_A</Hit_def>
      <Hit_accession>14</Hit_accession>
      <Hit_len>960</Hit_len>
      <Hit_hsps>
        <Hsp>
          <Hsp_num>1</Hsp_num>
          <Hsp_bit-score>49.9773</Hsp_bit-score>
          <Hsp_score>54</Hsp_score>
          <Hsp_evalue>2.68758e-09</Hsp_evalue>
          <Hsp_query-from>97</Hsp_query-from>
          <Hsp_query-to>130</Hsp_query-to>
          <Hsp_hit-from>1</Hsp_hit-from>
          <Hsp_hit-to>34</Hsp_hit-to>
          <Hsp_query-frame>1</Hsp_query-frame>
          <Hsp_hit-frame>1</Hsp_hit-frame>
          <Hsp_identity>31</Hsp_identity>
          <Hsp_positive>31</Hsp_positive>
          <Hsp_align-len>34</Hsp_align-len>
          <Hsp_qseq>AAAAAAAAAACAAAAAAAAAANAAAAAAAAACAA</Hsp_qseq>
          <Hsp_hseq>AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA</Hsp_hseq>
          <Hsp_midline>|||||||||| |||||||||| ||||||||| ||</Hsp_midline>
        </Hsp>

         plenty matches and finally

        <Hsp>
          <Hsp_num>728</Hsp_num>
          <Hsp_bit-score>49.9773</Hsp_bit-score>
          <Hsp_score>54</Hsp_score>
          <Hsp_evalue>2.68758e-09</Hsp_evalue>
          <Hsp_query-from>97</Hsp_query-from>
          <Hsp_query-to>130</Hsp_query-to>
          <Hsp_hit-from>728</Hsp_hit-from>
          <Hsp_hit-to>761</Hsp_hit-to>
          <Hsp_query-frame>1</Hsp_query-frame>
          <Hsp_hit-frame>1</Hsp_hit-frame>
          <Hsp_identity>31</Hsp_identity>
          <Hsp_positive>31</Hsp_positive>
          <Hsp_align-len>34</Hsp_align-len>
          <Hsp_qseq>AAAAAAAAAACAAAAAAAAAANAAAAAAAAACAA</Hsp_qseq>
          <Hsp_hseq>AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA</Hsp_hseq>
          <Hsp_midline>|||||||||| |||||||||| ||||||||| ||</Hsp_midline>
        </Hsp>

Do you want "bt full" output from gdb instead. ;-)))))) This is likely the longest bug report I ever wrote and 4 A.M.

biopython install problem OS X mavericks

after the Mac OS X Mavericks install my python install was reverted and I've started to reinstall all modules. I've installed the needed Xcode command line tools, the numpy package and then proceeded to the biopython install.

However the install (either from easy_install or compiling from source tarball originates the following warnings and the install appears to be stalled and never completes:

Bio/cpairwise2module.c:313:12: warning: implicit conversion loses integer precision: 'Py_ssize_t' (aka 'long') to 'int'
[-Wshorten-64-to-32]
lenA = PySequence_Length(py_sequenceA);
~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7/abstract.h:1065:27: note: expanded from macro
'PySequence_Length'

define PySequence_Length PySequence_Size

Bio/cpairwise2module.c:314:12: warning: implicit conversion loses integer precision: 'Py_ssize_t' (aka 'long') to 'int'
[-Wshorten-64-to-32]
lenB = PySequence_Length(py_sequenceB);
~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7/abstract.h:1065:27: note: expanded from macro
'PySequence_Length'

define PySequence_Length PySequence_Size

2 warnings generated.
Any idea on how to solve this problem? Thanks in advance

ProteinAlphabet to ThreeLetterProtein converter

This is a small issue but I haven't found a way to easily convert from the single-letter Protein alphabet to the three-letter alphabet and vice-versa.

It would be nice to have the possibility to specify the protein alphabet to translate to as such:

   from Bio.Seq import Seq
   from Bio.Alphabet import IUPAC, ThreeLetterProtein
   coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG", IUPAC.unambiguous_dna)
   protein = coding_dna.translate(alphabet=ThreeLetterProtein)

or define a converter function from one alphabet to the other.

Medline parser and multi-line MeSH terms

Hi, when I use " Entrez.efetch(db="pubmed", id=l, rettype="medline", retmode="text") "
to download pubmed citations, I found an error of the result.
Such as PMID is "23039619", the reslut of MeSH terms are "['Blood Circulation', 'High-Intensity Focused Ultrasound Ablation/adverse', 'effects/instrumentation/_methods', 'Humans', 'Models, Biological', 'Sonication', 'Temperature', 'Time Factors', 'Transducers']
" , the term 'effects/instrumentation/_methods' is not a MeSH term.
And I think the reason of this kind of error is when MeSH term has a subheading, and the subheading has space, the biopython program separate the phrase with the space wrongly.
Finally, I want to thank you for your works, biopython is very helpful for my research.

Track labels above graph in GenomeDiagram?

Would be great to have an option to put the names of tracks above the respective tracks, not below.

Improve data representation styles in GenomeDiagram?

Currently there is no way to create a diagram with bar style without odd colouring: in the lower part one can assign colour only to the area, not covered by data points. In some cases this type of colouring might be good, but far from always. Sometimes the best representation would be like in UCSC GenomeBrowser: points with colour-filled contour under them (approximately equivalent to barchart with identical colour all over inside of bars; actually it is impossible to tell which one is it on genome-wide scale). If you need example pictures, I can easily make one from both systems to compare styles.

.

pairwise

Lack of deltablast Wrapper

There isn't a built-in deltablast wrapper (Bio/Blast/Applications.py), I wrote one which seems to work for me and is based off of the blastp wrapper. I don't fully understand the parameters (specifically self.parameters appears to be a subset of the command-line parameters taken) but all of the parameters of blastp are valid for deltablast as well so this shouldn't cause any problems. Additionally I've been using my wrapper for blastp and I haven't seen any problems

The modified applications.py can be found at:
http://www.filedropper.com/applications_1

Phylo.read: [ causing parsing problems

...this isn't a huge thing, but I thought I should post it in case it's of interest.

from Bio import Phylo
from StringIO import StringIO
#Works fine
Phylo.read(StringIO("(A, (B, C), (D, E))"), "newick")
#Error!
Phylo.read(StringIO("[&U] (A, (B, C), (D, E))"), "newick")
...traceback...
Bio.Phylo.NewickIO.NewickError: Parentheses do not match in (sub)tree: &U] (A, (B, C), (D, E)

The error seems to imply that the square brackets are parsed as though they were regular parentheses. Maybe you don't like all this '[&U]' stuff in Newick phylogenies; that's fair enough, but other packages sometimes put these into their output so it might be worth checking for. For example:

import dendropy
tree = dendropy.Tree.get_from_string("(A, (B, C), (D, E))", "newick")
tree.write_to_path("test.tre", "newick")
Phylo.read("test.tre", "newick")
open("test.tre").readlines()
...shows that the file contains...
['[&U] (A,(B,C),(D,E));\n']

Thanks for BioPython; it's a great set of software, and makes my life significantly more pleasant :D

Scale "below" plot in GenomeDiagram?

It would be very nice to have an option to draw graph's scale "below" the data ("below" in both meanings: as in layers, and as in page coordinates - at zero-level, not at mid-level). Both in linear and circular formats of course.

Now in some cases the scale can hide data quiet a lot in some cases.

No way to set max and min of scale in GenomeDiagram?

I couldn't find any way to set max/min value for scale in GenomeDiagram graphs, but it seems to be a rather important possibility - when plotting >1 regions of a genome it is very difficult to compare the values on graphs, when the max value on there scale is different.

If I am wrong again and there is a way to do that, I am sorry, please, tell, where it is written about :)

Bio.Phylo.draw_graphviz need to be updated

Error sample:

In [1]: from Bio import Phylo
In [2]: tree = Phylo.read("VK-uniq-prot.tree", "newick")
In [3]: Phylo.draw_graphviz(tree)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-11557e5fc0f3> in <module>()
----> 1 Phylo.draw_graphviz(tree)

/usr/lib/python3.3/site-packages/Bio/Phylo/_utils.py in draw_graphviz(tree, >label_func, prog, args, node_color, **kwargs)
    138 
    139     G = to_networkx(tree)
--> 140     Gi = networkx.convert_node_labels_to_integers(G, discard_old_labels=False)
    141     try:
    142         posi = networkx.graphviz_layout(Gi, prog, args=args)

TypeError: convert_node_labels_to_integers() got an unexpected keyword argument 'discard_old_labels'

New networkx version uses other argument specification:
In [4]: import networkx
In [5]: help(networkx.convert_node_labels_to_integers)

Return a copy of the graph G with the nodes relabeled with integers.

    Parameters
    ----------
    G : graph
       A NetworkX graph

    first_label : int, optional (default=0)
       An integer specifying the offset in numbering nodes.
       The n new integer labels are numbered first_label, ..., n-1+first_label.

    ordering : string
       "default" : inherit node ordering from G.nodes()
       "sorted"  : inherit node ordering from sorted(G.nodes())
       "increasing degree" : nodes are sorted by increasing degree
       "decreasing degree" : nodes are sorted by decreasing degree

    label_attribute : string, optional (default=None)
       Name of node attribute to store old label.  If None no attribute
       is created.

    Notes
    -----
    Node and edge attribute data are copied to the new (relabeled) graph.

    See Also
    --------
    relabel_nodes

BED/WIG/... files in Biopython?

I haven't found any issues here about that, but it seems a very obvious thing: Biopython needs to have a parser for files with genomic data, like WIG or BED. As far as I know, there is know such thing in Biopython.
There are a few existing packages out there, pybedtools for bed, wiggelen for wig... But I think the most comprehensive in terms of number of formats is track (https://github.com/xapple/track); is it possible that biopython adopts it (or any other package)?
It would be a great improvement for Biopython in my opinion.

Minimal MEME format support in Bio.motifs.MEME parser

The MEME parser fails to even load X.meme files downloaded FROM MEME.

The complaint is:

ValueError("Unexpected end of stream: 'TRAINING SET' not found.")

Looking at the code, it will fail again even if that worked bc it expects **** to follow.

it seems the parser only understands the FULL MEME output format, not the minimal format that seems to be what MEME actually stores its motifs in. This seems a MAJOR oversight. At the very least it limits the usefulness of the module greatly.

Is there any plan to support the minimal MEME format? IS it supported? Am I just being a dunce? I am good at being a dunce.

Anyway thanks.

run_tests.py doctest doesn't report problems

If one of the doctests fails, then "python run_tests.py" reports an error, but "python run_tests.py doctest" does not. With "python run_tests.py doctest" it seems that all tests, including the doctests, passed.

typo in _index.py results in IndexError: list index out of range if Fastq record has length 0

Dear Biopython developers,

I think there is a typo in _index.py which is exposed when a user tries to index_db() a fastq file which contains a 0 length record. Note that SeqIO.parse() handles these records gracefully.

On line 566, a ValueError() is called, but not call to raise() is made. This leaves the "handle" pointing at the empty qual line, causing an IndexError when the function attempts to process the next ID

Here is the relevant code in _index.py:
565 if line and line[0:1] != at_char:
566 ValueError("Problem with line %s" % repr(line))
567 break

Thanks for your awesome work on Biopython!

Sam

P.S.
A couple of Fastq formatted records which will recreate the crash:

@EMWLCP001DET6P
CCCATCATTTATAATTTTAACACGTCCTAGCGTGTAATCTACGGTAT
+
NA+<?;8M@*?=C4UE6'J<:=>??K=????;>?>K==????K>?>;
@EMWLCP001CB6TP
CAGAGTTGCAAGTGCCGGTAATCGCCCTCTCACAGCTGAACCGTGGGCCAGAGCAGCGAGCCGACAAGATGCCGGCCCTGAGCGACCTGCGTGAGTCGGGTTCGCTCGAGCAAGACGCCGACCTAGT
+
?;??>K>?>K>=??K>K>?K>?><RD1??????><?>>K>K>>?RD1K>??>>??????>J=??<K>?><=K>J=PC.??>?>?=I<?>=>>?;=<?PC.J=>:=?;>?>?K=>4<<K=>>F8>>?=
@EMWLCP001DHOHL

+

@EMWLCP001C1YIG
CAGAGGCAAAATGAGAAGGCAGAAAGGGATAGGAGCTG
+
>??;I;?TD4%>>>?E7G9=>?PC.PC.<??K>>?>?K
@EMWLCP001AOVPY
CAGTCGGACAGGCGCAGTTCATGGCGGCGCGAGCACACCGTTTCGCCCAGGTAGACGATGTCCACAGG
+
>>><?K>???K>??>??K><=?K>?K>????=<???<K=>RD1><RD1?K>>?=>?=????K>?>>K=

GenomeDiagram draws unnecessary regions without data.

If I try to draw some data which is not a full genome, but some specific region, GenomeDiagram draws all the genome before specified data. I get long tracks, mostly blank, and a short region, containing my data (look at the picture). Probably the problem is caused by genomic positions in my data: the first one is not 1.
I have not tried to do that in circular format.

database created by SeqIO.index_db() function only accessible from working directory

If a fasta file is stored as a file on disk (using an SQLite3 databse) using Bio.SeqIO.index_db() function, the absolute path retrieval of the SQLite3 databse does not work.

When we are trying to retrieve sequences from the stored files on disk using Bio.SeqIO.index_db() function, initialization works fine but sequence retrieval does not work because it is looking for the ".fasta" file in the working directory rather than looking in the absolute path where ".idx" file is present.

Currently the workaround is to set the current working directory to the original path of the file location. But changing current working directory inside scripts at multiple times may not be ultra productive after all.

Illustrated example attached in screenshot.

bioinfoboy@bifx-cli:~/db/biomart$ pwd
/homes/bioinfoboy/db/biomart
bioinfoboy@bifx-cli:~/db/biomart$ python3
Python 3.3.3 (default, Dec  3 2013, 17:27:12) 
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import SeqIO
>>> dbHandler = SeqIO.index_db("hKC.idx")
>>> dbHandler["ENST00000577736_snoU13.464-201#snoRNA"]
SeqRecord(seq=Seq('GTCCTTTTATAGTTGATGAGCATGATGATTGGGTGTTCACACGCATGTGTGAAA...ACA', SingleLetterAlphabet()), id='ENST00000577736_snoU13.464-201#snoRNA', name='ENST00000577736_snoU13.464-201#snoRNA', description='ENST00000577736_snoU13.464-201#snoRNA', dbxrefs=[])
>>> 
bioinfoboy@bifx-cli:~/db/biomart$ cd
bioinfoboy@bifx-cli:~$ pwd
/homes/bioinfoboy
bioinfoboy@bifx-cli:~$ python3
Python 3.3.3 (default, Dec  3 2013, 17:27:12) 
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import SeqIO
>>> dbHandler = SeqIO.index_db("/homes/bioinfoboy/db/biomart/hKC.idx")
>>> dbHandler["ENST00000577736_snoU13.464-201#snoRNA"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/homes/bioinfoboy/python/lib/python3.3/site-packages/Bio/File.py", line 615, in __getitem__
    proxy = self._proxy_factory(self._format, self._filenames[file_number])
  File "/homes/bioinfoboy/python/lib/python3.3/site-packages/Bio/SeqIO/__init__.py", line 885, in proxy_factory
    return _FormatToRandomAccess[format](filename, format, alphabet)
  File "/homes/bioinfoboy/python/lib/python3.3/site-packages/Bio/SeqIO/_index.py", line 152, in __init__
    SeqFileRandomAccess.__init__(self, filename, format, alphabet)
  File "/homes/bioinfoboy/python/lib/python3.3/site-packages/Bio/SeqIO/_index.py", line 39, in __init__
    self._handle = _open_for_random_access(filename)
  File "/homes/bioinfoboy/python/lib/python3.3/site-packages/Bio/File.py", line 89, in _open_for_random_access
    handle = open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'hKC.fasta'

greytrack_font_rotation in GenomDiagram circular format doesn't work.

If you generate a circular diagram and specify greytrack_font_rotation for your tracks nothing changes, labels are still radial (I wanted to locate them along the tracks with greytrack_font_rotation=90 or 270).

Bio pairwise local alignment

the speed pairwise2.align.localxx(seq1,seq2) is too slow!
I'm using emboss water, it is very faster. Can you incorporate water in your Biopython?

[please delete]

I just realised my problem was with importing the wrong package -- please disregard/delete

Missing esearch.dtd when compiling from source (Python 3.3)

When trying to use the Bio.Entrez module, I encountered this message:

Bio.Entrez uses NCBI's DTD files to parse XML files returned by NCBI Entrez.
Though most of NCBI's DTD files are included in the Biopython distribution,
sometimes you may find that a particular DTD file is missing. While we can
access the DTD file through the internet, the parser is much faster if the
required DTD files are available locally.

For this purpose, please download esearch.dtd from

http://eutils.ncbi.nlm.nih.gov/eutils/dtd/20060628/esearch.dtd

and save it either in directory

/usr/local/lib/python3.3/site-packages/Bio/Entrez/DTDs

or in directory

/home/rmcclosk/.biopython/Bio/Entrez/DTDs

in order for Bio.Entrez to find it.

Alternatively, you can save esearch.dtd in the directory
Bio/Entrez/DTDs in the Biopython distribution, and reinstall Biopython.

Please also inform the Biopython developers about this missing DTD, by
reporting a bug on https://github.com/biopython/biopython/issues or sign
up to our mailing list and emailing us, so that we can include it with the
next release of Biopython.

Proceeding to access the DTD file through the internet...

I'm using Python3.3, and I compiled BioPython from the version 1.63 tarball. I notice that in build/lib.linux-x86_64-3.3, there is a file called "eSearch_020511.dtd" - perhaps this just needs to be renamed. Thanks!

Error using Bio.Unigene parser

I have encountered a problem trying to parse Hs.data using the Bio.Unigene parser.

I have provided a description of my problem here: http://www.biostars.org/p/90622/ (I hope it's OK to provide just the link)

I am using Biopython version 1.63 and Python version 2.7.6.

Thanks for your time!

Parsing error in SearchIO HMMer 2 parser

Some input files seem to trigger a crash in the hsp parser with an Index error on
otherseq += self.line[19:].split()[0].strip()
I'll try to reproduce this outside of my software pipeline and will provide a fix.

CodonAdaptationIndex math domain error

I get an error on certain sequences when running the CodonAdaptationIndex module from Bio.SeqUtils.CodonUsage. The error is thrown when using the cai_for_gene method. The error is:

File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqUtils/CodonUsage.py", line 119, in cai_for_gene
    caiValue += math.log(self.index[codon])
ValueError: math domain error

I have figured out that the issue is when a codon in the tested sequence. If the index has the value of a codon as 0.00 and the tested sequence has that codon in it, then the error is thrown due to the math.log(0.00). E.g. if in the generated index codon CTA is 0.00 and CTA is present in the tested sequence then the error is thrown. Modifying the code of CodonUsage.py to be:

if codon!='ATG' and codon!= 'TGG': #these two codons are always one, exclude them.
                    if self.index[codon]!=0.00: #if the codon is 0 in the index an exception will be thrown so increment the length but do no increase the caiValue
                        caiValue += math.log(self.index[codon])
                    LengthForCai += 1

appears to fix the issue but I am unsure if this is the correct way to fix it.

Hope this helps.
Cheers,
Conor

Potential error in mass calculations for RNA/DNA?

In Bio/Data/IUPACData.py the molecular weights of unambiguous DNA are listed as:

unambiguous_dna_weights = {
    "A": 347.,
    "C": 323.,
    "G": 363.,
    "T": 322.,
    }

As far as I can tell these are the molecular weights for the non-deoxy bases instead of the deoxy bases. For example, AMP (347.22) instead of dAMP (331.22) is listed.

I've looked at the original BioPearl code that these numbers were taken from and I think they were just copied incorrectly. I have also looked at the code which uses this dict in Bio/SeqUtils/init.py called molecular_weight() and it just takes the sum of these values over the sequence (no correction made).

So, is this an error or am I missing something basic?
Thanks

More Windows line ending issues with SearchIO Index

This is a report of (more) Windows line ending issues. There is also another relevant issue (#197) opened by Konstantin..

The current results for buildbot tests on Windows are inconsistent for SearchIO Index tests. On Windows XP the tests pass, but they fail on Windows 7.

This is probably not dependent on the Windows version, but on the fact that the Windows 7 setup for github converts the text files to local newlines (\r\n) whereas the XP github does not.

When the test is done there is an assert comparing a docstring to a file. The docstring part converts the newlines to \n whereas (on Windows 7) the file has newlines as (\r\n), thus the test fails.

Heavy files SVG export in GenomeDiagram?

I was all the time exporting all the diagrams in PDF. But once I got the picture I needed, I decided to export into SVG, so I could modify a few things easily in Inkscape. I tried it with the same Diagram, which I was able to export into PDF (although it took a while), but I waited for over 40 minutes and it still was writing it to disk... And it is only 10% of E. coli genome (meaning every 10th nucleotide in circular format), not all of it, not human genome!
Is it a pure format issue? Or is it an issue of efficiency of different format export in GenomeDiagram? Or is it an issue in ReportLab?

Integration testing on online code

While extensive testing of online code is taxing on servers, at least some sporadic testing should be done (e.g. once a week or so)?

Phylo.CDAOIO Broken?

The test for Phylo.CDAOIO is failing (see below for a trace). The failure seems to occur inside the module and not on the test code.

The speculation is that the interface of rdflib (which the module uses) might have changed since this code was written.

ERROR: test_parse_0 (main.ParseTests)

Parse the phylogenies in test.cdao.

Traceback (most recent call last):
File "test_Phylo_CDAO.py", line 43, in test_parse
trees = list(bp._io.parse(filename, 'cdao'))
File "/home/tiago/Dropbox/soft/bp-release/release/biopython-1.63/Bio/Phylo/_io.py", line 53, in parse
for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
File "/home/tiago/Dropbox/soft/bp-release/release/biopython-1.63/Bio/Phylo/CDAOIO.py", line 63, in parse
return Parser(handle).parse(**kwargs)
File "/home/tiago/Dropbox/soft/bp-release/release/biopython-1.63/Bio/Phylo/CDAOIO.py", line 94, in parse
self.parse_handle_to_graph(**kwargs)
File "/home/tiago/Dropbox/soft/bp-release/release/biopython-1.63/Bio/Phylo/CDAOIO.py", line 115, in parse_handle_to_graph
graph.parse(file=self.handle, publicID=base_uri, format=parse_format)
TypeError: parse() takes at least 2 arguments (3 given)

ExPASy.sprot_search_ful and ExPASy.sprot_search_de do not work

Both will return just pages stating "Please update your links/bookmarks" and eventually correct link at the end of page.
The functions use http://www.expasy.ch/cgi-bin/sprot-search-ful or http://www.expasy.ch/cgi-bin/sprot-search-de eventually, which pages do not exist anymore.

Problem with hit.id and hit.description in SearchIO parsing of blast xml results

I am using Bio.SearchIO to parse a blast-generated xml file. I am getting inconsistent results for the values returned by hit.id and hit.description. On some hits within a single query, hit.id returns the Hit_id . . . /Hit_id field from the input file and hit.description returns the Hit_def . . . /Hit_def field as one would expect. On other hits, hit.description returns the part of the Hit_def . . . /Hit_def field after the first whitespace, and hit.id returns the part of the Hit_def . . . /Hit_def field before the first whitespace. I do not see any differences in the input file that relate to this. Thus, it appears to be a bug in SearchIO. A small program that demonstrates the problem along with sample input and output files are at http://umcsdept.ftml.net/SearchIO/

MuscleCommandline example using stdin and stdout broken on Python 3

Following the wiki example for using the MUSCLE commandline interface under Python 3 fails at the stage in which you write the SeqRecords to the child process's stdin. In Python 3, string data cannot be written to this buffer; only bytes can:

>>> SeqIO.write(records, child.stdin, "fasta")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.3/site-packages/Bio/SeqIO/__init__.py", line 463, in write
    count = writer_class(fp).write_file(sequences)
  File "/usr/lib/python3.3/site-packages/Bio/SeqIO/Interfaces.py", line 266, in write_file
    count = self.write_records(records)
  File "/usr/lib/python3.3/site-packages/Bio/SeqIO/Interfaces.py", line 251, in write_records
    self.write_record(record)
  File "/usr/lib/python3.3/site-packages/Bio/SeqIO/FastaIO.py", line 189, in write_record
    self.handle.write(">%s\n" % title)
TypeError: 'str' does not support the buffer interface
>>> child.stdin.write("blah")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' does not support the buffer interface
>>> child.stdin.write("blah".encode())
4

However, trying to get around this by writing encoded bytes to a Seq object (understandably) fails:

>>> SeqRecord(Seq("ATCG".encode()), id="foo", description="bar")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.3/site-packages/Bio/Seq.py", line 106, in __init__
    raise TypeError("The sequence data given to a Seq object should "
TypeError: The sequence data given to a Seq object should be a string (not another Seq object etc)

I'm not sure whether this should be fixed in Biopython or the example should be removed or updated in the Wiki (I haven't yet found a workaround, other than just skipping keeping everything in memory and using intermediate files).

import Bio.Seq fails in python3

Is module Seq available under python3?
I've installed fresh biopython from the repo (version shows 1.62+) under python3, and tried first example from documentation, that starts with line
from Bio.Seq import Seq
It fails with error message:

from Bio.Seq import Seq
Traceback (most recent call last):
File "", line 1, in
File "Bio/Seq.py", line 24, in
from Bio.Alphabet import IUPAC
File "Bio/Alphabet/IUPAC.py", line 11, in
from Bio.Data import IUPACData
File "Bio/Data/IUPACData.py", line 37, in
'U': 'Sel', 'O': 'Pyl',
TypeError: unsupported operand type(s) for +: 'dict_items' and 'dict_items'

GenomeDiagram: in circular format label of a feature, which crosses 0, is drawn in a wrong place.

If you try to create a circular Diagram with a feature, that starts near the end of a sequence and end in the beginning (crosses zero) and draw it's label in the middle, it is drawn in a wrong place: on the opposite side.
If you need my code, please tell, but it is quiet an obvious thing.
I also noticed that complicated sigils (OCTO and JAGGY at least) also don't work with such features, probably on the same reason.

Looking at the code for _CircularDrawer it seems, that the problem lies in calculating the angle to middle point of such features: average position gives wrong place in such cases (see _circularDrawer.get_feature_sigil, line 378).

Scientific notation in branch lengths for Newick parser

The Newick parser does not handle branch lengths in scientific notation. For example:

"(foo:1e-1,bar:0.1)"

gets parsed as "[Clade(branch_length=1.0, name='e-1'), Clade(branch_length=0.1, name='bar')]".

Changing line 29 in NewickIO.py as follows is a fix:

(r":[0-9]*.?[0-9]+([eE][+-]?[0-9]+)?",'edge length'),

Incorrect parsing of hit and query strand in BlatIO.

Based on the below-linked document describing the PSL format (as well as my own PSL files), it seems that most PSL files report only a single character for the strand (+ or -) indicating the strand of the query.

http://uswest.ensembl.org/info/website/upload/psl.html

However, the current version of BlatIO assumes that both the query and hit strand are reported in the PSL file with a two character sequence (e.g. +-, --, etc.) Here is the relevant code. Please fix!

https://github.com/biopython/biopython/blob/master/Bio/SearchIO/BlatIO.py#L289

Missing DTD files

Fetching from Pubmed works great, but some DTD files seem to be missing. Here are the warnings I get:

/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py:501: UserWarning: Unable to load DTD file pubmed_130501.dtd.

Bio.Entrez uses NCBI's DTD files to parse XML files returned by NCBI Entrez.
Though most of NCBI's DTD files are included in the Biopython distribution,
sometimes you may find that a particular DTD file is missing. While we can
access the DTD file through the internet, the parser is much faster if the
required DTD files are available locally.

For this purpose, please download pubmed_130501.dtd from

http://www.ncbi.nlm.nih.gov/corehtml/query/DTD/pubmed_130501.dtd

and save it either in directory

/usr/lib/pymodules/python2.7/Bio/Entrez/DTDs

or in directory

/home/kml/.biopython/Bio/Entrez/DTDs

in order for Bio.Entrez to find it.

Alternatively, you can save pubmed_130501.dtd in the directory
Bio/Entrez/DTDs in the Biopython distribution, and reinstall Biopython.

Please also inform the Biopython developers about this missing DTD, by
reporting a bug on http://bugzilla.open-bio.org/ or sign up to our mailing
list and emailing us, so that we can include it with the next release of
Biopython.

Proceeding to access the DTD file through the internet...

  warnings.warn(message)

And the same warning for these two DTDs:

/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py:501: UserWarning: Unable to load DTD file nlmmedlinecitationset_130501.dtd.
/usr/lib/pymodules/python2.7/Bio/Entrez/Parser.py:501: UserWarning: Unable to load DTD file bookdoc_130101.dtd.

This is with Biopython 1.58 on Ubuntu 12 (the repository package).

Phylo module - Problem when writing Nexus file

Hello,

a small issue when writing Nexus file. I use the Phylo.write() function to write a Nexus file with one tree and I always have 2 ';' characters at the end of the tree. It can be a problem for some softwares: e.g. FigTree raises an error when reading such file.

An example of code:
from cStringIO import StringIO
tree = Phylo.read(StringIO("(A, (B, C), (D, E))"), "newick")
Phylo.write(tree, "test.nex", "nexus")

Thanks

missing DTD's for esearch and esummary

Warning about missing DTD's to be downloaded.

to replicate:

Python 2.7.5 (default, Aug 25 2013, 00:04:04) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import Entrez
>>> import Bio
>>> Bio.__version__
'1.63'
>>> Entrez.email = "[email protected]"
>>> data = Entrez.read(Entrez.esearch("nucleotide", "48165[BioProject]"))
/Library/Python/2.7/site-packages/Bio/Entrez/Parser.py:525: UserWarning: Unable to load 
DTD file esearch.dtd.
[...]
>>> summary = Entrez.read(Entrez.esummary(db="nucleotide", id=data['IdList'][0]))
/Library/Python/2.7/site-packages/Bio/Entrez/Parser.py:525: UserWarning: Unable to load 
DTD file esummary-v1.dtd.
[...]

Keep up the good work! =)

Entrez error

I am running the following code using version 1.62

from Bio import Entrez
Entrez.email = "[email protected]"
handle = Entrez.esearch(db="pubmed", term="biopython")
record = Entrez.read(handle)

and I get the following error. This problem just started happening today.

Traceback (most recent call last):
File "C:/Python33/protein/test.py", line 4, in
record = Entrez.read(handle)
File "C:\Python33\lib\site-packages\Bio\Entrez__init__.py", line 367, in read
record = handler.read(handle)
File "C:\Python33\lib\site-packages\Bio\Entrez\Parser.py", line 184, in read
self.parser.ParseFile(handle)
File "C:\Python33\lib\site-packages\Bio\Entrez\Parser.py", line 522, in externalEntityRefHandler
warnings.warn(message)
File "C:\Python33\lib\idlelib\PyShell.py", line 60, in idle_showwarning
file.write(warnings.formatwarning(message, category, filename,
AttributeError: 'NoneType' object has no attribute 'write'

TogoWS pubmed headers

The test code for TogoWS makes reference to a ti field on pubmed records. Pubmed records on Togo do not have ti fields (this test works again an online database that is eventually changing over time).

The current solution was to change the test of TogoWS, removing the references to the ti field, but we need to research if this is the best solution.

References

Current test code (with ti removed):
https://github.com/biopython/biopython/blob/45686a0fa43c9c5f4f425fa4d23f153684eb3d33/Tests/test_TogoWS.py

Previous test (with ti):
https://github.com/biopython/biopython/blob/14977ce6667176cdaf622991a2f997e9fa688f06/Tests/test_TogoWS.py

Dealing with Timeouts during the Blast

Sometimes the Biopython program will provide a timeout error when doing a blast. Is there a good way to have the program look for an exception and run the blast request again or do something else to recover from the timeout?

Thanks

SubsMat.MatrixInfo update

I've updated Bio.SubsMat.MatrixInfo.py with a new substitution matrix (PHAT) and written a little function to output the matrices in a more useable format. I've been using the new version for months, and figured I should polish it up and submit it to be included in the official package. Anyone interested in taking a look?
-Steve

bug in module Bio.Phylo.BaseTree.BranchColor.to_hex()

This one-line function is supposed to return an HTML-compatible color, but it currently produces incorrect strings in two ways:

It has an "L" appended to results.
It does not zero-pad to 6 hexits because of the "L", so any code with zero red intensity returns a 5-hexit value whose results are unpredictable in downstream software.

I think the uninitialized function works correctly, but the fully-loaded function does not, perhaps because of overloading of str somewhere.

A fix would by to replace the current code of
return '#' + hex(
self.red * (164)
+ self.green * (162)
+ self.blue)[2:].zfill(6)

with
return '#%02x%02x%02x' %(self.red, self.green,self.blue)

biopython / biopython Goto Github PK

biopython's Introduction

Biopython README file

For the impatient

Python Requirements

Optional Dependencies

Installation From Source

Testing

Experimental code

Bugs

Contributing, Bug Reports

Distribution Structure

biopython's People

Contributors

Stargazers

Watchers

Forkers

biopython's Issues

define PySequence_Length PySequence_Size

define PySequence_Length PySequence_Size

Parse the phylogenies in test.cdao.

Recommend Projects

Recommend Topics

Recommend Org