nickjcroucher / gubbins Goto Github PK

Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins

Home Page: http://nickjcroucher.github.io/gubbins/

License: GNU General Public License v2.0

Shell 1.12% Python 48.10% Puppet 0.17% C 38.67% C++ 0.28% Makefile 0.96% M4 4.05% R 6.64%

genomics sequencing next-generation-sequencing research bioinformatics bioinformatics-pipeline global-health infectious-diseases pathogen

gubbins's Introduction

Gubbins

Genealogies Unbiased By recomBinations In Nucleotide Sequences

Introduction
Installation
Usage
License
Feedback/Issues
Citation
Further Information

Introduction

Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions, which are marked as recombinations, while concurrently constructing a phylogeny based on the putative point mutations outside of these regions. Simulations demonstrate the algorithm generates highly accurate reconstructions under realistic models of short-term bacterial evolution, and can be run in only a few hours on alignments of hundreds of bacterial genome sequences.

Installation

Before starting your analysis, please have a look at the Gubbins webpage, manual, tutorial, plotting advice and/or publication.

Required dependencies

Phylogenetic software:

Python modules:

Biopython (>1.59),
DendroPy (>=4.0)
Scipy
Numpy
Multiprocessing
Numba

See environment.yml for details. These are in addition to standard build environment tools (e.g. python >=3.8, pip3, make, autoconf, libtool, gcc, check, etc...). There are a number of ways to install Gubbins and details are provided below. If you encounter an issue when installing Gubbins please contact your local system administrator.

Recommended installation method - conda

Install conda and enable the bioconda channels. This can be done using the normal command line (Linux), with Terminal (OSX) or the Powershell (Windows versions >=10).

conda config --add channels r
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install gubbins

Linux - Ubuntu Xenial (16.04) & Debian (unstable)

Gubbins has been packaged by the Debian Med team and is trivial to install using apt.

sudo apt-get install gubbins

OSX/Linux - from source

Install the dependencies and include them in your PATH. Clone or download the source code from GitHub and run the following commands to install Gubbins:

autoreconf -i
./configure [--prefix=$PREFIX]
make
[sudo] make install
cd python
[sudo] python3 -m pip install [--prefix=$PREFIX] .

Use sudo to install Gubbins system-wide. If you don't have the permissions, run configure with a prefix to install Gubbins in your home directory.

OSX/Linux - installing from the repository

The easiest way to install the latest version of the code from this repository is to set up a conda environment with the packages needed for installation, then remove gubbins:

conda create -c bioconda -n gubbins_git gubbins python=3.9
conda activate gubbins_git
conda install -c conda-forge libtool autoconf-archive automake pkg-config check pytest
conda remove --force gubbins

Then download and install the repository in the same environment:

git clone https://github.com/nickjcroucher/gubbins
cd gubbins
autoreconf -i
chmod +x configure 
./configure --prefix=$CONDA_PREFIX
make
sudo make install
cd python
python3 -m pip install .

You may encounter an issue with clang versions not being able to link to library files during make check. If you have installed into a conda environment called gubbins_env, this can be solved with the command:

FILES=`ls -l1 $CONDA_PREFIX/lib/clang/*/lib/darwin/*`; for clang_dir in $(ls -d1 $CONDA_PREFIX/lib/clang/*); do if [[ ! -d "$clang_dir/lib/darwin" ]]; then mkdir -p $clang_dir/lib/darwin; for clang_file in $(echo $FILES); do ln -s $clang_file $clang_dir/lib/darwin/; done; fi;  done

OSX/Linux/Windows - Virtual Machine

Gubbins can be run through the Powershell in Windows versions >=10. We have also created a virtual machine which has all of the software setup, along with the test datasets from the paper. It is based on Bio-Linux 8. You need to first install VirtualBox, then load the virtual machine, using the 'File -> Import Appliance' menu option. The root password is 'manager'.

ftp://ftp.sanger.ac.uk/pub/pathogens/pathogens-vm/pathogens-vm.latest.ova

Running the tests

The test can be run from the top level directory:

make check

Usage

To run Gubbins with default settings:

run_gubbins.py [FASTA alignment]

Information on on further options can be found in the manual.

License

Gubbins is free software, licensed under GPLv2.

Feedback/Issues

There is no specific support for development or maintenance of Gubbins. However, we will try to help you out if you report any issues about usage of the software to the issues page.

Development plan

Version 3 incorporates a number of features that were explicitly requested by users (e.g. plotting functions), improved the algorithm's accuracy (e.g. using joint ancestral reconstruction) and were commonly used in published analyses (e.g. using IQTREE2 for phylogeny construction).

Future development will prioritise:

More efficient phylogenetic processing with modern python libraries
Parallelisation of recombination searches
Faster sequence reconstruction through hardware acceleration
Extension of existing analyses using phylogenetic placement

If you believe there are other improvements that could be added, please describe them on the issues page and tag the suggestion as an "enhancement".

Citation

If you use this software please cite: Croucher N. J., Page A. J., Connor T. R., Delaney A. J., Keane J. A., Bentley S. D., Parkhill J., Harris S.R. "Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins". doi:10.1093/nar/gku1196, Nucleic Acids Research, 2014.

Further Information

For more information on this software see the Gubbins webpage.

Data from the paper

Midpoint rerooting

From version 1.3.5 (25/6/15) to version 1.4.6 (29/2/16) trees were not midpoint rerooted by default. This does not have any effect on the recombination detection, but the output trees may not look as expected. Users are advised to upgrade to the latest version.

Ancestral sequence reconstruction

From version 3.0.0 onwards, Gubbins will use joint ancestral reconstructions with a modified version of pyjar by default. Version 2 used marginal ancestral reconstruction with RAxML; this is still available in version 3, using the --mar flag (IQtree can also be used for reconstruction in version >3.0.0). This may useful in cases where memory use is limiting. Version 1 used joint ancestral reconstruction with fastML.

gubbins's People

Stargazers

Watchers

gubbins's Issues

slow fastml

Hi,
Gubbins used to be pretty fast (v1.4.5 I believe) but for an unknown reason it stopped working (it was installed on biolinux 8 using apt-get) so I updated to the latest version using homebrew, but now it's about 10x slower, I'm guessing this is caused by the rather slow fastml3. How can I revert back to the older version or are there compelling reasons not to do so?
Aldert

removing identical isolates

Is there any way to stop gubbins removing "identical" isolates?

Input format is not clear

Hi, is it possible to get a clearer description of the expected input format?
run_gubbins.py [FASTA alignment]
I'm not clear what a FASTA alignment is exactly.

gubbins_drawer.py help improvement

Can you explain in -h what args should be?

Usage: gubbins_drawer.py [options] args

Options:
  -h, --help            show this help message and exit

  Output Options:
    -o FILE, --output=FILE
                        output file name [default= test.pdf]
    -t TREE, --tree=TREE
                        tree file to align tab files to

Error running gubbins_drawer.py with v. 2.2.0

Hi there,

I hope you can help? I keep getting the error "AttributeError: 'SeqFeature' object has no attribute 'sub_features'" when attempting to run gubbins_drawer.py.

The command used was:
gubbins_drawer.py -o test_gub.pdf -t core.full.final_tree.tre core.full.branch_base_reconstruction.embl

Full output:
28360 features found for core.full.branch_base_reconstruction.embl
Traceback (most recent call last):
File "/home/linuxbrew/.linuxbrew/Cellar/gubbins/2.2.0/libexec/bin/gubbins_drawer.py", line 685, in
new_tracks=add_ordered_tab_to_diagram(arg)
File "/home/linuxbrew/.linuxbrew/Cellar/gubbins/2.2.0/libexec/bin/gubbins_drawer.py", line 250, in add_ordered_tab_to_diagram
new_tracks=add_ordered_embl_to_diagram(record, incfeatures=["i", "d", "li", "del", "snp", "misc_feature", "core", "cds", "insertion", "deletion", "recombination", "feature", "blastn_hit", "fasta_record", "variation"], emblfile=False)
File "/home/linuxbrew/.linuxbrew/Cellar/gubbins/2.2.0/libexec/bin/gubbins_drawer.py", line 204, in add_ordered_embl_to_diagram
locations=iterate_subfeatures(feature, locations)
File "/home/linuxbrew/.linuxbrew/Cellar/gubbins/2.2.0/libexec/bin/gubbins_drawer.py", line 142, in iterate_subfeatures
if len(feature.sub_features)>0:
AttributeError: 'SeqFeature' object has no attribute 'sub_features'

I wonder if it might be related to this: chapmanb/bcbb#110

Thanks.
Best,
Robyn

Is there minimum sample size to use gubbins?

Hello @andrewjpage
Is there minimum sample size to use gubbins? - Is 8 genomes sufficient?

error while loading shared libraries: libgubbins.so.0

I was able to install gubbins on my Ubuntu 14.04 machine yesterday using the manual install. Tried the linuxbrew option but ran into some problems. Anyway, I am able to execute the python script:

christian@juliano:~/Desktop$ run_gubbins.py
usage: run_gubbins.py [-h] [--outgroup OUTGROUP]
                      [--starting_tree STARTING_TREE] [--use_time_stamp]
                      [--verbose] [--no_cleanup] [--tree_builder TREE_BUILDER]
                      [--iterations ITERATIONS] [--min_snps MIN_SNPS]
                      [--filter_percentage FILTER_PERCENTAGE]
                      [--prefix PREFIX] [--threads THREADS]
                      [--converge_method CONVERGE_METHOD] [--version]
                      [--min_window_size MIN_WINDOW_SIZE]
                      [--max_window_size MAX_WINDOW_SIZE]
                      alignment_filename
run_gubbins.py: error: the following arguments are required: alignment_filename

However, when I try running the sample data provided on the website I run into this problem:

christian@juliano:~/Desktop$ run_gubbins.py ST239.aln 
Trying PTHREADS version of raxml because no single threaded version of raxml could be found. Just to warn you, this requires 2 threads.

Using FastML 3 with GTR model

gubbins: error while loading shared libraries: libgubbins.so.0: cannot open shared object file: No such file or directory
Gubbins crashed, please ensure you have enough free memory

I found that src/libgubbins.la references this missing libgubbins.so.0 shared library, and I added this src/ directory to my LD_LIBRARY_PATH variable, as this post suggests, but still runs into the same problem. I'm quite a bit out of my depth, so any help would be much appreciated.

not enough mem

Dear Sir,
I get the follwing error when I run gubbins, on a dataset that includes a lot of "N" 's (more than 25%). it acorts withi this message:
Failed while running Gubbins. Please ensure you have enough free memory
Gubbins run finished

which I thuink is unlinkely since I run gubbins on a server with 256G of ram and from ""top" it seems that not even a fraction of this is being used. If I remove the sequences by hand that contain lots of N's or filter with gubbins for 25%, the run finishes. Note that I set and would like to use the "--filter_percentage 95" flag.
Any help would be very welcome.
best, Falk Hildebrand

gubbins + snippy

I am a bit confused about the combination snippy - gubbins you suggest. If I understand right (which I might not) Using snippy-core for alignments against a reference genome will identify snps from short reads to the reference, that are mapped and finally multiple single alignments are merged to create a core-alignment. Again, since those are short reads, and these alignment only conserve #"universally" conserved positions, I am having a hard time seeing how recombination events will be present in that core alignment. Am I missing something here?

Forgot to make a 2.2.1 release?

The versions and changelog seem to be there?
But no release.

Warning while parsing tree: non-numeric label N38 for internal node

When I use the -t fasttree option I get these warnings:

Warning while parsing tree: non-numeric label N38 for internal node
Warning while parsing tree: non-numeric label N37 for internal node
Warning while parsing tree: non-numeric label N22 for internal node
Warning while parsing tree: non-numeric label N21 for internal node
Warning while parsing tree: non-numeric label N48 for internal node
Warning while parsing tree: non-numeric label N47 for internal node
<snip>

I've never seen them when using FastTree in a regular manner.

snippy+ gubbins

I have questions when using snippy + gubbins:

When I use reference in snippy does it have to be reference from the same dataset or well annotated genome?
Snippy includes reference to alignment so should I remove it before running gubbins or just leave it in and root the tree with it?

Thanks

No ./configure in github - need to run autoconf?

The INSTALL says to run ./configure but there is no script. There was no autogen.sh either, so I ran autoconf and it did create the configure script with a few warnings (Ubuntu 12 server):

% autoconf
configure.ac:3: error: possibly undefined macro: AM_INIT_AUTOMAKE
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.
configure.ac:10: error: possibly undefined macro: AC_PROG_LIBTOOL
configure.ac:21: error: possibly undefined macro: AM_PATH_PYTHON
configure.ac:23: error: possibly undefined macro: AC_MSG_WARN

Support raxml-AVX2 if available

class RAxMLExecutable(object):
	def __init__(self, threads, model = 'GTRCAT', verbose = False ):
		self.verbose = verbose
		self.threads = threads
		self.single_threaded_executables = ['raxmlHPC-AVX','raxmlHPC-SSE3','raxmlHPC']
		self.multi_threaded_executables = ['raxmlHPC-PTHREADS-AVX','raxmlHPC-PTHREADS-SSE3','raxmlHPC-PTHREADS']
		self.model = model

Can you add 'raxmlHPC-PTHREADS-AVX2 to the start of the list?

It will be installed by Brew if your CPU supports it and you build from source.

But it may not exist for some people (I assume you fall back until one exists?

Failing test on 2.1.0

The recently released version 2.1.0 does not complete its tests successfully:

======================================================================
FAIL: test_robinson_foulds_convergence (test_external_dependancies.TestExternalDependancies)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/vagrant/gubbins/python/gubbins/tests/test_external_dependancies.py", line 82, in test_robinson_foulds_convergence
    assert common.GubbinsCommon.has_tree_been_seen_before(['RAxML_result.multiple_recombinations.iteration_1','RAxML_result.multiple_recombinations.iteration_2','RAxML_result.multiple_recombinations.iteration_3','RAxML_result.multiple_recombinations.iteration_4','RAxML_result.multiple_recombinations.iteration_5','RAxML_result.multiple_recombinations.iteration_6','RAxML_result.multiple_recombinations.iteration_7','RAxML_result.multiple_recombinations.iteration_8','RAxML_result.multiple_recombinations.iteration_9','RAxML_result.multiple_recombinations.iteration_10','multiple_recombinations.final_tree.tre'],'robinson_foulds') == 1
AssertionError

I've been running the tests on Debian stretch, using RaXML 8.2.8.

brew install gubbins errors - no formula for pillow

Hi all,

Having a few issues:
Thanks for any suggestions. Python3 installed with brew, pip3 install pillow was successful in case that matters. Using Ubuntu 14.04

$ brew install gubbins
==> Installing gubbins from homebrew/science
Error: No available formula with the name "pillow" (dependency of homebrew/science/gubbins)
==> Searching for a previously deleted formula...
Error: No previously deleted formula found.
==> Searching for similarly named formulae...
Error: No similarly named formulae found.
==> Searching taps...
Error: No formulae found in taps.

When searching for 'pil' (brew search pillow gives no results)

$ brew search pil
bluepill homebrew/science/piler pilosa
cfr-decompiler homebrew/science/pilercr procyon-decompiler
closure-compiler homebrew/science/pilon
homebrew/science/omcompiler pillar

If you meant "pil" specifically:
Instead of PIL, consider pip install pillow or brew install Homebrew/science/pillow.

$ brew install Homebrew/science/pillow

homebrew/science/pillow was deleted from homebrew/science in commit 052866d:
pillow: migrate to homebrew/core

To show the formula before removal run:
git -C "$(brew --repo homebrew/science)" show 052866d^:pillow.rb

Gubbins with SNP Alignment?

Hello,

Would I be correct in assuming that running Gubbins with a SNP alignment (only polymorphic sites) is inappropriate?

Or is it possible to correct for ascertainment bias in Gubbins?

Thank you,
Conrad Izydorczyk

Running example files

When I try to run the example files as described in the gubbins manual, I get the message "failed while building the tree". Might anyone know how to fix it?
Thanks!

python setup.py ignores prefix set with top level ./configure

See title

Source install doesn't install python wrappers

After make install I only have

bin:
total 12
-rwxr-xr-x 1 anthony ngs_users 11227 Jul 22 10:44 gubbins

lib:
total 188
-rw-r--r-- 1 anthony ngs_users 108818 Jul 22 10:44 libgubbins.a
-rwxr-xr-x 1 anthony ngs_users    969 Jul 22 10:44 libgubbins.la
lrwxrwxrwx 1 anthony ngs_users     19 Jul 22 10:44 libgubbins.so -> libgubbins.so.0.0.1
lrwxrwxrwx 1 anthony ngs_users     19 Jul 22 10:44 libgubbins.so.0 -> libgubbins.so.0.0.1
-rwxr-xr-x 1 anthony ngs_users  77633 Jul 22 10:44 libgubbins.so.0.0.1

Segmentation fault with some input files

Hi,

I'm getting Seg Faluts with gubbins when running with certain input files.
It's been working fine with other input files though.

I did an strace on my run, and here's the last few lines before it crashes.
one thing seems weird to me is that the mmap is invoked with the file descriptor -1

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe0b1708000 lseek(3, 0, SEEK_SET) = 0 read(3, "##fileformat=VCFv4.2\n##contig=<I"..., 4096) = 4096 lseek(3, 4096, SEEK_SET) = 4096 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7fff56d26e98} --- +++ killed by SIGSEGV (core dumped) +++

gubbins_strace.31071.txt

Thanks

segmentation fault

I've been trying to run this on Mac OSX and get a seg fault

....
subprocess.CalledProcessError: Command 'gubbins -r -v snps_vs_HO.fas.gaps.vcf -t snps_vs_HO.fas.iteration_1 -p snps_vs_HO.fas.gaps.phylip -m 3 snps_vs_HO.fas' returned non-zero exit status -11

Any ideas why this may be?

Linuxbrew installs 1.4.7 still

I just installed via linuxbrew and got v1.4.7 rather than v2.

Run a screening test to reject alignments that will likely crash

The iqtree tool runs a simple "composition test" on any alignment you give it, and then complains if it is too simple. What you want is the negative version of this, to test an alignment is too divergent, and reject it at the outset before segfaulting later.

https://github.com/Cibiv/IQ-TREE/wiki/Frequently-Asked-Questions#what-is-the-purpose-of-composition-test

python 3.5.0 not recognised by 'configure'

Hello,

I'm trying to install gubbins and have all the prerequisites available as described in the documentation, however configure is failing when trying to identify the python version. This is python 3.5.0, which is normally accessed as 'python3' on this machine, however I have also aliased it to 'python' in an attempt to persuade configure to find it.

[bss-admin@codon ~]$ which python   
alias python='python3'
        /usr/local/python/3.5.0/bin/python3
[bss-admin@codon ~]$ python --version
Python 3.5.0

However, despite this, configure does not like it:

checking for a Python interpreter with version >= 3.0... none
configure: WARNING: Python not found. Python is required to build presage python binding. Python can be obtained from http://www.python.org

The relevant section from config.log is pasted below. All my attempts at coercing it to accept this python version have failed. Do you have any suggestions on how to make this behave?

Many thanks
James

configure:20393: checking for a Python interpreter with version >= 3.0
configure:20408: python -c import sys, string # split strings by '.' and convert to numeric. Append some zeros # because we need at least 4 digits for the hex conversion. minver = map(int, string.split('3.0', '.')) + [0, 0, 0] minverhex = 0 for i in xrange(0, 4): minverhex = (minverhex << 8) + minver[i] sys.exit(sys.hexversion < minverhex)
configure:20411: $? = 1
configure:20408: python2 -c import sys, string # split strings by '.' and convert to numeric. Append some zeros # because we need at least 4 digits for the hex conversion. minver = map(int, string.split('3.0', '.')) + [0, 0, 0] minverhex = 0 for i in xrange(0, 4): minverhex = (minverhex << 8) + minver[i] sys.exit(sys.hexversion < minverhex)
configure:20411: $? = 1
configure:20408: python2.5 -c import sys, string # split strings by '.' and convert to numeric. Append some zeros # because we need at least 4 digits for the hex conversion. minver = map(int, string.split('3.0', '.')) + [0, 0, 0] minverhex = 0 for i in xrange(0, 4): minverhex = (minverhex << 8) + minver[i] sys.exit(sys.hexversion < minverhex)
./configure: line 20409: python2.5: command not found
configure:20411: $? = 127
configure:20408: python2.4 -c import sys, string # split strings by '.' and convert to numeric. Append some zeros # because we need at least 4 digits for the hex conversion. minver = map(int, string.split('3.0', '.')) + [0, 0, 0] minverhex = 0 for i in xrange(0, 4): minverhex = (minverhex << 8) + minver[i] sys.exit(sys.hexversion < minverhex)
configure:20411: $? = 1
configure:20408: python2.3 -c import sys, string # split strings by '.' and convert to numeric. Append some zeros # because we need at least 4 digits for the hex conversion. minver = map(int, string.split('3.0', '.')) + [0, 0, 0] minverhex = 0 for i in xrange(0, 4): minverhex = (minverhex << 8) + minver[i] sys.exit(sys.hexversion < minverhex)
./configure: line 20409: python2.3: command not found
configure:20411: $? = 127
configure:20408: python2.2 -c import sys, string # split strings by '.' and convert to numeric. Append some zeros # because we need at least 4 digits for the hex conversion. minver = map(int, string.split('3.0', '.')) + [0, 0, 0] minverhex = 0 for i in xrange(0, 4): minverhex = (minverhex << 8) + minver[i] sys.exit(sys.hexversion < minverhex)
./configure: line 20409: python2.2: command not found
configure:20411: $? = 127
configure:20408: python2.1 -c import sys, string # split strings by '.' and convert to numeric. Append some zeros # because we need at least 4 digits for the hex conversion. minver = map(int, string.split('3.0', '.')) + [0, 0, 0] minverhex = 0 for i in xrange(0, 4): minverhex = (minverhex << 8) + minver[i] sys.exit(sys.hexversion < minverhex)
./configure: line 20409: python2.1: command not found
configure:20411: $? = 127
configure:20408: python2.0 -c import sys, string # split strings by '.' and convert to numeric. Append some zeros # because we need at least 4 digits for the hex conversion. minver = map(int, string.split('3.0', '.')) + [0, 0, 0] minverhex = 0 for i in xrange(0, 4): minverhex = (minverhex << 8) + minver[i] sys.exit(sys.hexversion < minverhex)
./configure: line 20409: python2.0: command not found
configure:20411: $? = 127
configure:20408: python1.6 -c import sys, string # split strings by '.' and convert to numeric. Append some zeros # because we need at least 4 digits for the hex conversion. minver = map(int, string.split('3.0', '.')) + [0, 0, 0] minverhex = 0 for i in xrange(0, 4): minverhex = (minverhex << 8) + minver[i] sys.exit(sys.hexversion < minverhex)
./configure: line 20409: python1.6: command not found
configure:20411: $? = 127
configure:20408: python1.5 -c import sys, string # split strings by '.' and convert to numeric. Append some zeros # because we need at least 4 digits for the hex conversion. minver = map(int, string.split('3.0', '.')) + [0, 0, 0] minverhex = 0 for i in xrange(0, 4): minverhex = (minverhex << 8) + minver[i] sys.exit(sys.hexversion < minverhex)
./configure: line 20409: python1.5: command not found
configure:20411: $? = 127
configure:20417: result: none
configure:20469: WARNING: Python not found. Python is required to build presage python binding. Python can be obtained from http://www.python.org

Retaining recombinant free non-polymorphic sites

Is there a way in gubbins to generate a fasta file which can retain recombinantion free regions of both non-polymorphic sites and polymorphic sites from the alignment ? At present gubbins seems to only generate recombinantion free regions of polymorphic sites in fasta file. Thanks

gubbins would not run

Dear Sir

I had a copy of Gubbins installed in BioLinux8 (Ubuntu 14.04). When I tried to run the program. I got:
yue@PC-027[yue] run_gubbins.py -h [ 1:03PM]
/home/yue/.linuxbrew/bin/run_gubbins.py: /home/yue/.linuxbrew/Cellar/gubbins/2.2.0/libexec/bin/run_gubbins.py: /home/yue/.linuxbrew/opt/python3/bin/python3.5: bad interpreter: No such file or directory
/home/yue/.linuxbrew/bin/run_gubbins.py: line 2: /home/yue/.linuxbrew/Cellar/gubbins/2.2.0/libexec/bin/run_gubbins.py: Success
It says success but it is not quite right as I could not run the program.

Please help,
Yue

multithreading

Hi,

I was wondering if there is multithreaded/MPI version of gubbins?

INSTALL file not obvious

The INSTALL file is really nice, and written in Markdown, but it's not obvious where it is and that it isn't the standard GNU INSTALL autoconf document.

Could you append it into the README.md or give it a .md extension ?

Inconsistencies in branch stats file

Hi,
I'm running Gubbins 2.0.0 and seeing some inconsistencies in my branch stats output files, including outputs for distinct analyses using different sets of sequences. For example...

Node Total_SNPs Num_of_SNPs_inside_recombinations Num_of_SNPs_outside_recombinations Num_of_Recombination_Blocks Bases_in_Recombinations r/m rho/theta Genome_Length Bases_in_Clonal_Frame
573.1446 98 0 98 0 79847 0.000000 0.000000 5343462 5263613

So there are 0 recombination blocks on the branch leading to node 573.1446 but somehow there are 79847 bases in recombinant regions? Is the number of bases in recombinant regions counting everything in the history of that node rather than just the preceding branch?

Thanks in advance for your clarification of this,
Kelly

FastTree doesn't honour --num_threads due to OpenMP vars

FastTree uses OpenMP and respects the $OMP_NUM_THREADS variable.

Can you override that variable if the user specifies --num_threads ?

It should still respect the overall $OMP_THREAD_LIMIT variable.

A few questions on better practices for SNPs alignments

Dear Authors,

I am using Gubbins to detect recombination in my alignment; my dataset is constituted by protein coding SNPs only (protozoan genome, multichromosome)

Can I use the SNP alignment as input, rather than the core genome?
If that would cause biases in the analyses, would it be correct to build a "core genome" with only the genes presenting SNPs? Maybe separated by Ns (especially in between chromosomes)? I did not actually calle variants in non coding regions (I know the people at Sanger very familiar with the problems related to calling variants in certain protozoan intergenic regions...)
Should I compare only "subclades" of my single-species tree? Or would that only be necessary in case of high intra-species diversity? -EDIT- I re-read the original article, for what I understand I should probably compare isolates in subpopulation that actually have the chance to come in contact and recombine with each other.

I hope my questions are detailed enough, please let me know if there is any more information you need.

Looking forward to hearing from you,
Max

numpy issues

Hi,

Please can you help regarding an installation issue. I have tried to troubleshoot this but having no luck. On my linux system, brew install python works fine but when i run brew install gubbins it os not picking up numpy which is definitely installed (see below).

dk@TCD50[dk] brew install gubbins [ 9:43PM]
==> Installing gubbins from homebrew/homebrew-science
==> Using a fortran compiler found at /usr/bin/gfortran.
This may be changed by setting the FC environment variable.
==> Downloading https://github.com/sanger-pathogens/gubbins/archive/v1.4.1.tar.gz
Already downloaded: /home/dk/.cache/Homebrew/gubbins-1.4.1.tar.gz
==> Downloading https://pypi.python.org/packages/source/n/nose/nose-1.3.7.tar.gz
Already downloaded: /home/dk/.cache/Homebrew/gubbins--nose-1.3.7.tar.gz
==> python3 -c import setuptools... --no-user-cfg install --prefix=/home/dk/.linuxbrew/Cellar/g
==> Downloading https://pypi.python.org/packages/source/b/biopython/biopython-1.65.tar.gz
Already downloaded: /home/dk/.cache/Homebrew/gubbins--biopython-1.65.tar.gz
==> python3 -c import setuptools... --no-user-cfg install --prefix=/home/dk/.linuxbrew/Cellar/g
Last 15 lines from /home/dk/.cache/Homebrew/Logs/gubbins/02.python3:
--prefix=/home/dk/.linuxbrew/Cellar/gubbins/1.4.1_1/libexec/vendor
--single-version-externally-managed
--record=installed.txt

running install

Numerical Python (NumPy) is not installed.

This package is required for many Biopython features. Please install
it before you install Biopython. You can install Biopython anyway, but
anything dependent on NumPy will not work. If you do this, and later
install NumPy, you should then re-install Biopython.

You can find NumPy at http://www.numpy.org

I have tried the following solution from issue 1975, but it still doesn't work
mkdir -p /home/xxx/.local/lib/python3.5/site-packages
echo 'import site; site.addsitedir("/home/xxx/.linuxbrew/lib/python3.5/site-packages")' >> /home/xxx/.local/lib/python3.5/site-packages/homebrew.pth

Thanks
Dan

x

Direct output to a directory

Is it possible to direct the output files to a specific directory instead of generating them on the working directory? The --prefix option doesn't seem to have that behavior. The same question for the tmp directory.
Thanks!

TypeError: reroot_at_midpoint() got an unexpected keyword argument 'update_splits'

Im running into the attached error.
slurm-2236.txt

I dont have PAUP in my path

gubbins_drawer graph

Hello,
I used gubbins drawers to visualize the output of gubbins. However, the generated graph does not specify the genes where the recombinations are found. How can I fix it?
Thank you

' INSTALL' doesn't link to INSTALL.md

Install
Please see the INSTALL file for detailed instructions.

it should be a link?

Number of SNPs is more than input alignment

Hi,

My issue is that the resulting phylip file has more SNPs than the input. Here is a brief of my analysis:

I have ~100 isolates of the same species. I mapped the reads onto the reference genome, and created an alignment of 'pseudogenomes', which was used as input to Gubbins. Each pseudogenome is composed of the reference genome, but with the substitution SNPs predicted for each strain integrated into it, as described in ([http://mbio.asm.org/content/7/2/e00347-16.full]). Point is, I know exactly the number of substitution SNPs provided in the input. The resulting phylip file, however, had ~1000SNPs more even after filtering recombination! Which doesn't make sense because the program predicted several recombined regions for that dataset.

Initially, I thought that the program is counting the N's in the phylip file so I removed all the columns where an N is found. The final number of SNPs was still more by ~100.

I also ran Gubbins on 5 strains from the same dataset (using the same outgroup and the same method of pseudogenomes) and the final number of SNPs was less by ~1000 SNPs, which seems about right.

My question is, why are there still N's in the phy file, which is an alignment of all core SNPs. If the N's are there to mask recombination, then why aren't these regions removed from the phy file (and the tree)? They are quite confusing.. Most importantly, why am I getting more SNPs?

Thank you,
Areej

ltmain.sh should be removed?

I can't claim any experience with autoconf and friends, but perhaps the ltmain.sh in the top-level directory should not be there? After ./configure I was seeing this:

milou-b: /sw/apps/bioinfo/gubbins/1.4.2/src/gubbins-1.4.2 $ make
make  all-recursive
make[1]: Entering directory `/pica/sw/apps/bioinfo/gubbins/1.4.2/src/gubbins-1.4.2'
Making all in src
make[2]: Entering directory `/pica/sw/apps/bioinfo/gubbins/1.4.2/src/gubbins-1.4.2/src'
/bin/sh ../libtool --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I..     -g -O2 -MT Newickform.lo -MD -MP -MF .deps/Newickform.Tpo -c -o Newickform.lo Newickform.c
libtool: Version mismatch error.  This is libtool 2.4.2 Debian-2.4.2-1ubuntu1, but the
libtool: definition of this LT_INIT comes from libtool 2.2.6b.
libtool: You should recreate aclocal.m4 with macros from libtool 2.4.2 Debian-2.4.2-1ubuntu1
libtool: and run autoconf again.
make[2]: *** [Newickform.lo] Error 63
make[2]: Leaving directory `/pica/sw/apps/bioinfo/gubbins/1.4.2/src/gubbins-1.4.2/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/pica/sw/apps/bioinfo/gubbins/1.4.2/src/gubbins-1.4.2'
make: *** [all] Error 2

My autotools are rather old (Scientific Linux 6.7), but the local libtool created by ./configure doesn't match my autotools version (2.2.6b):

milou-b: /sw/apps/bioinfo/gubbins/1.4.2/src/gubbins-1.4.2 $ ./libtool --version
libtool (GNU libtool) 2.4.2

If I remove ltmain.sh and reconfigure, the versions match:

milou-b: /pica/sw/apps/bioinfo/gubbins/1.4.2/src/gubbins-1.4.2 $ ./libtool --version
ltmain.sh (GNU libtool) 2.2.6b

from gubbins import common ImportError: cannot import name common

Can't find RAxML in path

I have RAxML in path

type raxmlHPC
raxmlHPC is hashed (/phengs/hpc_software/RAxML/8.2.8/multithread/raxmlHPC)

However gubbins reports

No usable version of RAxML could be found, please ensure one of these executables is in your PATH:
raxmlHPC-PTHREADS-AVX
raxmlHPC-PTHREADS-SSE3
raxmlHPC-PTHREADS
 raxmlHPC-AVX
raxmlHPC-SSE3
raxmlHPC

"Binary package - For 64 bit Linux" links to nowhere

At the github.io website, the link "Binary package - For 64 bit Linux" is a non-existent HTML anchor.

Is it possible to use gubbins to remove recombination from a core gene alignment?

I was wondering whether gubbins would be able to accurately detect recombination from a core gene alignment, implemented by Roary?

Ns generated in filtered .phylip file

Hi,

Is there a reason behind the filtered_polymorphic_sites.phylip file containing 'N' bases in some of the sequences? The input file does not contain any, so I'm unsure why N's are generated.

Thanks,
Cam

Question: gubbins input?

I have WGA aligned using mugsy and converted to fasta in galaxy. @andrewjpage or other have you tested using mugsy WGAs for gubbinst input - is it suitable alternative to snippy or not?

Thank you in advance

.embl output file not opening in Artemis

Hi,
I used gubbins in my whole genome alignment containing 9 whole genome sequences of _Helicobacter pylori. _ When I tried to view .gff and .embl file onto any sequence from the alignment in Artemis, there is an error message like this:
read failed: one of the features in the entry has out of range location: 1623848..1624333
Can you please help me fix this issue?

Some options have default listed twice in --help

eg. "default 10000 (default: 10000)" is this because of long and short options?

  --tree_builder TREE_BUILDER, -t TREE_BUILDER
                        Application to use for tree building
                        [raxml|fasttree|hybrid], default RAxML (default:
                        raxml)
  --iterations ITERATIONS, -i ITERATIONS
                        Maximum No. of iterations, default is 5 (default: 5)
  --min_snps MIN_SNPS, -m MIN_SNPS
                        Min SNPs to identify a recombination block, default is
                        3 (default: 3)
  --filter_percentage FILTER_PERCENTAGE, -f FILTER_PERCENTAGE
                        Filter out taxa with more than this percentage of
                        gaps, default is 25 (default: 25)
  --min_window_size MIN_WINDOW_SIZE, -a MIN_WINDOW_SIZE
                        Minimum window size, default 100 (default: 100)
  --max_window_size MAX_WINDOW_SIZE, -b MAX_WINDOW_SIZE
                        Maximum window size, default 10000 (default: 10000)
  --raxml_model RAXML_MODEL, -r RAXML_MODEL
                        RAxML model [GTRGAMMA|GTRCAT], default GTRCAT
                        (default: GTRCAT)

Gubbins - Ns in output alignment

Dear Gubbins Team,

I am trying to use Gubbins to scan for recombination a SNP alignment which contains no missing data. The input alignment has 21,134 sites, the output file (my_output.filtered_polymorphic_sites.fasta) contains 20,174 sites. I initially thought that this was due to the potential recombinant sites being excluded; however I noticed that there are 'Ns' in the output alignment. What are these due to?
I apologize in case this is a duplicated topic, as I found a similar question here: #182 , but there is no answer yet.

Thank you for your kind attention,
Max Tagliamonte

Losing taxa?

Hi,
I have been using Gubbins for a few studies and it has worked nicely. However, in a recent study I have experienced a problem where I loose a few taxa "during" the Gubbins process. Is there somehow to obtain information as to why this occurred in the generated files?

Thank you!