Git Product home page Git Product logo

magus's Introduction

MAGUS

Multiple Sequence Alignment using Graph Clustering


Purpose and Functionality

MAGUS is a tool for piecewise large-scale multiple sequence alignment.
The dataset is divided into subsets, which are independently aligned with a base method (currently MAFFT -linsi). These subalignments are merged together with the Graph Clustering Merger (GCM). GCM builds the final alignment by clustering an alignment graph, which is constructed from a set of backbone alignments. This process allows MAGUS to effectively boost MAFFT -linsi to over a million sequences.

The basic procedure is outlined below. Steps 4-7 are GCM.

  1. The input is a set of unaligned sequences. Alternatively, the user can provide a set of multiple sequence alignments and skip the next two steps.
  2. The dataset is decomposed into subsets.
  3. The subsets are aligned with MAFFT -linsi.
  4. A set of backbone alignments are generated with MAFFT -linsi (or provided by the user).
  5. The backbones are compiled into an alignment graph.
  6. The graph is clustered with MCL.
  7. The clusters are resolved into a final alignment.

Installing MAGUS

Deepest thanks to Baqiao Liu for setting up the MAGUS PyPI package (https://pypi.org/project/magus-msa/)
This is currently the easiest way to get started with MAGUS.
The package can be installed with

pip3 install magus-msa

and executed with

magus <arguments>

Alternatively, you can download and extract the code from this repository to a directory of your choice.
Then, you can run MAGUS with

python3 <directory_path>/magus.py


Dependencies

MAGUS requires

  • Python 3
  • MAFFT (linux version is included)
  • MCL (linux version is included)
  • FastTree and Clustal Omega are needed if using these guide trees (linux versions included)

If you would like to use some other version of MAFFT and/or MCL (for instance, if you're using Mac), you will need to edit the MAFFT/MCL paths in configuration.py
(I'll pull these out into a separate config file to make it simpler).


Getting Started

Please navigate your terminal to the "example" directory to get started with some sample data.
A few basic ways of running MAGUS are shown below.
Run "magus.py -h" to view the full list of arguments.

Align a set of unaligned sequences from scratch
python3 ../magus.py -d outputs -i unaligned_sequences.txt -o magus_result.txt

-o specifies the output alignment path
-d (optional) specifies the working directory for GCM's intermediate files, like the graph, clusters, log, etc.

Merge a prepared set of alignments
python3 ../magus.py -d outputs -s subalignments -o magus_result.txt

-s specifies the directory with subalignment files. Alternatively, you can pass a list of file paths.


Controlling the pipeline

Specify subset decomposition behavior
python3 ../magus.py -d outputs -i unaligned_sequences.txt -t fasttree --maxnumsubsets 100 --maxsubsetsize 50 -o magus_result.txt

-t specifies the guide tree method to use, and is the main way to set the decomposition strategy.
Available options are fasttree (default), parttree, clustal (recommended for very large datasets), and random.
--maxnumsubsets sets the desired number of subsets to decompose into (default 25).
--maxsubsetsize sets the threshold to stop decomposing subsets below this number (default 50).
Decomposition proceeds until maxnumsubsets is reached OR all subsets are below maxsubsetsize.

Specify beckbones for alignment graph
python3 ../magus.py -d outputs -i unaligned_sequences.txt -r 10 -m 200 -o magus_result.txt
python3 ../magus.py -d outputs -s subalignments -b backbones -o magus_result.txt

-r and -m specify the number of MAFFT backbones and their maximum size, respectively. Default to 10 and 200.
Alternatively, the user can provide his own backbones; -b can be used to provide a directory or a list of files.

Specify graph trace method
python3 ../magus.py -d outputs -i unaligned_sequences.txt --graphtracemethod mwtgreedy -o magus_result.txt

--graphtracemethod is the flag that governs the graph trace method. Options are minclusters (default and recommended), fm, mwtgreedy (recommended for very large graphs), rg, or mwtsearch.

Unconstrained alignment
python3 ../magus.py -d outputs -i unaligned_sequences.txt -c false -o magus_result.txt

By default, MAGUS constrains the merged alignment to induce all subalignments. This constraint can be disabled with -c false.
This drastically slows MAGUS and is strongly not recommended above 200 sequences.


Things to Keep in Mind

  • MAGUS will not overwrite existing backbone, graph and cluster files.
    Please delete them/specify a different working directory to perform a clean run.
  • Related issue: if MAGUS is stopped while running MAFFT, MAFFT's output backbone files will be empty.
    This will cause errors if MAGUS reruns and finds these empty files.
  • A large number of subalignments (>100) will start to significantly slow down the ordering phase, especially for very heterogenous data.
    I would generally disadvise using more than 100 subalignments, unless the data is expected to be well-behaved.

Related Publications

magus's People

Contributors

lrauschning avatar mgnute avatar runeblaze avatar vlasmirnov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

magus's Issues

StopIteration error

I am trying to run the example python3 ../magus.py -d outputs -i unaligned_sequences.txt -o magus_result.txt and getting this error:

(magus-env) root@39db7c458bba:/wd/MAGUS/example# python3 ../magus.py -d outputs -i unaligned_sequences.txt -o magus_result.txt
MAGUS was run with: ../magus.py -d outputs -i unaligned_sequences.txt -o magus_result.txt
Running a task, output file: /wd/MAGUS/example/magus_result.txt
Aligning sequences /wd/MAGUS/example/unaligned_sequences.txt
Read 1000 sequences from /wd/MAGUS/example/unaligned_sequences.txt ..
Building PASTA-style FastTree initial tree on /wd/MAGUS/example/unaligned_sequences.txt with skeleton size 300..
Running a task, output file: /wd/MAGUS/example/outputs/decomposition/initial_tree/initial_align.txt
Running an external tool, command: /miniconda3/envs/magus-env/bin/mafft --localpair --maxiterate 1000 --ep 0.123 --quiet --thread 128 --anysymbol /wd/MAGUS/example/outputs/decomposition/initial_tree/skeleton_sequences.txt > /wd/MAGUS/example/outputs/decomposition/initial_tree/temp_initial_align.txt
Completed a task, output file: /wd/MAGUS/example/outputs/decomposition/initial_tree/initial_align.txt
Running a task, output file: /wd/MAGUS/example/outputs/decomposition/initial_tree/skeleton_hmm/hmm_model.txt
Running an external tool, command: /miniconda3/envs/magus-env/bin/hmmbuild --ere 0.59 --cpu 1 --symfrac 0.0 --informat afa /wd/MAGUS/example/outputs/decomposition/initial_tree/skeleton_hmm/temp_hmm_model.txt /wd/MAGUS/example/outputs/decomposition/initial_tree/initial_align.txt
Completed a task, output file: /wd/MAGUS/example/outputs/decomposition/initial_tree/skeleton_hmm/hmm_model.txt
Read 700 sequences from /wd/MAGUS/example/outputs/decomposition/initial_tree/queries.txt ..
Running a task, output file: /wd/MAGUS/example/outputs/decomposition/initial_tree/chunks_queries/queries_chunk_1_aligned.txt
Running an external tool, command: /miniconda3/envs/magus-env/bin/hmmalign -o /wd/MAGUS/example/outputs/decomposition/initial_tree/chunks_queries/temp_queries_chunk_1_aligned.txt /wd/MAGUS/example/outputs/decomposition/initial_tree/skeleton_hmm/hmm_model.txt /wd/MAGUS/example/outputs/decomposition/initial_tree/chunks_queries/queries_chunk_1.txt
Completed a task, output file: /wd/MAGUS/example/outputs/decomposition/initial_tree/chunks_queries/queries_chunk_1_aligned.txt
Read 1000 sequences from /wd/MAGUS/example/outputs/decomposition/initial_tree/initial_align.txt ..
Found 100% ACGT-N, assuming DNA..
Data type wasn't specified. Inferred data type DNA from /wd/MAGUS/example/outputs/decomposition/initial_tree/initial_align.txt
Running a task, output file: /wd/MAGUS/example/outputs/decomposition/initial_tree/initial_tree.tre
Running an external tool, command: /miniconda3/envs/magus-env/bin/fasttree -nt -gtr -fastest -nosupport /wd/MAGUS/example/outputs/decomposition/initial_tree/initial_align.txt > /wd/MAGUS/example/outputs/decomposition/initial_tree/temp_initial_tree.tre
Completed a task, output file: /wd/MAGUS/example/outputs/decomposition/initial_tree/initial_tree.tre
Built initial tree on /wd/MAGUS/example/unaligned_sequences.txt in 183.0174605846405 sec..
Using target subset size of 50, and maximum number of subsets 25..
Read 1000 sequences from /wd/MAGUS/example/unaligned_sequences.txt ..
Task for /wd/MAGUS/example/magus_result.txt threw an exception:
generator raised StopIteration
Traceback (most recent call last):
  File "/miniconda3/envs/magus-env/lib/python3.8/site-packages/dendropy/dataio/newickreader.py", line 306, in tree_iter
    raise StopIteration
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/wd/MAGUS/tasks/task.py", line 59, in run
    func(**self.taskArgs)
  File "/wd/MAGUS/align/aligner.py", line 45, in runAlignmentTask
    decomposeSequences(context)
  File "/wd/MAGUS/align/decompose/decomposer.py", line 44, in decomposeSequences
    buildDecomposition(context, subsetsDir)
  File "/wd/MAGUS/align/decompose/decomposer.py", line 66, in buildDecomposition
    context.subsetPaths = treeutils.decomposeGuideTree(subsetsDir, context.sequencesPath, guideTreePath,
  File "/wd/MAGUS/helpers/treeutils.py", line 96, in decomposeGuideTree
    guideTree = dendropy.Tree.get(path=guideTreePath, schema="newick", preserve_underscores=True)
  File "/miniconda3/envs/magus-env/lib/python3.8/site-packages/dendropy/datamodel/treemodel.py", line 2732, in get
    return cls._get_from(**kwargs)
  File "/miniconda3/envs/magus-env/lib/python3.8/site-packages/dendropy/datamodel/basemodel.py", line 155, in _get_from
    return cls.get_from_path(src=src, schema=schema, **kwargs)
  File "/miniconda3/envs/magus-env/lib/python3.8/site-packages/dendropy/datamodel/basemodel.py", line 216, in get_from_path
    return cls._parse_and_create_from_stream(stream=fsrc,
  File "/miniconda3/envs/magus-env/lib/python3.8/site-packages/dendropy/datamodel/treemodel.py", line 2633, in _parse_and_create_from_stream
    tree_lists = reader.read_tree_lists(
  File "/miniconda3/envs/magus-env/lib/python3.8/site-packages/dendropy/dataio/ioservice.py", line 357, in read_tree_lists
    product = self._read(stream=stream,
  File "/miniconda3/envs/magus-env/lib/python3.8/site-packages/dendropy/dataio/newickreader.py", line 322, in _read
    for tree in self.tree_iter(stream=stream,
RuntimeError: generator raised StopIteration

MAGUS aborted with an exception..
Task manager found a failed task: /wd/MAGUS/example/magus_result.txt
Traceback (most recent call last):
  File "/miniconda3/envs/magus-env/lib/python3.8/site-packages/dendropy/dataio/newickreader.py", line 306, in tree_iter
    raise StopIteration
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "../magus.py", line 29, in main
    mainAlignmentTask()
  File "/wd/MAGUS/align/aligner.py", line 30, in mainAlignmentTask
    task.awaitTask()
  File "/wd/MAGUS/tasks/task.py", line 47, in awaitTask
    awaitTasks([self])
  File "/wd/MAGUS/tasks/task.py", line 94, in awaitTasks
    controller.awaitTasks(tasks)
  File "/wd/MAGUS/tasks/controller.py", line 34, in awaitTasks
    observeTaskManager()
  File "/wd/MAGUS/tasks/controller.py", line 53, in observeTaskManager
    runTask(task)
  File "/wd/MAGUS/tasks/manager.py", line 219, in runTask
    task.run()
  File "/wd/MAGUS/tasks/task.py", line 59, in run
    func(**self.taskArgs)
  File "/wd/MAGUS/align/aligner.py", line 45, in runAlignmentTask
    decomposeSequences(context)
  File "/wd/MAGUS/align/decompose/decomposer.py", line 44, in decomposeSequences
    buildDecomposition(context, subsetsDir)
  File "/wd/MAGUS/align/decompose/decomposer.py", line 66, in buildDecomposition
    context.subsetPaths = treeutils.decomposeGuideTree(subsetsDir, context.sequencesPath, guideTreePath,
  File "/wd/MAGUS/helpers/treeutils.py", line 96, in decomposeGuideTree
    guideTree = dendropy.Tree.get(path=guideTreePath, schema="newick", preserve_underscores=True)
  File "/miniconda3/envs/magus-env/lib/python3.8/site-packages/dendropy/datamodel/treemodel.py", line 2732, in get
    return cls._get_from(**kwargs)
  File "/miniconda3/envs/magus-env/lib/python3.8/site-packages/dendropy/datamodel/basemodel.py", line 155, in _get_from
    return cls.get_from_path(src=src, schema=schema, **kwargs)
  File "/miniconda3/envs/magus-env/lib/python3.8/site-packages/dendropy/datamodel/basemodel.py", line 216, in get_from_path
    return cls._parse_and_create_from_stream(stream=fsrc,
  File "/miniconda3/envs/magus-env/lib/python3.8/site-packages/dendropy/datamodel/treemodel.py", line 2633, in _parse_and_create_from_stream
    tree_lists = reader.read_tree_lists(
  File "/miniconda3/envs/magus-env/lib/python3.8/site-packages/dendropy/dataio/ioservice.py", line 357, in read_tree_lists
    product = self._read(stream=stream,
  File "/miniconda3/envs/magus-env/lib/python3.8/site-packages/dendropy/dataio/newickreader.py", line 322, in _read
    for tree in self.tree_iter(stream=stream,
RuntimeError: generator raised StopIteration

Waiting for 0 tasks to finish..
MAGUS finished in 183.12643766403198 seconds..

I had some trouble with dependencies and python versions, so I am running MAGUS in a conda environment, which is specified with the following environment.yml:

name: magus-env
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - python=3.8.0
  - dendropy=4.2.0
  - clustalo=1.2.4
  - mafft=7.490
  - mcl=14.137
  - hmmer=3.3.2
  - fasttree=2.1.10
  - raxml-ng=1.0.3

I also modified the paths in configuration.py as follows:

clustalPath = os.path.join(os.path.dirname(os.path.abspath(__file__)), "/miniconda3/envs/magus-env/bin/clustalo")
    mafftPath = os.path.join(os.path.dirname(os.path.abspath(__file__)), "/miniconda3/envs/magus-env/bin/mafft")
    mclPath = os.path.join(os.path.dirname(os.path.abspath(__file__)), "/miniconda3/envs/magus-env/bin/mcl")
    mlrmclPath = os.path.join(os.path.dirname(os.path.abspath(__file__)), "tools/mlrmcl/mlrmcl")
    hmmalignPath = os.path.join(os.path.dirname(os.path.abspath(__file__)), "/miniconda3/envs/magus-env/bin/hmmalign")
    hmmbuildPath = os.path.join(os.path.dirname(os.path.abspath(__file__)), "/miniconda3/envs/magus-env/bin/hmmbuild")
    hmmsearchPath = os.path.join(os.path.dirname(os.path.abspath(__file__)), "/miniconda3/envs/magus-env/bin/hmmsearch")
    fasttreePath = os.path.join(os.path.dirname(os.path.abspath(__file__)), "/miniconda3/envs/magus-env/bin/fasttree")
    raxmlPath = os.path.join(os.path.dirname(os.path.abspath(__file__)), "/miniconda3/envs/magus-env/bin/raxml-ng")

(I couldn't find a package for mlrmcl, but that doesn't seem to have anything to do with the error, as far as I can tell).

setup.py installs non-namespaced modules

Hi, Iā€™m reviewing the Bioconda recipe for MAGUS over at bioconda/bioconda-recipes#46022 and briefly looked at the repository. I noticed a relatively typical problem that I thought Iā€™d mention.

setup.py contains the line py_modules = py_modules=['magus', 'magus_configuration', 'version'],. The third entry in that list makes it so that a version module is available "site-wide" (not under the magus package). That is, I can write this to get the magus version:

python3 -c 'import version; print(version.__version__)'

This could easily conflict with other packages and result in weirdness. You may want to consider not using py_modules at all and instead arrange the code so that everything is located under a magus package (=directory with __init__.py).

Incompatible with python >3.8?

I encountered this error when using python v3.10.0

(magus-env) root@39db7c458bba:/wd/MAGUS/example# python3 ../magus.py -d outputs -i unaligned_sequences.txt -o magus_result.txt
Traceback (most recent call last):
  File "/wd/MAGUS/example/../magus.py", line 12, in <module>
    from align.aligner import mainAlignmentTask
  File "/wd/MAGUS/align/aligner.py", line 11, in <module>
    from align.decompose.decomposer import decomposeSequences 
  File "/wd/MAGUS/align/decompose/decomposer.py", line 11, in <module>
    from align.decompose import initial_tree, kmh
  File "/wd/MAGUS/align/decompose/initial_tree.py", line 13, in <module>
    from helpers import sequenceutils, hmmutils, treeutils
  File "/wd/MAGUS/helpers/treeutils.py", line 7, in <module>
    import dendropy
  File "/miniconda3/envs/magus-env/lib/python3.10/site-packages/dendropy/__init__.py", line 24, in <module>
    from dendropy.dataio.nexusprocessing import get_rooting_argument
  File "/miniconda3/envs/magus-env/lib/python3.10/site-packages/dendropy/dataio/__init__.py", line 20, in <module>
    from dendropy.dataio import newickreader
  File "/miniconda3/envs/magus-env/lib/python3.10/site-packages/dendropy/dataio/newickreader.py", line 29, in <module>
    from dendropy.dataio import nexusprocessing
  File "/miniconda3/envs/magus-env/lib/python3.10/site-packages/dendropy/dataio/nexusprocessing.py", line 30, in <module>
    from dendropy.utility import container
  File "/miniconda3/envs/magus-env/lib/python3.10/site-packages/dendropy/utility/container.py", line 356, in <module>
    class CaseInsensitiveDict(collections.MutableMapping):
AttributeError: module 'collections' has no attribute 'MutableMapping'

It seems similar to this issue, and it went away when I used python 3.8. So I'm guessing it's due to the same problem (use of deprecated collections).

Wrong number of characters ERROR

Hi,

I'm getting:

Output: FastTree Version 2.1.11 SSE3, OpenMP (64 threads)
Alignment: /user/work/tk19812/software/WITCH/examples/MAGUS/example/OBP.AA.outputs/decomposition/initial_tree/initial_align.txt
Amino acid distances: BLOSUM45 Joins: balanced Support: none
Search: Fastest+2nd +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.50
ML Model: Le-Gascuel 2008, CAT approximation with 20 rate categories
Wrong number of characters for Wallacei_Hege.OBPloc26: expected 351 but have 345 instead.
This sequence may be truncated, or another sequence may be too long.

when trying to align proteins. I don't understand why.
Thanks
F

Receiving TypeError

Hello there,

I am planning to run MAGUS on 600k sequences (S prot of Sars-CoV-2). I wanted to give it a quick try and downloaded influenza PB1 proteins and run MAGUS with python ./MAGUS/magus.py -d outputs1 -i PB1.fasta -o aligned1.txt but its keep failing, with an error TypeError: unsupported operand type(s) for +: 'int' and 'NoneType' , although the example file run without errors. You can find the log files and fasta file in the below link. Could you help me with this?

link to files

Thank you

The PermissionError

Hello again. :)

I am having a trouble, and I could not troubleshoot it. I am recieving The PermissionError: [errno 13] permission denied when I try it with my file which has nearly 600k sequences. MAGNUS is running on WSL2 with 40 cores and 400GB RAM. The code i am using is python ./MAGUS-master/magus.py -d outputs1 -i SGNRDiscarded.fasta -o aligned1.txt -t clustal. The thing is it runs alright without problems with the sample file that is provided (in the same directory). I have tried to chown the directory and as well as with sudo command. What i recognised is that the first time it threw the error the output directory was around 1.10GB and then I run again with sudo command and the time i threw the same error, the output directory was 1.45GB. So the same error but i think at different stages. At the moment I opened the WSL2 with Admin priviliges (right click and select run with admin priv) and running it again. In the link below you can find the captured error.png and log.txt.

PermissionError

specify number of threads?

Hi,
Currently, when I run MAGUS, it uses all available cores, and starts maxing out the memory. Is it possible to specify the max number of threads to use? I see in magus_configuration.py numCores = 1, but clearly it is not limiting itself to 1 core. Maybe this is specified elsewhere?

Thanks!
-Pascal

Open source license, and also packaging MAGUS

Hi Vlad,

Will this project consider adding an open source license? (I am already modifying the code in a public fork, and I just feel incorrect that I am just using your code without ever a license saying that I can do so.)

Also, on the packaging MAGUS part (for all the permission/dependencies problems), do you welcome efforts to publish MAGUS on things like bioconda (or throwing MAGUS in a container, etc. I am not sure how to best package research software. I am assuming it is bioconda that is the best)? If so, there is a slight chance that I can help during my spare time.

Example fails invoking mafft

Cloned git repo today clean Ubuntu (AWS c5a.4xlarge instance with Ubuntu 20.04).
Installed dendropy dependency.

cd example
python3 ../magus.py -d outputs -i unaligned_sequences.txt -o magus_result.txt

# ...some output deleted...

subprocess.CalledProcessError: 
Command '/home/ubuntu/magus/MAGUS-master/tools/mafft/mafft --localpair --maxiterate 1000 --ep 0.123 --quiet --thread 16 --anysymbol /home/ubuntu/magus/MAGUS-master/example/outputs/decomposition/initial_tree/skeleton_sequences.txt > /home/ubuntu/magus/MAGUS-master/example/outputs/decomposition/initial_tree/temp_initial_align.txt' 
returned non-zero exit status 126.

unsupported operand type(s) for +: 'int' and 'NoneType'

Python version: 3.7.0
Java version: 1.8

When running tasks for 16S.M dataset pulled from https://crw-site.chemistry.gatech.edu/DAT/3C/Alignment/Files/16S/16S.M.alnfasta.zip (unaligned sequences pulled from the alignment file), the following error occurred after getting the info Using 10 MAFFT backbones...:

Traceback (most recent call last):
  File "/home/chengze5/scratch/softwares/MAGUS/magus.py", line 24, in main
    mainAlignmentTask()
  File "/scratch/users/chengze5/softwares/MAGUS/align/aligner.py", line 25, in mainAlignmentTask
    task.awaitTask()
  File "/scratch/users/chengze5/softwares/MAGUS/tasks/task.py", line 39, in awaitTask
    awaitTasks([self])
  File "/scratch/users/chengze5/softwares/MAGUS/tasks/task.py", line 82, in awaitTasks
    controller.awaitTasks(tasks)
  File "/scratch/users/chengze5/softwares/MAGUS/tasks/controller.py", line 29, in awaitTasks
    observeTaskManager()
  File "/scratch/users/chengze5/softwares/MAGUS/tasks/controller.py", line 48, in observeTaskManager
    runTask(task)
  File "/scratch/users/chengze5/softwares/MAGUS/tasks/manager.py", line 204, in runTask
    task.run()
  File "/scratch/users/chengze5/softwares/MAGUS/tasks/task.py", line 47, in run
    func(**self.taskArgs)
  File "/scratch/users/chengze5/softwares/MAGUS/align/aligner.py", line 43, in runAlignmentTask
    mergeSubalignments(context)
  File "/scratch/users/chengze5/softwares/MAGUS/align/merge/merger.py", line 22, in mergeSubalignmen
    buildGraph(context)
  File "/scratch/users/chengze5/softwares/MAGUS/align/merge/graph_build/graph_builder.py", line 29,
    context.graph.initializeMatrix()
  File "/scratch/users/chengze5/softwares/MAGUS/align/merge/alignment_graph.py", line 45, in initial
    self.matrixSize = sum(self.subalignmentLengths)
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

Before that, I noticed all the subsets contain 0/200 sequences so MAFFT was basically aligning empty files. Maybe it is an issue with how sequences are read in from the file?

Running on millions of protein sequences

Hi,

For a large scale protein MSA test, I compiled the 5 largest Pfam families with 1-3 mio sequences and tried to align them with MAGUS.
Unfortunately, MAGUS failed apparently due to a recursion-limit error when reading the tree (see attached error log). I tried:

python3 magus.py -t clustal
python3 magus.py -t clustal --recurse True
python3 magus.py -t clustal --maxnumsubsets 100 --recurse True

All gave the same error. You wrote "clustal" is the recommended option for large scale data, however, another tree algorithm might be a solution? Can you help?
log_errors.txt

where to find mlrmcl

Hi,
As I am working on a mac, I've been installing the dependencies myself. I have been able to track down all of them except for mlrmcl. Can you please provide a link?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.