Git Product home page Git Product logo

rcedgar / muscle Goto Github PK

View Code? Open in Web Editor NEW
177.0 7.0 21.0 3.88 MB

Multiple sequence alignment with top benchmark scores scalable to thousands of sequences. Generates replicate alignments, enabling assessment of downstream analyses such as trees and predicted structures.

License: GNU General Public License v3.0

Makefile 0.18% C++ 95.86% C 3.85% Shell 0.07% Batchfile 0.03%
bioinformatics biology algorithms nucleotide-alignment protein-alignment sequence-clustering sequence-search

muscle's Introduction

Muscle5

MUSCLE is widely-used software for making multiple alignments of biological sequences.

Version 5 of MUSCLE achieves highest scores on Balibase, Bralibase and Balifam benchmark tests and scales to thousands of sequences on a commodity desktop computer.

This version supports generating an ensemble of alternative alignments with the same high accuracy obtained with default parameters. By comparing downstream predictions from different alignments, such as trees, a biologist can evaluation the robustness of conclusions against alignment errors.

Downloads

Binary files are self-contained, no dependencies.

https://github.com/rcedgar/muscle/releases

Documentation

Muscle v5 home page
Manual

Building MUSCLE from source

https://github.com/rcedgar/muscle/wiki/Building-MUSCLE

Reference

Edgar, Robert C. Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny. Nature Communications 13.1 (2022): 6968.
https://www.nature.com/articles/s41467-022-34630-w.pdf

muscle's People

Contributors

lukaszsobala avatar martin-g avatar nsoranzo avatar rcedgar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

muscle's Issues

Error: Library not loaded: /usr/local/opt/gcc/lib/gcc/11/libgomp.1.dylib

Can you help out ? I just tried to execute Muscle on a fresh MacOS installation (version 12.1; dev tools enabled).

MBP-de-Nicolas:~ nicolas$ chmod +x /Users/nicolas/Desktop/muscle
MBP-de-Nicolas:~ nicolas$ /Users/nicolas/Desktop/muscle
dyld[995]: Library not loaded: /usr/local/opt/gcc/lib/gcc/11/libgomp.1.dylib
  Referenced from: /Users/nicolas/Desktop/muscle
  Reason: tried: '/usr/local/opt/gcc/lib/gcc/11/libgomp.1.dylib' (no such file), '/usr/local/lib/libgomp.1.dylib' (no such file), '/usr/lib/libgomp.1.dylib' (no such file)
Abort trap: 6

Segmentation fault while running muscle

muscle -in combined.fasta -out align.fasta

MUSCLE v3.8.1551 by Robert C. Edgar

http://www.drive5.com/muscle
This software is donated to the public domain.
Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.

combined 74665 seqs, lengths min 38, max 11532, avg 307
Segmentation fault

Can you help me out with this?

error when running

dyld: Symbol not found: __ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE4findEPKcmm
Referenced from: /Users/wang/src/BLCA/muscle
Expected in: /usr/local/lib/libstdc++.6.dylib
in /Users/wang/src/BLCA/muscle
Abort trap: 6

anyone knows how fix it? thank you

Does not build on non-Intel architectures

Hi,
I upgraded the Debian package to muscle 5.1. It builds nicely for Intel architectures. However, Debian builds for several other architectures and this was successful for version 3.8.1551. The autobuilders logs can be seen here. Looking for instance at arm64 log you can find at the end the failure:

g++ -Wdate-time -D_FORTIFY_SOURCE=2 -DNDEBUG -pthread -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -O3 -fopenmp -ffast-math -c -o Linux/transaln.o transaln.cpp
In file included from testlog.cpp:2:
timing.h: In function ‘void cmd_testlog()’:
timing.h:26:9: error: impossible constraint in ‘asm’
   26 |         __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
      |         ^~~~~~~
timing.h:26:9: error: impossible constraint in ‘asm’
   26 |         __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
      |         ^~~~~~~
timing.h:26:9: error: impossible constraint in ‘asm’
   26 |         __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
      |         ^~~~~~~
timing.h:26:9: error: impossible constraint in ‘asm’
   26 |         __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
      |         ^~~~~~~
timing.h:26:9: error: impossible constraint in ‘asm’
   26 |         __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
      |         ^~~~~~~
timing.h:26:9: error: impossible constraint in ‘asm’
   26 |         __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
      |         ^~~~~~~
timing.h:26:9: error: impossible constraint in ‘asm’
   26 |         __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
      |         ^~~~~~~
timing.h:26:9: error: impossible constraint in ‘asm’
   26 |         __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
      |         ^~~~~~~
timing.h:26:9: error: impossible constraint in ‘asm’
   26 |         __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
      |         ^~~~~~~
timing.h:26:9: error: impossible constraint in ‘asm’
   26 |         __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
      |         ^~~~~~~
make[2]: *** [Makefile:53: Linux/testlog.o] Error 1

This assembly is for x86 based system. On arm64 you would want something like this. It would be great if muscle would be portable also to other architectures.

Kind regards, Andreas.

source files are missing

ScoreType.h and FileBuffer.h are referenced in the code but not included with the release. They seem to come from amap, which is not listed as a dependency.

This is the first barrier that anybody wishing to build the code will run into.

Minor: interface change from 3.8 to 5.1 may benefit from more helpful error message

This is very minor: Using the old syntax muscle -in input.fasta -out output.aln is not compatible anymore with version 5, but the error message does not point in the right direction. Instead of

Invalid command line
Unknown option in

It may be helpful to show something like

The option `--in` is no longer supported. Please use `--align` instead.

M1 chip support

Hello Robert,

Any plans for M1/ARM structure support?

Thanks,

Jianshu

linux binary will not run

Hello! I downloaded the latest linux binary but cannot get it to run. The steps I used are as follows:

chmod +x muscle
./muscle
zsh: exec format error:

The result is the same using the absolute path to the binary.

automatically reverse-complement nt sequences

Hi @rcedgar,
thanks a ton for your efforts to provide useful tools to the community!

I was interested in switching from MAFFT to MUSCLE5 for large alignments of virus genomes.
There is feature in MAFFT (option --adjustdirection) which is able to reverse complement some sequences (quite useful because some similar virus genomes may have been deposited in different orientations in NCBI nuccore for example).
Do you think it could be implemented as well in MUSCLE5?

Best regards,

Thomas

Fatal error UngappedSeq != m_UngappedSeqs[0] from efa_explode on resampled MSAs

This is a bug in the current code, I hope to post a fix in the next few days. Work-around is to write a script in python or similar -- it's pretty simple to read through the EFA file one line at a time. If the first character in the line is '<', start writing to a new file where the filename is the rest of this line (i.e. with '<' deleted).

Fatal error mpcflat.cpp assert failed

muscle 5.0.1428_linux64 791Gb RAM, 24 cores
(C) Copyright 2004-2021 Robert C. Edgar.
https://drive5.com

Hello Robert,

I have the following error when running muscle. There supper5 one is good.

muscle -align ASVs1.fa -output aln.afa -threads 24
Elapsed time 01:27:16
Max memory 0.0b

---Fatal error---
mpcflat.cpp(170) assert failed: PairCount == PairCount2
5140.18user 87.93system 1:27:24elapsed 99%CPU (0avgtext+0avgdata 194959560maxresident)k
62000inputs+24outputs (11major+41593028minor)pagefaults 0swaps

Thanks,

Jianshu

Cannot open xxx.fas , errno=2 No such file or directory

Hi, I have successfully run Muscle 5.1.win64 on Windows once before. But this time, all files (including the one which succeeded before) failed to run and shows like:
---Fatal Error---
Cannot open xxx.fas , errno=2 No such file or directory

And I have checked the .fas or .fasta files. They all have right informations and can be open by other Editor softwares like BioEdit.

Can anyone help me?Thanks a lot.

Muscle &File address:
C:\Users\USER\Desktop\software\desktop\Muscle\muscle-5.1\muscle5.1.win64.exe

Code:
C:\Users\USER\Desktop\software\desktop\Muscle\muscle-5.1\muscle5.1.win64.exe -align ES_Mitos_match_unPaired_73_cob.fas -output test.fasta

.fas file is like:

SDHF_76M_cob
ATGAATAAATCTATACGAACTTATCACCCATTATTTAAAATTGCCAATAATGCTTAATTGAT????????????????????????????????????????GATCATTATTAGGAATTTGTTTAGTAAAACAAATCCTAACAGGAATATTYTTAGCCATACATTATAGCCCCAATATTKAACAAGCATTTTCTAGAGTAGCAMATATTTGTCGAGACGTAAATWATGGRTGACTACTTCGAGTCTTGCATGCCAATGGAGCATCTGTATTTTTTCTT????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????CTTGAGGACAAATATCTTTTTCTAGGGTGCCGGTGTCATAGTTTTTTGGAGCAGCCGTTCCATATTTAGGTACTGATCTAGTACAATGAGTATGAGGKGGCTTCGCTGTAGACAAYGCYACAYTAAACCGATTCTTTACTTTACATTTTTTAYTACCATTTATCATTACWGCACTTGTATTAATWCACTTAKTATTTCTTCACCAAACAGGATCAAAYAAMCCCATAGSAATAAACAGAAACATTGATAAAATTCCCTTCCATCCATACTTCACAACAAAAGACATTGTAGGRTTYATTATTATAWTTATAWTATTAKSCWTTTTATCACTCAAAGAACCATATATTCTTGGAGATCCTGACAATTTCACACCAGCTAATCCCTTAGTTACTCCTGTTCACATTCAACCTGAATGATATTTTCTATTTGCATATGCAATTTTACGATCAATTCCTAAYAAATTAGGRGGAGTAATCGCYYTAGTAATATCAATTGCYATCTTATTTATTATACCATTATATAAATCAAATTTTCGAAGAATACAATTTTACCCAATTAATCAAGCATACTTCTGAACAATAACAAATACTGTAATTTTATTAACATGAATTGGAGCACGACCAGTAGAAGAACCTTATATCATCACAGGCCAATTATTAACTACAATTTATTTTAGATATTATCTTATAATTCCTTTTATTACCAAAACATGAGAAACATAGACTTATG
SDHF_76R_cob
ATGAATAAAYCTATACGAACTCATCAYCCATTATTTAAAATTGCTAATAATGATTTAKTTGACCTTCCAGCCCCTTCAAATATTACAACCTGATGAAATTTTGAATCACTGTTAGGAATCTGCTTAATAATACAAATTTTAACGGGAATATTTCTAGCTATACATTATAGCCCAAACATTKAGCAAGCGTTTTCTAGAGTAACACACATTTGTCGAG???????????????????????????????????????????????????????????????????????????????TGGTCGASGAKWWYRSWAKRCRWSAWTMAAATTAATACATACCTGATTTGTAGGTGTAATAATTTTATTCATCACTATAGCAACAGCTTTTGTAGGGTATGTCTTACCTTGAGGACAAATATCTTTTTGAGGGGCCACTGTAATTACTAACCTTTTA?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????CTGTTTCTTCACYAWMYARSATCAAAYAACCCCTTTAGGAATAAATAGAAACATCGACAAAATCCCCTTCCATCCATATTTCACAACAAAAGACATTGTTGGATTCATCATTATATTTATAATATTATCTATTTTATCACTAAAAGAACCTTATATTCTAGGAGAYCCCGACAATTTCACACCAGCYAATCCTTTAGTTACYCCYATYCACATTCAACCTGAATGATATTTTCTATTTGCATACGCAATTTTACGATCAATTCCTAATAAATTAGGGGGAGTAATAGCCTTAGTGATATCAATTGCCATCTTATTTATTATACCATTATATAAATCAAATTTCCGAAGAATTCAATYCTACCCGATTAATCAAATATACTTCTGAATAATAGTAAACACTGTAATCTTATTAACATGAATTGGAGCACGACCAGTAGNNNNNNNNNNNTACAGGTCAATTATTAACTACASTMTATKTCAGATACTACMTWATWATTYCYTCAATTACTAAAATATGAGAAAATCTTAYCAAAT
SDHF_76S_cob
ATGAATAAATCTATACGAAYTWATCATCCATTATTTAAAATCGCCAATAATGCTTTAATTGATCTTCCGATCTCGAATAATTGAACATGATGAAACTTTGGATCATTATTAGGAATTTGTTTAGTAATACAAATCCTAACCGGAATATTCTTAGCCATACATTATAGACCTAATATCGAACAAGCATTTTCTAGAGTAGCACACATTTGTCGAGACGTAAATTATGGATGACTACTACGAACCTTACATGCTAATGGAGCMTCTATATTTTTTCTTTGTATTTACTTACA??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????CRRYKGSATCAAGATAACCCTTTAGGAATAAATAGAAACATCGACAAAATCCCCTTYCAYCCATATTWCACAACAAAAGAYATTGTTGGATTCATYATTATATTYATAVTAYTRTCHATTTTATCACTMAAAGAACCNNNNTTATATTCTAGGAGAYCCYGACAATTTTACACCGGCTAATCCTTTAGTTACCCCTGTTCACATTCAACCTGAATGGTACTTCCTATTTGCATATGCAATTTTACGATCAATTCCTAACAAATTAGGAGGAGTAATCGCCTTAGCAATATCAATTGCTATTCTATTTATCATACCATTACATAAATCAAATTTCCGAAGAATTCAATTTTACCCCAATT???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
SDHF_76Z_cob
?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????ATAGTGTAAGACATCATTCAAATTAATACAGACCTGATTTGTAGGTGTAATAATTTTATTCATCACTATAGCAACAGCTTTTGTAGGATATGTCTTACCTTGAGGACAAATATCTTTTTGAGGGGCCACTGTAATTACTAACCTTTTATC???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????AATTCTCCATGGGAACGTACTTATATAATAATTCCCTCAATTACTAAAATATGAGAAAATCTTACCAAAT

makefile is nonstandard and redundant

The default Makefile uses ccache, which is nonstandard and rarely installed on most modern systems. This is the first problem that someone who wishes to build the code will encounter Makefile_osx does not have this issue, although it does not build for me on up-to-date OSX with the standard clang-based Apple developer tools installed. That's a different issue, though.

There are many lines of rules for building cpp object, all of which can be replaced by the simple standard rule:

%.o: %.cpp $(DEPS)
$(CPP) -c -o $@ $&lt; $(CPPFLAGS)
muscle5: $(OBJS)
$(LNK) -o muscle5 $(LNKOPTS) $(OBJS)
if one does the standard practice of making object files in-place.

Muscle5.1 alignment error

muscle 5.1.linux64 [12f0e2] 32.8Gb RAM, 16 cores
Built Jan 13 2022 23:17:13
(C) Copyright 2004-2021 Robert C. Edgar.
https://drive5.com

Input: 74665 seqs, avg length 308, max 11532

WARNING: >1k sequences, may be slow or use excessive memory, consider using -super5

Killed

build not fully static with -fopenmp -pthread

openmp is from GCC and pthread are part of glibc. The makefile specifies -static for link, but the resulting binaries will not be fully static; I believe this is true for the distributed binaries, making it a problem that binary users will find a problem with.

/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/../../../../x86_64-pc-linux-gnu/bin/ld: /usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/libgomp.a(oacc-profiling.o): in function `goacc_profiling_initialize':
/var/tmp/portage/sys-devel/gcc-11.1.0-r1/work/gcc-11.1.0/libgomp/oacc-profiling.c:137: warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

The simple thing is to remove -static from the Makefile link. There are ways to build fully static for distribution, but they are more complicated than reflected in the makefile.

Floating point exception error (core dumped) running super5

Hello,
I am trying to align my protein sequences and it runs well for a while but then it throws this error:

Floating point exception (core dumped)
Floating point exception (core dumped)
Floating point exception (core dumped)
etc.

Does anyone know why/what this means? Thanks!

double free or corruption (!prev)eriors

I was using muscle5 for alignment of around 10k sequences, I am getting the following error:

muscle 5.1.linux64 [12f0e2]  528Gb RAM, 56 cores
Built Jan 13 2022 23:17:13
(C) Copyright 2004-2021 Robert C. Edgar.
https://drive5.com

Input: 9988 seqs, avg length 745, max 3209


WARNING: >1k sequences, may be slow or use excessive memory, consider using -super5

00:40 4.9Gb  CPU has 56 cores, running 36 threads
double free or corruption (!prev)eriors

Failing to compile 5.1 on Linux with GCC 11

Hello,
I'm sorry, but I fail to to compile Muscle 5.1 on Linux with GCC 11.2.1:

This is for Git snaphsot:

$ make
bash ./gitver.bash
"7630cd"
mkdir -p Linux/
g++  -DNDEBUG -pthread  -O3 -fopenmp -ffast-math -c -o Linux/addconfseq.o addconfseq.cpp
g++  -DNDEBUG -pthread  -O3 -fopenmp -ffast-math -c -o Linux/align.o align.cpp
...
g++  -DNDEBUG -pthread  -O3 -fopenmp -ffast-math -c -o Linux/usage.o usage.cpp
g++  -DNDEBUG -pthread  -O3 -fopenmp -ffast-math -c -o Linux/usorter.o usorter.cpp
g++  -O3 -fopenmp -pthread -lpthread -static Linux/addconfseq.o Linux/align.o Linux/alignpairflat.o Linux/allocflat.o Linux/alnalnsflat.o Linux/alnmsasflat.o Linux/alnmsasflat3.o Linux/alpha.o Linux/alpha3.o Linux/assertsameseqs.o Linux/buildposterior3flat.o Linux/buildpostflat.o Linux/bwdflat3.o Linux/calcalnflat.o Linux/calcalnscoreflat.o Linux/calcalnscoresparse.o Linux/calcposteriorflat.o Linux/colscoreefa.o Linux/consflat.o Linux/conspairflat.o Linux/defaulthmmparams.o Linux/derep.o Linux/diagbox.o Linux/disperse.o Linux/dividetree.o Linux/eacluster.o Linux/eadistmx.o Linux/eadistmxmsas.o Linux/eesort.o Linux/efabestcols.o Linux/efabestconf.o Linux/efaexplode.o Linux/efastats.o Linux/ensemble.o Linux/fasta.o Linux/fasta2.o Linux/fa2efa.o Linux/fwdflat3.o Linux/getconsseq.o Linux/getpairs.o Linux/getpostpairsalignedflat.o Linux/globalinputms.o Linux/guidetreejoinorder.o Linux/heatmapcolors.o Linux/help.o Linux/hmmdump.o Linux/hmmparams.o Linux/jalview.o Linux/jointrees.o Linux/letterconf.o Linux/letterconfhtml.o Linux/logaln.o Linux/logdistmx.o Linux/logmx.o Linux/main.o Linux/make_a2m.o Linux/maxcc.o Linux/mpcflat.o Linux/msa.o Linux/msastats.o Linux/msa2.o Linux/multisequence.o Linux/mysparsemx.o Linux/myutils.o Linux/pairhmm.o Linux/permutetree.o Linux/perturbhmm.o Linux/pprog.o Linux/pprogt.o Linux/pprog2.o Linux/probcons.o Linux/progalnflat.o Linux/project.o Linux/qscore.o Linux/qscoreefa.o Linux/qscorer.o Linux/qscore2.o Linux/quarts.o Linux/randomchaintree.o Linux/refineflat.o Linux/relabel.o Linux/relaxflat.o Linux/resample.o Linux/seb8.o Linux/seq.o Linux/sequence.o Linux/setprobconsparams.o Linux/stripgappycols.o Linux/stripgappyrows.o Linux/super4.o Linux/super5.o Linux/testfb.o Linux/testlog.o Linux/testscoretype.o Linux/textfile.o Linux/totalprobflat.o Linux/tracebackflat.o Linux/transaln.o Linux/transq.o Linux/tree.o Linux/treefromfile.o Linux/treeperm.o Linux/treesplitter.o Linux/treesubsetnodes.o Linux/treetofile.o Linux/tree2.o Linux/tree4.o Linux/trimtoref.o Linux/trimtorefefa.o Linux/uclust.o Linux/upgma5.o Linux/usage.o Linux/usorter.o -o Linux/muscle
/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: cannot find -lm
/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/ld: cannot find -lc
collect2: error: ld returned 1 exit status
make: *** [Makefile:41: Linux/muscle] Error 1

Compiling released version ends with same error, just the beginning is different:

$ make
bash ./gitver.bash
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
""
mkdir -p Linux/
g++  -DNDEBUG -pthread  -O3 -fopenmp -ffast-math -c -o Linux/addconfseq.o addconfseq.cpp
g++  -DNDEBUG -pthread  -O3 -fopenmp -ffast-math -c -o Linux/align.o align.cpp
...

It looks similar like an error I already got (discussed in #4), and looks bit low-level, but I'm sure I have all compilation tools correctly installed. I might be missing some dependency (although I think I checked everything), but I fail to find out which one it could be... When I succeed, I'll create package RPM for openSUSE Linux.

Missing counter.h file

Trying to compile v5.0.1428:

$ make -C src/ CC=gcc CPP=g++ LNK=g++
...
g++ -fopenmp -ffast-math -msse -mfpmath=sse -O3 -DNDEBUG -c -o o/msa.o msa.cpp
g++ -fopenmp -ffast-math -msse -mfpmath=sse -O3 -DNDEBUG -c -o o/msa2.o msa2.cpp
g++ -fopenmp -ffast-math -msse -mfpmath=sse -O3 -DNDEBUG -c -o o/myutils.o myutils.cpp
myutils.cpp:2058:10: fatal error: ../ver/counter.h: No such file or directory
 2058 | #include "../ver/counter.h"
      |          ^~~~~~~~~~~~~~~~~~
compilation terminated.
make: *** [Makefile:381: o/myutils.o] Error 1

dispersion value meaning

Dispersion calculation gives me two values, I dont which is the one that is mentioned on the website (and if <0.05 then alignment is likely fine):
@disperse file=ensemble_mafft.efa D_LP=0.005485 D_Cols=1

What is D_LP and D_Cols?

Improve multi-threading efficiency?

Hello,

I use MUSCLE quite often, but I think the speed of the program suffers a bit from the multi-threading algorithm.

It is quite common for me that after a some time (e.g. 50% of the total time needed for one pass of the Consistency step) more than half of the threads will have finished and it takes a long time for the last 2-3 threads to finish the work. I attach a btop screenshot with the pattern of core usage of a muscle (5.1.linux64) run:
Screenshot from 2022-10-27 11-37-57
The two large large "blocks" of activity on the right are the Consistency steps.

For short jobs it does not matter that much, but for alignments that take hours, 2 h vs 4 h makes a difference. And the longer the job, the longer the proportion of time spent waiting for the trailing threads seems to be. Is there a way to dynamically reassign parts of the work to cores that sit idle, or maybe before assigning the work, using some method to assess the "difficulty" of the alignment and weigh the work using this?

Thank you,
Lukasz

Aarch64 muscle built from source fails

Hello,

Since only a Mac arm64 version is provided, I tried to build my own for a distribution on aarch64 ubuntu.

The build finished with zero errors, but when I ran muscle with no parameters I git this:

---Fatal error---
myutils.cpp(122) assert failed: sizeof(void *) == 4

Tag a release

Hi @rcedgar,
thanks for publishing MUSCLE on GitHub! I'm working on updating the MUSCLE package on Bioconda. Bioconda strongly recommends using releases (or at least Git tags) in package recipes, which is also general good practice for reproducibility. Would you please be able to tag a release on GitHub?

Segfault with long sequences

To align a pair of sequences of length L, muscle requires at least 5 x L^2 bytes of memory

The upper limit on L is currently a bit less than 30,000 letters. Unfortunately, this is close to the length of a Cov-2 genome, and an attempt to align Cov-2 genomes with muscle usually fails for this reason.

Currently, muscle segfaults if this limit is exceeded, this is a bug, it should give a graceful error message and exit.

Segmentation fault issue while running Muscle 5.1.linux_intel64

Hello!

I have downloaded muscle version 5.1,linux_intel64 and ran alignment of this sequences file.

Command: ./muscle5.1.linux_intel64 -align corr_analysis/data/35_sequences.fasta -output corr_analysis/aligned_aligned_sequences.fa -refineiters 2
Output:

❯ ./muscle5.1.linux_intel64 -align corr_analysis/data/35_sequences.fasta -output corr_analysis/aligned_aligned_sequences.fa -refineiters 2

muscle 5.1.linux64 [12f0e2]  32.6Gb RAM, 12 cores
Built Jan 13 2022 23:17:13
(C) Copyright 2004-2021 Robert C. Edgar.
https://drive5.com

Input: 35 seqs, avg length 29834, max 31491

00:00 5.1Mb  CPU has 12 cores, running 12 threads
[1]    22676 segmentation fault (core dumped)  ./muscle5.1.linux_intel64 -align corr_analysis/data/35_sequences.fasta -outpu

Same issue happens with muscle_v5.0.1428_linux.

Is there any known workaround for the issue?

version, tag, and create releases

The version info for muscle5 is contained in ver_counter.h (referenced incorrectly in the code itself as ../ver/counter.h). This file contains a version string of 1219, while the executable in the binaries directory shows 1278. There are no tags or releases.

Please fix the file reference, preferably renaming it to the standard "version.h". Then tag and create a release.

The release is where the binaries should live, not in the repo. You will care if you are creating releases frequently and run out of push bandwidth.

Where did profile-profile alignment go in v5?

There is still ample documentation on profile-profile alignment from v3.8, but searching the source code for v5 im not finding any references to it.. Has that been obsoleted or replaced by a new command?

Thanks a lot for your work.

maxgapfract not working in Muscle 5.1.linux64 [12f0e2]

Hello,

just a quick issue - the maxgapfract option doesn't seem to work:

muscle5 -resample sequences.ensemble.efa -maxgapfract 0.3 -output sequences.ensemble.gap03.efa

the output:


Invalid command line
Unknown option maxgapfract

Am I doing something wrong?

Thanks,
Lukasz

Multiple fasta files error

Error: Command-line option "ABC.fasta" must start with '-'

I am trying to align multiple fasta files together in one output file, but it is giving me the error above.

OSX binary not working

Hello! I downloaded the OSX binary but can not get it to run. My OSX version is macOS Monterey 12.0.1. Following the commands recommended in a closed issue, I get the following:

wget https://github.com/rcedgar/muscle/releases/download/v5.0.1428/muscle_v5.0.1428_osx
md5 muscle_v5.0.1428_osx
MD5 (muscle_v5.0.1428_osx) = d00d9f586476704f1a0f5a4fa1703b8c
./muscle_v5.0.1428_osx --version
dyld[88714]: missing symbol called
Abort trap: 6

allocflat.cpp(15) assert failed: uint64(Size) == Size64

Dear Robert,

I am running muscle5 (binary or self-compiled binary) on a ubuntu 18.04 machine, and this error message appeared when trying to align more than 1 sequences

---Fatal error---
allocflat.cpp(15) assert failed: uint64(Size) == Size64---Fatal error---
allocflat.cpp(15) assert failed: uint64(Size) == Size64

The order is as follows,
muscle5 -super5 allde.fasta -output aln.afa

Best regards,
Runsheng

Unknown option addconfseqs

Hej Robert!
I'm running muscle5 on a computing cluster and all seems to work super smooth, but I've run into a small issue when trying to calculate the column confidence.

muscle -addconfseqs Gene_0107_diversified_ensemble.efa -output Gene_0107_diversified_ensemble_confseqs.efa

Below the output from the log file:

Invalid command line
Unknown option addconfseqs

muscle 5.1.linux64 [] 131Gb RAM, 20 cores
Built Mar 29 2022 15:27:13
(C) Copyright 2004-2021 Robert C. Edgar.
https://drive5.com

Is this an issue at my end or yours?
Cheers,
Karin

showing Fatal Error i dont know much about it sorry new guy

it happened after it was joining the
01:51:25 901Mb 100.0% Join 81 / 250 [1 x 717, 717 pairs]
01:51:54 901Mb 100.0% Join 82 / 250 [2 x 718, 1436 pairs]
01:52:05 901Mb 100.0% Join 83 / 250 [1 x 720, 720 pairs]
01:52:13 901Mb 100.0% Join 84 / 250 [1 x 721, 721 pairs]
01:52:14 901Mb 100.0% Join 85 / 250 [1 x 722, 722 pairs]
01:52:25 917Mb 100.0% Join 86 / 250 [2 x 723, 1446 pairs]
01:52:34 910Mb 100.0% Join 87 / 250 [2 x 725, 1450 pairs]
01:52:50 911Mb 100.0% Join 88 / 250 [5 x 727, 2000 pairs]
01:52:58 911Mb 100.0% Join 89 / 250 [1 x 732, 732 pairs]
01:53:10 910Mb 100.0% Join 90 / 250 [1 x 733, 733 pairs]
01:53:53 910Mb 100.0% Join 91 / 250 [5 x 734, 2000 pairs]
01:54:00 910Mb 100.0% Join 92 / 250 [1 x 739, 739 pairs]
01:54:05 910Mb 100.0% Join 93 / 250 [1 x 740, 740 pairs]
01:54:09 910Mb 100.0% Join 94 / 250 [1 x 741, 741 pairs]
01:54:14 910Mb 100.0% Join 95 / 250 [1 x 742, 742 pairs]
01:54:18 910Mb 100.0% Join 96 / 250 [1 x 743, 743 pairs]
01:54:46 910Mb 100.0% Join 97 / 250 [24 x 744, 2000 pairs]
01:54:53 911Mb 100.0% Join 98 / 250 [1 x 768, 768 pairs]
01:54:57 910Mb 100.0% Join 99 / 250 [1 x 769, 769 pairs]
01:55:05 910Mb 100.0% Join 100 / 250 [1 x 770, 770 pairs]
01:55:23 910Mb 100.0% Join 101 / 250 [1 x 771, 771 pairs]
01:55:39 912Mb 100.0% Join 102 / 250 [7 x 772, 2000 pairs]
01:56:11 910Mb 100.0% Join 103 / 250 [2 x 779, 1558 pairs]
01:56:26 910Mb 100.0% Join 104 / 250 [1 x 781, 781 pairs]
01:56:44 910Mb 100.0% Join 105 / 250 [2 x 782, 1564 pairs]
01:56:44 910Mb 100.0% Join 106 / 250 [1 x 2, 2 pairs]
01:56:44 910Mb 100.0% Join 107 / 250 [1 x 1, 1 pairs]
01:56:44 910Mb 100.0% Join 108 / 250 [1 x 2, 2 pairs]
01:56:44 929Mb 100.0% Join 109 / 250 [3 x 4, 12 pairs]
01:56:44 916Mb 100.0% Join 110 / 250 [2 x 7, 14 pairs]
01:56:44 916Mb 100.0% Join 111 / 250 [1 x 9, 9 pairs]
01:56:44 910Mb 100.0% Join 112 / 250 [1 x 2, 2 pairs]
01:56:57 910Mb 100.0% Join 113 / 250 [93 x 240, 2000 pairs]
01:57:11 910Mb 100.0% Join 114 / 250 [160 x 333, 2000 pairs]

muscle -super5 1_newfile.fasta -output aln.afa
Elapsed time 01:57:11
Max memory 1.2Gb

---Fatal error---
AssertSeqsEq E:\src\muscle5\pprog2.cpp:32

Error while using super5 command

Dear developer,

From the manual https://drive5.com/muscle5/manual/cmd_super5.html, I noticed there is an option - Super5 - for aligning large sets of sequences. However, while testing with this option with the command muscleWin64.exe -super5 test.fasta -output alignedTest.fasta, it gave an error.

Error:

Invalid command line option "super5"

MUSCLE v3.8.31 by Robert C. Edgar

http://www.drive5.com/muscle
This software is donated to the public domain.
Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.


Basic usage

    muscle -in <inputfile> -out <outputfile>

Common options (for a complete list please see the User Guide):

    -in <inputfile>    Input file in FASTA format (default stdin)
    -out <outputfile>  Output alignment in FASTA format (default stdout)
    -diags             Find diagonals (faster for similar sequences)
    -maxiters <n>      Maximum number of iterations (integer, default 16)
    -maxhours <h>      Maximum time to iterate in hours (default no limit)
    -html              Write output in HTML format (default FASTA)
    -msf               Write output in GCG MSF format (default FASTA)
    -clw               Write output in CLUSTALW format (default FASTA)
    -clwstrict         As -clw, with 'CLUSTAL W (1.81)' header
    -log[a] <logfile>  Log to file (append if -loga, overwrite if -log)
    -quiet             Do not write progress messages to stderr
    -version           Display version information and exit

Without refinement (very fast, avg accuracy similar to T-Coffee): -maxiters 2
Fastest possible (amino acids): -maxiters 1 -diags -sv -distance1 kbit20_3
Fastest possible (nucleotides): -maxiters 1 -diags

I have tested run with the normal run with this command: `muscleWin64.exe -in test.fasta -out alignedTest.fasta. The tool is run properly. So, do you have any suggestions to solve this problem?

Best regards,
Li Chuin Chong

allocflat.cpp(15) assert failed: uint64(Size) == Size64

I have installed the most recent version of muscle with conda and from source and have run into the same issue.

I have a couple hundred thousand sequences so will potentially need to run with the -super5 argument. However, when I run with the -super5 algorithm, I get the following error:

muscle 5.1.linux64 [] 132Gb RAM, 24 cores

Input: 374249 seqs, length avg 1935 max 102417

WARNING: Sequence length >5k may require excessive memory
00:13 1.4Gb 100.0% Derep 335593 uniques, 38655 dupes
00:14 1.5Gb CPU has 24 cores, defaulting to 20 threads

muscle -super5 combined_seq.fasta -output super5_iter2.afa
Elapsed time 00:14

Max memory 1.5Gb
---Fatal error---
allocflat.cpp(15) assert failed: uint64(Size) == Size64

I should note that I am also running with -align which has not put out any issues.

Request: don't rearrange sequence order

Hi Robert,

Is it possible to keep the order of the sequences in the output file the same as in the input? (I've been getting this with the latest windows and linux pre-compiled binaries, with -align, with each producing a different, seemingly random order).

Cheers,
Seth

p.s. I've been really happy with the quality of alignments from muscle5!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.