Git Product home page Git Product logo

usearch12's Introduction

Usearch implements several popular biological sequence search and clustering algorithms, including USEARCH, UCLUST, UPARSE, UCHIME, UNOISE and SINTAX.

Version 12 is the first open-source version of usearch. Compared to earlier versions, functionality which is sufficiently covered by other open-source projects has been removed. In particular, there is no support for OTU table manipulation or diversity analysis which is well supported by other tools such as QIIME and DADA2. The goal here is to simplify the package as much as reasonably possible to encourage collaborators to join the open-source project.

Documentation

Docs. web site: https://rcedgar.github.io/usearch12_documentation/

Docs. source: https://github.com/rcedgar/usearch12_documentation

Installation

Download the binary (executable) file for your operating system. There are no dependencies, so that is typically all you need to do. Make sure the execute bit is set if you are using Linux or OSX, and make sure the binary is in your PATH (or, you can type the full path name). For more details see https://rcedgar.github.io/usearch12_documentation/install.html.

Building from source

Windows

To build using Microsoft Visual C++ (MSVC), load the solution file usearch12.sln and select Build then Rebuild Solution from the main menu bar.

To build from the command line, run ./build_win.bash from a command prompt. This requires that the MSVC build tools are in your PATH. The build_win.bash script (1) checks that there are no uncommitted changes to the repo, (2) overwrites gitver.txt with the latest commit hash, and (3) runs MSBuild to compile and link usearch12.exe.

Linux

The primary development environment is MSVC. The Linux Makefile is generated automatically by build_linux.py from the MSVC project file usearch12.vcxproj. To build on Linux you need gcc, ccache and make.

A pre-generated Makefile is included. This means that you can run make in the usual way. Generally, this Makefile should not be manually edited because changes will be lost the next time it is generated.

Alternatively you can run the ./build_linux.py script, which (1) checks that there are no uncommitted changes to the repo, (2) overwrites gitver.txt with the latest commit hash, (3) generates a new Makefile from usearch12.vcxproj, and (4) runs make. To run this you need python3 as well.

OSX

Building for OSX x86 and M-chip is supported by GitHub Actions.

Older versions of usearch

Binaries for usearch versions 5 though 11 are provided at https://github.com/rcedgar/usearch_old_binaries/, licensed under CC0-1.0 (public domain). There are no plans to provide source code for the older versions.

Citing usearch

Please cite the appropriate paper(s) listed here: https://rcedgar.github.io/usearch12_documentation/citation.html

usearch12's People

Contributors

igortru avatar rcedgar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

usearch12's Issues

QIIME integration

Volunteers solicited to tackle integration of usearch into QIIME. I'm guessing this will be straightforward by staring from the VSEARCH modules. If you'd like to take this on, please reply below.

CMake build system

Hi!

It will be good to have CMake based build system for usearch. CMake is cross-platform build system, that allows quite easy build and install usearch

PS If appricated i can create merge request with cmake build system

no ublast?

Hello Robert,

usearch -ublast is not open source right, but I do saw usearch_local there, just some default parameter tuning for usearch_local and then we have ublast?

Thanks,

Jianshu

naive bayesian classifier also not in open source v12

Hello @rcedgar,

I noticed that the NBC algorithm is also not there in v12, which is widely used for amplicon/marker gene classification. Again I understand that we can rely on the binary for v11.0.667 but cannot make it built in in other pipelines or softwares. Sintax ia available though an alternative to NBC.

Just to ask and I will close this issue soon.

Jianshu

-fastq_mergepairs not accepting wildcards?

Hi @rcedgar,
I'm testing the new version to draft a q2 wrapper.

With USEARCH 11 the following command works:

usearch  -fastq_mergepairs *R1*fastq -relabel @ -fastqout ../usearch/merge.fq

with usearch12 the wildcard expansion is not accepted:

Command line error, unexpected 'sample4_S0_L000_R1_001.fastq'

but wildcard seems supported according to the docs.

Request to reintegrate the `calc_distmx` command

Hello Robert,

First and foremost, I would like to extend my deepest gratitude for your decision to open-source USEARCH. This move is greatly appreciated by the bioinformatics community and will undoubtedly foster an environment of increased collaboration.

I am writing to inquire about the possibility of reintegrating the calc_distmx command into the open-source version of USEARCH. While I understand and support the rationale behind simplifying the project, I believe that the calc_distmx command is quite unique and is not covered by other projects. It has been particularly useful for many users, including our team.

Thank you for considering this request.
With kind regards,
Vladimir

-otutab command zotus in -otutabout missing in -dbmatched output

I previously tested on a smaller dataset and assumed that the zotus in -dbmatched output have a 1:1 relationsihp with the zotus in the -otutabout output. Recently I processed a larger dataset and found out that the -dbmatched output seemed to occationally output more zotus that in the zotu table, and the fasta file is mixed cased even if the input is all capital cased:

v11_zotutab_df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 18309 entries, 1d3c1dcc435355d1e30bb3282bff6963 to 2eee628a3d5f319163ee02cff26adc65
v11_mapped_zotus_se.info()
<class 'pandas.core.series.Series'>
Index: 18318 entries, b458921b1baa8e102d81619091df6569 to 9d366aecf793acdffcb91091d39af3eb
v12_zotutab_df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 18294 entries, 72eedd8f95b2df0ddab8e81fa2d547fa to eeb6b8f2d9cf5484f7b90db77e8cd926
v12_mapped_zotus_se.info()
<class 'pandas.core.series.Series'>
Index: 18298 entries, b458921b1baa8e102d81619091df6569 to 9d366aecf793acdffcb91091d39af3eb

 cat zotus.fasta | grep -v '^>' | grep 'a'

 cat matched_zotus_v11.fasta | grep -v '^>' | grep 'a' | head
CCCCGGAACGGCCTCCAAAACTATCAGTCTAGAGTTCGAgagagGTGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTA
CCCCGGAACGGCCTCCAAAACTATCGGTCTAGAGTTCGAgagagGTGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTA
 cat matched_zotus_v12.fasta | grep -v '^>' | grep 'a' | head
CCCCGGAACGGCCTCCAAAACTATCAGTCTAGAGTTCGAgagagGTGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTA
CCCCGGAACGGCCTCCAAAACTATCGGTCTAGAGTTCGAgagagGTGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTA

In version 12, the -dbmatched output seems to occationally miss zotus in the table, I've only encounted this behavior with this specific dataset:

v12_mapped_zotus_se.reindex(v12_zotutab_df.index).loc[v12_mapped_zotus_se.reindex(v12_zotutab_df.index).isna()]
#OTU ID
d23398118462c6ede342aa4a5b451af2 NaN
 cat zotus.fasta | grep 'd23398118462c6ede342aa4a5b451af2'
d23398118462c6ede342aa4a5b451af2
 cat matched_zotus_v12.fasta | grep 'd23398118462c6ede342aa4a5b451af2'

The binary executeable version

packages in environment at /anaconda3/envs/q2-usearch-97:
q2-usearch 0+untagged.43.g761d7f9.dirty pypi_0 pypi
usearch 12.0_beta h9ee0642_1 bioconda

The commands I ran

usearch -otutab merged.fastq
-zotus zotus.fasta
-otutabout zotu_tab_v11.tsv
-dbmatched matched_zotus_v11.fasta
-notmatched unmapped_v11.fasta
-id 0.97
-log otutab_v11.log
-threads 10

usearch -otutab merged.fastq
-zotus zotus.fasta
-otutabout zotu_tab_v12.tsv
-dbmatched matched_zotus_v12.fasta
-notmatched unmapped_v12.fasta
-id 0.97
-log otutab_v12.log
-threads 10

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.