Comments (4)
A related observation is that, for the example above, the behavior disappears when the --maxaccepts option is above 1. The program correctly identifies both sequences as matches and sorts them by decreasing raw score.
vsearch --usearch_global query_sequence.txt --db target_database.txt --userout sample.out --userfields query+target+id+alnlen+mism+gaps+qilo+qihi+tilo+tihi+evalue+bits+raw+pairs+qrow+trow --id 0.65 --maxaccepts 2
from vsearch.
Hi
The search algorithm is heuristic, and it is not guaranteed to find the best match. It just finds the first match that is good enough, given an identity threshold, i.e. 65% in your example, and other criteria specified. The heuristics are based on the number of shared kmers between the sequences, so usually the match is quite good, but occationally they are not that good. If you increase maxaccepts
from 1, it will consider more matches, and choose the one with the highest identity. That's why the results are better for you in that case. If you set both maxaccepts
and maxrejects
to zero, it will scan the entire database for the best match, but it will take much more time.
from vsearch.
Indeed, this is the way usearch behaves too: kmer profile filtering can sometime favor longer sequences.
As mentioned, increasing --maxaccepts
solves the issue. I've created a toy-example where t2
has two mismatches with the query Q
, and t1
only one, but t2
is longer and has more kmers in common with the query, so it is placed at the top of the list of possible matches. If maxaccepts
is set to 1 (default value), then t2
is selected.
t1 AGATAGGGACGTGTACCAATCAGCGTTGTTCTGCCTCGTGAATCCGAACATAGGCACTTATTTCGAATCCAGGATAAGGCTAGATGCGCCCTGGGTCCCGGAGTA
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||
Q AGATAGGGACGTGTACCAATCAGCGTTGTTCTGCCTCGTGAATCCGAACATAGGCACTTATTTCGAAACCAGGATAAGGCTAGATGCGCCCTGGGTCCCGGAGTA
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||| ||||||||||||||||||||||
t2 AAAGATAGGGACGTGTACCAATCAGCGTTGTTCTGCCTCGTGAATCCGAACATAGGCACTTATTTCGAATCCAGGATAAGGCTACATGCGCCCTGGGTCCCGGAGTAG
Q="AGATAGGGACGTGTACCAATCAGCGTTGTTCTGCCTCGTGAATCCGAACATAGGCACTTATTTCGAAACCAGGATAAGGCTAGATGCGCCCTGGGTCCCGGAGTA"
t1="AGATAGGGACGTGTACCAATCAGCGTTGTTCTGCCTCGTGAATCCGAACATAGGCACTTATTTCGAATCCAGGATAAGGCTAGATGCGCCCTGGGTCCCGGAGTA"
t2="AAAGATAGGGACGTGTACCAATCAGCGTTGTTCTGCCTCGTGAATCCGAACATAGGCACTTATTTCGAATCCAGGATAAGGCTACATGCGCCCTGGGTCCCGGAGTAG"
vsearch \
--usearch_global <(printf ">q1\n%s\n" "${Q}") \
--db <(printf ">t1\n%s\n>t2\n%s\n" "${t1}" "${t2}") \
--wordlength 3 \
--id 0.9 \
--quiet \
--userfields query+target+id \
--userout -
(see frederic-mahe/vsearch-tests@9a875b0 for more details on the selection of Q
and more tests)
from vsearch.
This makes sense, thanks for the responsiveness!
from vsearch.
Related Issues (20)
- fastq_stripleft when the resulting length is null?
- forward read trimming and filtering (Minardi et al. 2021) HOT 1
- control of 2 separate randseed events in sintax HOT 4
- from fasta files to an OTU table HOT 1
- --uchime_denovo takes abundance information into account HOT 1
- how to detect matches containing many ambiguous symbols? HOT 1
- more compile-time checks HOT 2
- Issue encountered when using vsearch --usearch_global to generate OTU frequency table HOT 3
- clean-up stale branches HOT 1
- --makeudb_usearch truncates fasta headers HOT 3
- maxseqlength is not supported by uchime_denovo command HOT 6
- vsearch --usearch_global not showing "full alignment" instead only the segment pair HOT 3
- vsearch --top_hits_only --maxaccepts 1 returns sometimes 2 values HOT 6
- missing userfields options
- Consequences of using vsearch on NovaSeq data HOT 4
- Fix warnings reported by Lintian HOT 2
- Obtaining the expected error for each read HOT 4
- Question about the query file of -usearch_global command when creating OTU tables HOT 6
- Convert Qiime2 database (2 files) into fasta database (1 file) for taxonomic assignment in vsearch HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vsearch.