jermp / lphash Goto Github PK
View Code? Open in Web Editor NEWFast and compact locality-preserving minimal perfect hashing for k-mer sets.
License: MIT License
Fast and compact locality-preserving minimal perfect hashing for k-mer sets.
License: MIT License
Hello,
Would it be possible to implement multiple threads for the query-p command. In particular, a very simple implementation would just be to have each thread analyze a different subset of the sequences in the input fasta file. Thanks for any assistance.
Hello and thank you for this great tool! I was wondering if it would be possible to associate metadata with the kmers. The paper mentions abundance counts, reference identifiers, or contig identifiers as possible satellite values. How would one go about querying your lphash database for this information? Thanks for any insights.
(I understand that the database size and database construction time would not be optimized for this use case, but I am most interested in keeping a fast lookup time.)
Hello,
The building time for me seems to be much higher than the results in Table 5 of the paper. What value of k corresponds to the results in Table 5? I downloaded both the k=31 and k=63 human files from Zenodo and ran lphash to build the database.
Specifically, I ran:
lphash build-p -i human.k63.unitigs.fa.ust.fa.gz -c 5 -k 63 -m 28 --check -o human_c5_k63_m28.lph --verbose -t 4
How long is this expected to take? Thank you.
Hello,
I ran lphash on the S. cerevisiae reference genome and received the following error:
[Error] different hashes, maybe there were some Ns in the input (not supported as of now)
Specifically, I first ran bcalm with -kmer-size 31 and -abundance-min 1, then I ran ust with -k 31, then I ran lphash build-p with -k 31 and -m 15. As far as I can tell, there are no Ns in the genome.
Is this an error I should be concerned about? Thanks for your assistance.
I do notice that the genome has upper and lower case letters. Does lphash convert everything to uppercase for both building and querying?
Hello,
Based on my testing, Lphash doesn't convert kmers into their canonical representation. I would expect a kmer and its reverse complement to hash to the same value.
For example, BCALM 2 converts all k-mers into their canonical representation with respect to reverse-complements.
I believe that the kmers will need to be converted to their canonical form both during the building and the querying steps. Thanks for any insight.
Also, what is the expected output when you query lphash for a kmer that is not in the database? Does lphash return a particular value? I'm not 100% sure, but I think right now it will still output a hash number (that will collide with another kmer actually in the database). Thanks for any insights.
Update PTHash to latest commit, so that cmd_line_parser
does not take any positional arguments.
#pragma once
directive instead of if-define
guards.external/
(or another directory), such as kseq.h
and BooPHF.hpp
.prettyprint.hpp
).clang-format
. For example: do not use if-else
without parentheses. Only use one-liner if
s without parentheses.src/
only the executables for building and performing queries. We can also have a unique driver program named lphash
that takes as input an argument build
or query
specifying the sub-tool to use. See an example here: https://github.com/jermp/sshash-lite/blob/main/src/sshash-lite.cpp. This also implies having a unique tool to build both data structures, partitioned and un-partitioned (currently called -alt
).tests/
.build_configuration
class to build the data structures with default parameters, as used here and here for example. A build_configuration
object is then passed as input to the constructor of the mphf.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.