Comments (9)
thanks for the quick reply! I think I changed the wrong line (https://github.com/jermp/sshash/blob/master/include/kmer.hpp#L6), will let you know if this works.
Just testing fulgor vs themisto v3 with mSWEEP for a certain downstream application for now.
from fulgor.
I forgot to mention that this happens after fulgor enters the following step:
building minimizers MPHF (PTHash) with 8 threads...
from fulgor.
Hi,
yes it means there are more than 1B minimizer for m=20, hence the minimal perfect hash function use in SSHash (which is PTHash) would require 128-bit hash codes to be safe.
It should be enough to change the default hasher in SSHash for minimizers, here: https://github.com/jermp/sshash/blob/master/include/hash_util.hpp#L51.
from fulgor.
Should you have some update on the Themisto index, I'd love to know. Thanks!
from fulgor.
Thanks for the info @tmaklin :). One thing (of which I'm certain you are aware) is that the current approaches for reporting the pseudoalignment results are far from space optimal.
I'm aware this is something you've worked on in alignment-writer. If it makes sense to design and converge on a more standard and compact output format for these tools, this is something that we'd certainly be interested in. For example, we have a binary format (the RAD) format that we use in alevin-fry that addresses a variant of this problem. However, it would be nice to generalize and to understand what information makes sense for different use cases.
Cheers,
Rob
from fulgor.
Indexing the data overnight worked so closing this. Thanks!
Re the file format, a standardized and compact output for all the different tools sounds great! Alignment-writer (think I need to figure out a better name) is a wrapper around BitMagic and achieves anything from 10x to 100x compression on the test cases I used while developing but the efficiency naturally depends on the complexity of the alignment. I'll be adding support for the format fulgor currently uses soon.
Some issues I've noticed with the formats while developing and using the various tools, roughly in order of headaches caused
- Total number of alignment targets can't be inferred with certainty from the format.
- Not printing empty lines for no alignments.
- Fragment names instead of the position of the read in the fastq files (makes sorting the file difficult and slows down matching the alignments with the reads if the file is not sorted).
- Total number of reads can't be inferred.
- Multiple files to store the results (for example unique alignments + their counts).
- I tend to prefer formats that support streaming the results rather than having to wait for the whole alignment to finish, or conversely read in the whole file before processing the results.
Would be nice to discuss/work on this at some point.
from fulgor.
Hi @tmaklin,
yes, a common alignment format would be very nice. Happy to work on this together if you like.
from fulgor.
Likewise. I have some thoughts on this as well :). Shall we open a separate issue for it? Or, even better, @jermp, if you can enable “discussions” on this repo.
from fulgor.
Good idea, discussions enabled!
from fulgor.
Related Issues (14)
- Error in ggcat_querier compilation HOT 5
- How does fulgor handle multi-mappers? HOT 15
- Num_contigs must be less than 2^32 Aborted (core dumped) HOT 27
- Compilation error HOT 4
- Missing -lrt linker flag on Ubuntu HOT 3
- Add a note in the README HOT 1
- Even better build pipeline for meta colored dBG HOT 1
- Fulgor build failed because of 128bits integers HOT 10
- Feature request: creates a distinct color for each sequence in the input file HOT 5
- How to interpret the color dump file? HOT 18
- Consistent terminology HOT 9
- Fulgor Indexing Error Due to Empty Bucket Detection HOT 1
- Build fails due to missing header HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fulgor.