Comments (7)
The implementation may be different for different software. The end product will be incorporating the Bloom Filter into our containers to filter the data in a pipe.
- BIGSI is one example. Although it's backwards to what we're hoping to do. In this case there already is a container available so we would only have to modify it to accept data from a stream and/or incorporate it into the
serratus-dl
containers.
I honestly am not well versed on all the options available and it needs to be researched and then we need to compare the speed/efficiency of each Bloom Filter to ensure it doesn't lag the whole pipeline. Therefore we can break this task down into two objectives
-
Curate a list of available bloom filter software. Include licenses, is it command line, language it's written in, website, academic paper reference if available.
-
Build each software package in a container and test how long it takes to process a benchmark set of RNA-sequencing libraries against a small test filter.
-
If needed, can we alter the program to accept RNA-sequencing reads (in fastq) format from a data-pipe.
See Also: Wikipedia
from serratus.
@emreerhan we could use your sage advice on this issue when you get a chance.
from serratus.
@jefftaylor42 @ababaian
- What exactly do you mean by software implementation? A library? A standalone program? An AWS functionality?
- In which programming language should it be?
from serratus.
@jefftaylor42 @ababaian I suggest adding tag Bioinformatics
to this issue.
from serratus.
My first thought is the bloom filter library developed by the Birol lab and used in BioBloom tools: https://github.com/bcgsc/biobloom
License: GPL3 licence (free for academic use; not sure if this is an issue)
Language: C++; also has command line interface
doi: 10.1093/bioinformatics/btu558
from serratus.
BIGSI has an incredibly easy to use bloom filter class. https://github.com/Phelimb/BIGSI/blob/master/bigsi/bloom/bloomfilter.py
License: MIT
Language: Python
doi: http://dx.doi.org/10.1038/s41587-018-0010-1
from serratus.
We'll tentatively shut down Bloom Filters until there is a good reason to re-visit them.
from serratus.
Related Issues (20)
- Migrate improved CoV assemblies to lovelywater, and annotate them
- Deal with suppressed phage runs HOT 1
- Brief SRA metadata for a subset of SRR ids HOT 3
- Download fasta file HOT 10
- Phylogenetic and taxonomic analyses for any viral contigs HOT 5
- Hello! Many parts of this tutorial “Running Serratus”are confusing to me. Could you be more specific? HOT 6
- all serratus assemblies (60k accessions) HOT 13
- Novel viruses identified by Serratus HOT 2
- Speed up SRA download: aws s3 cp or sratoolkit? HOT 7
- Is Serratus running on most recent SRAs? HOT 4
- palmID viral RdRp analysis crash HOT 10
- additional things to migrate to lovelywater:
- Bug in the SRA meta-data analysis from palmID
- Tree data HOT 2
- Website is down HOT 2
- scRNA-seq analysis with Serratus HOT 1
- Adding new column to the Serratus search? HOT 1
- question about cloud budget HOT 1
- BUG rows on srarun missing values on the spots column HOT 1
- Serratus-Lite summarizer script HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from serratus.