peterjoel / needle Goto Github PK
View Code? Open in Web Editor NEWSearch algorithms written in Rust
License: MIT License
Search algorithms written in Rust
License: MIT License
Gives error
error[E0554]: #![feature] may not be used on the stable release channel
--> C:\Users\pdaniels.cargo\registry\src\github.com-1ecc6299db9ec823\needle-0.1.1\src\lib.rs:2:1
|
2 | #![feature(test)]
| ^^^^^^^^^^^^^^^^^
error: aborting due to previous error
error: Could not compile needle
.
Accessing the good suffix table can have detrimental effect on the speed of the search. We may be able to detect when to do a search without it. We may need to introduce a trait to capture the algorithm variants in order to keep it fast.
Currently implemented as random access. Allow inputs to be iterators. If possible use a common trait, but a separate API is ok too.
Searching shouldn't be limited to &[u8]
. We should support &[T]
(possibly with some constraints), or even use Iterator<T>
as input.
Add an optional max_threads
attribute to split search out into multiple threads.
This might be complicated (or not really make sense at all) for an iterator- or stream-based API. For finding all matches of a known-length string, it could work though.
#[test]
fn needle_genome() {
let haystack = b"CGGACTCGACAGATGTGAAGAACGACAATGTGAAGACTCGACACGACAGAGTGAAGAGAAGAGGAAACATTGTAA";
let needle = needle::BoyerMoore::new(&b"GAAGA"[..]);
assert_eq!(
vec![16, 31, 52, 57],
needle.find_overlapping_in(haystack).collect::<Vec<usize>>()
);
}
This test fails; needle finds [16, 31, 52]
.
Am I mis-using something or is this just not finding all the matches? The last one isn't even overlapping.
These can both be optimised better than find, especially for types like String because UTF-8 can be treated as [u8]
when you don't need to know the index.
Similarly, an iterator-based API could possibly implement fast search-and-replace on UTF-8 because it wouldn't need to count the characters. This is more complicated though, and might not work, depending offer an improvement on brute-force.
Add an attribute to disallow overlapping matches.
Currently, every item has to be marked #[cfg(test)]
. It would be better to put it in a module.
find_first()
is very limited. Instead return an iterator from find()
, which will allow the same behaviour as find_first()
, using find().next()
, but also allow finding all (or some) matches ergonomically.
Instead of always using ==
, there could be opportunities to match on other relations. For example, case-insensitive string matching, or wildcards.
For example, matching a DNA sequence using A C G T
nucleotide alphabet, a user might want to match X
for unknown/any, or even implement this full list of partially known matches. To do that, they would have to have the ability to redefine "equality".
This might be a bit weird though, as this type of matching function would not be a partial order, since the relation would not be transitive when wild-card elements are involved.
It should be possible to make a fast function for replacing subsequences from an iterator. Approximately:
for x in bytes.into_iter().replace_seq(&[1,2,3], &[9,9,9,9,9]) {
// ....
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.