signalsciences / ac Goto Github PK
View Code? Open in Web Editor NEWgolang Aho-Corasick for byte strings
License: BSD 3-Clause "New" or "Revised" License
golang Aho-Corasick for byte strings
License: BSD 3-Clause "New" or "Revised" License
see https://golang.org/pkg/regexp/ for details
and add this to tests.
We are testing the String versions of the functions but not the []byte
Should be easy to add to test harness
The old Match
function returned a []int
an list of indexes to the original dictionary input.
Redo so it returns the raw [][]byte or a []string or the exact list of what it found. For a few reasons
The original code has a number of minor corrections for latest golang style.
Current Match
returns a list of indexes into the original dictionary. We'll change that name to something like FindAll
and FindAllString
in a different ticket
Match
with return a bool if the input []byte
has a match.MatchString
will return a bool if the input string
has a matchTo make this as close to regex as possible:
The current node
struct does something like this
a bool
b int
c bool
when it could be
a bool
c bool
b int
which saves 8 bytes per instance.
Add TravisCI for automated build/test
acascii version crashes on non-ASCII dictionary. Should return error.
This functions knows the size of the array, but creates an empty holder and uses append.
We can just set the array to the correct size, skipped any re-allocations
func CompileString(dictionary []string) (*Matcher, error) {
m := new(Matcher)
var d [][]byte
for _, s := range dictionary {
d = append(d, []byte(s))
}
The dumb way to do case-insentive match is convert to upper-case when creating dictionary, and upper-case when doing a search. However, this using the ToUpper
make a copy (even for []byte
) and does full UTF-8 case folding which is slow and inappropriate for this use case.
The old assert style looks like something I might have done in 2013 too ;-)
Table-driven is the way to go now days.
Very good coverage however.
the original has comparison between itself vs. regex vs. linear search
maybe a bit excessive now days.
If we do this right, we can use an interface and reuse some code for this vs. regex
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.