Git Product home page Git Product logo

ac's People

Contributors

client9 avatar lsmith500 avatar shawnps avatar shawnps-sigsci avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ac's Issues

Add []byte versions to test

We are testing the String versions of the functions but not the []byte

Should be easy to add to test harness

Create FindAll and FindAllString from old Match function

The old Match function returned a []int an list of indexes to the original dictionary input.

Redo so it returns the raw [][]byte or a []string or the exact list of what it found. For a few reasons

  • matches regexp
  • easier to test
  • better api -- a different implimentation may not use the original list or keep it around.

Create "Match" and "MatchString" to match regex package

Current Match returns a list of indexes into the original dictionary. We'll change that name to something like FindAll and FindAllString in a different ticket

  • Match with return a bool if the input []byte has a match.
  • MatchString will return a bool if the input string has a match

Fix constructors to match package regex

To make this as close to regex as possible:

  • switch over NewMatcher and NewStringMatcher to Compile, CompileString
  • return object and error. Right now, there are no errors but this will future proof the API
    (for example, errors if duplicate items are passed could be don\e)
  • Add MustCompile, MustCompileString for ease in testing and matching regexp

Fix CompileStrings to use pre-size array, not append

This functions knows the size of the array, but creates an empty holder and uses append.

We can just set the array to the correct size, skipped any re-allocations

func CompileString(dictionary []string) (*Matcher, error) {
        m := new(Matcher)

        var d [][]byte
        for _, s := range dictionary {
                d = append(d, []byte(s))
        }

Create memory-free ASCII case-insensitive matching

The dumb way to do case-insentive match is convert to upper-case when creating dictionary, and upper-case when doing a search. However, this using the ToUpper make a copy (even for []byte) and does full UTF-8 case folding which is slow and inappropriate for this use case.

Convert old tests to table-driven

The old assert style looks like something I might have done in 2013 too ;-)
Table-driven is the way to go now days.

Very good coverage however.

Add in benchmarks

the original has comparison between itself vs. regex vs. linear search

maybe a bit excessive now days.

If we do this right, we can use an interface and reuse some code for this vs. regex

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.