Git Product home page Git Product logo

intxeger's People

Contributors

k15z avatar moreati avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

intxeger's Issues

Support countably infinite regular expressions

The goal is to support array-based indexing for regular expressions with unbounded repeats. Currently, the max_repeats parameter limits the number of times any sequence can be repeated, making it so that there are always a finite number of strings which can be generated from a regex.

After this change, the user will be able to choose between (1) specifying max_repeats and having a finite set of strings or (2) not specifying max_repeats and having an infinite set of strings they can iterate over and/or index into.

Repeat

Modify the Repeat class to apply Cantor's pairing function when max_repeat is not specified.

x -> (a, b) # decompose the index into two values
b -> interpret this as the length of repeated sequence
a -> (a1, a2, a3, ... a_b) # convert it into `b` values
a_i -> the integer index of the `i`th element in the sequence

Note that the length attribute will be set to float(-inf).

Choice

Modify the Choice class to handle both finite nodes and infinite nodes. It should assign the smallest integers to the finite nodes; then, once those are all assigned, it should start handling the infinite nodes by rotating between them.

Strings may not be unique if the regex is ambiguous

For example, if your regex is:

(abc)|(abc)

Then it will say that length=2 and generate ["abc", "abc"] since they're generated by different nodes in the tree. It's not clear what the solution is but this is a not a problem unique to intxeger, other libraries such as exrex also have this issue.

ValueError raised

Hi there, I'm evaluating using this library instead of the alternatives since it looks quite nice. But I am enountering some issues.

For example, given this input:

from intxeger import build

regex = "a$"
result = build(regex)

I am getting this:

op = AT, args = AT_END, max_repeat = 10

    def _to_node(op, args, max_repeat):
        if op == sre_parse.IN:
            nodes = []
            for op, args in args:
                nodes.append(_to_node(op, args, max_repeat))
            if nodes[0] == "NEGATE":
                values = [c[i] for c in nodes[1:] for i in range(c.length)]
                nodes = [Constant(c) for c in string.printable if c not in values]
            return Choice(nodes)
        elif op == sre_parse.RANGE:
            min_value, max_value = args
            return Choice(
                [Constant(chr(value)) for value in range(min_value, max_value + 1)]
            )
        elif op == sre_parse.LITERAL:
            return Constant(chr(args))
        elif op == sre_parse.NEGATE:
            return "NEGATE"
        elif op == sre_parse.CATEGORY:
            return Choice([Constant(c) for c in CATEGORY_MAP[args]])
        elif op == sre_parse.ANY:
            return Choice([Constant(c) for c in string.printable])
        elif op == sre_parse.ASSERT:
            nodes = []
            for op, args in args[1]:
                nodes.append(_to_node(op, args, max_repeat))
            return Concatenate(nodes)
        elif op == sre_parse.BRANCH:
            nodes = []
            for group in args[1]:
                subnodes = []
                for op, args in group:
                    subnodes.append(_to_node(op, args, max_repeat))
                nodes.append(Concatenate(subnodes))
            return Choice(nodes)
        elif op == sre_parse.SUBPATTERN:
            nodes = []
            ref_id = args[0]
            for op, args in args[3]:
                nodes.append(_to_node(op, args, max_repeat))
            return Group(Concatenate(nodes), ref_id)
        elif op == sre_parse.GROUPREF:
            return GroupRef(ref_id=args)
        elif op == sre_parse.MAX_REPEAT or op == sre_parse.MIN_REPEAT:
            min_, max_, args = args
            op, args = args[0]
            if max_ == sre_parse.MAXREPEAT:
                max_ = max_repeat
            return Repeat(_to_node(op, args, max_repeat), min_, max_)
        elif op == sre_parse.NOT_LITERAL:
            return Choice([Constant(c) for c in string.printable if c != chr(args)])
        else:
>           raise ValueError(f"{op} {args}")
E           ValueError: AT AT_END

Expand user API

  • Add an intxeger.sample(regex, N) method which builds the tree, optimize it, and uses it to generate N samples.
  • Add an intxeger.iterator(regex, ordered=False) generator which yields random or ordered samples.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.