k15z / intxeger Goto Github PK
View Code? Open in Web Editor NEWGenerate unique strings from regular expressions.
Home Page: https://k15z.github.io/IntXeger/
License: MIT License
Generate unique strings from regular expressions.
Home Page: https://k15z.github.io/IntXeger/
License: MIT License
The goal is to support array-based indexing for regular expressions with unbounded repeats. Currently, the max_repeats
parameter limits the number of times any sequence can be repeated, making it so that there are always a finite number of strings which can be generated from a regex.
After this change, the user will be able to choose between (1) specifying max_repeats and having a finite set of strings or (2) not specifying max_repeats and having an infinite set of strings they can iterate over and/or index into.
Modify the Repeat
class to apply Cantor's pairing function when max_repeat
is not specified.
x -> (a, b) # decompose the index into two values
b -> interpret this as the length of repeated sequence
a -> (a1, a2, a3, ... a_b) # convert it into `b` values
a_i -> the integer index of the `i`th element in the sequence
Note that the length attribute will be set to float(-inf)
.
Modify the Choice
class to handle both finite nodes and infinite nodes. It should assign the smallest integers to the finite nodes; then, once those are all assigned, it should start handling the infinite nodes by rotating between them.
For example, if your regex is:
(abc)|(abc)
Then it will say that length=2
and generate ["abc", "abc"]
since they're generated by different nodes in the tree. It's not clear what the solution is but this is a not a problem unique to intxeger
, other libraries such as exrex
also have this issue.
Hi there, I'm evaluating using this library instead of the alternatives since it looks quite nice. But I am enountering some issues.
For example, given this input:
from intxeger import build
regex = "a$"
result = build(regex)
I am getting this:
op = AT, args = AT_END, max_repeat = 10
def _to_node(op, args, max_repeat):
if op == sre_parse.IN:
nodes = []
for op, args in args:
nodes.append(_to_node(op, args, max_repeat))
if nodes[0] == "NEGATE":
values = [c[i] for c in nodes[1:] for i in range(c.length)]
nodes = [Constant(c) for c in string.printable if c not in values]
return Choice(nodes)
elif op == sre_parse.RANGE:
min_value, max_value = args
return Choice(
[Constant(chr(value)) for value in range(min_value, max_value + 1)]
)
elif op == sre_parse.LITERAL:
return Constant(chr(args))
elif op == sre_parse.NEGATE:
return "NEGATE"
elif op == sre_parse.CATEGORY:
return Choice([Constant(c) for c in CATEGORY_MAP[args]])
elif op == sre_parse.ANY:
return Choice([Constant(c) for c in string.printable])
elif op == sre_parse.ASSERT:
nodes = []
for op, args in args[1]:
nodes.append(_to_node(op, args, max_repeat))
return Concatenate(nodes)
elif op == sre_parse.BRANCH:
nodes = []
for group in args[1]:
subnodes = []
for op, args in group:
subnodes.append(_to_node(op, args, max_repeat))
nodes.append(Concatenate(subnodes))
return Choice(nodes)
elif op == sre_parse.SUBPATTERN:
nodes = []
ref_id = args[0]
for op, args in args[3]:
nodes.append(_to_node(op, args, max_repeat))
return Group(Concatenate(nodes), ref_id)
elif op == sre_parse.GROUPREF:
return GroupRef(ref_id=args)
elif op == sre_parse.MAX_REPEAT or op == sre_parse.MIN_REPEAT:
min_, max_, args = args
op, args = args[0]
if max_ == sre_parse.MAXREPEAT:
max_ = max_repeat
return Repeat(_to_node(op, args, max_repeat), min_, max_)
elif op == sre_parse.NOT_LITERAL:
return Choice([Constant(c) for c in string.printable if c != chr(args)])
else:
> raise ValueError(f"{op} {args}")
E ValueError: AT AT_END
Negation Group
r"[^abc]"
Lookahead
r"hello(?=world)"
Lookbehind
r"(?<=hello)world"
Backreference
r"(echo|kali)-\1"
intxeger.sample(regex, N)
method which builds the tree, optimize it, and uses it to generate N
samples.intxeger.iterator(regex, ordered=False)
generator which yields random or ordered samples.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.