noprompt / frak Goto Github PK
View Code? Open in Web Editor NEWTransform collections of strings into regular expressions.
Transform collections of strings into regular expressions.
As soon as string size in file goes beyong 205 character, stack overflow error is reported,
StackOverflowError clojure.lang.AFn.applyToHelper (AFn.java:155)
Do you have workaround or solution?
Is there any static allocation which causes the frak to accept strings smaller than particular size.?
When I replace [frak "0.1.2"] with [frak "0.1.3"] I get an exception
; project.clj
(defproject regard "1.0.0-SNAPSHOT"
:description "Spits out regexps, given keywords"
:dependencies [[org.clojure/clojure "1.3.0"]
[frak "0.1.3"]]
:main regard.core)
; core.clj
(ns regard.core)
(require 'frak)
(defn -main [& args]
"I don't do a whole lot."
(println (apply str args)))
$ lein repl
REPL started; server listening on localhost port 15186
CompilerException java.lang.RuntimeException: Unable to resolve symbol: mapv in this context, compiling:(frak.clj:157)
clojure.core=>
This:
(frak/pattern ["bits" "bite" "bit"])
produces bit(?:[se])?
but should produce bit[se]?
.
It would be useful to have the Node version published on npm, so people can just npm install frak -g
and then just use the frak
binary.
In the future, a non-binary version (that exports a frak
function for use in other scripts) could be made available though require('frak')
.
https://twitter.com/_munter_/status/388050796681891841 /cc @Munter
Nice project.
If I understand correctly, frak will generate a regex that matches the exact set of strings it was fed, right? It would also be nice to add a learning/generalization ability, so that e.g. if you feed it "foo1"
and "foo2"
, etc., it can generalize to /foo\d+/
rather than the strict /foo[12]/
.
At my previous job, I developed such a system but unfortunately the code never got out of the corporate walls. I'd feed the system a set of many thousands of legal citation strings (like "726 N.W.2d 852" or "42 U.S.C. § 405(c)(2)(C)" - see http://www.law.cornell.edu/citation/ for many more such examples) and it would create a regular expression to recognize these strings and any strings "in the same family". The idea was to combine a lexer (something that knew about the lexemes that could occur in the language - e.g. numbers, roman numerals, publication names, state names, etc.) with a regex generator based on the lexeme classes rather than raw characters.
Some of the difficulties involved figuring out just how much to generalize, which of course depends on the problem domain. There were also ambiguous lexemes - for example, "MD" is both a roman numeral and a state abbreviation, so I had to sense which made better sense in context.
The regexes got pretty big, even with the generalization (which tends to make them smaller), so I made one FSM per jurisdiction and fed them to GraphViz for some killer pictures. =)
Anyway, food for thought if you ever want to take this project in another direction.
Hi Joel,
Nice work! In order to make this piece of code more widely usable, can you make this compiled into a command line tool?
Much appreciate it!
Ye
It would be nice if frak builds were available from a public repository, so I could easily add a dependency on frak in my projects.
Here is a guide that explains how to upload artifacts to Maven Central: http://maven.apache.org/guides/mini/guide-central-repository-upload.html
Update: Maybe Clojars is a more natural fit for a Clojure library: https://clojars.org
I've created a pull request for Maven support: #5
Firstly, an epic tool, very nice!
What engine are you using?
I just want to confirm whether the /usr/share/dict/words
generated expression works? I've tried it twice but it's never matched anything.
I tried to integrate frak into a regular Clojure projet (https://gist.github.com/xaccrocheur/6333714#file-frak-command-line-attempts) but I cannot seem to get it to take my arguments, a clean vector of strings..?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.