madisonmay / commonregex Goto Github PK
View Code? Open in Web Editor NEWA collection of common regular expressions bundled with an easy to use interface.
License: MIT License
A collection of common regular expressions bundled with an easy to use interface.
License: MIT License
just a quick check whether you still keep an eye on this repo
This library is great, but it was last published to pypi on August 29, 2014. Any chance we can get the lastest published on there?
Is it possible to maintain a .md file with verbose regexes so contributors can understand it better and maybe improve or add features?
There seems to be an issue parsing prices. The extracted price figure is sometimes truncated for values in the thousands or greater
Example:
cr.CommonRegex("$6000").prices return ["$600"] instead of ["$6000"]
@madisonmay Check out this rust version of CommonRegex!
https://github.com/hskang9/CommonRegexRust
https://crates.io/crates/commonregex
Can I link this to README?
Thanks for build this commonregex and convenient to our works,but i still have a question.
Why it costs too much time?Also my cpu temperature rise rapidly.(the length of my content is 105851 bytes)
I've been testing CommonRegex on several different card types and it seems like Diners club doesn't get detected.
Currently this isn't in the latest pip version that's published it seems.
adding support for identifying tags and their attributes as well as replacement of attribute values making it more awesome. I am half way down there need some testing.
I am attempting to parse the following test-data.txt with version commonregex==1.5.4:
2523088780
social security number: 428-34-4474
this is far less expensive than the alternative
114 jeffery street
usa
from commonregex import CommonRegex
with open('./test-data.txt') as data:
parsed_text = CommonRegex(data.read())
and receiving the error:
parsed_text.ssn_number
↵
>>> AttributeError: 'CommonRegex' object has no attribute 'ssn_number'
no problems with emails, phones, etc:
>>> parsed_text.emails
↵
Appreciate it
I got this error on the most recent version of Commonregex (1.5.4):
AttributeError: 'CommonRegex' object has no attribute 'po_boxes'
My code just creates a parser object and tries to retrieve po_boxes.
This does not account for dates such as
6/21
for June 21st.
Cool as this already is, it would be even cooler if it supported non-US phone numbers. I'd try and do it myself, but given how little I currently know about regular expressions I'd probably be more of a hindrance than a help.
Instead of returning an array of literals, return an array of objects of matched text and start position inside the original parsed text. Also, it would be good to have a list of all matched texts sorted by their position on the original text.
While a hostname is not supposed to contain an underscore, a DNS entry can, which I think ends up being more relevant to what valid email addresses exist.
I suggest MIT.
test = CommonRegex('')
test.dates
<function regex.call..regex_method at 0x000000000389F048>
test.dates()
[]
test = CommonRegex('asdasd')
test.dates
[]
test.dates()
Traceback (most recent call last):
File "", line 1, in
TypeError: 'list' object is not callable
This is on python 3.3.3 on Win 7 x64bit
I won't be surprised if this if my failure to understand a concept :)
Obvious workaround to detect empty strings before passing to CommonRegex is obvious.
3015 POE RD., HOUSTON, TX 77051
Do I need to do something first, or is this something that can be corrected in the street parser?
Oh, the call I am making is
for sa in street_address.finditer(res):
res = sa.string
break
This is working for other addresses but first one it failed on. Also unlike RE, it stopped on a \n and ignored the next line. Do I need to pre-parse the text?
You just apply all regex on create instance of class. IMO is much better to do it in lazy way usign sth like cached properites.
Traceback (most recent call last):
File "csv_parse.py", line 14, in
parsed_text = CommonRegex({row[2]})
File "/Users/mviraktamath/.pyenv/versions/3.8.6/lib/python3.8/site-packages/commonregex.py", line 53, in init
setattr(self, key, method())
File "/Users/mviraktamath/.pyenv/versions/3.8.6/lib/python3.8/site-packages/commonregex.py", line 39, in regex_method
return [x.strip() for x in self.regex.findall(text or self.obj.text)]
TypeError: expected string or bytes-like object
Can we get IPv6 support on .ip? Wouldn't mind if it were a .ipv6, actually, to keep it separate.
CommonRegex("2001:0db8::ff00:0042:8329").ip
[]
www.google.com works great, but google.com does not: []
I made a PHP port of this: https://github.com/james2doyle/CommonRegexPHP
Can I link to it in the README?
Hi, nice tool, I have been testing it and works well. I'd love to find out if it is just me or there is a bug somewhere when using zip_codes and ssn_number options, this is my script and works fine as long as I have the two last lines commented out.
from commonregex import CommonRegex
print("STARTING COMMON REGEX")
parsed_text = CommonRegex("""SOME TEXT HERE""")
print(parsed_text.times)
print(parsed_text.emails)
print(parsed_text.links)
print(parsed_text.phones)
print(parsed_text.street_addresses)
print(parsed_text.btc_addresses)
print(parsed_text.credit_cards)
print(parsed_text.prices)
print(parsed_text.ipv6s)
print(parsed_text.ips)
print(parsed_text.dates)
#print(parsed_text.zip_codes)
#print(parsed_text.ssn_number)
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.