A Ruby binding to re2, an "efficient, principled regular expression library".
Current version: 0.6.0
Supported Ruby versions: 1.8.7, 1.9.2, 1.9.3, 2.0.0, 2.1.0, Rubinius 2.2
You will need re2 installed as well as a C++ compiler such as gcc (on Debian and Ubuntu, this is provided by the build-essential package). If you are using Mac OS X, I recommend installing re2 with Homebrew by running the following:
$ brew install re2
If you are using Debian, you can install the libre2-dev package like so:
$ sudo apt-get install libre2-dev
If you are using a packaged Ruby distribution, make sure you also have the Ruby header files installed such as those provided by the ruby-dev package on Debian and Ubuntu.
You can then install the library via RubyGems with gem install re2
or gem install re2 -- --with-re2-dir=/opt/local/re2
if re2 is not installed in the
default location of /usr/local/
.
Full documentation automatically generated from the latest version is available at http://rubydoc.info/github/mudge/re2.
Bear in mind that re2's regular expression syntax differs from PCRE, see the official syntax page for more details.
You can use re2 as a mostly drop-in replacement for Ruby's own Regexp and MatchData classes:
$ irb -rubygems
> require 're2'
> r = RE2::Regexp.new('w(\d)(\d+)')
=> #<RE2::Regexp /w(\d)(\d+)/>
> m = r.match("w1234")
=> #<RE2::MatchData "w1234" 1:"1" 2:"234">
> m[1]
=> "1"
> m.string
=> "w1234"
> r =~ "w1234"
=> true
> r !~ "bob"
=> true
> r.match("bob")
=> nil
As RE2::Regexp.new
(or RE2::Regexp.compile
) can be quite verbose, a helper
method has been defined against Kernel
so you can use a shorter version to
create regular expressions:
> RE2('(\d+)')
=> #<RE2::Regexp /(\d+)/>
Note the use of single quotes as double quotes will interpret \d
as d
as
in the following example:
> RE2("(\d+)")
=> #<RE2::Regexp /(d+)/>
As of 0.3.0, you can use named groups:
> r = RE2::Regexp.new('(?P<name>\w+) (?P<age>\d+)')
=> #<RE2::Regexp /(?P<name>\w+) (?P<age>\d+)/>
> m = r.match("Bob 40")
=> #<RE2::MatchData "Bob 40" 1:"Bob" 2:"40">
> m[:name]
=> "Bob"
> m["age"]
=> "40"
As of 0.6.0, you can use RE2::Regexp#scan
to incrementally scan text for
matches (similar in purpose to Ruby's
String#scan
).
Calling scan
will return an RE2::Scanner
which is
enumerable meaning you can
use each
to iterate through the matches (and even use
Enumerator::Lazy
):
re = RE2('(\w+)')
scanner = re.scan("It is a truth universally acknowledged")
scanner.each do |match|
puts match
end
scanner.rewind
enum = scanner.to_enum
enum.next #=> ["It"]
enum.next #=> ["is"]
-
Pre-compiling regular expressions with
RE2::Regexp.new(re)
,RE2::Regexp.compile(re)
orRE2(re)
(including specifying options, e.g.RE2::Regexp.new("pattern", :case_sensitive => false)
-
Extracting matches with
re2.match(text)
(and an exact number of matches withre2.match(text, number_of_matches)
such asre2.match("123-234", 2)
) -
Extracting matches by name (both with strings and symbols)
-
Checking for matches with
re2 =~ text
,re2 === text
(for use incase
statements) andre2 !~ text
-
Incrementally scanning text with
re2.scan(text)
-
Checking regular expression compilation with
re2.ok?
,re2.error
andre2.error_arg
-
Checking regular expression "cost" with
re2.program_size
-
Checking the options for an expression with
re2.options
or individually withre2.case_sensitive?
-
Performing a single string replacement with
pattern.replace(replacement, original)
-
Performing a global string replacement with
pattern.replace_all(replacement, original)
-
Escaping regular expressions with
RE2.escape(unquoted)
andRE2.quote(unquoted)
All feedback should go to the mailing list: mailto:[email protected]