Git Product home page Git Product logo

google / re2 Goto Github PK

View Code? Open in Web Editor NEW
8.6K 239.0 1.1K 9.82 MB

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

License: BSD 3-Clause "New" or "Revised" License

Perl 0.49% Shell 0.31% Makefile 0.97% C++ 91.18% Python 4.97% CMake 0.71% Starlark 1.10% TypeScript 0.21% HTML 0.02% JavaScript 0.06%

re2's Introduction

This is the source code repository for RE2, a regular expression library.

For documentation about how to install and use RE2,
visit https://github.com/google/re2/.

The short version is:

make
make test
make install
make testinstall

Building RE2 requires Abseil (https://github.com/abseil/abseil-cpp)
to be installed on your system. Building the testing for RE2 requires
GoogleTest (https://github.com/google/googletest) and Benchmark
(https://github.com/google/benchmark) to be installed as well.

There is a fair amount of documentation (including code snippets) in
the re2.h header file.

More information can be found on the wiki:
https://github.com/google/re2/wiki

Issue tracker:
https://github.com/google/re2/issues

Mailing list:
https://groups.google.com/group/re2-dev

Unless otherwise noted, the RE2 source files are distributed
under the BSD-style license found in the LICENSE file.

RE2's native language is C++.

The Python wrapper is at https://github.com/google/re2/tree/main/python
and on PyPI (https://pypi.org/project/google-re2/).

A C wrapper is at https://github.com/marcomaggi/cre2/.
A D wrapper is at https://github.com/ShigekiKarita/re2d/ and on DUB (code.dlang.org).
An Erlang wrapper is at https://github.com/dukesoferl/re2/ and on Hex (hex.pm).
An Inferno wrapper is at https://github.com/powerman/inferno-re2/.
A Node.js wrapper is at https://github.com/uhop/node-re2/ and on NPM (npmjs.com).
An OCaml wrapper is at https://github.com/janestreet/re2/ and on OPAM (opam.ocaml.org).
A Perl wrapper is at https://github.com/dgl/re-engine-RE2/ and on CPAN (cpan.org).
An R wrapper is at https://github.com/girishji/re2/ and on CRAN (cran.r-project.org).
A Ruby wrapper is at https://github.com/mudge/re2/ and on RubyGems (rubygems.org).
A WebAssembly wrapper is at https://github.com/google/re2-wasm/ and on NPM (npmjs.com).

re2's People

Contributors

allen-webb avatar ariccio avatar bencsikandrei avatar cedk avatar ckennelly avatar damienmg avatar daoswald avatar deansturtevant1 avatar dneto0 avatar donis- avatar dougkwan avatar dvyukov avatar eel76 avatar ehsannas avatar hannahshisfb avatar hanwen avatar hjmallon avatar jserv avatar junyer avatar kellerb avatar legrosbuffle avatar lluixhi avatar nico avatar pkasting avatar randomascii avatar rsc avatar shahms avatar stefanor avatar tfarina avatar wdv4758h avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

re2's Issues

build failure: Mac OS X before 10.9

Commit 59d9612 made a change to the symbols list on Darwin in order to fix an issue preventing re2 from compiling on Mac OS X 10.9, which defaults to the libc++ C++ stdlib. Since the new symbol that was replaced only occurs in libc++, this has the effect of breaking re2 on older versions of OS X by default and on any non-clang compiler on 10.9.

This issue is reposted from https://code.google.com/p/re2/issues/detail?id=99.

Bad link to docs in description

The link in the markdown description to the brief docs still refers to code.google.com.
Which no longer exists. Where are the docs?

Missing constructor definition for RE2::Arg::Arg(T*, Parser parser)

This constructor is declared but never defined:
https://code.google.com/p/re2/source/browse/re2/re2.h#799

What steps will reproduce the problem?

  1. Define bool ParseMyType(const char* str, int n, void* dest) { ... }
  2. Call RE2::FullMatch(<string>, <regex>, RE2::Arg(&my_type, ParseMyType));
  3. Get linker error

What is the expected output? What do you see instead?

Constructor should be defined somewhere, something like this:

template <typename T>
RE2::Arg::Arg(T* p, RE2::Arg::Parser parser) : arg_(p), parser_(parser) { }

`vsnprintf_s` in `StringAppendV`

In util.h, vsnprintf is #defined as vsnprintf_s:

#define vsnprintf vsnprintf_s

....and it's used twice, both in re2::StringAppendV.

The first use:

  // First try with a small fixed size buffer
  char space[1024];

  // It's possible for methods that use a va_list to invalidate
  // the data in it upon use.  The fix is to make a copy
  // of the structure before using it and use that copy instead.
  va_list backup_ap;
  va_copy(backup_ap, ap);
  int result = vsnprintf(space, sizeof(space), format, backup_ap);
  va_end(backup_ap);

...uses a stack-allocated buffer, and thus template parameter inference works it's magic to infer the size of the buffer.

The second use:

  // Repeatedly increase buffer size until it fits
  int length = sizeof(space);
  while (true) {
    if (result < 0) {
      // Older behavior: just try doubling the buffer size
      length *= 2;
    } else {
      // We need exactly "result+1" characters
      length = result+1;
    }
    char* buf = new char[length];

    // Restore the va_list before we use it again
    va_copy(backup_ap, ap);
    result = vsnprintf(buf, length, format, backup_ap);
    va_end(backup_ap);

...uses a dynamically allocated buffer (char* buf = new char[length]), and thus template parameter inference fails. MSVC generates:

error C2660: 'vsnprintf_s' : function does not take 4 arguments

The correct code looks something like:

result = vsnprintf(buf, length, _TRUNCATE, format, backup_ap);

ld: fatal: unrecognized option '--version-script=libre2.symbols'

With REBUILD_FILES=1 set:

$ make
...
g++ -shared -Wl,-soname,libre2.so.0,--version-script=libre2.symbols -pthread -o obj/so/libre2.so.0 obj/so/util/hash.o obj/so/util/rune.o obj/so/util/stringprintf.o obj/so/util/strutil.o obj/so/util/valgrind.o obj/so/re2/bitstate.o obj/so/re2/compile.o obj/so/re2/dfa.o obj/so/re2/filtered_re2.o obj/so/re2/mimics_pcre.o obj/so/re2/nfa.o obj/so/re2/onepass.o obj/so/re2/parse.o obj/so/re2/perl_groups.o obj/so/re2/prefilter.o obj/so/re2/prefilter_tree.o obj/so/re2/prog.o obj/so/re2/re2.o obj/so/re2/regexp.o obj/so/re2/set.o obj/so/re2/simplify.o obj/so/re2/stringpiece.o obj/so/re2/tostring.o obj/so/re2/unicode_casefold.o obj/so/re2/unicode_groups.o
ld: fatal: unrecognized option '--version-script=libre2.symbols'
ld: fatal: use the -z help option for usage information
collect2: error: ld returned 1 exit status
make: *** [obj/so/libre2.so] Error 1
rm re2/unicode_groups.cc re2/unicode_casefold.cc

Not support GBK encoding?

when i tried to use gbk encoding characters, it told me:
re2/re2.cc:201: Error parsing '好': invalid UTF-8
re2/re2.cc:795: Invalid RE2: invalid UTF-8

Replace function fails for some characters

I want to replace url in a string and following is a basic scenario to reproduce bug.

urlToReplace is a string variable and it contains following data (ignore whitespace placed just before the a tag. Otherwise github does not show example):
< a href = "http://example.com/secure/ViewProfile.jspa?name=bug">

url variable is a string and contains following data:
http://example.com/secure/ViewProfile.jspa?name=bug

Failure occurs at this re2 function call.
bool result = re2::RE2::Replace(&urlToReplace, url, "Data to write...");

Result is false. It should be true. As far as i have seen problem is caused by ? character which exists at both urlToReplace variable and url variable.

Syntax page neglects several flags

Originally reported Sep 24, 2014.

What steps will reproduce the problem?

  1. Observe the list of flags supported in Perl
  2. Observe the list of flags documented for RE2

What is the expected output? What do you see instead?

I expected the Perl flags to be mentioned in RE2's documentation. In particular, I was interested in /x because it is so so useful.

Create releases in Github

Hi, we've been using re2 at double negative for some time, previously we pulled tarballs down from googlecode into our build system for each version of re2. Now that you've switched to Github though we can't pull a tarball until a release is made.

Would it be possible to have a release made sometime soon? The previous versioning system like 20150428 would be fine.

Thanks, Shane.

Get pattern from pre-compiled re2 object.

I have requirement where I need to get pattern(in string format) from pre-compiled object. This is needed because I create pre-compiled object in main thread. And passes the object further where I use FullMatch function to do matching. But here I need to see the pattern which I am using. Is there a way to get it?

thanks
Nitin

RE2 fails to compile under MSVC 2015 due to sprintf

Util.h, around line 68, redefines sprintf to sprintf_s:

ifdef _WIN32

define snprintf _snprintf_s

define sprintf sprintf_s

However, sprintf_s has different parameters. The call to sprintf in strutil.cc at line 42 causes a compile error:
sprintf(dest + used, "%03o", c);

C2664 'int sprintf_s(char *const ,const size_t,const char *const ,...)': cannot convert argument 2 from 'const char [6]' to 'const size_t' re2 c:\src\re2\util\strutil.cc 41

Some possible ways to correct this are:

  1. Do not redefine sprintf. Build MSVC project with macro _CRT_SECURE_NO_DEPRECATE defined.
  2. Do not redefine sprintf. Use something like:
    #ifdef _WIN32
    sprintf_s(dest + used, 5, "%03o", c);
    #else
    sprintf(dest + used, "%03o", c);
    #endif

Valgrind memory error

This doesn't seem to cause any problems (that I'm aware of) but I noticed a few warnings when running my code through valgrind on OS X

==20082== Conditional jump or move depends on uninitialised value(s)
==20082==    at 0x30D96AA: pthread_rwlock_init (in /usr/lib/system/libsystem_c.dylib)
==20082==    by 0xE45D6: re2::Mutex::Mutex() 
==20082==    by 0xE0164: re2::Mutex::Mutex() 
==20082==    by 0xD66A1: re2::Prog::Prog() 
==20082==    by 0xD6624: re2::Prog::Prog() 
==20082==    by 0xB3666: re2::Compiler::Compiler() 
==20082==    by 0xB35E4: re2::Compiler::Compiler() 
==20082==    by 0xB6016: re2::Compiler::Compile(re2::Regexp*, bool, long long) 
==20082==    by 0xB6B53: re2::Regexp::CompileToProg(long long) 
==20082==    by 0xD8506: re2::RE2::Init(re2::StringPiece const&, re2::RE2::Options const&) 
==20082==    by 0xD88E1: re2::RE2::RE2(std::string const&) 
==20082==    by 0xD882C: re2::RE2::RE2(std::string const&) 


==20085== Conditional jump or move depends on uninitialised value(s)
==20085==    at 0x30D96AA: pthread_rwlock_init (in /usr/lib/system/libsystem_c.dylib)
==20085==    by 0xE45D6: re2::Mutex::Mutex() 
==20085==    by 0xE0164: re2::Mutex::Mutex() 
==20085==    by 0xD7F39: re2::RE2::Init(re2::StringPiece const&, re2::RE2::Options const&) 
==20085==    by 0xD88E1: re2::RE2::RE2(std::string const&) 
==20085==    by 0xD882C: re2::RE2::RE2(std::string const&) 

missing #include for gettimeofday()

bazel build :all
INFO: Reading 'startup' options from /usr/local/google/home/hanwen/.bazelrc: --watchfs
......
INFO: Found 26 targets...
INFO: From Compiling util/benchmark.cc:
util/benchmark.cc: In function 're2::int64 nsec()':
util/benchmark.cc:29:24: error: 'gettimeofday' was not declared in this scope
if(gettimeofday(&tv, 0) < 0)
^
ERROR: /usr/local/google/home/hanwen/vc/re2/BUILD:73:1: C++ compilation of rule '//:bench-main' failed: gcc failed: error executing command /usr/bin/gcc -iquote . -iquote bazel-out/local_linux-fastbuild/genfiles -isystem . -isystem bazel-out/local_linux-fastbuild/genfiles -isystem tools/cpp/gcc3 '-std=c++0x' ... (remaining 9 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
INFO: Elapsed time: 4.742s, Critical Path: 3.51s

Disabling long long use with RE2_HAVE_LONGLONG does not work.

In 376ee99 RE2_HAVE_LONGLONG was introduced for disabling methods that use C++11 long long when using a C++03 compiler.

This patch is faulty as at the top of re2/re2.h RE2_HAVE_LONGLONG is set to 0 if it is not already defined. This means that the #ifdef RE2_HAVE_LONGLONGs always evaluate to true.

Test failure on Cygwin

What steps will reproduce the problem?

  1. Install re2 prerequisites on a Cygwin64 environment
  2. Download re2 and run "make && make install"
  3. Make fixes specified at http://stackoverflow.com/a/8393475/4027566 and run "make test".

What is the expected output? What do you see instead?
All tests are expected to pass; however, obj/dbg/test/re2_test fails.

What version of the product are you using? On what operating system?
The latest from mercurial as of 2014-12-03 04:00UTC.

Please provide any additional information below.
The output from the logfile is as follows:
$ cat obj/dbg/test/re2_test.log
re2/testing/re2_test.cc:688: Memory at 0x6fffffe0000
re2/testing/re2_test.cc:826: Check failed: (v) == (6700000000081920.1f)6.6999997e+15 != 6.7000003e+15

FullMatchN crashes

I encountered a crash when calling FullMatchN.
From the code below, the first and third block are OK, but the second block (below the comment) will crash. Is there a problem with my code or a bug in re2?

#include <re2/re2.h>
#include <iostream>

using namespace re2;

int main() {
  {
    std::string s;
    int i = 0;
    if (RE2::FullMatch("hello123436", "h(e)l+o([0-9]+)", &s, &i)) {
      std::cout << "1. match\n";
      std::cout << "s = " << s << "\n";
      std::cout << "i = " << i << "\n";
    }
  }

  {
    std::string s;
    int i = 0;
    RE2::Arg* args = new RE2::Arg[2];
    args[0] = &s;
    args[1] = &i;

    /* the line below will crash, why? */
    if (RE2::FullMatchN("hello123436", "h(e)l+o([0-9]+)", &args, 2)) {
      std::cout << "2. match\n";
      std::cout << "s = " << s << "\n";
      std::cout << "i = " << i << "\n";
    }
    delete[] args;
  }

  {

    std::string s;
    int i = 0;

    RE2::Arg arg1 = &s;
    RE2::Arg arg2 = &i;
    const RE2::Arg* args[2] = {&arg1, &arg2};

    if (RE2::FullMatchN("hello123436", "h(e)l+o([0-9]+)", args, 2)) {
      std::cout << "3. match\n";
      std::cout << "s = " << s << "\n";
      std::cout << "i = " << i << "\n";
    }
  }
}

Bug in util/util.h and -maybe- in Android.mk

Hi there,

I've discovered a problem when trying to build re2 for M-Dessert, where stlport is not used any more.
In particular, re2 doesn't build because of the following lines in util.h:

#if defined(ANDROID)

#if defined(_STLPORT_VERSION)
#include <unordered_set>      // using stlport
#else
#include <tr1/unordered_set>  // using gnustl
#endif
using std::tr1::unordered_set;

#elif defined(__GNUC__) && !defined(USE_CXX0X)

#include <tr1/unordered_set>
using std::tr1::unordered_set;

#else

#include <unordered_set>
using std::unordered_set;

#endif

And because of this directive in Android.mk
LOCAL_SDK_VERSION := 14.

The fix "seems to be" the following:

#if defined(ANDROID) 

#if defined(_STLPORT_VERSION) || defined(_USING_LIBCXX)
#include <unordered_set>      // using stlport
using std::unordered_set;
#else
#include <tr1/unordered_set>  // using gnustl
using std::tr1::unordered_set;
#endif

#elif defined(__GNUC__) && !defined(USE_CXX0X)

#include <tr1/unordered_set>
using std::tr1::unordered_set;

#else

#include <unordered_set>
using std::unordered_set;

#endif

And there is the need to comment "LOCAL_SDK_VERSION := 14" in the makefile.

Please elaborate.
C

Compiling without pcre in MSVC 2013 causes link errors

If I try to compile the test suite in MSVC 2013 without pcre, (removing pcre.cc from project test and project benchmark) I get link errors because entry points from pcre.cc are not found:

1>------ Build started: Project: benchmark, Configuration: Release x64 ------
2>------ Build started: Project: dfa_test, Configuration: Release x64 ------
2> dfa_test.cc
1> benchmark.cc
1>E:\src\re2\util\benchmark.cc(29): error C2079: 'tv' uses undefined struct 'nsec::timeval'
1>E:\src\re2\util\benchmark.cc(30): error C3861: 'gettimeofday': identifier not found
1>E:\src\re2\util\benchmark.cc(32): error C2228: left of '.tv_sec' must have class/struct/union
1> type is 'int'
1>E:\src\re2\util\benchmark.cc(32): error C2228: left of '.tv_usec' must have class/struct/union
1> type is 'int'
1>E:\src\re2\util\benchmark.cc(115): warning C4244: '=' : conversion from 'double' to 'int', possible loss of data
1>E:\src\re2\util\benchmark.cc(117): warning C4244: '=' : conversion from 'double' to 'int', possible loss of data
2>E:\src\re2\util/thread.h(25): error C2146: syntax error : missing ';' before identifier 'pid_'
2>E:\src\re2\util/thread.h(25): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
2>E:\src\re2\re2/prog.h(99): warning C4244: 'return' : conversion from '__int64' to 'int', possible loss of data
2>E:\src\re2\re2\testing\dfa_test.cc(111): warning C4244: '=' : conversion from '__int64' to 'int', possible loss of data
3>------ Build started: Project: exhaustive1_test, Configuration: Release x64 ------
4>------ Build started: Project: exhaustive2_test, Configuration: Release x64 ------
3>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: __cdecl re2::PCRE::PCRE(class std::basic_string<char,struct std::char_traits,class std::allocator > const &,class re2::PCRE_Options const &)" (??0PCRE@re2@@qeaa@AEBV?$basic_string@DU?$char_traits@D@std@@v?$allocator@D@2@@std@@AEBVPCRE_Options@1@@z) referenced in function "public: __cdecl re2::TestInstance::TestInstance(class re2::StringPiece const &,enum re2::Prog::MatchKind,enum re2::Regexp::ParseFlags)" (??0TestInstance@re2@@qeaa@AEBVStringPiece@1@W4MatchKind@Prog@1@W4ParseFlags@Regexp@1@@z)
3>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: __cdecl re2::PCRE::~PCRE(void)" (??1PCRE@re2@@qeaa@XZ) referenced in function "public: __cdecl re2::TestInstance::~TestInstance(void)" (??1TestInstance@re2@@qeaa@XZ)
3>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: bool __cdecl re2::PCRE::HitLimit(void)" (?HitLimit@PCRE@re2@@QEAA_NXZ) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
3>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: void __cdecl re2::PCRE::ClearHitLimit(void)" (?ClearHitLimit@PCRE@re2@@QEAAXXZ) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
3>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: bool __cdecl re2::PCRE::DoMatch(class re2::StringPiece const &,enum re2::PCRE::Anchor,int *,class re2::PCRE::Arg const * const *,int)const " (?DoMatch@PCRE@re2@@QEBA_NAEBVStringPiece@2@W4Anchor@12@PEAHPEBQEBVArg@12@H@Z) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
3>test.lib(tester.obj) : error LNK2019: unresolved external symbol "private: static bool __cdecl re2::PCRE::Arg::parse_null(char const *,int,void *)" (?parse_null@Arg@PCRE@re2@@CA_NPEBDHPEAX@Z) referenced in function "public: __cdecl re2::PCRE::Arg::Arg(void)" (??0Arg@PCRE@re2@@qeaa@XZ)
3>test.lib(tester.obj) : error LNK2019: unresolved external symbol "private: static bool __cdecl re2::PCRE::Arg::parse_stringpiece(char const *,int,void *)" (?parse_stringpiece@Arg@PCRE@re2@@CA_NPEBDHPEAX@Z) referenced in function "public: __cdecl re2::PCRE::Arg::Arg(class re2::StringPiece *)" (??0Arg@PCRE@re2@@qeaa@PEAVStringPiece@2@@z)
3>E:\src\re2\Release\exhaustive1_test.exe : fatal error LNK1120: 7 unresolved externals
4>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: __cdecl re2::PCRE::PCRE(class std::basic_string<char,struct std::char_traits,class std::allocator > const &,class re2::PCRE_Options const &)" (??0PCRE@re2@@qeaa@AEBV?$basic_string@DU?$char_traits@D@std@@v?$allocator@D@2@@std@@AEBVPCRE_Options@1@@z) referenced in function "public: __cdecl re2::TestInstance::TestInstance(class re2::StringPiece const &,enum re2::Prog::MatchKind,enum re2::Regexp::ParseFlags)" (??0TestInstance@re2@@qeaa@AEBVStringPiece@1@W4MatchKind@Prog@1@W4ParseFlags@Regexp@1@@z)
4>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: __cdecl re2::PCRE::~PCRE(void)" (??1PCRE@re2@@qeaa@XZ) referenced in function "public: __cdecl re2::TestInstance::~TestInstance(void)" (??1TestInstance@re2@@qeaa@XZ)
4>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: bool __cdecl re2::PCRE::HitLimit(void)" (?HitLimit@PCRE@re2@@QEAA_NXZ) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
4>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: void __cdecl re2::PCRE::ClearHitLimit(void)" (?ClearHitLimit@PCRE@re2@@QEAAXXZ) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
4>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: bool __cdecl re2::PCRE::DoMatch(class re2::StringPiece const &,enum re2::PCRE::Anchor,int *,class re2::PCRE::Arg const * const *,int)const " (?DoMatch@PCRE@re2@@QEBA_NAEBVStringPiece@2@W4Anchor@12@PEAHPEBQEBVArg@12@H@Z) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
4>test.lib(tester.obj) : error LNK2019: unresolved external symbol "private: static bool __cdecl re2::PCRE::Arg::parse_null(char const *,int,void *)" (?parse_null@Arg@PCRE@re2@@CA_NPEBDHPEAX@Z) referenced in function "public: __cdecl re2::PCRE::Arg::Arg(void)" (??0Arg@PCRE@re2@@qeaa@XZ)
4>test.lib(tester.obj) : error LNK2019: unresolved external symbol "private: static bool __cdecl re2::PCRE::Arg::parse_stringpiece(char const *,int,void *)" (?parse_stringpiece@Arg@PCRE@re2@@CA_NPEBDHPEAX@Z) referenced in function "public: __cdecl re2::PCRE::Arg::Arg(class re2::StringPiece *)" (??0Arg@PCRE@re2@@qeaa@PEAVStringPiece@2@@z)
4>E:\src\re2\Release\exhaustive2_test.exe : fatal error LNK1120: 7 unresolved externals
5>------ Build started: Project: exhaustive3_test, Configuration: Release x64 ------
6>------ Build started: Project: exhaustive_test, Configuration: Release x64 ------
5>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: __cdecl re2::PCRE::PCRE(class std::basic_string<char,struct std::char_traits,class std::allocator > const &,class re2::PCRE_Options const &)" (??0PCRE@re2@@qeaa@AEBV?$basic_string@DU?$char_traits@D@std@@v?$allocator@D@2@@std@@AEBVPCRE_Options@1@@z) referenced in function "public: __cdecl re2::TestInstance::TestInstance(class re2::StringPiece const &,enum re2::Prog::MatchKind,enum re2::Regexp::ParseFlags)" (??0TestInstance@re2@@qeaa@AEBVStringPiece@1@W4MatchKind@Prog@1@W4ParseFlags@Regexp@1@@z)
5>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: __cdecl re2::PCRE::~PCRE(void)" (??1PCRE@re2@@qeaa@XZ) referenced in function "public: __cdecl re2::TestInstance::~TestInstance(void)" (??1TestInstance@re2@@qeaa@XZ)
5>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: bool __cdecl re2::PCRE::HitLimit(void)" (?HitLimit@PCRE@re2@@QEAA_NXZ) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
5>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: void __cdecl re2::PCRE::ClearHitLimit(void)" (?ClearHitLimit@PCRE@re2@@QEAAXXZ) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
5>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: bool __cdecl re2::PCRE::DoMatch(class re2::StringPiece const &,enum re2::PCRE::Anchor,int *,class re2::PCRE::Arg const * const *,int)const " (?DoMatch@PCRE@re2@@QEBA_NAEBVStringPiece@2@W4Anchor@12@PEAHPEBQEBVArg@12@H@Z) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
5>test.lib(tester.obj) : error LNK2019: unresolved external symbol "private: static bool __cdecl re2::PCRE::Arg::parse_null(char const *,int,void *)" (?parse_null@Arg@PCRE@re2@@CA_NPEBDHPEAX@Z) referenced in function "public: __cdecl re2::PCRE::Arg::Arg(void)" (??0Arg@PCRE@re2@@qeaa@XZ)
5>test.lib(tester.obj) : error LNK2019: unresolved external symbol "private: static bool __cdecl re2::PCRE::Arg::parse_stringpiece(char const *,int,void *)" (?parse_stringpiece@Arg@PCRE@re2@@CA_NPEBDHPEAX@Z) referenced in function "public: __cdecl re2::PCRE::Arg::Arg(class re2::StringPiece *)" (??0Arg@PCRE@re2@@qeaa@PEAVStringPiece@2@@z)
5>E:\src\re2\Release\exhaustive3_test.exe : fatal error LNK1120: 7 unresolved externals
6>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: __cdecl re2::PCRE::PCRE(class std::basic_string<char,struct std::char_traits,class std::allocator > const &,class re2::PCRE_Options const &)" (??0PCRE@re2@@qeaa@AEBV?$basic_string@DU?$char_traits@D@std@@v?$allocator@D@2@@std@@AEBVPCRE_Options@1@@z) referenced in function "public: __cdecl re2::TestInstance::TestInstance(class re2::StringPiece const &,enum re2::Prog::MatchKind,enum re2::Regexp::ParseFlags)" (??0TestInstance@re2@@qeaa@AEBVStringPiece@1@W4MatchKind@Prog@1@W4ParseFlags@Regexp@1@@z)
6>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: __cdecl re2::PCRE::~PCRE(void)" (??1PCRE@re2@@qeaa@XZ) referenced in function "public: __cdecl re2::TestInstance::~TestInstance(void)" (??1TestInstance@re2@@qeaa@XZ)
6>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: bool __cdecl re2::PCRE::HitLimit(void)" (?HitLimit@PCRE@re2@@QEAA_NXZ) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
6>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: void __cdecl re2::PCRE::ClearHitLimit(void)" (?ClearHitLimit@PCRE@re2@@QEAAXXZ) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
6>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: bool __cdecl re2::PCRE::DoMatch(class re2::StringPiece const &,enum re2::PCRE::Anchor,int *,class re2::PCRE::Arg const * const *,int)const " (?DoMatch@PCRE@re2@@QEBA_NAEBVStringPiece@2@W4Anchor@12@PEAHPEBQEBVArg@12@H@Z) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
6>test.lib(tester.obj) : error LNK2019: unresolved external symbol "private: static bool __cdecl re2::PCRE::Arg::parse_null(char const *,int,void *)" (?parse_null@Arg@PCRE@re2@@CA_NPEBDHPEAX@Z) referenced in function "public: __cdecl re2::PCRE::Arg::Arg(void)" (??0Arg@PCRE@re2@@qeaa@XZ)
6>test.lib(tester.obj) : error LNK2019: unresolved external symbol "private: static bool __cdecl re2::PCRE::Arg::parse_stringpiece(char const *,int,void *)" (?parse_stringpiece@Arg@PCRE@re2@@CA_NPEBDHPEAX@Z) referenced in function "public: __cdecl re2::PCRE::Arg::Arg(class re2::StringPiece *)" (??0Arg@PCRE@re2@@qeaa@PEAVStringPiece@2@@z)
6>E:\src\re2\Release\exhaustive_test.exe : fatal error LNK1120: 7 unresolved externals
7>------ Build started: Project: random_test, Configuration: Release x64 ------
8>------ Build started: Project: re2_test, Configuration: Release x64 ------
8> re2_test.cc
8>E:\src\re2\re2\testing\re2_test.cc(9): fatal error C1083: Cannot open include file: 'sys/mman.h': No such file or directory
7>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: __cdecl re2::PCRE::PCRE(class std::basic_string<char,struct std::char_traits,class std::allocator > const &,class re2::PCRE_Options const &)" (??0PCRE@re2@@qeaa@AEBV?$basic_string@DU?$char_traits@D@std@@v?$allocator@D@2@@std@@AEBVPCRE_Options@1@@z) referenced in function "public: __cdecl re2::TestInstance::TestInstance(class re2::StringPiece const &,enum re2::Prog::MatchKind,enum re2::Regexp::ParseFlags)" (??0TestInstance@re2@@qeaa@AEBVStringPiece@1@W4MatchKind@Prog@1@W4ParseFlags@Regexp@1@@z)
7>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: __cdecl re2::PCRE::~PCRE(void)" (??1PCRE@re2@@qeaa@XZ) referenced in function "public: __cdecl re2::TestInstance::~TestInstance(void)" (??1TestInstance@re2@@qeaa@XZ)
7>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: bool __cdecl re2::PCRE::HitLimit(void)" (?HitLimit@PCRE@re2@@QEAA_NXZ) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
7>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: void __cdecl re2::PCRE::ClearHitLimit(void)" (?ClearHitLimit@PCRE@re2@@QEAAXXZ) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
7>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: bool __cdecl re2::PCRE::DoMatch(class re2::StringPiece const &,enum re2::PCRE::Anchor,int *,class re2::PCRE::Arg const * const *,int)const " (?DoMatch@PCRE@re2@@QEBA_NAEBVStringPiece@2@W4Anchor@12@PEAHPEBQEBVArg@12@H@Z) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
7>test.lib(tester.obj) : error LNK2019: unresolved external symbol "private: static bool __cdecl re2::PCRE::Arg::parse_null(char const *,int,void *)" (?parse_null@Arg@PCRE@re2@@CA_NPEBDHPEAX@Z) referenced in function "public: __cdecl re2::PCRE::Arg::Arg(void)" (??0Arg@PCRE@re2@@qeaa@XZ)
7>test.lib(tester.obj) : error LNK2019: unresolved external symbol "private: static bool __cdecl re2::PCRE::Arg::parse_stringpiece(char const *,int,void *)" (?parse_stringpiece@Arg@PCRE@re2@@CA_NPEBDHPEAX@Z) referenced in function "public: __cdecl re2::PCRE::Arg::Arg(class re2::StringPiece *)" (??0Arg@PCRE@re2@@qeaa@PEAVStringPiece@2@@z)
7>E:\src\re2\Release\random_test.exe : fatal error LNK1120: 7 unresolved externals
9>------ Build started: Project: regexp_benchmark, Configuration: Release x64 ------
10>------ Build started: Project: search_test, Configuration: Release x64 ------
9>LINK : fatal error LNK1181: cannot open input file 'Release\benchmark.lib'
10>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: __cdecl re2::PCRE::PCRE(class std::basic_string<char,struct std::char_traits,class std::allocator > const &,class re2::PCRE_Options const &)" (??0PCRE@re2@@qeaa@AEBV?$basic_string@DU?$char_traits@D@std@@v?$allocator@D@2@@std@@AEBVPCRE_Options@1@@z) referenced in function "public: __cdecl re2::TestInstance::TestInstance(class re2::StringPiece const &,enum re2::Prog::MatchKind,enum re2::Regexp::ParseFlags)" (??0TestInstance@re2@@qeaa@AEBVStringPiece@1@W4MatchKind@Prog@1@W4ParseFlags@Regexp@1@@z)
10>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: __cdecl re2::PCRE::~PCRE(void)" (??1PCRE@re2@@qeaa@XZ) referenced in function "public: __cdecl re2::TestInstance::~TestInstance(void)" (??1TestInstance@re2@@qeaa@XZ)
10>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: bool __cdecl re2::PCRE::HitLimit(void)" (?HitLimit@PCRE@re2@@QEAA_NXZ) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
10>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: void __cdecl re2::PCRE::ClearHitLimit(void)" (?ClearHitLimit@PCRE@re2@@QEAAXXZ) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
10>test.lib(tester.obj) : error LNK2019: unresolved external symbol "public: bool __cdecl re2::PCRE::DoMatch(class re2::StringPiece const &,enum re2::PCRE::Anchor,int *,class re2::PCRE::Arg const * const *,int)const " (?DoMatch@PCRE@re2@@QEBA_NAEBVStringPiece@2@W4Anchor@12@PEAHPEBQEBVArg@12@H@Z) referenced in function "private: void __cdecl re2::TestInstance::RunSearch(enum re2::Engine,class re2::StringPiece const &,class re2::StringPiece const &,enum re2::Prog::Anchor,struct re2::TestInstance::Result *)" (?RunSearch@TestInstance@re2@@AEAAXW4Engine@2@AEBVStringPiece@2@1W4Anchor@Prog@2@PEAUResult@12@@z)
10>test.lib(tester.obj) : error LNK2019: unresolved external symbol "private: static bool __cdecl re2::PCRE::Arg::parse_null(char const *,int,void *)" (?parse_null@Arg@PCRE@re2@@CA_NPEBDHPEAX@Z) referenced in function "public: __cdecl re2::PCRE::Arg::Arg(void)" (??0Arg@PCRE@re2@@qeaa@XZ)
10>test.lib(tester.obj) : error LNK2019: unresolved external symbol "private: static bool __cdecl re2::PCRE::Arg::parse_stringpiece(char const *,int,void *)" (?parse_stringpiece@Arg@PCRE@re2@@CA_NPEBDHPEAX@Z) referenced in function "public: __cdecl re2::PCRE::Arg::Arg(class re2::StringPiece *)" (??0Arg@PCRE@re2@@qeaa@PEAVStringPiece@2@@z)
10>E:\src\re2\Release\search_test.exe : fatal error LNK1120: 7 unresolved externals
11>------ Skipped Build: Project: ALL_BUILD, Configuration: Release x64 ------
11>Project not selected to build for this solution configuration
========== Build: 0 succeeded, 10 failed, 15 up-to-date, 1 skipped ==========

Compiling Unicode regex is significantly slower

Not sure if this is expected, but I'm seeing significantly slower performance (~200x) when compiling unicode regex. For example:

\pL\pL\pL\pN\pN -- Single regex compile Time: 8.09727ms.

versus:

[A-Za-z][A-Za-z][A-Za-z][0-9][0-9] -- Single regex compile Time: 0.035597ms

threadwin.cc gets compiler errors in MSVC 2013

If I compile threadwin.cc in MSVC2013 I get a compiler error in thread.h:
e:\src\re2\util/thread.h(8): fatal error C1083: Cannot open include file: 'pthread.h': No such file or directory

If I put ifdef's around the include of pthread.h, I get no errors in thread.h but I get errors in compiling threadwin.cc

in thread.h:

ifndef _WIN32

include <pthread.h>

endif

e:\src\re2\util/thread.h(22): error C2146: syntax error : missing ';' before identifier 'pid_'
1>e:\src\re2\util/thread.h(22): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
1>e:\src\re2\util\threadwin.cc(11): error C2065: 'pid_' : undeclared identifier
1>e:\src\re2\util\threadwin.cc(27): error C2065: 'pid_' : undeclared identifier
1>e:\src\re2\util\threadwin.cc(31): error C2065: 'pid_' : undeclared identifier
1>e:\src\re2\util\threadwin.cc(32): error C2065: 'pid_' : undeclared identifier
1>e:\src\re2\util\threadwin.cc(39): error C2065: 'pid_' : undeclared identifier
1>e:\src\re2\util\threadwin.cc(41): error C2065: 'pid_' : undeclared identifier

If I change this line in thread.h:
pthread_t pid_;
to:

ifdef _WIN32

HANDLE pid_;

else

pthread_t pid_;

endif

then threadwin.cc compiles with no errors.

This is all with source-code cloned from the code-review repository as of about 9:30 PM on Monday, September18, with project files build with CMAKE.

Provide access to capturing group matching ranges.

A feature that I would really like to see in re2 is the ability to extract the capturing group ranges. I can access the content of named capturing groups matches with the functions NamedCapturingGroups(), CapturingGroupNames() and NumberOfCapturingGroups() and match this with args returned by FullMatchN() but I would really like to be able to get the range of the matches. I.e. the start and end offsets of the match with the input string. This would be very useful for replacing capturing groups in a string. Currently, I'd have to do multiple re2 calls to achieve this.

One way to do this with out breaking the current API would be to extend RE2:Arg by adding int start and int end variables which indicated the starting and ending offset of the match or -1 for no match.

Minor leaks at initialisation

Hi all,

I've noticed some minor leaks because of RE2 initialisation mechanism :

// static empty things for use as const references.
// To avoid global constructors, initialized on demand.
GLOBAL_MUTEX(empty_mutex);
static const string *empty_string;
static const map<string, int> *empty_named_groups;
static const map<int, string> *empty_group_names;

static void InitEmpty() {
  GLOBAL_MUTEX_LOCK(empty_mutex);
  if (empty_string == NULL) {
    empty_string = new string;
    empty_named_groups = new map<string, int>;
    empty_group_names = new map<int, string>;
  }
  GLOBAL_MUTEX_UNLOCK(empty_mutex);
}

Those three allocated objects are never freed, or maybe i did not find a way to do it ...
Wouldn't it be better to have those three objects in the RE2 object ? So that it can be properly freed during destructor.
I know that it would increase RE2 objects' size a little bit but imo it is worthwhile.
It would also remove the need for a global mutex here.

I can provide a patch if needed :)

Question: Obtaining matches

Hey.

I have not found any other place where to ask questions - so if this is not the right one, i'd be happy if you could point me into the right direction where to ask.

Currently I am working on replacing a RegExp extension within a scripting language ( https://github.com/unitpoint/objectscript ) that was formerly based off pcrelib with re2. So I downloaded, test-compiled and ran all the tests that re2 had - and I had no issues at all, OS X 10.10.

But now I want to write the actual C++ wrapper but got a little bit stuck:

  static bool FullMatchN(const StringPiece& text, const RE2& re,
                         const Arg* const args[], int argc);
  static const VariadicFunction2<
      bool, const StringPiece&, const RE2&, Arg, RE2::FullMatchN> FullMatch;

I see that there is const Arg* const args[]. But I see no conversion operator within the Arg class to the parsed type...or at least, no way to convert the matches.

What I would like to do would be something like this:

{type} matches = RE2::fullMtchN(string, RegExp, &args, &argc);
for(int i=0; i<argc; i++) {
    // Add value to a return array to the script language.
    // Get the type within args[i] and add the correct value type.
}

I am not familiar with Python or any other language for which a re2 wrapper exists, so I have a lot of difficulties understanding how they work.

All that I want to do for now is to place in a content string and a regular expression and return the results in an array.

How would I go about this?

Kind regards,
Ingwie

RFE: lookahead and lookbehind

The automata theory does allow this without any overhead (the overall time complexity can stay linear).

The question is how to do it without much extra space. Any ideas?

configure some PREFIX or special CONFIG something

Hi , the default linux application usage is like ./configure --prefix=xxxx && make && make install , and now i want to compute the --prefix location via some commands in outer wrapper Makefile, do you know how to configure it (use some configure file),thanks very much!

stringpiece.h is not self contained?

In function compare:

  int compare(const StringPiece& x) const {
    int r = memcmp(ptr_, x.ptr_, std::min(length_, x.length_));
    if (r == 0) {
      if (length_ < x.length_) r = -1;
      else if (length_ > x.length_) r = +1;
    }
    return r;
  }

...re2 uses std::min, but stringpiece.h never #included std::min.

`HAVE_PTHREAD` vs `_POSIX_THREADS`? (ongoing windows compatibility work)

In mutex.h, re2 #defines HAVE_PTHREAD and HAVE_RWLOCK:

/*
 * A simple mutex wrapper, supporting locks and read-write locks.
 * You should assume the locks are *not* re-entrant.
 */

#ifndef RE2_UTIL_MUTEX_H_
#define RE2_UTIL_MUTEX_H_

#include <stdlib.h>

namespace re2 {

#define HAVE_PTHREAD 1
#define HAVE_RWLOCK 1

...without actually checking if we have Pthreads or not.

I think that the POSIX standard guarantees that _POSIX_THREADS is defined when Pthreads is present.

On Windows (VS2013), Pthreads is not present by default, and thus this breaks compilation.

Maybe re2 should check for _POSIX_THREADS instead of HAVE_PTHREAD? (I don't have access to a unix build system at the moment)

I'm not sure about HAVE_RWLOCK.

Simply commenting out the two #defines allows compilation to proceed a bit further, until it hits the #include <pthread.h> in re2.cc.

If I change the HAVE_PTHREAD to:

#if defined(_POSIX_THREADS)
#define HAVE_PTHREAD 1
#endif

...and then guard the #include <pthread.h>:

#if defined(HAVE_PTHREAD)
#include <pthread.h>
#endif

...that eliminates most (maybe all) Pthread issues with compiling on Windows. There are, of course, other issues, but this deals with the Pthreads issues.

Compiling benchmark.cc in MSVC 2013 gets compile errors.

Compiling benchmark.cc in MSVC 2013 gets compile errors.
1>------ Build started: Project: benchmark, Configuration: Release x64 ------
1> benchmark.cc
1>E:\src\re2\util\benchmark.cc(29): error C2079: 'tv' uses undefined struct 'nsec::timeval'
1>E:\src\re2\util\benchmark.cc(30): error C3861: 'gettimeofday': identifier not found
1>E:\src\re2\util\benchmark.cc(32): error C2228: left of '.tv_sec' must have class/struct/union
1> type is 'int'
1>E:\src\re2\util\benchmark.cc(32): error C2228: left of '.tv_usec' must have class/struct/union
1> type is 'int'
1>E:\src\re2\util\benchmark.cc(115): warning C4244: '=' : conversion from 'double' to 'int', possible loss of data
1>E:\src\re2\util\benchmark.cc(117): warning C4244: '=' : conversion from 'double' to 'int', possible loss of data

struct timeval and gettimeofday are not defined in MSVC

The following code compiles cleanly and I think it is correct (untested):

static int64 nsec() {

if defined(_MSC_VER)

LARGE_INTEGER freq, val;
QueryPerformanceFrequency(&freq);
QueryPerformanceCounter(&val);
return (1000*(double)val.QuadPart) / (double)freq.QuadPart;

elif defined(APPLE) || defined(_WIN32)

struct timeval tv;
if(gettimeofday(&tv, 0) < 0)
    return -1;
return (int64)tv.tv_sec*1000*1000*1000 + tv.tv_usec*1000;

else

struct timespec tp;
if(clock_gettime(CLOCK_REALTIME, &tp) < 0)
    return -1;
return (int64)tp.tv_sec*1000*1000*1000 + tp.tv_nsec;

endif

}

make testinstall fails on osx 10.9 mavericks (probably Yosemite too)

What steps will reproduce the problem?

  1. run install steps as per wiki
  2. in last step, make testinstall, it fails.

What is the expected output? What do you see instead?

Expected: to work :)
Actual result:

$ sudo make testinstall
cp testinstall.cc obj
(cd obj && c++ -I/usr/local/include -L/usr/local/lib testinstall.cc -lre2 -pthread -o testinstall)
Undefined symbols for architecture x86_64:
"re2::FilteredRE2::FirstMatch(re2::StringPiece const&, std::__1::vector<int, std::__1::allocator > const&) const", referenced from:
_main in testinstall-025vi1.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [testinstall] Error 1

What version of the product are you using? On what operating system?
Last from source. OS is mavericks (10.9.2)

Thanks!

Compilation failure against -std=c++11 or similar

Since C++11, narrowing in {} is forbidden. Thus I got the following error:

$ make CXXFLAGS="-Wall -O3 -g -pthread -std=c++11" test
g++ -o obj/dbg/re2/testing/re2_test.o -fPIC  -Wall -O3 -g -pthread -std=c++11 -Wsign-compare -c -I.    re2/testing/re2_test.cc
re2/testing/re2_test.cc: In function ‘void re2::RE2Bug18391750()’:
re2/testing/re2_test.cc:1438:30: error: narrowing conversion of ‘252’ from ‘int’ to ‘char’ inside { }
               0x2a, 0x29, 0x0};
                              ^
re2/testing/re2_test.cc:1438:30: error: narrowing conversion of ‘252’ from ‘int’ to ‘char’ inside { }
re2/testing/re2_test.cc:1438:30: error: narrowing conversion of ‘194’ from ‘int’ to ‘char’ inside { }
re2/testing/re2_test.cc:1438:30: error: narrowing conversion of ‘155’ from ‘int’ to ‘char’ inside { }
re2/testing/re2_test.cc:1438:30: error: narrowing conversion of ‘197’ from ‘int’ to ‘char’ inside { }
re2/testing/re2_test.cc:1438:30: error: narrowing conversion of ‘197’ from ‘int’ to ‘char’ inside { }
re2/testing/re2_test.cc:1438:30: error: narrowing conversion of ‘212’ from ‘int’ to ‘char’ inside { }
re2/testing/re2_test.cc:1438:30: error: narrowing conversion of ‘143’ from ‘int’ to ‘char’ inside { }
re2/testing/re2_test.cc:1438:30: error: narrowing conversion of ‘143’ from ‘int’ to ‘char’ inside { }
re2/testing/re2_test.cc:1438:30: error: narrowing conversion of ‘231’ from ‘int’ to ‘char’ inside { }
re2/testing/re2_test.cc:1438:30: error: narrowing conversion of ‘174’ from ‘int’ to ‘char’ inside { }
re2/testing/re2_test.cc:1438:30: error: narrowing conversion of ‘243’ from ‘int’ to ‘char’ inside { }
re2/testing/re2_test.cc:1438:30: error: narrowing conversion of ‘174’ from ‘int’ to ‘char’ inside { }
re2/testing/re2_test.cc:1438:30: error: narrowing conversion of ‘174’ from ‘int’ to ‘char’ inside { }
re2/testing/re2_test.cc: In function ‘void re2::RE2Bug18458852()’:
re2/testing/re2_test.cc:1454:66: error: narrowing conversion of ‘245’ from ‘int’ to ‘char’ inside { }
               0xf5, 0x87, 0x87, 0x90, 0x29, 0x5d, 0x29, 0x29, 0x0};
                                                                  ^
re2/testing/re2_test.cc:1454:66: error: narrowing conversion of ‘135’ from ‘int’ to ‘char’ inside { }
re2/testing/re2_test.cc:1454:66: error: narrowing conversion of ‘135’ from ‘int’ to ‘char’ inside { }
re2/testing/re2_test.cc:1454:66: error: narrowing conversion of ‘144’ from ‘int’ to ‘char’ inside { }
Makefile:163: recipe for target 'obj/dbg/re2/testing/re2_test.o' failed
make: *** [obj/dbg/re2/testing/re2_test.o] Error 1

-std=gnu++11 causes the same errors. I encountered such errors when compiling facebook/hhvm, which uses -std=gnu++11 for the whole project. See facebook/hhvm#5209

Sometimes DFA table doesn't lead to state 1.

I am using this lib for getting DFA tables to be downloaded to FPGA. I've modified method BuildAllStates in dfa.cc and go through the states using RunStateOnByteUnlocked method. There is no transition to the first state in regexps like "h.*o" or ".*o" or "h.+o" and so on. In other cases everything is nice.

testinstall hangs when compiled with --static

On Ubuntu 14.10 64 or 32-bit, which has gcc version 4.9.1,
compiling testinstall.cc with --static results in a program that
hangs. I don't see this problem on other platforms.

git clone https://github.com/google/re2.git
cd re2
make clean
make
g++ --static testinstall.cc -L obj -lre2 -I . -pthread
./a.out

the a.out program hangs, but if you take away --static, it does not hang.

For windows, re2_test.cc should not include <unistd.h>

Compiling re2_test.cc in windows fails because #include <unistd.h>. unistd.h is not needed to compile in Windows with the most recent changes and should not be included if _WIN32 is defined. With this one exception, re2_test.cc now compiles cleanly in Windows.

re2 supports \x construct in a replacement string?

Hi all,
I have encounter a problem using re2, regarding the usage of \x construct in a replacement string.
The re2 library has different behavior than their counterpart in perl. In perl one can use the \x construct in the replacement string. Is this possible for re2?

It would be really handy if escape sequences in the form of \x6D can be used within a rewrite string.
If something like that is not provided by re2, are you interesting for such an enhancement?

Include room for null character when using sprintf (Patch included)

in util/strutil.cc, buffer size for sprintf must be large enough to hold the trailing null character. Otherwise it could cause buffer overflow.

--- a/util/strutil.cc
+++ b/util/strutil.cc
@@ -36,9 +36,13 @@ int CEscapeString(const char* src, int src_len, char* dest,
         // digit then that digit must be escaped too to prevent it being
         // interpreted as part of the character code by C.
         if (c < ' ' || c > '~') {
-          if (dest_len - used < 4) // need space for 4 letter escape
+          if (dest_len - used < 5) // need space for 4 letter escape and null
             return -1;
+#ifndef _WIN32
           sprintf(dest + used, "\\%03o", c);
+#else
+          sprintf_s(dest + used, 5, "\\%03o", c);
+#endif
           used += 4;
         } else {
           dest[used++] = c; break;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.