Git Product home page Git Product logo

node-pcre's Introduction

Description

A pcre binding for node.js with UTF8 and Unicode properties support.

Requirements

  • node.js -- v0.8.0 or newer
  • Windows, Linux, or OSX
    • BSD or OpenSolaris support is possible -- just need to generate and submit a config.h for PCRE 8.32 with these options:
./configure --enable-utf8 --enable-unicode-properties --enable-static --disable-shared --enable-jit --disable-cpp --enable-pcre8 --disable-pcre16 --disable-pcre32

Install

npm install pcre

Examples

  • Simple one-off regexp execution:
var inspect = require('util').inspect;
var PCRE = require('pcre').PCRE;

console.log(inspect(PCRE.exec("(?<nodejsrules>o)", "foo", 0), false, Infinity));

// output:
// [ 1, 2, 1, 2, named: { nodejsrules: [ 1, 2 ] } ]
  • Simple one-off regexp execution returning all matches:
var inspect = require('util').inspect;
var PCRE = require('pcre').PCRE;

console.log(inspect(PCRE.execAll("(?<nodejsrules>o)", "foo", 0), false, Infinity));

// output:
// [ [ 1, 2, 1, 2, named: { nodejsrules: [ 1, 2 ] } ],
//   [ 2, 3, 2, 3, named: { nodejsrules: [ 2, 3 ] } ] ]
  • Instantiate a regexp and test it:
var PCRE = require('pcre').PCRE;

var re = new PCRE("o");
console.log(re.test("foo", 0));
console.log(re.test("bar", 0));
console.log(re.test("node.js rules", 2));

// output:
// true
// false
// false
  • Instantiate a regexp, JIT compile it, and execute it, returning all matches:
var inspect = require('util').inspect;
var PCRE = require('pcre').PCRE;

var re = new PCRE("o");
re.study(PCRE.PCRE_STUDY_JIT_COMPILE);
console.log(inspect(re.execAll("fooooo", 0), false, Infinity));

// output:
// [ [ 1, 2 ], [ 2, 3 ], [ 3, 4 ], [ 4, 5 ], [ 5, 6 ] ]

API

PCRE static constants

All static constants for regexp flags/options and errors can be found in lib/pcre.js.

PCRE static methods

  • exec(< string >pattern, < mixed >subject, < integer >offset[, < integer >flags]) - mixed - Compiles pattern and executes it on subject starting at offset in subject. subject can be a string or Buffer. The return value is either null in case of no match, an integer error code in case of error, or an array on success containing offsets in the subject for the first match. The first two offsets reference the entirety of the matched part of the subject. Any additional offsets reference capture groups in order from left to right. Offsets for named capture groups are additionally available on the named object.

  • execAll(< string >pattern, < mixed >subject, < integer >offset[, < integer >flags]) - mixed - Same as exec() except an array of array matches is returned on success.

  • test(< string >pattern, < mixed >subject, < integer >offset[, < integer >flags]) - boolean - Similar to exec(), but used merely to test if pattern matches at least once.

  • version() - string - Returns the version and date of the PCRE library used (e.g. "8.32 2012-11-30").

PCRE methods

  • (constructor)(< string >pattern[, < integer >flags]) - Compiles pattern and returns a new PCRE instance.

  • study([< integer >flags][, < integer >jitStackStart=1, < integer >jitStackMax=32KB]) - boolean - Performs some analysis of the compiled regexp in order to optimize it. jitStackStart and jitStackMax are custom starting and maximum JIT stack sizes (in bytes) respectively for when one of the JIT flags are passed in. The return value indicates the success of the analysis.

  • set(< string >pattern[, < integer >flags]) - (void) - Compiles a new pattern and replaces the existing regexp.

  • save() - Buffer - Returns the internal state object representing the compiled regexp. Note: this does not save the result of any previous optimizations performed by study().

  • load(< Buffer >state) - (void) - Loads previously saved internal state data from save().

  • exec(< mixed >subject, < integer >offset[, < integer >flags]) - mixed - Similar to PCRE.exec().

  • execAll(< mixed >subject, < integer >offset[, < integer >flags]) - mixed - Same as exec() except an array of array matches is returned on success.

  • test(< mixed >subject, < integer >offset[, < integer >flags]) - boolean - Similar to exec(), but used merely to test if pattern matches at least once.

node-pcre's People

Contributors

mscdex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

node-pcre's Issues

Weird PCRE_ERROR_BADOPTION

Setting these options PCRE.PCRE_MULTILINE | PCRE.PCRE_DOTALL causes execAll() not to work, but exec() works.

var PCRE = require('pcre').PCRE;

var content = '/**asd\nqwe\r\nzxc\rpoi*/'

var re = new PCRE( '/\\*\\*.*?\\*/', PCRE.PCRE_MULTILINE | PCRE.PCRE_DOTALL );
console.log( re.exec( content ) ); // works
console.log( re.execAll( content ) ); // PCRE_ERROR_BADOPTION

var re = new PCRE( '/\\*\\*.*?\\*/' );
console.log( re.execAll( content ) ); // works (no match)

Provide libpcre binaries

It is very inconvenient to download gigabytes of applications (visual studio, python) just to produce a binary that is ~400kb, not to mention one needs to find out the instructions elsewhere and also that installing all of this on one's system may be undesirable. Since, supposedly, you are testing/using this I guess it won't be a problem to provide the binaries also. I can contribute the windows one since I already built it.

Segmentation fault when using JIT for some regexes

Reproducible with the following test code on node v0.10.30 on both Arch x86_64 and Debian x86_64:

var PCRE = require('pcre').PCRE;

var a = new Array(10000).join('a');
var rx = new Array(10000).join('a?') + a;

var p = new PCRE(rx);
p.study(PCRE.PCRE_STUDY_JIT_COMPILE);
p.exec(a);

A 20,000 character evil regex is a pretty extreme case, but I'm worried there may be strange corner cases that could cause this with a shorter regex.

The issue does not occur using the same test code, but removing the line where the JIT is applied, which leads me to believe it might be a memory allocation error. Memory is finite so it may not make sense to raise the limit to accommodate such a corner case, but I think the error condition should be identified and handled gracefully (perhaps throw a JS Exception?) rather than causing my node process to segfault.

Thoughts? I'm trying to investigate the issue locally right now.

Incorrect group offsets returned

To my knowledge, if an array is returned by PCRE.exec, it must be of even length. I'm encountering some weird behavior when dealing with nested groups. Test case:

PCRE.exec('((t))+', 't', 0) // returns [0, 1, 0, 0, 1], which has odd length

This seems like a bug. If not, what do the offsets in the array mean?

Constant names have too much PCRE in them

PCRE.PCRE.PCRE_CASELESS = 0x00000001; /* C1 /
PCRE.PCRE.PCRE_MULTILINE = 0x00000002; /
C1 */
...

Why PCRE.PCRE_* ? PCRE.* without the PCRE_ prefix would be just fine.
Regards.

_

_

Error on install

I am getting the following error while installing:


> node-gyp rebuild

  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_byte_order.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_chartables.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_compile.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_config.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_exec.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_fullinfo.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_get.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_globals.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_jit_compile.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_maketables.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_newline.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_ord2utf8.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_refcount.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_string_utils.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_study.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_tables.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_ucd.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_valid_utf8.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_version.o
  CC(target) Release/obj.target/libpcre/deps/libpcre/pcre_xclass.o
  LIBTOOL-STATIC Release/pcre.a
  CXX(target) Release/obj.target/pcre/src/binding.o
../src/binding.cc:138:21: error: expected class name
class PCRE : public ObjectWrap {
                    ^
../src/binding.cc:159:36: error: unknown type name 'Arguments'; did you mean 'v8::internal::Arguments'?
    static Handle<Value> New(const Arguments& args) {
                                   ^~~~~~~~~
                                   v8::internal::Arguments
/Users/username/.node-gyp/0.12.4/deps/v8/include/v8.h:127:7: note: 'v8::internal::Arguments' declared here
class Arguments;
      ^
../src/binding.cc:184:37: error: unknown type name 'Arguments'; did you mean 'v8::internal::Arguments'?
    static Handle<Value> Load(const Arguments& args) {
                                    ^~~~~~~~~
                                    v8::internal::Arguments
/Users/username/.node-gyp/0.12.4/deps/v8/include/v8.h:127:7: note: 'v8::internal::Arguments' declared here
class Arguments;
      ^
../src/binding.cc:229:37: error: unknown type name 'Arguments'; did you mean 'v8::internal::Arguments'?
    static Handle<Value> Save(const Arguments& args) {
                                    ^~~~~~~~~
                                    v8::internal::Arguments
/Users/username/.node-gyp/0.12.4/deps/v8/include/v8.h:127:7: note: 'v8::internal::Arguments' declared here
class Arguments;
      ^
../src/binding.cc:251:36: error: unknown type name 'Arguments'; did you mean 'v8::internal::Arguments'?
    static Handle<Value> Set(const Arguments& args) {
                                   ^~~~~~~~~
                                   v8::internal::Arguments
/Users/username/.node-gyp/0.12.4/deps/v8/include/v8.h:127:7: note: 'v8::internal::Arguments' declared here
class Arguments;
      ^
../src/binding.cc:298:38: error: unknown type name 'Arguments'; did you mean 'v8::internal::Arguments'?
    static Handle<Value> Study(const Arguments& args) {
                                     ^~~~~~~~~
                                     v8::internal::Arguments
/Users/username/.node-gyp/0.12.4/deps/v8/include/v8.h:127:7: note: 'v8::internal::Arguments' declared here
class Arguments;
      ^
../src/binding.cc:350:38: error: unknown type name 'Arguments'; did you mean 'v8::internal::Arguments'?
    static Handle<Value> Exec_(const Arguments& args, int what = WHAT_EXEC) {
                                     ^~~~~~~~~
                                     v8::internal::Arguments
/Users/username/.node-gyp/0.12.4/deps/v8/include/v8.h:127:7: note: 'v8::internal::Arguments' declared here
class Arguments;
      ^
../src/binding.cc:647:37: error: unknown type name 'Arguments'; did you mean 'v8::internal::Arguments'?
    static Handle<Value> Exec(const Arguments& args) {
                                    ^~~~~~~~~
                                    v8::internal::Arguments
/Users/username/.node-gyp/0.12.4/deps/v8/include/v8.h:127:7: note: 'v8::internal::Arguments' declared here
class Arguments;
      ^
../src/binding.cc:651:40: error: unknown type name 'Arguments'; did you mean 'v8::internal::Arguments'?
    static Handle<Value> ExecAll(const Arguments& args) {
                                       ^~~~~~~~~
                                       v8::internal::Arguments
/Users/username/.node-gyp/0.12.4/deps/v8/include/v8.h:127:7: note: 'v8::internal::Arguments' declared here
class Arguments;
      ^
../src/binding.cc:655:37: error: unknown type name 'Arguments'; did you mean 'v8::internal::Arguments'?
    static Handle<Value> Test(const Arguments& args) {
                                    ^~~~~~~~~
                                    v8::internal::Arguments
/Users/username/.node-gyp/0.12.4/deps/v8/include/v8.h:127:7: note: 'v8::internal::Arguments' declared here
class Arguments;
      ^
../src/binding.cc:659:40: error: unknown type name 'Arguments'; did you mean 'v8::internal::Arguments'?
    static Handle<Value> Version(const Arguments& args) {
                                       ^~~~~~~~~
                                       v8::internal::Arguments
/Users/username/.node-gyp/0.12.4/deps/v8/include/v8.h:127:7: note: 'v8::internal::Arguments' declared here
class Arguments;
      ^
../src/binding.cc:160:19: error: calling a protected constructor of class 'v8::HandleScope'
      HandleScope scope;
                  ^
/Users/username/.node-gyp/0.12.4/deps/v8/include/v8.h:816:13: note: declared protected here
  V8_INLINE HandleScope() {}
            ^
../src/binding.cc:162:16: error: member access into incomplete type 'const v8::internal::Arguments'
      if (!args.IsConstructCall()) {
               ^
/Users/username/.node-gyp/0.12.4/deps/v8/include/v8.h:127:7: note: forward declaration of 'v8::internal::Arguments'
class Arguments;
      ^
../src/binding.cc:164:13: error: no member named 'New' in 'v8::String'; did you mean simply 'New'?
            String::New("Use `new` to create instances of this object."))
            ^~~~~~~~~~~
            New
../src/binding.cc:159:26: note: 'New' declared here
    static Handle<Value> New(const Arguments& args) {
                         ^
../src/binding.cc:164:25: error: reference to type 'const v8::internal::Arguments' could not bind to an lvalue of type 'const char [46]'
            String::New("Use `new` to create instances of this object."))
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/binding.cc:159:47: note: passing argument to parameter 'args' here
    static Handle<Value> New(const Arguments& args) {
                                              ^
../src/binding.cc:169:12: error: no member named 'Wrap' in 'PCRE'
      obj->Wrap(args.This());
      ~~~  ^
../src/binding.cc:169:21: error: member access into incomplete type 'const v8::internal::Arguments'
      obj->Wrap(args.This());
                    ^
/Users/username/.node-gyp/0.12.4/deps/v8/include/v8.h:127:7: note: forward declaration of 'v8::internal::Arguments'
class Arguments;
      ^
../src/binding.cc:171:15: error: member access into incomplete type 'const v8::internal::Arguments'
      if (args.Length() >= 1) {
              ^
/Users/username/.node-gyp/0.12.4/deps/v8/include/v8.h:127:7: note: forward declaration of 'v8::internal::Arguments'
class Arguments;
      ^
../src/binding.cc:172:37: error: type 'const v8::internal::Arguments' does not provide a subscript operator
        if (Buffer::HasInstance(args[0]))
                                ~~~~^~
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
make: *** [Release/obj.target/pcre/src/binding.o] Error 1
gyp ERR! build error
gyp ERR! stack Error: `make` failed with exit code: 2
gyp ERR! stack     at ChildProcess.onExit (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/build.js:269:23)
gyp ERR! stack     at ChildProcess.emit (events.js:110:17)
gyp ERR! stack     at Process.ChildProcess._handle.onexit (child_process.js:1074:12)
gyp ERR! System Darwin 14.3.0
gyp ERR! command "node" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /Users/username/Sites/node-root/css-parser/node_modules/pcre
gyp ERR! node -v v0.12.4
gyp ERR! node-gyp -v v2.0.1
gyp ERR! not ok
npm ERR! Darwin 14.3.0
npm ERR! argv "node" "/usr/local/bin/npm" "install" "pcre"
npm ERR! node v0.12.4
npm ERR! npm  v2.12.0
npm ERR! code ELIFECYCLE

npm ERR! [email protected] install: `node-gyp rebuild`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] install script 'node-gyp rebuild'.
npm ERR! This is most likely a problem with the pcre package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR!     node-gyp rebuild
npm ERR! You can get their info via:
npm ERR!     npm owner ls pcre
npm ERR! There is likely additional logging output above.

My $PATH:

echo $PATH;
/usr/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/usr/local/git/bin:/usr/bin

Proide API compliance with the native JS RegExp methods

Having PCRE is really awesome. But it's preferred that it would work as more of a drop-in replacement for the native RegExp, with the ability to use the full PCRE syntax.

The biggest issue with the current implementation is that it returns a series of offsets rather than returning the string as you might expect.

What are the chances that your PCRE module will behave more like JS native Rexexp.exec?

The page for RegExp on MDN

It seems like execAll is the same as the global flag.

Use libuv

Can you add support for using the libuv threadpool so these regexs can run in parallel? We don't want parsing a 20MB file to happen in the main thread

error when install

I'm getting the node-gyp rebuild error:

LIBTOOL-STATIC Release/pcre.a
libtool: unrecognized option -static' libtool: Trylibtool --help' for more information.
make: *** [Release/pcre.a] Error 1
gyp ERR! build error
gyp ERR! stack Error: make failed with exit code: 2
gyp ERR! stack at ChildProcess.onExit (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/build.js:267:23)
gyp ERR! stack at ChildProcess.EventEmitter.emit (events.js:98:17)
gyp ERR! stack at Process.ChildProcess._handle.onexit (child_process.js:797:12)
gyp ERR! System Darwin 13.3.0
gyp ERR! command "node" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /Users/xiaofanyang/workspace/booking/node_modules/pcre
gyp ERR! node -v v0.10.26
gyp ERR! node-gyp -v v0.12.2
gyp ERR! not ok
npm ERR! [email protected] install: node-gyp rebuild
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] install script.
npm ERR! This is most likely a problem with the pcre package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR! node-gyp rebuild
npm ERR! You can get their info via:
npm ERR! npm owner ls pcre
npm ERR! There is likely additional logging output above.

npm ERR! System Darwin 13.3.0
npm ERR! command "node" "/usr/local/bin/npm" "install" "pcre"
npm ERR! cwd /Users/xiaofanyang/workspace/booking
npm ERR! node -v v0.10.26
npm ERR! npm -v 1.4.3
npm ERR! code ELIFECYCLE
npm ERR!
npm ERR! Additional logging details can be found in:
npm ERR! /Users/xiaofanyang/workspace/booking/npm-debug.log
npm ERR! not ok code 0

what can I do to fix this?

Weird behavior of null character

Please see the following test cases for some weird behavior involving null characters. I haven't looked through your code yet, but I suspect it has something to do with how the strings are passed to the pcre library (null-terminated).

PCRE.exec('\\0', '\0', 0); // returns null, should return [0, 1]
PCRE.exec('a\\0a', 'a\0a', 0); // returns null, should return [0, 3]

segfault with execAll

doing

var re = new Pcre( '.' );
re.execAll( "foobar" );

gives me

node(1636,0x7fff7375e180) malloc: *** error for object 0x100000000: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6

on NodeJS 0.10.32 and 0.10.33

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.