Git Product home page Git Product logo

pire's People

Contributors

agryosha avatar amdmi3 avatar davenger avatar dprokoptsev avatar gotthit avatar grphil avatar jakovenko-dm avatar karina-usmanova avatar kv75 avatar moskupols avatar orivej avatar pg83 avatar pilot7747 avatar sergey-v-galtsev avatar sorc1 avatar starius avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pire's Issues

Compilation error on Windows

How to build it on Windows? It always fails with the following error:
pire\re_lexer.cpp(29): fatal error C1083: Cannot open include file: 're_parser.h': No such file or directory

Which Regexp syntax is used ?

Hello, I am considering to replace RE2 library with PIRE in a high load application which works with a stream. I am stuck with a regexp syntax, could you clarify which one do I need to use?

Encoding issues

I have a very simple regexp

std::string r;
r  = "(post|get|put|delete).*http/1\\.(1|0)\r\n.*\r\n\r\n";

The test data is:

GET / HTTP/1.1
User-Agent: chrome
Host: ya.ru
Accept: /
Proxy-Connection: Keep-Alive

Which has a proper format i.e. it contains \r\n after each line and extra \r\n after the header. Creation of the scanner is done with the following function:

Pire::Scanner flow::scannerFor( std::string regexp ){
    if ( !regexp.size() ) {
        throw common::error( _ERR_EMPTY_REGEXP );
    }

    std::vector<Pire::wchar32> pattern;
    Pire::Scanner s;

    try {
        Pire::Encodings::Utf8().FromLocal( regexp.c_str(), regexp.c_str() + regexp.size(), std::back_inserter(pattern) );

        s = Pire::Lexer( pattern.begin(), pattern.end() )
            .SetEncoding( Pire::Encodings::Utf8() )
            .AddFeature( Pire::Features::CaseInsensitive() )
            .Parse()
            .Surround()
            .Compile<Pire::Scanner>();

    } catch ( ... ) {
        throw common::error( _ERR_REGEXP_COMPILE
            .arg( regexp ) 
        );
    }

    return s;
}

When I use scanner which is compiled for my regexp with the test data it fails to match. If I comment the line

.SetEncoding( Pire::Encodings::Utf8() )

in the scanner's creation function, scanner starts to match.
Could you comment this situation?

Compile Error

yaoweibin@ubuntu:/test/pire$ uname -a
Linux ubuntu 2.6.28-11-server #42-Ubuntu SMP Fri Apr 17 02:45:36 UTC 2009 x86_64 GNU/Linux
yaoweibin@ubuntu:
/test/pire$ g++ -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.3.3-5ubuntu4' --with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.3 --program-suffix=-4.3 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4)
yaoweibin@ubuntu:~/test/pire$ make
make all-recursive
make[1]: Entering directory /home/yaoweibin/test/pire' Making all in pire make[2]: Entering directory/home/yaoweibin/test/pire/pire'
make all-am
make[3]: Entering directory /home/yaoweibin/test/pire/pire' /bin/bash ../ylwrap inline.lpp .c inline.cpp -- : make[3]: *** [inline.cpp] Error 1 make[3]: Leaving directory/home/yaoweibin/test/pire/pire'
make[2]: *** [all] Error 2
make[2]: Leaving directory /home/yaoweibin/test/pire/pire' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory/home/yaoweibin/test/pire'
make: *** [all] Error 2

add LettersCount() to SlowScanner

Other scanner types have this method. SlowScanner can implement it easily. I write template code parametrized with scanner type, in which I need LettersCount.

fix build instructions for "On *nix, from the tarball"

Github now has Downloads and Releases as two separate entities. There are no links to Downloads from the main page of the repo. "Releaes" currently are simple snapshots of tags. In particular, files in releases are not autoreconf'ed, so they lack file "configure". Downloads are outdated.

Look in the instructions:

On *nix, from the tarball:

    $ ./configure && make && sudo make install

Obviously these instructions do not work for tarball from Releases.

How to solve:

  • change INSTALL, add autoreconf step, or
  • make complete release on Github. It allows to upload tarball (with autoreconf'ed files)

Make pire_inline capable of cross-compiling regexps

Scanner representation is not portable between systems with different byte order and potentially word size; hence once regexp is compiled and inlined, the resulting .cpp must be compiled for the same system where pire_inline had been run. As the result, no cross-compiling is possible.

We need to invent a way to specify target platform capabilities and serialize regexp for that platform.

Make a new release

Hi!

Thank you for the such cool library! I'm interested in packaging the library into some dependecy managers. It will be much easier if you make a release on GitHub, so in a package recipe I will rely on some "stable" version instead of specific commit. E.g. it much easier to create a package for Conan with release.

I found that from the last release (in 2013 - 7 years ago) there are a lot of changes. Will be fine if they will be released.

Thank you!

pattern without ^ and $ matches nothing unless Surrounded

  1. There are two functions called Matches.
  • from pire/run.h:
bool Matches(const Scanner& scanner, const char* begin, const char* end)
{
        return Runner(scanner).Run(begin, end);
}
  • from README:
bool Matches(const Pire::NonrelocScanner& scanner, const char* ptr, size_t len)
{
        return Pire::Runner(scanner)
                .Begin()        // '^'
                .Run(ptr, len)  // the text 
                .End();         // '$'
                // implicitly cast to bool
}

Which one is correct?

If Begin() and End() are to be called, then patterns without ^ and $ match nothing:

Graph for pattern 'abc'

When Begin() is called, it feeds scanner with special begin char, moving it to dead state 1.

Compare this graph with the graph produced for same pattern surrounded and optimised:

Graph for pattern 'abc' surrounded and optimised

Does this mean that all patterns must begin with ^ and end with $? Are Begin() and End() calls required? It should be clarified and documented.

2 . pigrep

Program pigrep behaves as latter Matches, calling Begin() and End(). It also surrounds its patterns. I have removed surrounding (btw it would be useful option, grep has it as -x, --line-regexp) and get the following results:

$ echo -n 'abc' | pigrep 'abc'
$ echo -n 'abc' | pigrep '^abc'
$ echo -n 'abc' | pigrep 'abc$'
$ echo -n 'abc' | pigrep '^abc$'
abc

Summary of problems here:

  • pattern without ^ and $ matches nothing unless Surrounded
  • two different functions called Matches
  • provide pigrep option equivalent to grep --line-regexp

Conan package

Hello,
Do you know about Conan?
Conan is modern dependency manager for C++. And will be great if your library will be available via package manager for other developers.

Here you can find example, how you can create package for the library.

If you have any questions, just ask :-)

can't build from the package produced by `make distcheck` because of missing file re_parser.y

How to reproduce:

git clone https://github.com/yandex/pire
cd pire
autoreconf --install
./configure
make all check
make distcheck
tar -xf pire-0.0.5.tar.gz
cd pire-0.0.5/
./configure
make

Error:

$ make
make  all-recursive
make[1]: Entering directory `.../pire/pire-0.0.5'
Making all in pire
make[2]: Entering directory `.../pire/pire-0.0.5/pire'
make[2]: *** No rule to make target `re_parser.y', needed by `re_parser.cpp'.  Stop.
make[2]: Leaving directory `.../pire/pire-0.0.5/pire'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `.../pire/pire-0.0.5'
make: *** [all] Error 2

I have flex and bison installed.
OS: Ubuntu 14.04.2 LTS

Lexer.Parse called multiple times results in errors

#include <pire/pire.h>

int main() {
    Pire::Lexer lexer("abc");
    Pire::Scanner s1 = lexer.Parse().Compile<Pire::Scanner>();
    Pire::Scanner s2 = lexer.Parse().Compile<Pire::Scanner>();
    const char* text = "abc";
    std::cout << "abc "
              << Pire::Matches(s1, text, text + 3) << ' '
              << Pire::Matches(s2, text, text + 3) << std::endl;
    std::cout << "ab  "
              << Pire::Matches(s1, text, text + 2) << ' '
              << Pire::Matches(s2, text, text + 2) << std::endl;
    std::cout << "    "
              << Pire::Matches(s1, text, text + 0) << ' '
              << Pire::Matches(s2, text, text + 0) << std::endl;
}

Output:

abc 1 0
ab  0 0
    0 1

Expected output:

abc 1 1
ab  0 0
    0 0

typo in docs

pire/fsm.h:
/// Creates an FSM which matches any suffix of any word current FSM matches.
void MakePrefix();
/// Creates an FSM which matches any suffix of any word current FSM matches.
void MakeSuffix();

Make pire_inline capable of cross-compiling regexps

Scanner representation is not portable between systems with different byte order and potentially word size; hence once regexp is compiled and inlined, the resulting .cpp must be compiled for the same system where pire_inline had been run. As the result, no cross-compiling is possible.

We need to invent a way to specify target platform capabilities and serialize regexp for that platform.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.