skvadrik / re2c Goto Github PK

Lexer generator for C, C++, Go and Rust.

License: Other

Makefile 0.05% C 86.35% C++ 11.11% Shell 0.04% Haskell 0.08% M4 0.01% Roff 0.82% Reason 0.01% CMake 0.09% Go 0.01% Ragel 0.11% Python 0.06% PHP 0.01% Java 1.19% JavaScript 0.01% Rust 0.03% TeX 0.03% Starlark 0.01%

re2c's People

Contributors

Stargazers

Watchers

Forkers

bukka otaran yuhangwang pjump wicker25 jhunt siyuantlw blep pauloscustodio thebikeshed aa10000 raedwulf wujunze unixod pskocik stefb965 gnumadmin rossburton nightlark zhouat webgamelinux hyperliar nilini praveenmunagapati hotmit freddieakeroyd gkantsidis malickf skyformat99 fbenkstein weruminger lego12239 zerobitzero erthink floppym petk m-tmatma polluks yenforyang mwberry wl6179 fjardon acidburn0zzz longjohncoder mbrukman neuroradiology dayfox5317 xdcs100 stjordanis shenchupeng craze4toys chuangyeshuo silicon-lang yang123vc certik lamby vinceswann raosiyong jannick0 little-orange linecode tmcdaniel1995 davekok hailong luoji001 yokonuist meme ligfx marcelraschke sailfish009 renesugar salewski covalang danielschuette fanf2 margiex jiangdon2007 ryandesign braveheal abaelhe josekisystems softwareimpacts hijadelufo deni-eng-27 https-github-com-tigareksaperdana-co-id siyms kurtsansom sergeyklay perryperrin filin8804 xmxgodelf hesonghua yangbiaocn lofi-sighbaby pombredanne jeremymiya alexkval glasslight xiaoruiguo val-verde

re2c's Issues

It fixes some of the annoying warnings in the generated code

It fixes some of the annoying warnings in the generated
code. I picked up the patch from
http://ufo2000.lxnt.info/pmwiki/index.php/Re2c/Patches

Original comment by: nuffer

re2c scanner has buffering bug.

The scanner generated for the following expression has
faulty buffering (v0.9.3):

/*!re2c

\+ “cc” { return 1; }
[\000-\377] { return 0; }

The scanner is missing a call to YYFILL() when it gets to
the “cc” part of the match. I was using re2c to generate
a scanner to run on a microcontroller. My buffer is only
16 bytes so I ran into this bug pretty quickly.

I will attach a test program that illustrates the bug.

Kevin Collins
[email protected]

Original comment by: *anonymous

expose YYCTXMARKER

In my initial patch I missed the case that YYFILL
implementation might shift in the buffer an accumulated
token and a partially accumulated trailing context. To
allow YYFILL handle such “shifts” properly we need to
expose the YYCTXMARKER in the same way we always
exposed the YYMARKER.

I attached patches for re2c and htdocs, and new ctx.re.

To test that everything still works I rerun ctx.re
(obviously) and c.re tests. The latter to ensure that
existing specs that do not use trailing context won’t
be affected even though they do not define YYCTXMARKER.

Original comment by: alder

negated char classes

Hi,

I think one big addition would be negated char classes,
whose syntax would be like Perl or PHP:
[^abc]

With this, you could build an easier comment parser,
like:
Ccomments = “/*” ([^] | “*” [^/])* “*/”;
CPPcomments = “//” [^\r\n]*

a dot-all match would also be usefull:
anychar = .
instead of the the char class you use in the examples:
anychar = [\000-\377]

Both these two additions should be easy to add, and
should also have great performance.

Regards,
Nuno Lopes

Original comment by: *anonymous

underestimation of n in YYFILL(n)

The scanner below fails to match “0.eL” if YYFILL reads
exactly n characters (not more). Same (or similar) problem
with test/cnokw.re (if modified so that YYFILL reads
exactly n characters), with input “0.e+1L”.

I believe it is due to an underestimation of ‘n’ at
state yy00. Probably caused by a change that made
maxDist() to store its result with ‘t->depth = maxDist(t)’
and avoiding recalculation. It seems to conflict with
void calcDepth(State *head) marking non-key states
(with s->link = NULL) and calling maxDist in the same loop.
Moving the marking of non-key states to
another loop before doing the calculations appear to help:

E.g.:
// mark non-key states by s->link= NULL ;
for (dfa_state_t* s = head; s; s = s->next ){
if ( (s!=head)&&
!SCC::state_is_in_non_trivial_SCC(s) ){
s->link= NULL ;
}
}

// calculate max number of transitions before
guarantied to reach
// a key state.
for (dfa_state_t* s = head; s; s = s->dfa_state_next() ){
SCC::maxDist(s);
}

Affected versions:

0.9.1-6 from ubuntu hoary: no
0.9.9 yes
0.9.10 yes

int scan(Scanner *s){
uchar *cursor = s->cur;
std:
s->tok = cursor;

/*!re2c
any = [\000-\377];

/*!re2c

("0"* “.” “e”? “L”?) |
("0"+ “.” “e”? “L”?)
{ RET; }

“\n”
{
if(cursor == s->eof) RET;
s->pos = cursor; s->line++;
goto std;
}

any
{
RET;
}
*/

}
-——————————————————————————

Original comment by: *anonymous

Addition to man on flag -f

patch against
$Id: re2c.1.in,v 1.25 2005/11/11 07:39:53 helly Exp $

Notes:

a) the discussion under (2.) in the patch is probably too
long. May need to be cut back a little.

b) I am not sure that I discovered all the needed
changes. The “The -f option inhibits declaration of
yych and yyaccept.” part surpised me. It is
probably worth to document it.

Original comment by: antalk

variable length trailing context is included in tokens

Example:

/*!re2c

(“a”|“b”)/¹ { return KEYWORD; }
(“a”|“b”)/[0-9]\+ { return KEYWORD; }
[0-9]\+ { return NUMBER; }

Scanning over “a77 a1 b8 b1” one would expect the
following:
1: KEYWORD = “a”
2: NUMBER = “77”
3: KEYWORD = “a”
4: NUMBER = “1”
5: KEYWORD = “b”
6: NUMBER = “8”
7: KEYWORD = “b”
8: NUMBER = “1”

Instead re2c scanner returns:
1: KEYWORD = “a77” (!)
2: KEYWORD = “a”
3: NUMBER = “1”
4: KEYWORD = “b8” (!)
5: KEYWORD = “b”
6: NUMBER = “1”

Full test case is attached.

Original comment by: alder

minor cosmetic problem

In my opinion, the version should be maintained for
output, too.

Using re2c-0.9.1 (extracted from the windows installer
package which is really nice :-), the code it generates
begins with
/* Generated by re2c 0.5 on Tue May 25 15:15:57 2004
/
This sort of baffled me, because i was using re2c 0.9.1,
not re2c 0.5.
I think the guilty line is
o << "/ Generated by re2c 0.5 on ";
line 134 in parser.y

Might be useful to maintain this for all future releases.

Original comment by: *anonymous

homepage should tell what the project does

The homepage could use a blurb saying what re2c is, like:

re2c is a great tool for writing fast and flexible
lexers. Unlike
other such tools, re2c concentrates solely on
generating efficient
code for matching regular expressions. Not only does
this singleness
make re2c more suitable for a wider variety of
applications, it
allows us to generate scanners which approach
hand-crafted ones in
terms of size and speed.

(from the debian package)

Original comment by: *anonymous

Fix warnings from MSVC.NET

fix warnings.

Original comment by: nuffer

savable state support for multiple re2c blocks

With the -f flag specified, every /*!re2c */ block starts with the resume
switch statement, and a yyNext: label. However, only the last one is
correct; the other ones (as might be expected) only contain
yyFillLabels up to that point. In addition, the multiple yyNext: labels
are inappropriate.

I took a look at the code, but I couldn’t see an obvious way to fix it
other than generating the yyNext: label only at the first block, and
the switch statement only at the end, perhaps triggered by
something like:

/*!re2c:emit_switch */

I think that’s a bit ugly, so I’m hoping that someone has a better idea.

A couple more comments about the feature:

1) It’s not clear to me why -f should suppress the declaration of
yyaccept and yych. I understand that the submitter of the patch had
a different idea of how these should be declared, but that may not
be general.

2) I’d really like to be able to hook into the yyFillLabel mechanism
myself. Not everything falls nicely into re2c tokenising (the case I’m
looking at here involves base-64 decoding.) I’d be quite content with
the following:

/*!fill:3 */ ==>

YYSETSTATE;
if (YYLIMIT – YYCURSOR < 3) YYFILL;
yyFillLabel<something>:

This could be achieved by adding the following snippet to
scanner.re (where the max:re2c patch went, although I think re2c:
match would have been a bit better)

“/*!fill:” [ \t]* [0-9]\+ {
int n = atoi(tok + 8);
fill_index(out, n);
tok = pos = cursor;
ignore_eoc = true;
goto echo;
}

and modifying need() in code.cc:

static void need(std::ostream &o, uint n, bool & readCh)
{
fill_index(o, n);
o << “\tyych = *YYCURSOR;\n”;
readCh = false;
++oline;
}

with the rest of it going into fill_index:

void fill_index(std::ostream &o, uint n)
{
uint fillIndex;
bool hasFillIndex = (0<=vFillIndexes);
if ( hasFillIndex == true )
{
fillIndex = vFillIndexes++;
o << “\tYYSETSTATE(” << fillIndex << “);\n”;
++oline;
}

if (n == 1)
{
o << “\tif(YYLIMIT == YYCURSOR) YYFILL;\n”;
+oline;
}
else
{
o << "\tif((YYLIMIT – YYCURSOR) < " << n << “) YYFILL(” <<
n << “);\n”;
+oline;
}

if ( hasFillIndex == true )
{
o << “yyFillLabel” << fillIndex << “:\n”;
++oline;
}
}

I haven’t actually tried that yet, and I’m probably missing some
details about #line numbering.

3) I don’t understand the point of YYGETSTATE() instead of just
YYGETSTATE. I guess tastes differ. But it could not be implemented
as a function unless you used a global variable, and that seems
unlikely in the context of control inversion.

4) On the other hand, while we’re on the subject, it would be really
handy for me to have re2c emit:

++YYCURSOR: SAVECURSOR;
and
RESTORECURSOR;

instead of
YYMARKER = ++YYCURSOR;
and
YYCURSOR = YYMARKER;

Perhaps that’s not useful to anyone else, and I know how to do it. In
the particular buffering environment I’m working inside, the saved
cursor state cannot easily be represented as a pointer. I notice that
re2c is capable of simply doing YYCURSOR -= k; for some constant
k; that wouldn’t work either but I haven’t run into it in the wild yet.
Maybe I’ve just been lucky.

[email protected]

Original comment by: ricilake

Make generated if comparisons work with characters above 127

On writing a lexer dealing with UTF-8 characters, I
came across a problem with generated if’s. Here is a
code snippet:

if(yych <= ‘\277’){
//blah blah
}

The above comparison always fails (at least on MSVC
2003) since ‘\277’ is negative. I believe it’s
conformant to the C++ spec that the type of a character
literal is ‘char’.

The patch changes code.cc so the above snippet becomes
if(yych <= L’\277’){
//blah blah
}

In fact, it might not be a bad idea to prefix L to
every character literal generated by RE2C.

Original comment by: limethief

Invalid code with -b option in Visual .NET

When compiling the last example from the re2c manual
generated with the -b option in Visual .NET 2003 ( with
option /TC – compile as C code) , I get several errors :

simple.c(222) : error C2065: ‘yybm’ : undeclared identifier
simple.c(222) : error C2109: subscript requires array
or pointer type
simple.c(248) : error C2109: subscript requires array
or pointer type
simple.c(440) : error C2109: subscript requires array
or pointer type
simple.c(570) : error C2109: subscript requires array
or pointer type
simple.c(586) : error C2109: subscript requires array
or pointer type
simple.c(638) : error C2109: subscript requires array
or pointer type

etc…

it seems that this is caused by fact the bit vectors
array is declared outside the scanner’s main block in
the scan function.

Original comment by: noirotm

Windows support (takes only a few minutes)

Hi!
I have just started using re2c on windows and therefore
I needed to make a few changes to create a running
version. As I don’t know how to use CVS I would be glad
if you applied the following changes to the source:

\- Use istream (ifstream or cin) instead of unix
open/read/close
- Rename the .cc files to .cpp (makes compiling on
windows easier, but that’s not neccessary)

I appended “my version” of re2c, but I’m not sure
whether this is the newest version because I didn’t get
it via CVS but by the sourceforge download page.

Thanks a lot,
Jakob

Original comment by: *anonymous

do not generate goto next state

This is arguably redandand as optimizers will remove
these gotos anyway, but the generated code becomes much
more readable (and maybe it’ll help optimizer as well :-).

The attached patch removes gotos that were generated
from the current state to the next one, like in the
following example:

yy37:
++YYCURSOR;
if((yych = *YYCURSOR) == ‘=’) goto yy96;
goto yy38;

yy38:
{ RET; }

Patched re2c generates instead:

yy37:
++YYCURSOR;
if((yych = *YYCURSOR) == ‘=’) goto yy96;

{ RET; }

Original comment by: alder

CharSet initialization fix

Latest (as of this writing) rev.1.32 in CVS does not
initialize CharSet completely – not yet allocated char
sets need cardinality set to 0. As the result re2c crashes.
Patch is attached.

Original comment by: alder

Invalid options prefixed with two dashes cause program crash

If invalid command line options are prefixed with two
dashes (i.e. —) the program aborts.

Example:
re2c —xyz

Original comment by: hattrick

automake build patch

I made a patch to get re2c built with automake. Maybe I’m not the
only one who has been longing for a familiar “configure” build
system in RE2C. Enjoy!

Patching instruction:

$ cd re2c (your CVS working directory)
$ patch -p1 < re2c-automake-patch.diff
$ chmod 755 autogen.sh cvsclean.sh
$ ./autogen.sh

Original comment by: moriyoshi

readsome with MSVC

The way MS implemented readsome makes it useless for
reading files (ifstream::open does not fill in the
buffer, and the first readsome sees that the buffer is
empty). Replacing readsome to read-gcount pair makes it
work with both VC and g++.
Patch is attached.

Original comment by: alder

enable builds outside the source directory

Build outside the source directory fails.

First, there are changes needed to Makefile.am regarding
scanner.cc and parser.cc, the patchfile is what I used
now, but there may be more changes needed to get a
useful logic here.

Regards,
Gerrit

Original comment by: siebenschlaefer

symbol table reimplementation

use std::map instead of list

Note: for small number of symbols and symbol references
it is slightly slower, at about 100/100 becomes faster.

Original comment by: *anonymous

Add case insensitive string literals

Add case insensitive string literals (we can use single
quotes for them, as single quotes are not used by re2c
currently)

Original comment by: nuffer

--output=output option does not work as documented

Using the —output=output option causes the program to
abort.

Original comment by: hattrick

Piece of code saving a backtracking point not generated

re2c generates a wrong code that doesn’t initialize
YYMARKER in case the backtrack point is the very first
character of the input. The attached patch fixes this
problem.

Use the following script to reproduce the problem.

-test.re
/*!re2c
ALNUM = [0-9a-zA-Z];
ANY = [\000\377];

“?!” ALNUM* {
}

“?” ALNUM+ {
}

(ANY\“?”)* {
}
*/
~~-test.re~~-

Before patching (original):
$ re2c test.re | grep “YYMARKER”
YYCURSOR = YYMARKER;

After patching:
$ re2c test.re | grep “YYMARKER”
YYMARKER = YYCURSOR + 1;
YYCURSOR = YYMARKER;

Original comment by: moriyoshi

incorrect code generated with -b option

it looks like the code generated for the following
fragment with ‘-b’ option is incorrect:

/*!re2c
any = [\001-\377];
‘<a’ { RET; }
[<][A-Za-z]\+ { RET; }
[\000] { RET; }
any { goto cont; }
*/

I suspect that the cause of trouble is the code around
``yy7:’’ label:

yy7: +YYCURSOR;
if(yybm[0(yych = *YYCURSOR)] & 128) yych =
*YYCURSOR;
goto yy9;
goto yy8;

here, ``goto yy8:‘’ is unreachable (and ``yych =
*YYCURSOR’’ is performed twice).

The full code generated by re2c is attached.
(the platform is Mac OS X, version 10.3, but I hope it
is not important).

Thanks!

#line 6 “HTML_Lexer.cpp”
{
YYCTYPE yych;
unsigned int yyaccept;
static unsigned char yybm[] = {
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 128, 128, 128, 128, 128, 128, 128,
128, 128, 128, 128, 128, 128, 128, 128,
128, 128, 128, 128, 128, 128, 128, 128,
128, 128, 128, 0, 0, 0, 0, 0,
0, 128, 128, 128, 128, 128, 128, 128,
128, 128, 128, 128, 128, 128, 128, 128,
128, 128, 128, 128, 128, 128, 128, 128,
128, 128, 128, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
};
goto yy0;
yy1: +YYCURSOR;
yy0:
if((YYLIMIT – YYCURSOR) < 3) YYFILL;
yych = *YYCURSOR;
if(yych <= ‘\000’) goto yy4;
if(yych != ‘<’) goto yy6;
goto yy2;
yy2: +YYCURSOR;
if((yych = *YYCURSOR) <= ‘Z’){
if(yych <= ‘@’) goto yy3;
if(yych <= ‘A’) goto yy7;
goto yy9;
} else {
if(yych <= ‘`’) goto yy3;
if(yych <= ‘a’) goto yy7;
if(yych <= ‘z’) goto yy9;
goto yy3;
}
yy3:
#line 63 “HTML_Lexer.re2c”
{ fprintf(stderr,“.”); goto tag; }
#line 66 “HTML_Lexer.cpp”
yy4: +YYCURSOR;
goto yy5;
yy5:
#line 62 “HTML_Lexer.re2c”
{ RET; }
#line 72 “HTML_Lexer.cpp”
yy6: yych = *YYCURSOR;
goto yy3;
yy7: +YYCURSOR;
if(yybm[0+(yych = *YYCURSOR)] & 128) yych =
*YYCURSOR;
goto yy9;
goto yy8;
yy8:
#line 60 “HTML_Lexer.re2c”
{ RET; }
#line 81 “HTML_Lexer.cpp”
yy9: +YYCURSOR;
if(YYLIMIT == YYCURSOR) YYFILL;
yych = *YYCURSOR;
goto yy10;
yy10: if(yybm[0yych] & 128) goto yy9;
goto yy11;
yy11:
#line 61 “HTML_Lexer.re2c”
{ RET; }
#line 92 “HTML_Lexer.cpp”
}
#line 64 “HTML_Lexer.re2c”

}

Original comment by: *anonymous

re2c hangs when processing valid re-file

// tested with win32 build of RE2C 0.9.4

#define YYCTYPE unsigned char
#define YYCURSOR cursor
#define YYLIMIT cursor
#define YYMARKER marker
#define YYFILL

bool DetectBinHex(const char text)
{
YYCTYPE *start = (YYCTYPE *)text;
YYCTYPE *cursor = (YYCTYPE *)text;
YYCTYPE *marker = (YYCTYPE *)text;
next:
YYCTYPE *token = cursor;
/!re2c
‘(This file must be converted with BinHex 4.0)’
{ if (token == start || *(token – 1) == ‘\n’)
return true; else goto next; }
[\001-\377]
{ goto next; }
[\000]
{ return false; }
*/
return false;
}

Original comment by: ssvb

Re: [ 1297658 ] underestimation of n in YYFILL(n)

> It seems you analyzed this in more detail, could you
perhaps
> create a patch also?

This patch appears to work. Only ran on tests coming
with re2c-0.9.10, but

$ diff cnokw.c cnokw.temp
136c136
< if((YYLIMIT – YYCURSOR) < 4) YYFILL;

> if((YYLIMIT – YYCURSOR) < 5) YYFILL;

suggests it is OK. (Other tests show no difference).

Original comment by: antalk

re2c does not emit last line if '\n' missing

If the last line in the .re file does not terminate with \n,
it is not copied to the .c file.

Original comment by: *anonymous

Unicode patch for 0.9.7

Wanted to enable 16-bit character support as to enable
construction of unicode scanners.

We didn’t find a simple way to support the EBCDIC
stuff, so we removed it. Further, for some reason the
use of bit vectors failed in test, so we disabled them as
well.

We added a -w option to enable the Unicode processing
as to keep backward compatibility with 8-bit use (the
stuff mentioned above still disabled though).

Thomas Rask Thomsen

Original comment by: thomas_rask

braced quantifiers: {\d+(,|,\d+)?} style

0.9.1 supports nongreedy * and + closure quantifiers.

This patch allows for perl-style {\d+(,|,\d+)?}
closures. Why describe it myself, a clip from the
perlre man page:

{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m
times

Example re2c patthen:

“a” "b"{4,8} “c” { code here; }

As with the _ and * quantifiers, it modified a a
“primary” gammar element.

Original comment by: yourjesus

Input buffer overrun.

Input buffer overrun. Scanner, generated by re2c reads
one byte after the end of input buffer.

This bug taken from
http://ufo2000.lxnt.info/pmwiki/index.php/Re2c/Bugs

test.re:

#include <stdio.h>
#include <string.h>

#define YYCTYPE char
#define YYCURSOR cursor
#define YYLIMIT cursor
#define YYMARKER marker
#define YYFILL

int test(const char text)
{
YYCTYPE *cursor = (YYCTYPE *)text;
YYCTYPE *marker = (YYCTYPE *)text;
next:;
YYCTYPE *token = cursor;
/!re2c
[\001-\377] { return 0; }
[\000] { return 1; }
*/
}

int main()
{
char *text = new char¹;
strcpy(text, "");
test(text);
}

test.cpp:

/* Generated by re2c 0.5 on Thu Nov 6 13:57:12 2003 */
#line 1 “test.re”
#include <stdio.h>
#include <string.h>

#define YYCTYPE char
#define YYCURSOR cursor
#define YYLIMIT cursor
#define YYMARKER marker
#define YYFILL

int test(const char *text)
{
YYCTYPE *cursor = (YYCTYPE *)text;
YYCTYPE *marker = (YYCTYPE *)text;
next:;
YYCTYPE *token = cursor;
{
YYCTYPE yych;
unsigned int yyaccept;
goto yy0;
yy1: +YYCURSOR;
yy0:
if(YYLIMIT == YYCURSOR) YYFILL;
yych = *YYCURSOR;
if(yych <= ‘\000’) goto yy4;
yy2: yych = *YYCURSOR;
yy3:
#line 17
{ return 0; }
yy4: yych = *+YYCURSOR;
yy5:
#line 18
{ return 1; }
}
#line 19

}

int main()
{
char *text = new char¹;
strcpy(text, "");
test(text);
}

valgrind.log:

1781 Memcheck, a.k.a. Valgrind, a memory error
detector for x86-linux.
1781 Copyright © 2002, and GNU GPL’d, by Julian
Seward.
1781 Using valgrind-1.9.6, a program
instrumentation system for x86-linux.
1781 Copyright © 2000-2002, and GNU GPL’d, by
Julian Seward.
1781 Estimated CPU clock rate is 962 MHz
1781 For more details, rerun with: -v
1781
1781 Invalid read of size 1
1781 at 0×80484B0: test(char const*) (in
/home/serge/test/test)
1781 by 0×80484FE: main (in /home/serge/test/test)
1781 by 0×403077A6: __libc_start_main (in
/lib/libc-2.3.2.so)
1781 by 0×80483D0: (within /home/serge/test/test)
1781 Address 0×41116059 is 0 bytes after a block
of size 1 alloc’d
1781 at 0×40161948: __builtin_vec_new (in
/usr/lib/valgrind/valgrind.so)
1781 by 0×40161990: operator new[](unsigned) (in
/usr/lib/valgrind/valgrind.so)
1781 by 0×80484DD: main (in /home/serge/test/test)
1781 by 0×403077A6: __libc_start_main (in
/lib/libc-2.3.2.so)
1781
1781 ERROR SUMMARY: 1 errors from 1 contexts
(suppressed: 0 from 0)
1781 malloc/free: in use at exit: 1 bytes in 1 blocks.
1781 malloc/free: 2 allocs, 1 frees, 3 bytes allocated.
1781 For a detailed leak analysis, rerun with:
—leak-check=yes
1781 For counts of detected errors, rerun with: -v

Original comment by: nuffer

re2c generate some invalid #line on WIN32

Here is an excerpt of code generated on WIN32 with the
prebuilt 0.9.9 binary:

yy4:
#line 179
“e:\\prg\\vc\\cppparser4\\src\\lib_cpppreprocessor\\lexer.re”
{ RET; }
#line 295
“e:\prg\vc\cppparser4\src\lib_cpppreprocessor\\lexer.cpp”

Notice the on the second #line, the backslash ‘\’
directory separator are not escaped. This cause a
forest of warning.

\ Baptiste.

Original comment by: blep

re2c can not be built on fedora-5

After just downloading it (v. 0.10.1), untarring, in
the end of ./configure && make I’m receiving:

make all-am
make¹: Entering directory
`/var/spool/ecbuild/RPM/BUILD/re2c-0.10.1’
if g++ -DHAVE_CONFIG_H -I. -I. -I. -g -O2 -MT
code.o -MD -MP -MF “.deps/code.Tpo” -c -o code.o code.cc; \
then mv -f “.deps/code.Tpo” “.deps/code.Po”; else rm -f
“.deps/code.Tpo”; exit 1; fi
if g++ -DHAVE_CONFIG_H -I. -I. -I. -g -O2 -MT dfa.o
-MD -MP -MF “.deps/dfa.Tpo” -c -o dfa.o dfa.cc; \
then mv -f “.deps/dfa.Tpo” “.deps/dfa.Po”; else rm -f
“.deps/dfa.Tpo”; exit 1; fi
if g++ -DHAVE_CONFIG_H -I. -I. -I. -g -O2 -MT
main.o -MD -MP -MF “.deps/main.Tpo” -c -o main.o main.cc; \
then mv -f “.deps/main.Tpo” “.deps/main.Po”; else rm -f
“.deps/main.Tpo”; exit 1; fi
if g++ -DHAVE_CONFIG_H -I. -I. -I. -g -O2 -MT
parser.o -MD -MP -MF “.deps/parser.Tpo” -c -o parser.o
parser.cc; \
then mv -f “.deps/parser.Tpo” “.deps/parser.Po”; else
rm -f “.deps/parser.Tpo”; exit 1; fi
parser.y: In function âint yyparse()â:
parser.y:87: error: âmkAltâ was not declared in this scope
parser.y:114: error: âmkAltâ was not declared in this scope
parser.y:138: error: âmkAltâ was not declared in this scope
make¹: * [parser.o] Error 1
make¹: Leaving directory
`/var/spool/ecbuild/RPM/BUILD/re2c-0.10.1’
make: * [all] Error 2

Original comment by: sergey57

re2c creates an infinite loop

The following code produces an infinte loop.
(YYCURSOR is not incremented)

/!re2c
[^\n\000] ‘\n’ { do_this(); }
[^\n\000]* ‘\000’ { do_that(); }
*/

while this equivalent notation works:

/!re2c
[^\n\000] ‘\n’ { do_this(); }
[^\n\000]\+ ‘\000’ { do_that(); }
‘\000’ { do_that(); }
*/

Original comment by: *anonymous

Fix compile on FreeBSD

unistd.h and ctype.h headers missing.

Original comment by: yourjesus

-e (EBCDIC cross compile) broken

test1.re --

/*!re2c
[a-z] { printf(“Small letter”); }
[\000-\377] { printf( “anyting else” ); }
*/

re2c -e test1.re

[a-z] matches some non-letters.
Cause: range boundaries translated to EBCDIC, then
interpreted as a range.

test2.re -

/*!re2c
. { printf(“Anything but newline”); }
[\000-\377] { printf(“newline”); }
*/
-———————
re2c -e test2.re
Takes 0×0A to be newline, but this is not the case in
EBCDIC.
Cause: mkDot() uses ASCII \n where EBCDIC \n is expected.

test3a.re --

/*!re2c
‘abcdefghijklmnopqrstuvwxyz’ { printf(“Small or
capital letter”); }
[\000-\377] { printf( “anyting else” ); }
*/
-———————————-

test3b.re -

/*!re2c
‘ABCDEFGHIJKLMNOPQRSTUVWXYZ’ { printf(“Small or
capital letter”); }
[\000-\377] { printf( “anyting else” ); }
*/
-———————————

re2c -e fails to generate case insensitivity.
Cause: (ASCII) toupper and tolower used on
EBCDIC encoded letters.

Original comment by: *anonymous

Cleanup, Fix warnings, allow build with bison 1.875, makerpm

Just as a reminder here is my version that does:

\- a little clean up
- allow build with newer g++ versions and bison 1.875
(re2c should now build with all current and older g++
and yacc or bison versions)
- fix some warnings
- putting in cvs tags
- added a makerpm script
- added re2c —version

hope to get the changes intergrated :-)

When you apply the patch you either need to do make
clean and remove the now missing files from cvs or do it
manually with these files:
README
parser.cc
parser.tab.h
re2c.1
scanner.cc
version.h
y.tab.h

(the patch turns those into temp files).

The patch is against todays version (2004.01.27).

You may also want to verify your README changes to
the new file README.in (and look out for how the
version info is inserted there).

regards
marcus

Original comment by: helly

integrate storable state patch

The changeset can be found here:

http://www.mgix.com/re2c-0.9.5-salvable-state-unified-patch-v2.bz2

Original comment by: *anonymous

-e (EBCDIC cross compile) broken

test1.re --

/*!re2c
[a-z] { printf(“Small letter”); }
[\000-\377] { printf( “anyting else” ); }
*/

re2c -e test1.re

[a-z] matches some non-letters.
Cause: range boundaries translated to EBCDIC, then
interpreted as a range.

test2.re -

test3a.re --

/*!re2c
‘abcdefghijklmnopqrstuvwxyz’ { printf(“Small or
capital letter”); }
[\000-\377] { printf( “anyting else” ); }
*/
-———————————-

test3b.re -

/*!re2c
‘ABCDEFGHIJKLMNOPQRSTUVWXYZ’ { printf(“Small or
capital letter”); }
[\000-\377] { printf( “anyting else” ); }
*/
-———————————

re2c -e fails to generate case insensitivity.
Cause: (ASCII) toupper and tolower used on
EBCDIC encoded letters.

Original comment by: *anonymous

incorrect code generated when using -b

This bug was reported by Matt Sergeant (msergeant at
startechgroup.co.uk):

Anyway take a look at this. It produces the correct
code if run without the -b flag, but incorrect code
(i.e. it doesn’t match) if you run with -b.

(this will probably wrap, unfortunately I don’t have a
way to send it without wrapping right now)

#define NULL ((char*) 0)
#define YYCTYPE char
#define YYCURSOR p
#define YYLIMIT p
#define YYMARKER q
#define YYFILL
#define YYDEBUG debug

#include <stdio.h>

void debug(int state, char curr) {
printf(“State: %d, Curr: %c\n”, state, curr);
}

char scan281(char *p){
char *q;
start:
/!re2c
( “adsl” | “adslppp” | “bdsl” | “cdsl” |
“dslgw4pool” | “dslppp” | “edsl” | “fdsl” | “ldsl” |
“pool” | “pppdsl” | “premiumC” | “vdsl” | “xsttldsl”
)[0-9a-k\-]\+ “.”.+ “.uswest.net” {return
“dsl”;}
( “dhcp” | “dorms” | “rh” ).* “-”[0-9]\+ “-”[0-9]+
“.”.+ “.resnet.pitt.edu” {return “resnet”;}
[0-9]\+ “-”[0-9]\+ “-”[0-9]\+ “-”[0-9]\+ “.”( “dhcp” |
.* “modem” | “bothell” ).* “.washington.edu”
{return “edu”;}
[\001-\377] { goto start; }
[\000] {return NULL; }
*/
}

int main(int argc, char **argv) {
char *v = “D-128-208-46-51.dhcp4.washington.edu”;
printf(“%s\n”, scan281(v));
return 0;
}

Original comment by: nuffer

variable length trailing context

This patch adds to re2c the ability to handle trailing
contexts of variable length (fixes a known “bug” :-)

Test case is attached as well.

For “no trailing context” test case a C scanner was
generated. It matches line by line the expected “test/c.c”.

Original comment by: alder

reorganized sources

For some time I have been trying to modify re2c to my
needs. Although I did not finish that, for some time I
will not have time to continue. Basically, I
reorganized the sources (more modular and probably
easier to understand), added some comments (mostly for
doxygen) and implemented some changes that I hope
others might find useful, too. Description of changes
is in README.in,
together with some comments on the directories. Please
take a look at it.

Note on compiling: for production #define NDEBUG to
turn off assert()’s. Otherwise just ./autogen.sh;
./configure; make; make check; sudo make install;
should work.

The resulting program is named re3c-re2c, it is
compatible to re2c (at least to the extent exercised by
make check), although the code emitted is not
identical. (See also -c flag)

make doxy : runs doxygen (HTML output under doxy/)

Unfortunately cannot upload everything:
Error: Uploaded file must be >20 and <256000 bytes.

Removed doc/, test/, examples/ to accomodate.
(Kept test/trailing-var.* and examples/cmmap_re.c)

Original comment by: antalk

unused variable `yyaccept'

The generated code exhibits a (gcc) compiler warning.

unused variable `yyaccept’

Is this a hard thing to get rid of?

thanks,
Jason

Original comment by: *anonymous

Missing forward declaration

Getting the following error while trying to compile
re2c on Fedora Core 5.

parser.y: In function ‘int yyparse()’:
parser.y:87: error: ‘mkAlt’ was not declared in this
scope
parser.y:114: error: ‘mkAlt’ was not declared in this
scope
parser.y:138: error: ‘mkAlt’ was not declared in this
scope
make¹: * [parser.o] Error 1
make¹: Leaving directory `/tmp/re2c-0.10.1’
make: * [all] Error 2

Original comment by: czachary

label_list reimplementation

re2c-0.9.10 label_list implementation is slow.
Attachment includes a reimplementation (as patch).

Original comment by: *anonymous

cannot compile with gcc-3.3.1 on cygwin

Hi,

I’m the Cygwin gcc maintainer and wanted to build and
distribute re2c for Cygwin.

But I cannot compile the provided scanner.cc, getting
this error from g++:

g++ ~~o scanner.o -O2 -Wall -I. -Wno-unused -Wno~~
parentheses -Wno-deprecated -c scanner.cc
scanner.re: In member function `int Scanner::echo
(std::ostream&)‘:
scanner.re:79: error: invalid conversion from `uchar*’ to
`const char*’
scanner.re:79: error: initializing argument 1 of
`std::basic_ostream<_CharT,
Traits>& std::basic_ostream<CharT, _Traits>::write
(const _CharT*, int)
[with _CharT = char, Traits = std::char_traits<char>]’
scanner.re:75: error: invalid conversion from `uchar*’ to
`const char*’
scanner.re:75: error: initializing argument 1 of
`std::basic_ostream<CharT,
Traits>& std::basic_ostream<CharT, _Traits>::write
(const _CharT*, int)
[with _CharT = char, Traits = std::chartraits<char>]’
make: * [scanner.o] Error 1

Any hints please,

Gerrit

Original comment by: siebenschlaefer

Copmputational Reuse Optimization

In the automata emitter, the maxDist method is called
iteratively and recursively on a DAC. No effort is
made to cache the computations.

This patch caches the distance computations for better
computational reuse.

On complicated sets of regular expression tokens re2c
compile times of 10+ hours can be dropped to several
seconds.

Original comment by: yourjesus

Re2c fails to generate valid code

The following re2c code does not get processed correctly:

/*!re2c
([“] ([\000-\255]\[”])\+ ["]) { RET; }
*/

The generated result includes:
case 0×9F:
case 0xA7:
case 0xA8:
case 0xA9:
case 0xA::
case 0xA;:
case 0xA<:
case 0xA=:
case 0xA>:
case 0xA?:
case 0xA@:
case 0xAA:
case 0xAB:

Needless to say, the compiler isn’t too happy about this.

—Aaron ([email protected])

Original comment by: *anonymous

Avoid rebuilds of re2c when running subtargets

$ diff u Makefile.am~ Makefile.am
-- Makefile.am~ 2005-10-27 15:18:12.325708800 0200
++ Makefile.am 2005-10-27 15:18:23.992484800 +0200
@ -12,7 +12,7 @
#CXXFLAGS = -O2 -Wall -I. -Wno-unused
-Wno-parentheses -Wno-deprecated
YFLAGS = -d

-RE2C = re2c
+RE2C = re2c$(EXEEXT)
RE2CFLAGS = -s

CLEANFILES = parser.cc y.tab.c y.tab.h parser.cc
re2c.1 .version

Original comment by: siebenschlaefer

re2c cannot accept {0,}

It actually crashes if this form is ever used.

I attached a patch and a test case for this bug.

With this branch working it might be possible to
express one of the “factor” alternatives in “parser.y” as:

| primary close
{
switch($2){
case ‘*’:
$$ = new CloseVOp($1, 0, -1);
break;
case ‘+’:
$$ = new CloseVOp($1, 1, -1);
break;
case ‘?’:
$$ = new CloseVOp($1, 0, 1);
break;
}
}

IMHO, but might be easier to read.

Original comment by: alder

skvadrik / re2c Goto Github PK

re2c's People

Contributors

Stargazers

Watchers

Forkers

re2c's Issues

Recommend Projects

Recommend Topics

Recommend Org