skvadrik / re2c Goto Github PK
View Code? Open in Web Editor NEWLexer generator for C, C++, Go and Rust.
Home Page: https://re2c.org
License: Other
Lexer generator for C, C++, Go and Rust.
Home Page: https://re2c.org
License: Other
It fixes some of the annoying warnings in the generated
code. I picked up the patch from
http://ufo2000.lxnt.info/pmwiki/index.php/Re2c/Patches
Original comment by: nuffer
The scanner generated for the following expression has
faulty buffering (v0.9.3):
/*!re2c
\+ “cc” { return 1; }
[\000-\377] { return 0; }
*/
The scanner is missing a call to YYFILL() when it gets to
the “cc” part of the match. I was using re2c to generate
a scanner to run on a microcontroller. My buffer is only
16 bytes so I ran into this bug pretty quickly.
I will attach a test program that illustrates the bug.
Kevin Collins
[email protected]
Original comment by: *anonymous
In my initial patch I missed the case that YYFILL
implementation might shift in the buffer an accumulated
token and a partially accumulated trailing context. To
allow YYFILL handle such “shifts” properly we need to
expose the YYCTXMARKER in the same way we always
exposed the YYMARKER.
I attached patches for re2c and htdocs, and new ctx.re.
To test that everything still works I rerun ctx.re
(obviously) and c.re tests. The latter to ensure that
existing specs that do not use trailing context won’t
be affected even though they do not define YYCTXMARKER.
Original comment by: alder
Hi,
I think one big addition would be negated char classes,
whose syntax would be like Perl or PHP:
[^abc]
With this, you could build an easier comment parser,
like:
Ccomments = “/*” ([^] | “*” [^/])* “*/”;
CPPcomments = “//” [^\r\n]*
a dot-all match would also be usefull:
anychar = .
instead of the the char class you use in the examples:
anychar = [\000-\377]
Both these two additions should be easy to add, and
should also have great performance.
Regards,
Nuno Lopes
Original comment by: *anonymous
The scanner below fails to match “0.eL” if YYFILL reads
exactly n characters (not more). Same (or similar) problem
with test/cnokw.re (if modified so that YYFILL reads
exactly n characters), with input “0.e+1L”.
I believe it is due to an underestimation of ‘n’ at
state yy00. Probably caused by a change that made
maxDist() to store its result with ‘t->depth = maxDist(t)’
and avoiding recalculation. It seems to conflict with
void calcDepth(State *head) marking non-key states
(with s->link = NULL) and calling maxDist in the same loop.
Moving the marking of non-key states to
another loop before doing the calculations appear to help:
E.g.:
// mark non-key states by s->link= NULL ;
for (dfa_state_t* s = head; s; s = s->next ){
if ( (s!=head)&&
!SCC::state_is_in_non_trivial_SCC(s) ){
s->link= NULL ;
}
}
// calculate max number of transitions before
guarantied to reach
// a key state.
for (dfa_state_t* s = head; s; s = s->dfa_state_next() ){
SCC::maxDist(s);
}
Affected versions:
0.9.1-6 from ubuntu hoary: no
0.9.9 yes
0.9.10 yes
int scan(Scanner *s){
uchar *cursor = s->cur;
std:
s->tok = cursor;
/*!re2c
any = [\000-\377];
*/
/*!re2c
("0"* “.” “e”? “L”?) |
("0"+ “.” “e”? “L”?)
{ RET; }
“\n”
{
if(cursor == s->eof) RET;
s->pos = cursor; s->line++;
goto std;
}
any
{
RET;
}
*/
}
-——————————————————————————
Original comment by: *anonymous
patch against
$Id: re2c.1.in,v 1.25 2005/11/11 07:39:53 helly Exp $
Notes:
a) the discussion under (2.) in the patch is probably too
long. May need to be cut back a little.
b) I am not sure that I discovered all the needed
changes. The “The -f option inhibits declaration of
yych and yyaccept.” part surpised me. It is
probably worth to document it.
Original comment by: antalk
Example:
/*!re2c
(“a”|“b”)/1 { return KEYWORD; }
(“a”|“b”)/[0-9]\+ { return KEYWORD; }
[0-9]\+ { return NUMBER; }
*/
Scanning over “a77 a1 b8 b1” one would expect the
following:
1: KEYWORD = “a”
2: NUMBER = “77”
3: KEYWORD = “a”
4: NUMBER = “1”
5: KEYWORD = “b”
6: NUMBER = “8”
7: KEYWORD = “b”
8: NUMBER = “1”
Instead re2c scanner returns:
1: KEYWORD = “a77” (!)
2: KEYWORD = “a”
3: NUMBER = “1”
4: KEYWORD = “b8” (!)
5: KEYWORD = “b”
6: NUMBER = “1”
Full test case is attached.
Original comment by: alder
In my opinion, the version should be maintained for
output, too.
Using re2c-0.9.1 (extracted from the windows installer
package which is really nice :-), the code it generates
begins with
/* Generated by re2c 0.5 on Tue May 25 15:15:57 2004
/
This sort of baffled me, because i was using re2c 0.9.1,
not re2c 0.5.
I think the guilty line is
o << "/ Generated by re2c 0.5 on ";
line 134 in parser.y
Might be useful to maintain this for all future releases.
Original comment by: *anonymous
The homepage could use a blurb saying what re2c is, like:
re2c is a great tool for writing fast and flexible
lexers. Unlike
other such tools, re2c concentrates solely on
generating efficient
code for matching regular expressions. Not only does
this singleness
make re2c more suitable for a wider variety of
applications, it
allows us to generate scanners which approach
hand-crafted ones in
terms of size and speed.
(from the debian package)
Original comment by: *anonymous
fix warnings.
Original comment by: nuffer
With the -f flag specified, every /*!re2c */ block starts with the resume
switch statement, and a yyNext: label. However, only the last one is
correct; the other ones (as might be expected) only contain
yyFillLabels up to that point. In addition, the multiple yyNext: labels
are inappropriate.
I took a look at the code, but I couldn’t see an obvious way to fix it
other than generating the yyNext: label only at the first block, and
the switch statement only at the end, perhaps triggered by
something like:
/*!re2c:emit_switch */
I think that’s a bit ugly, so I’m hoping that someone has a better idea.
A couple more comments about the feature:
1) It’s not clear to me why -f should suppress the declaration of
yyaccept and yych. I understand that the submitter of the patch had
a different idea of how these should be declared, but that may not
be general.
2) I’d really like to be able to hook into the yyFillLabel mechanism
myself. Not everything falls nicely into re2c tokenising (the case I’m
looking at here involves base-64 decoding.) I’d be quite content with
the following:
/*!fill:3 */ ==>
YYSETSTATE;
if (YYLIMIT – YYCURSOR < 3) YYFILL;
yyFillLabel<something>:
This could be achieved by adding the following snippet to
scanner.re (where the max:re2c patch went, although I think re2c:
match would have been a bit better)
“/*!fill:” [ \t]* [0-9]\+ {
int n = atoi(tok + 8);
fill_index(out, n);
tok = pos = cursor;
ignore_eoc = true;
goto echo;
}
and modifying need() in code.cc:
static void need(std::ostream &o, uint n, bool & readCh)
{
fill_index(o, n);
o << “\tyych = *YYCURSOR;\n”;
readCh = false;
++oline;
}
with the rest of it going into fill_index:
void fill_index(std::ostream &o, uint n)
{
uint fillIndex;
bool hasFillIndex = (0<=vFillIndexes);
if ( hasFillIndex == true )
{
fillIndex = vFillIndexes++;
o << “\tYYSETSTATE(” << fillIndex << “);\n”;
++oline;
}
if (n == 1)
{
o << “\tif(YYLIMIT == YYCURSOR) YYFILL;\n”;
+oline;
}
else
{
o << "\tif((YYLIMIT – YYCURSOR) < " << n << “) YYFILL(” <<
n << “);\n”;
+oline;
}
if ( hasFillIndex == true )
{
o << “yyFillLabel” << fillIndex << “:\n”;
++oline;
}
}
I haven’t actually tried that yet, and I’m probably missing some
details about #line numbering.
3) I don’t understand the point of YYGETSTATE() instead of just
YYGETSTATE. I guess tastes differ. But it could not be implemented
as a function unless you used a global variable, and that seems
unlikely in the context of control inversion.
4) On the other hand, while we’re on the subject, it would be really
handy for me to have re2c emit:
++YYCURSOR: SAVECURSOR;
and
RESTORECURSOR;
instead of
YYMARKER = ++YYCURSOR;
and
YYCURSOR = YYMARKER;
Perhaps that’s not useful to anyone else, and I know how to do it. In
the particular buffering environment I’m working inside, the saved
cursor state cannot easily be represented as a pointer. I notice that
re2c is capable of simply doing YYCURSOR -= k; for some constant
k; that wouldn’t work either but I haven’t run into it in the wild yet.
Maybe I’ve just been lucky.
Original comment by: ricilake
On writing a lexer dealing with UTF-8 characters, I
came across a problem with generated if’s. Here is a
code snippet:
if(yych <= ‘\277’){
//blah blah
}
The above comparison always fails (at least on MSVC
2003) since ‘\277’ is negative. I believe it’s
conformant to the C++ spec that the type of a character
literal is ‘char’.
The patch changes code.cc so the above snippet becomes
if(yych <= L’\277’){
//blah blah
}
In fact, it might not be a bad idea to prefix L to
every character literal generated by RE2C.
Original comment by: limethief
When compiling the last example from the re2c manual
generated with the -b option in Visual .NET 2003 ( with
option /TC – compile as C code) , I get several errors :
simple.c(222) : error C2065: ‘yybm’ : undeclared identifier
simple.c(222) : error C2109: subscript requires array
or pointer type
simple.c(248) : error C2109: subscript requires array
or pointer type
simple.c(440) : error C2109: subscript requires array
or pointer type
simple.c(570) : error C2109: subscript requires array
or pointer type
simple.c(586) : error C2109: subscript requires array
or pointer type
simple.c(638) : error C2109: subscript requires array
or pointer type
etc…
it seems that this is caused by fact the bit vectors
array is declared outside the scanner’s main block in
the scan function.
Original comment by: noirotm
Hi!
I have just started using re2c on windows and therefore
I needed to make a few changes to create a running
version. As I don’t know how to use CVS I would be glad
if you applied the following changes to the source:
\- Use istream (ifstream or cin) instead of unix
open/read/close
- Rename the .cc files to .cpp (makes compiling on
windows easier, but that’s not neccessary)
I appended “my version” of re2c, but I’m not sure
whether this is the newest version because I didn’t get
it via CVS but by the sourceforge download page.
Thanks a lot,
Jakob
Original comment by: *anonymous
This is arguably redandand as optimizers will remove
these gotos anyway, but the generated code becomes much
more readable (and maybe it’ll help optimizer as well :-).
The attached patch removes gotos that were generated
from the current state to the next one, like in the
following example:
yy37:
++YYCURSOR;
if((yych = *YYCURSOR) == ‘=’) goto yy96;
goto yy38;
yy38:
{ RET; }
Patched re2c generates instead:
yy37:
++YYCURSOR;
if((yych = *YYCURSOR) == ‘=’) goto yy96;
{ RET; }
Original comment by: alder
Latest (as of this writing) rev.1.32 in CVS does not
initialize CharSet completely – not yet allocated char
sets need cardinality set to 0. As the result re2c crashes.
Patch is attached.
Original comment by: alder
If invalid command line options are prefixed with two
dashes (i.e. —) the program aborts.
Example:
re2c —xyz
Original comment by: hattrick
I made a patch to get re2c built with automake. Maybe I’m not the
only one who has been longing for a familiar “configure” build
system in RE2C. Enjoy!
Patching instruction:
$ cd re2c (your CVS working directory)
$ patch -p1 < re2c-automake-patch.diff
$ chmod 755 autogen.sh cvsclean.sh
$ ./autogen.sh
Original comment by: moriyoshi
The way MS implemented readsome makes it useless for
reading files (ifstream::open does not fill in the
buffer, and the first readsome sees that the buffer is
empty). Replacing readsome to read-gcount pair makes it
work with both VC and g++.
Patch is attached.
Original comment by: alder
Build outside the source directory fails.
First, there are changes needed to Makefile.am regarding
scanner.cc and parser.cc, the patchfile is what I used
now, but there may be more changes needed to get a
useful logic here.
Regards,
Gerrit
Original comment by: siebenschlaefer
use std::map instead of list
Note: for small number of symbols and symbol references
it is slightly slower, at about 100/100 becomes faster.
Original comment by: *anonymous
Add case insensitive string literals (we can use single
quotes for them, as single quotes are not used by re2c
currently)
Original comment by: nuffer
Using the —output=output option causes the program to
abort.
Original comment by: hattrick
re2c generates a wrong code that doesn’t initialize
YYMARKER in case the backtrack point is the very first
character of the input. The attached patch fixes this
problem.
Use the following script to reproduce the problem.
-test.re\377];
/*!re2c
ALNUM = [0-9a-zA-Z];
ANY = [\000
“?!” ALNUM* {
}
“?” ALNUM+ {
}
(ANY\“?”)* {
}
*/
-test.re-
Before patching (original):
$ re2c test.re | grep “YYMARKER”
YYCURSOR = YYMARKER;
After patching:
$ re2c test.re | grep “YYMARKER”
YYMARKER = YYCURSOR + 1;
YYCURSOR = YYMARKER;
Original comment by: moriyoshi
it looks like the code generated for the following
fragment with ‘-b’ option is incorrect:
/*!re2c
any = [\001-\377];
‘<a’ { RET; }
[<][A-Za-z]\+ { RET; }
[\000] { RET; }
any { goto cont; }
*/
I suspect that the cause of trouble is the code around
``yy7:’’ label:
yy7: +YYCURSOR;
if(yybm[0(yych = *YYCURSOR)] & 128) yych =
*YYCURSOR;
goto yy9;
goto yy8;
here, ``goto yy8:‘’ is unreachable (and ``yych =
*YYCURSOR’’ is performed twice).
The full code generated by re2c is attached.
(the platform is Mac OS X, version 10.3, but I hope it
is not important).
Thanks!
#line 6 “HTML_Lexer.cpp”
{
YYCTYPE yych;
unsigned int yyaccept;
static unsigned char yybm[] = {
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 128, 128, 128, 128, 128, 128, 128,
128, 128, 128, 128, 128, 128, 128, 128,
128, 128, 128, 128, 128, 128, 128, 128,
128, 128, 128, 0, 0, 0, 0, 0,
0, 128, 128, 128, 128, 128, 128, 128,
128, 128, 128, 128, 128, 128, 128, 128,
128, 128, 128, 128, 128, 128, 128, 128,
128, 128, 128, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
};
goto yy0;
yy1: +YYCURSOR;
yy0:
if((YYLIMIT – YYCURSOR) < 3) YYFILL;
yych = *YYCURSOR;
if(yych <= ‘\000’) goto yy4;
if(yych != ‘<’) goto yy6;
goto yy2;
yy2: +YYCURSOR;
if((yych = *YYCURSOR) <= ‘Z’){
if(yych <= ‘@’) goto yy3;
if(yych <= ‘A’) goto yy7;
goto yy9;
} else {
if(yych <= ‘`’) goto yy3;
if(yych <= ‘a’) goto yy7;
if(yych <= ‘z’) goto yy9;
goto yy3;
}
yy3:
#line 63 “HTML_Lexer.re2c”
{ fprintf(stderr,“.”); goto tag; }
#line 66 “HTML_Lexer.cpp”
yy4: +YYCURSOR;
goto yy5;
yy5:
#line 62 “HTML_Lexer.re2c”
{ RET; }
#line 72 “HTML_Lexer.cpp”
yy6: yych = *YYCURSOR;
goto yy3;
yy7: +YYCURSOR;
if(yybm[0+(yych = *YYCURSOR)] & 128) yych =
*YYCURSOR;
goto yy9;
goto yy8;
yy8:
#line 60 “HTML_Lexer.re2c”
{ RET; }
#line 81 “HTML_Lexer.cpp”
yy9: +YYCURSOR;
if(YYLIMIT == YYCURSOR) YYFILL;
yych = *YYCURSOR;
goto yy10;
yy10: if(yybm[0yych] & 128) goto yy9;
goto yy11;
yy11:
#line 61 “HTML_Lexer.re2c”
{ RET; }
#line 92 “HTML_Lexer.cpp”
}
#line 64 “HTML_Lexer.re2c”
}
Original comment by: *anonymous
// tested with win32 build of RE2C 0.9.4
#define YYCTYPE unsigned char
#define YYCURSOR cursor
#define YYLIMIT cursor
#define YYMARKER marker
#define YYFILL
bool DetectBinHex(const char text)
{
YYCTYPE *start = (YYCTYPE *)text;
YYCTYPE *cursor = (YYCTYPE *)text;
YYCTYPE *marker = (YYCTYPE *)text;
next:
YYCTYPE *token = cursor;
/!re2c
‘(This file must be converted with BinHex 4.0)’
{ if (token == start || *(token – 1) == ‘\n’)
return true; else goto next; }
[\001-\377]
{ goto next; }
[\000]
{ return false; }
*/
return false;
}
Original comment by: ssvb
> It seems you analyzed this in more detail, could you
perhaps
> create a patch also?
This patch appears to work. Only ran on tests coming
with re2c-0.9.10, but
$ diff cnokw.c cnokw.temp
136c136
< if((YYLIMIT – YYCURSOR) < 4) YYFILL;
> if((YYLIMIT – YYCURSOR) < 5) YYFILL;
suggests it is OK. (Other tests show no difference).
Original comment by: antalk
If the last line in the .re file does not terminate with \n,
it is not copied to the .c file.
Original comment by: *anonymous
Wanted to enable 16-bit character support as to enable
construction of unicode scanners.
We didn’t find a simple way to support the EBCDIC
stuff, so we removed it. Further, for some reason the
use of bit vectors failed in test, so we disabled them as
well.
We added a -w option to enable the Unicode processing
as to keep backward compatibility with 8-bit use (the
stuff mentioned above still disabled though).
Thomas Rask Thomsen
Original comment by: thomas_rask
0.9.1 supports nongreedy * and + closure quantifiers.
This patch allows for perl-style {\d+(,|,\d+)?}
closures. Why describe it myself, a clip from the
perlre man page:
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m
times
Example re2c patthen:
“a” "b"{4,8} “c” { code here; }
As with the _ and * quantifiers, it modified a a
“primary” gammar element.
Original comment by: yourjesus
Input buffer overrun. Scanner, generated by re2c reads
one byte after the end of input buffer.
This bug taken from
http://ufo2000.lxnt.info/pmwiki/index.php/Re2c/Bugs
test.re:
#include <stdio.h>
#include <string.h>
#define YYCTYPE char
#define YYCURSOR cursor
#define YYLIMIT cursor
#define YYMARKER marker
#define YYFILL
int test(const char text)
{
YYCTYPE *cursor = (YYCTYPE *)text;
YYCTYPE *marker = (YYCTYPE *)text;
next:;
YYCTYPE *token = cursor;
/!re2c
[\001-\377] { return 0; }
[\000] { return 1; }
*/
}
int main()
{
char *text = new char1;
strcpy(text, "");
test(text);
}
test.cpp:
/* Generated by re2c 0.5 on Thu Nov 6 13:57:12 2003 */
#line 1 “test.re”
#include <stdio.h>
#include <string.h>
#define YYCTYPE char
#define YYCURSOR cursor
#define YYLIMIT cursor
#define YYMARKER marker
#define YYFILL
int test(const char *text)
{
YYCTYPE *cursor = (YYCTYPE *)text;
YYCTYPE *marker = (YYCTYPE *)text;
next:;
YYCTYPE *token = cursor;
{
YYCTYPE yych;
unsigned int yyaccept;
goto yy0;
yy1: +YYCURSOR;
yy0:
if(YYLIMIT == YYCURSOR) YYFILL;
yych = *YYCURSOR;
if(yych <= ‘\000’) goto yy4;
yy2: yych = *YYCURSOR;
yy3:
#line 17
{ return 0; }
yy4: yych = *+YYCURSOR;
yy5:
#line 18
{ return 1; }
}
#line 19
}
int main()
{
char *text = new char1;
strcpy(text, "");
test(text);
}
valgrind.log:
1781 Memcheck, a.k.a. Valgrind, a memory error
detector for x86-linux.
1781 Copyright © 2002, and GNU GPL’d, by Julian
Seward.
1781 Using valgrind-1.9.6, a program
instrumentation system for x86-linux.
1781 Copyright © 2000-2002, and GNU GPL’d, by
Julian Seward.
1781 Estimated CPU clock rate is 962 MHz
1781 For more details, rerun with: -v
1781
1781 Invalid read of size 1
1781 at 0×80484B0: test(char const*) (in
/home/serge/test/test)
1781 by 0×80484FE: main (in /home/serge/test/test)
1781 by 0×403077A6: __libc_start_main (in
/lib/libc-2.3.2.so)
1781 by 0×80483D0: (within /home/serge/test/test)
1781 Address 0×41116059 is 0 bytes after a block
of size 1 alloc’d
1781 at 0×40161948: __builtin_vec_new (in
/usr/lib/valgrind/valgrind.so)
1781 by 0×40161990: operator new[](unsigned) (in
/usr/lib/valgrind/valgrind.so)
1781 by 0×80484DD: main (in /home/serge/test/test)
1781 by 0×403077A6: __libc_start_main (in
/lib/libc-2.3.2.so)
1781
1781 ERROR SUMMARY: 1 errors from 1 contexts
(suppressed: 0 from 0)
1781 malloc/free: in use at exit: 1 bytes in 1 blocks.
1781 malloc/free: 2 allocs, 1 frees, 3 bytes allocated.
1781 For a detailed leak analysis, rerun with:
—leak-check=yes
1781 For counts of detected errors, rerun with: -v
Original comment by: nuffer
Here is an excerpt of code generated on WIN32 with the
prebuilt 0.9.9 binary:
yy4:
#line 179
“e:\\prg\\vc\\cppparser4\\src\\lib_cpppreprocessor\\lexer.re”
{ RET; }
#line 295
“e:\prg\vc\cppparser4\src\lib_cpppreprocessor\\lexer.cpp”
Notice the on the second #line, the backslash ‘\’
directory separator are not escaped. This cause a
forest of warning.
\ Baptiste.
Original comment by: blep
After just downloading it (v. 0.10.1), untarring, in
the end of ./configure && make I’m receiving:
make all-am
make1: Entering directory
`/var/spool/ecbuild/RPM/BUILD/re2c-0.10.1’
if g++ -DHAVE_CONFIG_H -I. -I. -I. -g -O2 -MT
code.o -MD -MP -MF “.deps/code.Tpo” -c -o code.o code.cc; \
then mv -f “.deps/code.Tpo” “.deps/code.Po”; else rm -f
“.deps/code.Tpo”; exit 1; fi
if g++ -DHAVE_CONFIG_H -I. -I. -I. -g -O2 -MT dfa.o
-MD -MP -MF “.deps/dfa.Tpo” -c -o dfa.o dfa.cc; \
then mv -f “.deps/dfa.Tpo” “.deps/dfa.Po”; else rm -f
“.deps/dfa.Tpo”; exit 1; fi
if g++ -DHAVE_CONFIG_H -I. -I. -I. -g -O2 -MT
main.o -MD -MP -MF “.deps/main.Tpo” -c -o main.o main.cc; \
then mv -f “.deps/main.Tpo” “.deps/main.Po”; else rm -f
“.deps/main.Tpo”; exit 1; fi
if g++ -DHAVE_CONFIG_H -I. -I. -I. -g -O2 -MT
parser.o -MD -MP -MF “.deps/parser.Tpo” -c -o parser.o
parser.cc; \
then mv -f “.deps/parser.Tpo” “.deps/parser.Po”; else
rm -f “.deps/parser.Tpo”; exit 1; fi
parser.y: In function âint yyparse()â:
parser.y:87: error: âmkAltâ was not declared in this scope
parser.y:114: error: âmkAltâ was not declared in this scope
parser.y:138: error: âmkAltâ was not declared in this scope
make1: * [parser.o] Error 1
make1: Leaving directory
`/var/spool/ecbuild/RPM/BUILD/re2c-0.10.1’
make: * [all] Error 2
Original comment by: sergey57
The following code produces an infinte loop.
(YYCURSOR is not incremented)
/!re2c
[^\n\000] ‘\n’ { do_this(); }
[^\n\000]* ‘\000’ { do_that(); }
*/
while this equivalent notation works:
/!re2c
[^\n\000] ‘\n’ { do_this(); }
[^\n\000]\+ ‘\000’ { do_that(); }
‘\000’ { do_that(); }
*/
Original comment by: *anonymous
unistd.h and ctype.h headers missing.
Original comment by: yourjesus
/*!re2c
[a-z] { printf(“Small letter”); }
[\000-\377] { printf( “anyting else” ); }
*/
re2c -e test1.re
[a-z] matches some non-letters.
Cause: range boundaries translated to EBCDIC, then
interpreted as a range.
/*!re2c
. { printf(“Anything but newline”); }
[\000-\377] { printf(“newline”); }
*/
-———————
re2c -e test2.re
Takes 0×0A to be newline, but this is not the case in
EBCDIC.
Cause: mkDot() uses ASCII \n where EBCDIC \n is expected.
/*!re2c
‘abcdefghijklmnopqrstuvwxyz’ { printf(“Small or
capital letter”); }
[\000-\377] { printf( “anyting else” ); }
*/
-———————————-
/*!re2c
‘ABCDEFGHIJKLMNOPQRSTUVWXYZ’ { printf(“Small or
capital letter”); }
[\000-\377] { printf( “anyting else” ); }
*/
-———————————
re2c -e fails to generate case insensitivity.
Cause: (ASCII) toupper and tolower used on
EBCDIC encoded letters.
Original comment by: *anonymous
Just as a reminder here is my version that does:
\- a little clean up
- allow build with newer g++ versions and bison 1.875
(re2c should now build with all current and older g++
and yacc or bison versions)
- fix some warnings
- putting in cvs tags
- added a makerpm script
- added re2c —version
hope to get the changes intergrated :-)
When you apply the patch you either need to do make
clean and remove the now missing files from cvs or do it
manually with these files:
README
parser.cc
parser.tab.h
re2c.1
scanner.cc
version.h
y.tab.h
(the patch turns those into temp files).
The patch is against todays version (2004.01.27).
You may also want to verify your README changes to
the new file README.in (and look out for how the
version info is inserted there).
regards
marcus
Original comment by: helly
The changeset can be found here:
http://www.mgix.com/re2c-0.9.5-salvable-state-unified-patch-v2.bz2
Original comment by: *anonymous
/*!re2c
[a-z] { printf(“Small letter”); }
[\000-\377] { printf( “anyting else” ); }
*/
re2c -e test1.re
[a-z] matches some non-letters.
Cause: range boundaries translated to EBCDIC, then
interpreted as a range.
/*!re2c
. { printf(“Anything but newline”); }
[\000-\377] { printf(“newline”); }
*/
-———————
re2c -e test2.re
Takes 0×0A to be newline, but this is not the case in
EBCDIC.
Cause: mkDot() uses ASCII \n where EBCDIC \n is expected.
/*!re2c
‘abcdefghijklmnopqrstuvwxyz’ { printf(“Small or
capital letter”); }
[\000-\377] { printf( “anyting else” ); }
*/
-———————————-
/*!re2c
‘ABCDEFGHIJKLMNOPQRSTUVWXYZ’ { printf(“Small or
capital letter”); }
[\000-\377] { printf( “anyting else” ); }
*/
-———————————
re2c -e fails to generate case insensitivity.
Cause: (ASCII) toupper and tolower used on
EBCDIC encoded letters.
Original comment by: *anonymous
This bug was reported by Matt Sergeant (msergeant at
startechgroup.co.uk):
Anyway take a look at this. It produces the correct
code if run without the -b flag, but incorrect code
(i.e. it doesn’t match) if you run with -b.
(this will probably wrap, unfortunately I don’t have a
way to send it without wrapping right now)
#define NULL ((char*) 0)
#define YYCTYPE char
#define YYCURSOR p
#define YYLIMIT p
#define YYMARKER q
#define YYFILL
#define YYDEBUG debug
#include <stdio.h>
void debug(int state, char curr) {
printf(“State: %d, Curr: %c\n”, state, curr);
}
char scan281(char *p){
char *q;
start:
/!re2c
( “adsl” | “adslppp” | “bdsl” | “cdsl” |
“dslgw4pool” | “dslppp” | “edsl” | “fdsl” | “ldsl” |
“pool” | “pppdsl” | “premiumC” | “vdsl” | “xsttldsl”
)[0-9a-k\-]\+ “.”.+ “.uswest.net” {return
“dsl”;}
( “dhcp” | “dorms” | “rh” ).* “-”[0-9]\+ “-”[0-9]+
“.”.+ “.resnet.pitt.edu” {return “resnet”;}
[0-9]\+ “-”[0-9]\+ “-”[0-9]\+ “-”[0-9]\+ “.”( “dhcp” |
.* “modem” | “bothell” ).* “.washington.edu”
{return “edu”;}
[\001-\377] { goto start; }
[\000] {return NULL; }
*/
}
int main(int argc, char **argv) {
char *v = “D-128-208-46-51.dhcp4.washington.edu”;
printf(“%s\n”, scan281(v));
return 0;
}
Original comment by: nuffer
This patch adds to re2c the ability to handle trailing
contexts of variable length (fixes a known “bug” :-)
Test case is attached as well.
For “no trailing context” test case a C scanner was
generated. It matches line by line the expected “test/c.c”.
Original comment by: alder
For some time I have been trying to modify re2c to my
needs. Although I did not finish that, for some time I
will not have time to continue. Basically, I
reorganized the sources (more modular and probably
easier to understand), added some comments (mostly for
doxygen) and implemented some changes that I hope
others might find useful, too. Description of changes
is in README.in,
together with some comments on the directories. Please
take a look at it.
Note on compiling: for production #define NDEBUG to
turn off assert()’s. Otherwise just ./autogen.sh;
./configure; make; make check; sudo make install;
should work.
The resulting program is named re3c-re2c, it is
compatible to re2c (at least to the extent exercised by
make check), although the code emitted is not
identical. (See also -c flag)
make doxy : runs doxygen (HTML output under doxy/)
Unfortunately cannot upload everything:
Error: Uploaded file must be >20 and <256000 bytes.
Removed doc/, test/, examples/ to accomodate.
(Kept test/trailing-var.* and examples/cmmap_re.c)
Original comment by: antalk
The generated code exhibits a (gcc) compiler warning.
unused variable `yyaccept’
Is this a hard thing to get rid of?
thanks,
Jason
Original comment by: *anonymous
Getting the following error while trying to compile
re2c on Fedora Core 5.
parser.y: In function ‘int yyparse()’:
parser.y:87: error: ‘mkAlt’ was not declared in this
scope
parser.y:114: error: ‘mkAlt’ was not declared in this
scope
parser.y:138: error: ‘mkAlt’ was not declared in this
scope
make1: * [parser.o] Error 1
make1: Leaving directory `/tmp/re2c-0.10.1’
make: * [all] Error 2
Original comment by: czachary
re2c-0.9.10 label_list implementation is slow.
Attachment includes a reimplementation (as patch).
Original comment by: *anonymous
Hi,
I’m the Cygwin gcc maintainer and wanted to build and
distribute re2c for Cygwin.
But I cannot compile the provided scanner.cc, getting
this error from g++:
g++ o scanner.o -O2 -Wall -I. -Wno-unused -Wno
parentheses -Wno-deprecated -c scanner.cc
scanner.re: In member function `int Scanner::echo
(std::ostream&)‘:
scanner.re:79: error: invalid conversion from `uchar*’ to
`const char*’
scanner.re:79: error: initializing argument 1 of
`std::basic_ostream<_CharT,
Traits>& std::basic_ostream<CharT, _Traits>::write
(const _CharT*, int)
[with _CharT = char, Traits = std::char_traits<char>]’
scanner.re:75: error: invalid conversion from `uchar*’ to
`const char*’
scanner.re:75: error: initializing argument 1 of
`std::basic_ostream<CharT,
Traits>& std::basic_ostream<CharT, _Traits>::write
(const _CharT*, int)
[with _CharT = char, Traits = std::chartraits<char>]’
make: * [scanner.o] Error 1
Any hints please,
Gerrit
Original comment by: siebenschlaefer
In the automata emitter, the maxDist method is called
iteratively and recursively on a DAC. No effort is
made to cache the computations.
This patch caches the distance computations for better
computational reuse.
On complicated sets of regular expression tokens re2c
compile times of 10+ hours can be dropped to several
seconds.
Original comment by: yourjesus
The following re2c code does not get processed correctly:
/*!re2c
([“] ([\000-\255]\[”])\+ ["]) { RET; }
*/
The generated result includes:
case 0×9F:
case 0xA7:
case 0xA8:
case 0xA9:
case 0xA::
case 0xA;:
case 0xA<:
case 0xA=:
case 0xA>:
case 0xA?:
case 0xA@:
case 0xAA:
case 0xAB:
Needless to say, the compiler isn’t too happy about this.
—Aaron ([email protected])
Original comment by: *anonymous
$ diff u Makefile.am~ Makefile.am- Makefile.am~ 2005-10-27 15:18:12.325708800 0200
-
++ Makefile.am 2005-10-27 15:18:23.992484800 +0200
@ -12,7 +12,7 @
#CXXFLAGS = -O2 -Wall -I. -Wno-unused
-Wno-parentheses -Wno-deprecated
YFLAGS = -d
-RE2C = re2c
+RE2C = re2c$(EXEEXT)
RE2CFLAGS = -s
CLEANFILES = parser.cc y.tab.c y.tab.h parser.cc
re2c.1 .version
Original comment by: siebenschlaefer
It actually crashes if this form is ever used.
I attached a patch and a test case for this bug.
With this branch working it might be possible to
express one of the “factor” alternatives in “parser.y” as:
| primary close
{
switch($2){
case ‘*’:
$$ = new CloseVOp($1, 0, -1);
break;
case ‘+’:
$$ = new CloseVOp($1, 1, -1);
break;
case ‘?’:
$$ = new CloseVOp($1, 0, 1);
break;
}
}
IMHO, but might be easier to read.
Original comment by: alder
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.