orangeduck / mpc Goto Github PK

View Code? Open in Web Editor NEW

2.6K 2.6K 289.0 721 KB

A Parser Combinator library for C

License: Other

Makefile 1.64% C 98.36%

mpc's People

Stargazers

Watchers

Forkers

vishalsodani kennym atanasbozhkov zachwick fizzixnerd tushard ssherar shengf tortipesto krish92 piatra nvdnkpr ardeshir morefreeze sunmkim g15ecb jokpok cyendra myvyang mbohun david135 kissthink ryefccd dingjiu mikebolt zhouzhenghui rjpearsoniv stuartparry simbasailor xdek42 magnuslim zpzgone mattn grahaminn riccardomarotti elvongray mcanthony davidcatalan zy02636 pranavcode kod3r i5ting claudiouzelac needtolearn g231086 yodamaster jrjsmrtn zhiqiang-li dgeorgievski dunn alpha123 waltercm gmsjy wilenceyao pborel rebcabin sudha247 zarak petermlm liexusong zevenzeng mr-kumar-abhishek chazelton331 cmaughan zcourts shahid-pk luciusmagn forty-bot raylouis zachlungu synival stasyanko daweibalong takeisa janus aecorredor artisdom babooppa6 seusher debhal ifzz liqiang372 ailijic syreal17 mgm702 dexderp surgearrester cloutiy jrkristo cypherfunc lalitkgp17 chrisyer alukang towersofzeyron lohith20 lheta changchang9419 mossy02 bh0 mathuryash5

mpc's Issues

Can you please help me understand the tree structure generated ?

Hi ,

First of all thank you for that an awesome tutorial and the library. I have reached the polish evaluation portion and i did the evaluation with the grammar
expr : |'(' +')’;

Now i want to change the grammar to this.
expr : | +;

But I am getting confused or I am unable to visualise how the AST will look like. I am unable to understand the tree structure generated.

(+ 2 3)(+ 2 3) <- For the first grammar
-> lispy|>
regex
operator|char:1:1 '*'
expr|>
char:1:3 '('
operator|char:1:5 '+'
expr|number|regex:1:7 '2'
expr|number|regex:1:9 '3'
char:1:10 ')'
expr|>
char:1:12 '('
operator|char:1:13 '+'
expr|number|regex:1:15 '2'
expr|number|regex:1:17 '3'
char:1:18 ')'
regex
- 2 3 + 2 3 <- for the second grammar.
lispy|>
regex
operator|char:1:1 '*'
expr|>
operator|char:1:3 '+'
expr|number|regex:1:5 '2'
expr|number|regex:1:7 '3'
expr|>
operator|char:1:9 '+'
expr|number|regex:1:11 '2'
expr|number|regex:1:13 '3'
regex

Hoping that you will help me .

[Question] Regex string for comments

Hi,

I am testing with the mpc library and I have a question about regex lines on mpca_lang. I want to implement comments like in C /* ..... */. I found a possible regex string:

/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/

The site where I found it is this:

http://ostermiller.org/findcomment.html

When I add it to mpca_lang list, compiled and I run it, it shows on every command:

<stdin>: error: Parser Undefined!

There's something to do or to understand about this?

P.S Good library, and amazing work

Segfault on mpca_lang instanciation

Hi!

I've been working around the examples in the readme (which by the way are not really up to date), and buildyourownlisp. Updating mpc today lead me to get a segfault on my mpca_lang instanciation.

It seems to be a problem with a NULL pointer manipulation, but don't really have the time to dig in it right now, any clue?

example source

#include <stdio.h>
#include <stdlib.h>

#include "mpc.h"

int                     main(int argc, char** argv) {
        mpc_parser_t    *Expr = mpc_new("expr");
        mpc_parser_t    *Value = mpc_new("value");
        mpc_parser_t    *Maths = mpc_new("maths");
        mpc_result_t    result;
        char*           input = "123";

        mpca_lang(MPCA_LANG_DEFAULT,
        "                                       \
        expression  : <value> ;                 \
        value       : /[0-9]+/ | <expression> ; \
        maths       : /^/ <expression> /$/ ;    \
        ",
        Expr, Value, Maths);

        if (mpc_parse("input", input, Value, &result)) {
                mpc_ast_print(result.output);
                mpc_ast_delete(result.output);
        } else {
                mpc_err_print(result.error);
                mpc_err_delete(result.error);
        }

        mpc_cleanup(3, Expr, Value, Maths);
}

gdb session

(gdb) b main
Breakpoint 1 at 0x400dfc: file src/main.c, line 7.
(gdb) r
Starting program: /home/oleiade/Dev/Sandbox/C/config/./bin/debug/config 

Breakpoint 1, main (argc=1, argv=0x7fffffffe148) at src/main.c:7
7           mpc_parser_t    *Expr = mpc_new("expr");
(gdb) n
8           mpc_parser_t    *Value = mpc_new("value");
(gdb) 
9           mpc_parser_t    *Maths = mpc_new("maths");
(gdb) 
11          char*           input = "123";
(gdb) 
13          mpca_lang(MPCA_LANG_DEFAULT,
(gdb) 

Program received signal SIGSEGV, Segmentation fault.
0x0000000000409e46 in mpca_grammar_find_parser (x=0x618c10 "expression", st=0x7fffffffdf40) at src/mpc.c:2886
2886          if (p->name && strcmp(p->name, x) == 0) { return p; }

Grammar quirk

I'm having an issue with this grammar:

        " number    : /-?[0-9]+(\\.[0-9]*)?/ ;                                                                      \n"
        " character : /'.' | \".\"/ ;                                                                               \n"
        " string    : /\"(\\\\.|[^\"])*\"/ ;                                                                        \n"
        " boolean   : /\"true\" | \"false\"/ ;                                                                      \n"
        "                                                                                                           \n"
        " factor    : '(' <lexp> ')'                                                                                \n"
        "           | <number>                                                                                      \n"
        "           | <character>                                                                                   \n"
        "           | <string>                                                                                      \n"
        "           | <ident> '(' <lexp>? (',' <lexp>)* ')'                                                         \n"
        "           | <ident> ;                                                                                     \n"
        "                                                                                                           \n"
        " term      : <factor> (('*' | '/' | '%') <factor>)* ;                                                      \n"
        " lexp      : <term> <index>* (('+' | '-') <term> <index>* )* ;                                             \n"
        "                                                                                                           \n"
        " index     : '[' <number> ']' ;                                                                            \n"
        " stmt      : '{' <stmt>* '}'                                                                               \n"
        "           | \"while\" '(' <exp> <index>* ')' <stmt>                                                       \n"
        "           | \"for\" '(' <exp> <index>* ')' <stmt>                                                         \n"
        "           | \"if\"    '(' <exp> ')' <stmt>                                                                \n"
        "           | \"loop\" <stmt>                                                                               \n"
        "           | <ident> '=' <lexp> <index>* ';'                                                               \n"
        "           | \"return\" <lexp>? ';'                                                                        \n"
        "           | <ident> <index>* ';'                                                                          \n"
        "           | <ident> '(' <ident>? (',' <ident>)* ')' <index>* ';';                                         \n"
        "                                                                                                           \n"
        " exp       : <lexp> '>' <lexp>                                                                             \n"
        "           | <lexp> '<' <lexp>                                                                             \n"
        "           | <lexp> \">=\" <lexp>                                                                          \n"
        "           | <lexp> \"<=\" <lexp>                                                                          \n"
        "           | <lexp> \"!=\" <lexp>                                                                          \n"
        "           | <lexp> \"==\" <lexp>                                                                          \n"
        "           | <lexp> \"in\" <lexp> ;                                                                        \n"
        "                                                                                                           \n"
        " typeident : (\"int\" | \"char\" | \"str\" | \"bool\" | \"float\" ) <ident> ;                              \n"
        " decls     : (<typeident> '=' ( <number> | <character> | <string> | <boolean> | <term> ) <index>* ';')* ;  \n"
        " args      : <typeident>? (',' <typeident>)* ;                                                             \n"
        " body      : '{' <decls> <stmt>* '}' ;                                                                     \n"
        " procedure : (\"int\" | \"char\" | \"str\" | \"bool\" | \"float\" ) ':' <ident> '(' <args> ')' <body> ;    \n"
        " use       : (\"use\" <string>)* ;                                                                         \n"
        " xenon     : /^/ <use> <decls> <procedure>* /$/ ;                                                          \n"

The issue is at this part: body: '{' <decls> <stmt>* '}';. The grammar is fine, but, when parsing a file, it complains when I have a decls after a stmt. How do I make it, so that MPC doesn't care about the order that they occur in?

Segfault on parse error

I wrote the following simple test program, test.c:

#include <stdio.h>
#include "mpc/mpc.h"

int main(int argc, char **argv) {
   mpc_result_t result;

   if (argc != 2) {
      fprintf(stderr, "Usage: test filename\n");
      exit(1);
   }

   mpc_parser_t *asdf = mpc_new("asdf");
   mpc_parser_t *jkl = mpc_new("jkl");
   mpc_parser_t *line = mpc_new("line");
   mpc_parser_t *braceLine = mpc_new("braceLine");
   mpc_parser_t *program = mpc_new("program");

   mpca_lang(MPCA_LANG_DEFAULT,
      "asdf : \"asdf\";"
      "jkl : \"jkl;\";"
      "line : <asdf>* <jkl>;"
      "braceLine : <line> | ('{' <program> '}');"
      "program : <braceLine>*;",
      asdf, jkl, line, braceLine, program, NULL);

   if (mpc_parse_contents(argv[1], program, &result)) {
      mpc_ast_print(result.output);
      mpc_ast_delete(result.output);
   }
   else {
      fprintf(stderr, "Error!\n");

      mpc_err_print(result.error);
      mpc_err_delete(result.error);
   }

   mpc_cleanup(5, asdf, jkl, line, braceLine, program);

   return 0;
}

This program runs correctly when the input is valid, and crashes with a segfault when the input is invalid, before reaching the "Error!" fprintf, so the segfault is in mpc_parse_contents. I ran gdb and got the following result:

Program received signal SIGSEGV, Segmentation fault.
0x080502c3 in mpc_ast_print_depth (a=0x0, d=0, fp=0xb374e0) at mpc/mpc.c:2604
2604      if (strlen(a->contents)) {

Let me know if you need any more information.

Parser acting unexpectedly.

Hello, I've been trying to learn to use this library for a personal project so this problem is probably just a result of this learning process, but it's had me scratching my head for a while now.

I modified the maths.c example to the following grammar:

...
mpc_parser_t *New_Expr = mpc_new("new_expr");

mpca_lang(MPCA_LANG_PREDICTIVE,
    " expression : <product> (('+' | '-') <product>)*;                                                 "
    " product : <value>   (('*' | '/')   <value>)*;                                                    "
    " value : /[0-9]+/ | <new_expr> | \"repeat\" | \"rand\" | \"sin\" <new_expr> | \"cos\" <new_expr>; "
    " new_expr : '(' <expression> ')';                                                                 "
    " maths : /^/ <expression> /$/;                                                                    ",
    Expr, Prod, Value, Maths, New_Expr, NULL);
...
mpc_cleanup(5, Expr, Prod, Value, Maths, New_Expr);

That is, I added a few more valid values (repeat, rand, sin <new_expr>, cos <new_expr>) and added a new rule new_expr.

It works correctly for inputs that contain repeat, such as 1 + 2 + repeat, but doesn't work for others such as 1 + 2 + rand. I have no idea why this is happening. The error message (for the second expression) reads:

test.txt:1:10: error: expected "repeat", "rand", "sin", "cos" or end of input at 'a'

Thanks in advance.

Create a compiler

I am reading your book and I would like to know if this parser could be used to build a compiler o this library will be useful only for interpretate a language and not compile it.

set a tag please?

hi orangeduck:

i'm a reader of your Build Your Own Lisp, thank you for this great book first :)

i also a user of clib

when i try to install mpc via clib in a clean c project, everything work fine.

and then i create a package.json according to clib's spec like:

{
  "dependencies": {
    "orangeduck/mpc": "0.8.6"
  }
}

but when i execute clib install with the package.json above, error shows up:

~> clib install
       fetch : orangeduck/mpc:package.json
       error : unable to fetch orangeduck/mpc:package.json

so i dig into this problems, found out clib try to fetch the package.json from

[https://raw.githubusercontent.com/orangeduck/mpc/0.8.6/package.json HTTP/1.1]

check this out: https://github.com/clibs/clib/blob/d05a64a6c40add19d314b2fc639ec58c9a014309/deps/clib-package/clib-package.c#L460

personally, i don't this is a bug of clibs. but the problem can easily fix by set a git tag for mpc

how do you think?

2 warnings when compiling on Windows

When I tried compiling MPC for windows using Visual Studio 2015 Update 2 I got a few warnings regarding the usage of functions like strcpy. I'm not particularly worried about those warnings so I just use a few flags to remove them. But two other warnings still show up.

One of the warning is in this function. The function strtod uses doubles but the return value expects a float. To solve we just use the function strtof.

The other is in this line. There is a size_t being cast to char. This is not incorrect in terms of usage, but the compiler still complains. To remove this warning we can use an explicit cast like so:

range[strlen(range) + 0] = (char) j;

That will work unless there is an error in the function and j should be a char.

First example doesn't work - print_ast is passed a value_t

Looks like r.output isn't an mpc_ast_t_, it is a mpc_val_t_

Not only that, I'm not sure how to convert a val_t to an ast_t

I am missing something obvious or does the very first example not compile?

mpc_result_t r;

if(mpc_parse("input", input, Maths, &r)) {
mpc_ast_print(r.output);
mpc_ast_delete(r.output);
} else {

error: implicit declaration of function ‘strtof’

cheng@ada ~/mpc $ make
gcc -ansi -pedantic -O3 -g -Wall -Werror -Wextra -Wformat=2 -Wshadow -Wno-long-long -Wno-overlength-strings -Wno-format-nonliteral -Wcast-align -Wwrite-strings -Wstrict-prototypes -Wold-style-definition -Wredundant-decls -Wnested-externs -Wmissing-include-dirs -Wswitch-default examples/doge.c mpc.c -lm -o examples/doge
mpc.c: In function ‘mpcf_float’:
mpc.c:2259:3: error: implicit declaration of function ‘strtof’ [-Werror=implicit-function-declaration]
   *y = strtof(x, NULL);
   ^
mpc.c:2259:3: error: nested extern declaration of ‘strtof’ [-Werror=nested-externs]
cc1: all warnings being treated as errors
make: *** [examples/doge] Error 1

After git clone the repo, i got the error and the <stdlib.h> is included. i dont know why

SIGSEGV when creating grammar.

Hi,

I've been expanding lispy (from your book, which is awesome, btw), to include support for doubles. However, when I finished my implementation, I suddenly get SIGSEGV errors when creating the grammar. Is there something I'm doing wrong?

I'm using the latest (master) version on the repo.

Grammar:

#define GRAMMAR "                                          \
                                                           \
      long     : /-?[0-9]+/ ;                              \
      double   : /-?[0-9]+\\.?[0-9]+/;                     \
      symbol   : /[a-zA-Z0-9_+\\-*\\/\\\\=<>!&\\|\\:]+/ ;  \
      string   : /\"(\\\\.|[^\"])*\"/ ;                    \
      comment  : /;[^\\r\\n]*/ ;                           \
      sexpr    : '(' <expr>* ')' ;                         \
      qexpr    : '{' <expr>* '}' ;                         \
      expr     : <number>  | <symbol> | <string>           \
               | <comment> | <sexpr>  | <qexpr> ;          \
      lispy    : /^/ <expr>* /$/ ;                         \
                                                           \
   "

static mpc_parser_t* number_l;
static mpc_parser_t* number_d;
static mpc_parser_t* symbol;
static mpc_parser_t* string;
static mpc_parser_t* comment;
static mpc_parser_t* sexpr;
static mpc_parser_t* qexpr;
static mpc_parser_t* expr;
static mpc_parser_t* lispy;

mpc_parser_t* grammar_create() {
  number_l = mpc_new("long");
  number_d = mpc_new("double");
  symbol = mpc_new("symbol");
  string = mpc_new("string");
  comment = mpc_new("comment");
  sexpr = mpc_new("sexpr");
  qexpr = mpc_new("qexpr");
  expr = mpc_new("expr");
  lispy = mpc_new("lispy");

  mpca_lang(MPCA_LANG_DEFAULT, GRAMMAR,  
    number_l, number_d, symbol, string, comment,
    sexpr, qexpr, expr, lispy);

  return lispy;
}

GDB backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x74538eb4 in strcmp () from C:\WINDOWS\SysWOW64\msvcrt.dll
(gdb) backtrace
#0  0x74538eb4 in strcmp () from C:\WINDOWS\SysWOW64\msvcrt.dll
#1  0x0040de42 in mpca_grammar_find_parser ()
#2  0x0040de74 in mpcaf_grammar_id ()
#3  0x004081f5 in mpc_parse_apply_to ()
#4  0x00408607 in mpc_parse_run ()
#5  0x00408d22 in mpc_parse_run ()
#6  0x00408e6d in mpc_parse_run ()
#7  0x00408a66 in mpc_parse_run ()
#8  0x00408e6d in mpc_parse_run ()
#9  0x00408e6d in mpc_parse_run ()
#10 0x0040891a in mpc_parse_run ()
#11 0x004086d8 in mpc_parse_run ()
#12 0x00408e6d in mpc_parse_run ()
#13 0x00408e6d in mpc_parse_run ()
#14 0x004085db in mpc_parse_run ()
#15 0x00408feb in mpc_parse_input ()
#16 0x0040e8df in mpca_lang_st ()
#17 0x0040ea59 in mpca_lang ()
#18 0x00403e5c in grammar_create ()
#19 0x00401451 in main ()
(gdb)

Infinite loop - why?

I've been trying to write some code to ignore anything inside a scope (including nested brackets)
The grammar looks like this:

ignore_body "Body"  :  '{' <ignore_body>* '}' | /[^{}]*/ ;
func "Function"  :  <func_decl> '{' <ignore_body>* '}' ;

Any ideas why this would loop? I find debugging this hard, because the mpc core code is difficult to unpick....

Parsing is extremely slow

I changed the test program examples/doge.c
if (argc > 1) {
mpc_result_t r;
int i;
for (i = 0; i < 10000; ++i)
if (mpc_parse_contents(argv[1], Doge, &r)) {
/* mpc_ast_print(r.output); */
mpc_ast_delete(r.output);
} else {
mpc_err_print(r.error);
mpc_err_delete(r.error);
}
to parse a file 10000 times.

Running examples/doge.c with an empty input file needs 0.9sec on a 1Ghz x86.
Therefore a single iteration needs ~ 0.09msec for doing nothing (except opening and closing).

Running examples/doge.c with input
so c so c so c so c so c so c so c so c so c
so c so c so c so c so c so c so c so c so c
needs 12.8sec. A single iteration needs ~ 1.3msec.

Running examples/doge.c with 8 input lines of the following form
so c so c so c so c so c so c so c so c so c
needs 54.4sec. A single iteration needs ~ 5.4msec parsing
(4 times the time needed for 2 lines of input).

A file with 2000 lines of input would need more than a second to be parsed.

That is too slow to be used in production.

Get children by tag

Hello.

Suppose I have an AST with tag "Root" and three children ASTs with the following tags:

LeftOperand
Operator
RightOperand

Suppose I want to get the child AST "Operator" while traversing a tree. Currently I do something like:

mpc_ast_t operator = ast->children[OP_IND]; /* OP_IND == 1 */

But if I'm building a big parser and the AST gets very complex, it will be cumbersome to hardcode so many indexes for every node/child combination.

A good way to solve this in the current version is just implement a function that returns the index number given a tag, or returns the pointer to the AST given the tag, like:

int        mpc_ast_get_index(mpc_ast_t *ast, char *tag);
mpc_ast_t *mpc_ast_get_child(mpc_ast_t *ast, char *tag);

As far as I explored, MPC doesn't have such a feature. Would this be something practical to have?

(Notice that this the "char *tag" parameter could be replaced by an integer tag identifier like I've discussed in my previous issue.)

Uninuitive behaviour of stripping parsers?

I'm trying to replace an awful pile of buggy (legacy) regex hacks with a proper parser, and I hit an odd situation. Consider the following minimal example:

    mpc_result_t r;
    mpc_parser_t *comment = mpc_many(mpcf_strfold, mpc_noneof("*#"));
    char *input = strdup("not a #comment");
    if(mpc_parse("input", input, comment, &r)) {
            printf("\"%s\"\n", r.output); // Quoting to display whitespace bounds
    } else { mpc_err_print(r.error); mpc_err_delete(r.error); }

That works, but leaves behind any space between the end of the statement and the beginning of the comment. I thought I could simply remove them with one of the provided parsers, but they don't seem to work as I'd expect. I tried this first:

mpc_parser_t *comment = mpc_stripr(mpc_many(mpcf_strfold, mpc_noneof("*#")));

...and then the same with mpc_strip() and mpc_tok(), but the output continues to be

"not a "

I don't know if this is just a doc bug or a library bug or what, but something certainly feels wrong here. What can we do about it?

Unexpected link errors on quickstart example

I get unexpected link errors.

1>Main.obj : error LNK2019: unresolved external symbol "void __cdecl mpc_err_delete(struct mpc_err_t *)" (?mpc_err_delete@@YAXPAUmpc_err_t@@@Z) referenced in function _main
1>Main.obj : error LNK2019: unresolved external symbol "void __cdecl mpc_err_print(struct mpc_err_t *)" (?mpc_err_print@@YAXPAUmpc_err_t@@@Z) referenced in function _main
1>Main.obj : error LNK2019: unresolved external symbol "int __cdecl mpc_parse(char const *,char const *,struct mpc_parser_t *,union mpc_result_t *)" (?mpc_parse@@YAHPBD0PAUmpc_parser_t@@PATmpc_result_t@@@Z) referenced in function _main
1>Main.obj : error LNK2019: unresolved external symbol "struct mpc_parser_t * __cdecl mpc_new(char const *)" (?mpc_new@@YAPAUmpc_parser_t@@PBD@Z) referenced in function _main
1>Main.obj : error LNK2019: unresolved external symbol "void __cdecl mpc_cleanup(int,...)" (?mpc_cleanup@@YAXHZZ) referenced in function _main
1>Main.obj : error LNK2019: unresolved external symbol "void __cdecl mpc_ast_delete(struct mpc_ast_t *)" (?mpc_ast_delete@@YAXPAUmpc_ast_t@@@Z) referenced in function _main
1>Main.obj : error LNK2019: unresolved external symbol "void __cdecl mpc_ast_print(struct mpc_ast_t *)" (?mpc_ast_print@@YAXPAUmpc_ast_t@@@Z) referenced in function _main
1>Main.obj : error LNK2019: unresolved external symbol "struct mpc_err_t * __cdecl mpca_lang(int,char const *,...)" (?mpca_lang@@YAPAUmpc_err_t@@HPBDZZ) referenced in function _main
1>D:\Code\C++\Aueb\TonyCC\TonyCC\..\bin\Win32\Debug\TonyCC.exe : fatal error LNK1120: 8 unresolved externals

I checked and every function I am trying to use is in its right place in both mpc.h and mpc.c files.
Anyone has any idea what's going on?

bug in error reporting

Hi,
instead of detailed error message, generic "Unknown Error" is displayed in some cases. Bug appeared in commit
227dd44.

To be specific:
i'm working on code from chapter 6 build your own lisp.

example:

lispy>xxx
<stdinput>: error: Unknown Error

this works:

lispy> + 1 x
<stdinput>:1:5: error: expected '-', one or more of one of '0123456789', '(' or end of input at 'x'

Maybe there is a BUG when mpca_lang() deals with '\\'.

I have a grammar in C-style string:

              identifier : /[a-zA-Z0-9_]+/ ;                      
              lambda : '\\' <identifier> '.' <expr> ;             
              application : <lambda> ' ' <expr> ;                 
              expr : <lambda> | <application> | <identifier> ;   
              lizp : /^/ <expr>* /$/ ;

I want to parse codes like:

\x.x y
\x.x \y.y z
...

I use this grammar in mpca_lang() with flag MPCA_LANG_WHITESPACE_SENSITIVE and generate a binary , the binary will report "Parser Undefined" error.
I think it's caused by the '\\' in the second grammar, so I replace the '\\' by another notation.Then it work.
Is this a BUG ? Or how should I do with it ?

endless loop in mpc_parse

In my attempt to extend Lispy for a double type, I created the following grammar (full source code):

        mpc_parser_t* Integer  = mpc_new("integer");
        mpc_parser_t* Double   = mpc_new("double");
        mpc_parser_t* Symbol   = mpc_new("symbol");
        mpc_parser_t* Sexpr    = mpc_new("sexpr");
        mpc_parser_t* Expr     = mpc_new("expr");
        mpc_parser_t* Lispy    = mpc_new("lispy");

  const char *grammar = 
    "                                                                                         \n\
      double   : /-?[0-9]+.[0-9]+/ | ;                                                        \n\
      integer  : /-?[0-9]+/ | ;                                                               \n\
      symbol   : '+' | '-' | '*' | '/' | '%' | '^' | \"min\" | \"max\" | \"inc\" | \"dec\" ;  \n\
      sexpr    : '(' <expr>* ')' ;                                                            \n\
      expr     : <double> | <integer> | <symbol> | <sexpr> ;                                  \n\
      lispy    : /^/ <expr>* /$/ ;                                                            \n\
    \n";

  printf("Grammar: %s\n", grammar);

  mpca_lang(MPC_LANG_DEFAULT, grammar,
      Double, Integer, Symbol, Sexpr, Expr, Lispy);

Unfortunately, this gets my program stuck in mpc_parse(); (despite running on Linux;).

Unable to use a dot in a regex.

Hello,
I am trying to implement a double type in my Build Your Own Lisp language. I am using this regex to parse them: /-?[0-9]+\\.[0-9]+/. When I try to use a double in my REPL, I get a parser error. I am very confused, because I have the dot escaped properly.

The exact error is <stdin>:1:10: error: expected one of '0123456789', '-', one or more of one of '0123456789', one or more of one of 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV WXYZ0123456789_+-^%*/\=<>!&', '"', '#', '(', '[' or end of input at '.'

Why am I getting this error? Is this a bug?

Clang address sanitizer detects buffer overflow when mpc_err_print is used

Hello,

While playing with the code from your book Build Your Own Lisp (which is great, by the way!), using Clang's address sanitizer, I noticed that a buffer overflow was detected upon use of mpc_err_print. Consider the following MWE:

#include "mpc.h"

int main(int argc, char** argv) {
    mpc_parser_t* Foobar = mpc_new("foobar");
    mpca_lang(MPCA_LANG_DEFAULT, "foobar : \"foo\" | \"bar\";", Foobar);
    mpc_result_t r;
    if (mpc_parse("<stdin>", argv[1], Foobar, &r)) {
        mpc_ast_print(r.output);
        mpc_ast_delete(r.output);
    } else {
        mpc_err_print(r.error);
        mpc_err_delete(r.error);
    }
    mpc_cleanup(1, Foobar);
    return 0;
}

when this is compiled (on Ubuntu 14.04 with clang-3.5.1) with

clang -fsanitize=address -std=c99 -Wall test.c mpc.c -lm -o test

I get the following results:

$ ./test foo
string:1:1 'foo'

$ ./test baz
=================================================================
==21438==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000001371ca3 at pc 0x0000004462b0 bp 0x7fffe8244c10 sp 0x7fffe82443d0
READ of size 4 at 0x000001371ca3 thread T0
    #0 0x4462af (/home/<snip>/test+0x4462af)
    #1 0x4471ee (/home/<snip>/test+0x4471ee)
    #2 0x4bcb0c (/home/<snip>/test+0x4bcb0c)
    #3 0x4bc2dc (/home/<snip>/test+0x4bc2dc)
    #4 0x4bb12e (/home/<snip>/test+0x4bb12e)
    #5 0x4baf84 (/home/<snip>/test+0x4baf84)
    #6 0x4ba7e3 (/home/<snip>/test+0x4ba7e3)
    #7 0x7fb57c3bbec4 (/lib/x86_64-linux-gnu/libc.so.6+0x21ec4)
    #8 0x4ba36c (/home/<snip>/test+0x4ba36c)

0x000001371ca3 is located 0 bytes to the right of global variable 'char_unescape_buffer' defined in 'mpc.c:125:13' (0x1371ca0) of size 3
SUMMARY: AddressSanitizer: global-buffer-overflow ??:0 ??
Shadow bytes around the buggy address:
  0x000080266340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080266350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080266360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080266370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x000080266380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x000080266390: 00 00 00 00[03]f9 f9 f9 f9 f9 f9 f9 00 00 00 00
  0x0000802663a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0000802663b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0000802663c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0000802663d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0000802663e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  ASan internal:           fe
==21438==ABORTING

I think I have narrowed the problem down to the call to mpc_err_char_unescape(x->recieved) on line 181 of mpc.c, but unfortunately I'm just a beginner at C and haven't got any further than that!

Expected tokens optimized out in favor of children

I'm not sure if this is correct behavior or not, but it's causing me some execution issues.

My rule expects <inner_block>, which contains (<statement> | <comment>)*. In the case that there is only 1 child in <inner_block>, the <inner_block> is optimized out in favor of the child, which I do not expect to interpret at the level. Here's the input:

[my_func]
   echo (a b c)

...and the grammar as defined by mpca_lang():

qscript        : /^/ (<comment> | <resource>)* /$/ ;
   comment     : '#' /[^\\n]*/ ;
resource       : '[' (<rtype> <rname>) ']' <inner_block> ;
   rtype       : /[*]*/ ;
   rname       : <qstring> ;

inner_block    : (<comment> | <statement>)* ;
   statement   : <function> '(' (<comment> | <parameter> | <block>)* ')'  <seperator> ;
   function    : <qstring> ;
   parameter   : (<statement> | <literal>) ;
      literal  : (<number> | <qstring>) <seperator> ;
   block       : '{' <inner_block> '}' ;
   seperator   : ',' | \"\" ;

qstring        : (<complexstr> | <simplestr>) <qstring>* ;
   simplestr   : /[a-zA-Z0-9_!@#$%^&\\*_+\\-\\.=\\/<>]+/ ;
   complexstr  : (/\"[^\"]*\"/ | /'[^']*'/) ;

number         : (<float> | <int>) ;
   float       : /[-+]?[0-9]+\\.[0-9]+/ ;
   int         : /[-+]?[0-9]+/ ;

Here are the results from mpc_ast_print():

> 
  regex 
  resource|> 
    char:1:1 '['
    rtype|regex 
    rname|qstring|simplestr|regex:1:2 'my_func'
    char:1:9 ']'
    statement|>     <--------------- shouldn't expect a statement in <resource>!
      function|qstring|simplestr|regex:2:4 'echo'
      char:2:9 '('
      literal|> 
        qstring|> 
          simplestr|regex:2:10 'a'
          qstring|> 
            simplestr|regex:2:12 'b'
            qstring|simplestr|regex:2:14 'c'
        seperator|string 
      char:2:15 ')'
      seperator|string 
  regex

I've noticed that when I comment out a certain optimization at mpc.c +2695:
if (a->children_num == 1) { return a; }

...I get the desired results:

(snip)
    char:1:9 ']'
    inner_block|> 
      statement|> 
        function|qstring|simplestr|regex:2:4 'echo'
(etc)

I've found that changing the <inner_block> to:
inner_block : (<comment> | <statement>)* \"\";

...which forces a token after <inner_block>, which is a nice work-around, but doesn't feel like a solution.

Edit: Otherwise awesome library, btw :)
Edit 2: Formatting

Integer based tags to avoid so many string comparions

Hello.

I've been exploring MPC because I needed it for a project. MPC is quite cool because It's easy to build a parser and work with it, but one thing has been bothering me.

I need to build a parser for a programming language and after that I need to do several operations over it, such as Type Analysis, its actual evaluation, or generation of some other intermediate code from the resulting AST. The thing is, to do this I need to traverse the resulting AST. This means that at each node I need to do string comparisons with the tag field of the AST.

This makes traversing the tree a slow process.

Is there any use case of MPC that allows me to traverse an AST in a faster way in which I can avoid so many string comparisons? Alternatively, do you have any plans to implement any other identification method for nodes other then string tags?

A way to make things faster would have an integer value associated with every tag. The question is, how to integrate that in MPC in a nice way? How about something like this:

#include <mpc.h>

typedef enum {
    tag_int,
    tag_expr_mul,
    tag_expr_add,
    tag_input
} MyTags;

int main() {
    mpc_parser_t *mpc_int = mpc_new("int", tag_int);
    mpc_parser_t *mpc_expr_mul = mpc_new("expr_mul", tag_expr_mult);
    mpc_parser_t *mpc_expr_add = mpc_new("expr_add", tag_expr_add);
    mpc_parser_t *mpc_input = mpc_new("input", tag_input);

    /* ... */

    return 0;
}

This doesn't look as nice as the current implementation, but while traversing the tree we can have comparisons like:

int traverse(mpc_ast_t *ast) {
    if(ast->tag_i == tag_expr_add) {
        return traverse(ast->children[L_OPR]) + traverse(ast->children[R_OPR]);
    }

    /* ... */
}

What are your thoughts on this?

Regards!

(Notice that the example I've provided is just a quick and dirty example I made using MPC to build a simple integer calculator)

'++' in mpc_lang causes segfault

I was working through Build Your Own Lisp I mistakenly put two '+' in the grammar. The program failed with a segfault.

The problem was easy to spot, but a simple error message would be preferred.

Hope to get code location from mpc_ast_t

Hello. I'm writing code and blog post which write small script language using mpc.
I want to implement stacktrace in next blog entry. Is there way to get code location?

parsing decimal numbers

Hello,

I'm trying to parse decimal numbers using this regexp:

mpca_lang(MPCA_LANG_DEFAULT,
"
number : /-?[0-9]+(.[0-9]*)?/ ;
"
this works but I have to specify always a decimal dot.
lispy> list 1.0 2.0 3.0 4.0

regex
expr|symbol|string:1:1 'list'
expr|number|regex:1:6 '1.0'
expr|number|regex:1:10 '2.0'
expr|number|regex:1:14 '3.0'
expr|number|regex:1:18 '4.0'
regex
{1.000000 2.000000 3.000000 4.000000}

but if I try
lispy> list 1 2 3 4

regex
expr|symbol|string:1:1 'list'
expr|number|regex:1:6 '1 2'
expr|number|regex:1:10 '3 4'
regex
{1.000000 3.000000}

so it seems the number is not parsed correctly.
I'm sure it's my fault, but I cannot understand the error in my regexp.

thanks
Fausto

Code crash in mpc_parse

Hello! I am trying to write a parser for a simple scripting language. So I completed my grammar and I tried to run it but I mpc_parse runs for about 25 seconds and then crashes without reason.

The error I am getting is:

The biggest problem is that the error is not always on the same spot. Some times it is on mpc_error or on mpc_delete. I don't know. It's kind of random.

I will give you part of my code and I hope you can tell me if it is my code's problem or if I should start debuggin mpc!!

My code:

int main()
{
    mpc_parser_t* Int = mpc_new("int");
    mpc_parser_t* Char = mpc_new("char");
    mpc_parser_t* String = mpc_new("string");
    mpc_parser_t* Id = mpc_new("id");
    mpc_parser_t* Type = mpc_new("type");
    mpc_parser_t* Formal = mpc_new("formal");
    mpc_parser_t* Header = mpc_new("header");
    mpc_parser_t* FuncDecl = mpc_new("funcdecl");
    mpc_parser_t* VarDef = mpc_new("vardef");
    mpc_parser_t* Expr = mpc_new("expr");
    mpc_parser_t* Call = mpc_new("call");
    mpc_parser_t* Atom = mpc_new("atom");
    mpc_parser_t* Simple = mpc_new("simple");
    mpc_parser_t* SimpleList = mpc_new("simplelist");
    mpc_parser_t* Stmt = mpc_new("stmt");
    mpc_parser_t* FuncDef = mpc_new("funcdef");
    mpc_parser_t* Program = mpc_new("program");

    // Define them with the following Language 
    mpca_lang(MPCA_LANG_DEFAULT,
    "                                                                                        \
    int        : /-?[0-9]+/                                                                ; \
    char       : /'[a-zA-Z0-9!@#$%^&*()\\_+-,.\\/<>?;'|\"`~]'/                             ; \
    string     : /\"(\\\\.|[^\"])*\"/                                                      ; \
    id         : /[a-zA-Z][a-zA-Z0-9_-]*/                                                  ; \
    type       : \"int\" | \"bool\" | \"char\" | <type> '[' ']' | \"list\" '[' <type> ']'  ; \
    formal     : (\"ref\")? <type> <id> (',' <id>)*                                        ; \
    header     : <type>? <id> '(' (<formal> (';' <formal>)*)? ')'                          ; \
    funcdecl   : \"decl\" <header>                                                         ; \
    vardef     : <type> <id> (',' <id>)*                                                   ; \
    expr       : <atom> | <int> | <char> | '(' <expr> ')'                                    \
                 | ('+' | '-') <expr> | <expr> ('+' | '-' | '*' | '/' | \"mod\") <expr>      \
                 | <expr> ('=' | \"<>\" | '<' | '>' | \"<=\" | \">=\") <expr>                \
                 | \"true\" | \"false\" | \"not\" <expr> | <expr> (\"and\" | \"or\") <expr>  \
                 | \"new\" <type> '[' <expr> ']' | \"nil\" | \"nil?\" '(' <expr> ')'         \
                 | <expr> '#' <expr> | \"head\" '(' <expr> ')' | \"tail\" '(' <expr> ')'   ; \
    call       : <id> '(' (<expr> (',' <expr>)*)? ')'                                      ; \
    atom       : <id> | <string> | <atom> '[' <expr> ']' | <call>                          ; \
    simple     : \"skip\" | <atom> \":=\" <expr> | <call>                                  ; \
    simplelist : <simple> (',' <simple>)*                                                  ; \
    stmt       : <simple> | \"exit\" | \"return\" <expr>                                     \
                 | \"if\" <expr> ':' <stmt>+ (\"elif\" <expr> ':' <stmt>+)*                  \
                     (\"else\" ':' <stmt>+)? \"end\"                                         \
                 | \"for\" <simplelist> ';' <expr> ';' <simplelist> ':' <stmt>+ \"end\"    ; \
    funcdef    : \"def\" <header> ':' (<funcdef> | <funcdecl> | <vardef>)* <stmt>+ \"end\" ; \
    program    : /^/ <funcdef> /$/                                                         ; \
    ",
    Int, Char, String, Id, Type, Formal, Header, FuncDecl, VarDef, Expr,
    Call, Atom, Simple, SimpleList, Stmt, FuncDef, Program);

    mpc_result_t r;
    char* input = "def hey () : return 1 end";
    if(mpc_parse("input", input, Program, &r))
    {
        mpc_ast_print((mpc_ast_t*)r.output);
        mpc_ast_delete((mpc_ast_t*)r.output);
    }
    else
    {
        mpc_err_print(r.error);
        mpc_err_delete(r.error);
    }

    PAUSE("Press any key to continue . . .");

    // Undefine and Delete our Parsers 
    mpc_cleanup(17, Int, Char, String, Id, Type, Formal, Header, FuncDecl, VarDef, Expr,
                Call, Atom, Simple, SimpleList, Stmt, FuncDef, Program);
    return 0;
}

The grammar I used in a less obfuscated version (it's hard to read on code):

I am really sorry for the long post. I just thought it would be better if I post my error before spending hours over hours to fix it. Maybe you have seen it again and you can pinpoint the problem immediately.

Thanks for your hard work. Mpc is a great piece of code and very helpful.

So much thanks for this project

Yeah, I was very happy to see this project a few minutes ago, thanks for you work.

Useful code.

Surprising behaviour from mpc_count parser

I think I have stumbled upon a possible violation of POLA here..

This works as expected:

 mpc_parser_t *p = mpc_count(3, mpcf_strfold, mpc_digit(), free);
 int r = mpc_parse("test", "046", p, &mr);

If there's any non-digit input past count chars (3 in my example), the parser also succeeds, as expected:

 mpc_parser_t *p = mpc_count(3, mpcf_strfold, mpc_digit(), free);
 int r = mpc_parse("test", "046aa", p, &mr);

But, and this is the behaviour that surprised me, if there are any digits beyond count, the parser fails, like so:

 mpc_parser_t *p = mpc_count(3, mpcf_strfold, mpc_digit(), free);
 int r = mpc_parse("test", "04632", p, &mr);

$ test:1:6: error: expected 3 of digit at end of input

Is this on purpose? I can get around this issue by creating an mpc_many1 parser and splitting the digits externally, I just want to make sure that I'm not missing something here.

Oh, and thanks for this awesome library!

Infinite recursion on specific grammar rules

First of, thanks for the amazing little library!
I have been working with a friend on a academic project, in which we had to implement a interpreter for a tiny given language. This language has to support compound array and list data types. This means we want to be able to define for example this kind of variables:

list[int][] x (an array of integer lists)

The grammar that we defined at the start was this:

singletype   : "int" | "bool" | "char" ;
array        : <type> "[]"    ;
list         : "list" '[' <type> ']'   ;
type         : <singletype> | <list> | <array> ;

But mpc keeps falling in infinite loop and we have no idea why...
For know we removed the complex array support leaving it with only primitive types and N depth using this:

singletype   : "int" | "bool" | "char" ;
array        : <singletype> ("[]")+    ;
list         : "list" '[' <type> ']'   ;
type         : <list> | <array> | <singletype> ;

Still we cannot find out if the problem appears to be in the library or in our grammar definition.
Any ideas are appreciated

Parser reuse

So this issue is a little out of sorts, I guess. I've tried to modify mpc to parse <base> { <factor> } instead of <base> { <digits> }, as I would like to specify n-separated lists of x by writing x{n} in the grammar.

To begin, I modified mpcaf_grammar_repeat() like so:

static mpc_val_t *mpcaf_grammar_repeat(int n, mpc_val_t **xs) {
  (void) n;
  if (xs[1] == NULL) { return xs[0]; }
  if (strcmp(xs[1], "*") == 0) { free(xs[1]); return mpca_many(xs[0]); }
  if (strcmp(xs[1], "+") == 0) { free(xs[1]); return mpca_many1(xs[0]); }
  if (strcmp(xs[1], "?") == 0) { free(xs[1]); return mpca_maybe(xs[0]); }
  if (strcmp(xs[1], "!") == 0) { free(xs[1]); return mpca_not(xs[0]); }
  return mpca_and(2, xs[0], mpca_many(mpca_and(2, xs[1], xs[0])));
}

And modified mpca_lang_st to define Factor as:

mpc_define(Factor, mpc_and(2, mpcaf_grammar_repeat,
    Base,
      mpc_or(6,
        mpc_sym("*"),
        mpc_sym("+"),
        mpc_sym("?"),
        mpc_sym("!"),
        mpc_tok_brackets(Factor, free),
        mpc_pass()),
    mpc_soft_delete
  ));

Sample grammar and the result of mpc_print()'ing the generated parser:

list : "text"{","} ; ->
(<:> ((<S> ("text" whitespace)) ((<S> ("," whitespace)) (<S> ("text" whitespace)))*))

So the above looks good, but my problem is that parsing this new grammar seems to be causing a segfault after a few iterations of mpc_undefine_unretained() (upon accessing the passed in parser):

Looking at the stack trace, it looks like the calls to mpc_undefine_and() match up with the generated parser. The only thing I can think of is that xs[0] in the first snippet is being freed and then accessed again in the later and() parser that combines xs[1] and xs[0]. I'm still not super comfortable with mpc's source so I'm not really sure what to do at this point.

Any ideas?

Regex difficulty with adding decimal support

Hi,
first of all thanks for your wonderful book!
I am having difficulty adding support for doubles and wondered if you could offer any tips?
I've added a parser for doubles and all the rest of the plumbing as per number but can't seem to get past the first stage, defining the reggae for the parser:
mpca_lang(MPCA_LANG_DEFAULT,
"
number : /-?[0-9]+/ ;
doub : /-?[0-9]+.[0-9]+/ ;
symbol : "list" | "head" | "tail"
| "join" | "eval" | "len"
...
I also get the following error message when typing in say '1.0' into my interpreter.
:1:2: error: expected one of '0123456789', whitespace, '-', one or more of one of '0123456789', "list", "head", "tail", "join", "eval", "len", "cons", "init", '+', '*', '/', '(', '{' or end of input at '.'
thanks in advance,
Jude

Regex group support

I have defined a rule like this:

/(?<comment>[<][*](?(?=[<][*])\g<comment>|.)*?[*][>])/

but it doesn't seem to work. I use this rule to find comments in a program like this:

def hello ():

   <* a comment nananana 

    <* a comment nananana 
   BATMAN!!!! *>

   BATMAN!!!! *>

   puts("Hello world!\n")

   <* a comment nananana 
   BATMAN!!!! *>

end

I was wondering. Does your library support advanced regular expressions like that?

Note: I found the regex I used here: https://regex101.com/r/pP0kG4/3 and I modified it to work on my code. An it seems to work on this website. But in my code it doesn't.

Am I doing something wrong or you just do not support this?

Probably bugs

Please check out my gist:
https://gist.github.com/yihuang/0af450e858daf2d99138

Parser hangs on grammar

so, I have a grammar and parser, in this file: https://github.com/ikbenlike/Xenon/blob/master/CXenon/src/parser.c. When I try to parse this file:

float:fib(int n){
    if (n == 0){
        return 0;
    }
    if (n == 1){
        return 1;
    }
    return fib(n - 1) + fib(n - 2);
}

int:main() {
    bool no = false;
    int n = 11;
    int i = 1.1;

    str stuff = "stuff"[1];
    int n = fib(stuff[1])[1];
    stuff[1];
    for(a in b){print(stuff);}
    while (stuff[1] == stuff[1]) {
        n = fib(10);
        stuff();
        print(n);
        i = i + 1;
    }
    loop {
        print(n);
    }
    return 0;
}

the parser hangs. When I run it in valgrind and press ctrl+c while it's running, it says this:

==21138== Process terminating with default action of signal 2 (SIGINT)
==21138==    at 0x4C2DBB0: strlen (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21138==    by 0x40231F: mpc_input_terminated (mpc.c:365)
==21138==    by 0x402847: mpc_input_char (mpc.c:487)
==21138==    by 0x402ABB: mpc_input_string (mpc.c:521)
==21138==    by 0x40475C: mpc_parse_run (mpc.c:1026)
==21138==    by 0x404996: mpc_parse_run (mpc.c:1056)
==21138==    by 0x40526D: mpc_parse_run (mpc.c:1206)
==21138==    by 0x4048B1: mpc_parse_run (mpc.c:1041)
==21138==    by 0x40491B: mpc_parse_run (mpc.c:1048)
==21138==    by 0x40526D: mpc_parse_run (mpc.c:1206)
==21138==    by 0x4050F2: mpc_parse_run (mpc.c:1185)
==21138==    by 0x40526D: mpc_parse_run (mpc.c:1206)

The string passed to the parse function is properly terminated with a \0. This issue did not arise in previous versions.

Basic grammar for Maths expressions

Hello,

The below fails when I have basic Math expressions like: 9 + 8

The below passed. The only difference is the position of "term" and "factor" where I have union operator in the grammar. Could someone explain why this?
mpca_lang(MPCA_LANG_DEFAULT,
"
number : /-?[0-9]+/ ;
operator : '*' | '/' ;
oper : '+' | '-' ;
factor : | '(' ')' ;
term : | ;
expr : | ;
lispy : /^/ /$/ ;
",
Number, Operator, Oper,Factor, Term, Expr, Lispy);

"extern C" for C++ linking

Just adding an extern C in a ifdef macro in the header would be nice. I mean, I cant think of any downsides personally.

I tried using mpc in CPP, and while I should have been aware I wasnt, so I was clueless for half an hour wondering why the references were undefined.

[osx] segfault on certain decimal regexes

Working through http://www.buildyourownlisp.com/chapter6_parsing I found that replacing the decimal-number rule you suggested in #12 (comment) with the following causes a segfault when the REPL is given any input that causes mpc_parse("<stdin>", input, Lispy, &r) to evaluate as true:

diff --git a/repl.c b/repl.c
index 5aa2cf9..b2f26fb 100644
--- a/repl.c
+++ b/repl.c
@@ -12,7 +12,7 @@ int main (int argc, char** argv) {
   mpc_parser_t* Lispy    = mpc_new("lispy");

   mpca_lang(MPCA_LANG_DEFAULT, "                 \
-number  : /-?[0-9]+(\\.?[0-9]*)?/ ;              \
+number  : /-?[0-9]*\\.?[0-9]+/ ;                 \
 operator: '+' | '-' | '*' | '/';                 \
 expr    : <number> | '(' <operator> <expr>+ ')' ;\
 lispy   : /^/ <operator> <expr>+ /$/;            \

(Full code of repl.c: https://gist.github.com/dunn/1176c4ed7b2ba6c5b68e)

Remove the second \ in front of the decimal and that's a valid regular expression in most contexts, so I don't think this is just a silly typo on my part.

OSX: 10.11.1; same result with Clang and GCC:

🌰  clang --version
Apple LLVM version 7.0.0 (clang-700.1.76)
Target: x86_64-apple-darwin15.0.0
Thread model: posix

🌰  gcc-5 --version
gcc-5 (Homebrew gcc 5.2.0) 5.2.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

. is interpreted as anything instead of itself

I've written an expression like so "decimal : /-?[0-9]+.[0-9]+/;", however it will treat 4 4 and 4.0 both as 4.0.

Valgrind Errors

Hi!

Just wanted to say this is I really like the library, but I had two issues with it. There appears to be
a few memory related problems, I solved this in my project which used by mpc with a wrapper
around realloc, and replacing all instances of realloc with it:

static void *reallocate(void p, size_t n)
{
/**@bug something is not allocating enough memory, the +1 fixes the Valgrind issues/
errno = 0;
void *r = realloc(p, n+1);
if(!r)
fatal("reallocate failed: %s", errno ? strerror(errno) : "unknown reason");
return r;
}
As a temporary fix.

This problem is reproducible with mpcs "test" program. I've attached the output of valgrind. Presumably one of the reallocs somewhere is missing a +1 or something.

The second, minor, issue, I get the same build error as in #60, as strtof was introduced in c99, specify the ansi flag means the strtof definition is never defined, which is allowed behavior. The fix would be to specify -std=c99 in the makefile as c99 functions are used.

Output of uname -a:

Linux dhcppc2 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux

This was on commit 37c12b1.

Thanks!

mpc.txt

Can't run the example give in README.md

I tried creating your example. When I try:
mpc_ast_print(r.output);

I get syntax errors.

I use windows compiler though. Am I missing something?

Tree traversal iterator

Like we've started discussing in #48, the idea is to have a simple way to traverse a tree in either pre order or post order.

To start we have the following save point:

typedef struct mpc_ast_trav_t {
    mpc_ast_t             *curr_node;
    struct mpc_ast_trav_t *parent;
    int                    curr_child;
    mpc_ast_trav_order_t   order;
} mpc_ast_trav_t;

The order type is the following enum:

typedef enum {
    mpc_ast_trav_order_pre,
    mpc_ast_trav_order_post
} mpc_ast_trav_order_t;

Using the start function:

mpc_ast_trav_t *mpc_ast_traverse_start(mpc_ast_t *ast,
                                       mpc_ast_trav_order_t order);

We get a saving point which just keeps information about the current location in the tree. To iterate through it we ca use the "next" function:

mpc_ast_t *mpc_ast_traverse_next(mpc_ast_trav_t *trav);

The idea is that we can traverse the whole tree using only the next function. It will be practical in some applications. For example, to convert the whole tree to another specification (Like we've discussed in a previous issue where we wanted to convert from mpc_ast_t to another tree specification)

I've already started implementing these functions, but haven't finished yet.

Note that for now I've only thought about two order, pre and post. Pre order will make "next" function yield nodes "x", "a", "b", in the following tree:

  X
 / \
A   B

While post order would make "next" yield "a", "b", "x". Note also that because the trees are not necessarily binary, we don't get infix, prefix, or postfix.

`malloc`casts & `realloc` checks?

I am using currently using your (awesome) project for a doodle of mine. While browsing through it and giving rewriting a shot(for my understanding of it and to fit into my personal coding style), I saw that you do not cast malloc and the like to the appropriate pointer types, which does not play well with the picky extensions I am normally using. There's a few more occurrences of missing casting(namely when using sizeof). Should I go and create a pull request for this or do you deem casts as too ugly?

Also, you are not checking the return value of realloc. Fair enough, but I rewrote that to at least check it in the mpc_stack_parsers_reserve_* functions, where failing to allocate more memory could happen. Should I create a pull request for that, too?

problems with parsing

So, I'm trying to parse a file. However, I'm having some troubles with my grammar. This is my grammar:

" ident     : /[a-zA-Z_][a-zA-Z0-9_]*/ ;                                                                    \n"
" number    : /-?[0-9]+(\\.[0-9]*)?/ ;                                                                      \n"
" character : /'.' | \".\"/ ;                                                                               \n"
" string    : /\"(\\\\.|[^\"])*\"/ ;                                                                        \n"
" boolean   : /true | false/ ;                                                                              \n"
"                                                                                                           \n"
" print     : /\"print\" (<ident> | <string>)/ ;                                                            \n"
" factor    : '(' <lexp> ')'                                                                                \n"
"           | <number>                                                                                      \n"
"           | <character>                                                                                   \n"
"           | <string>                                                                                      \n"
"           | <ident> '(' <lexp>? (',' <lexp>)* ')'                                                         \n"
"           | <ident> ;                                                                                     \n"
"                                                                                                           \n"
" term      : <factor> (('*' | '/' | '%') <factor>)* ;                                                      \n"
" lexp      : <term> <index>* (('+' | '-') <term> <index>* )* ;                                             \n"
"                                                                                                           \n"
" index     : '[' <number> ']' ;                                                                            \n"
" stmt      : '{' <stmt>* '}'                                                                               \n"
"           | \"while\" '(' <exp> <index>* ')' <stmt>                                                       \n"
"           | \"for\" '(' <exp> <index>* ')' <stmt>                                                         \n"
"           | \"if\"    '(' <exp> ')' <stmt>                                                                \n"
"           | \"loop\" <stmt>                                                                               \n"
"           | <ident> '=' <lexp> <index>* ';'                                                               \n"
"           | \"print\" '(' <lexp>? ')' ';'                                                                 \n"
"           | \"return\" <lexp>? ';'                                                                        \n"
"           | <ident> <index>* ';'                                                                          \n"
"           | <ident> '(' <ident>? (',' <ident>)* ')' <index>* ';';                                         \n"
"                                                                                                           \n"
" exp       : <lexp> '>' <lexp>                                                                             \n"
"           | <lexp> '<' <lexp>                                                                             \n"
"           | <lexp> \">=\" <lexp>                                                                          \n"
"           | <lexp> \"<=\" <lexp>                                                                          \n"
"           | <lexp> \"!=\" <lexp>                                                                          \n"
"           | <lexp> \"==\" <lexp>                                                                          \n"
"           | <lexp> \"in\" <lexp> ;                                                                        \n"
"                                                                                                           \n"
" typeident : (\"int\" | \"char\" | \"str\" | \"bool\" | \"float\" ) ;                              \n"
" procedure : (\"int\" | \"char\" | \"str\" | \"bool\" | \"float\" ) ':' <ident> '(' <args> ')' <body> ;    \n"
" decls     : <typeident> <ident> '=' ( <number> | <character> | <string> | <boolean> | <term> ) <index>* ';' ;  \n"
" args      : <typeident>? (',' <typeident>)* ;                                                             \n"
" body      : '{' <decls> <stmt>* '}' ;                                                                     \n"
" use       : (\"use\" /[a-zA-Z_\\/\\.][a-zA-Z0-9_\\/\\.]*/)* ;                                             \n"
" xenon     : /^/ <use> <decls> <procedure>* /$/ ;                                                          \n"

I'm trying to parse this file:

int:stuff(str string_to_print){
    int a = 1;
    float b = 1.1;
    str c = "stuff";
}

But, it's giving this error:

print.pxe:1:4: error: expected one of 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_' at ':'

I have a version of the parser here but that one has problems where it doesn't make a node the child of another node (while for everything else to work correctly, it should be).

Endless loop in `mpc_parse_input()`

Hi ...
During coding along the buildyourownlisp.com tutorial I stumbled upon an endless loop.
The code is the following:

#include <stdlib.h>
#include <stdio.h>

/* The parser combinator lib */
#include "mpc.h"

#ifdef _WIN32
#include <string.h>

/* Declare a buffer for user input size 2048 */
static char input[2048];

/* Fake readline function */
char * readline(char* prompt)
{
    fputs(prompt, stdout);
    fgets(buffer, 2048, stdin);
    char* cpy = malloc(strlen(buffer) + 1);
    strcpy(cpy, buffer);
    cpy[strlen(cpy) - 1] = '\0';
    return cpy;
}

/* Fake add_history function */
void add_history(char* unused) {}

/* Otherwise include the editline headers */
#else
#include <readline/readline.h>
#include <readline/history.h>
#endif

int main(int argc, char ** argv)
{
    /* Create some parsers */
    mpc_parser_t* Number   = mpc_new("number");
    mpc_parser_t* Operator = mpc_new("operator");
    mpc_parser_t* Expr     = mpc_new("expr");
    mpc_parser_t* Lispy    = mpc_new("lispy");

    /* Define them with the following Language */
    mpca_lang(MPCA_LANG_DEFAULT,
              "                                                    \
                number   : /-?[0-9]?/ ;                            \
                operator : '+' | '-' | '*' | '/' ;                 \
                expr     : <number> | '(' <operator> <expr>+ ')' ; \
                lispy    : /^/ <operator> <expr>+ /$/ ;            \
              ",
              Number, Operator, Expr, Lispy);

    /* Print version and exit information */
    puts("Lispy Version 0.0.0.0.1");
    puts("Press Ctrl+c to Exit.\n");

    while (1)
    {
        char* input = readline("lispy> ");
        add_history(input);

        /* Attempt to parse the user input */
        mpc_result_t r;
        if (mpc_parse("<stdin>", input, Lispy, &r))
        {
            /* On success print the AST */
            mpc_ast_print(r.output);
            mpc_ast_delete(r.output);
        }
        else
        {
            /* Otherwise print the error */
            mpc_err_print(r.error);
            mpc_err_delete(r.error);
        }

        free(input);
    }

    /* Clean up the parser */
    mpc_cleanup(4, Number, Operator, Expr, Lispy);

    return 0;
}

The endless loop happens in mpc.c in the function mpc_parse_input in the while loop on line 887 in the case MPC_TYPE_MANY1. I put a printf into the while() body, printing j and results_slots, and those numbers increase steadily. (Tested under Cygwin and OSX 10.10).

Hang if recursion

I'm writing code using mpc.

#define STRUCTURE \
"                                                                       \n" \
"number    : /-?[0-9]+(\\.[0-9]*)?(e[0-9]+)?/ ;                         \n" \
"factor    : '(' <lexp> ')'                                             \n" \
"          | <number>                                                   \n" \
"          | <string>                                                   \n" \
"          | <array>                                                    \n" \
"          | <hash>                                                     \n" \
"          | <lambda>                                                   \n" \
"          | <call>                                                     \n" \
"          | <item>                                                     \n" \
"          | <ident> ;                                                  \n" \
"string    : /\"[^\"]*\"/ ;                                             \n" \
"array     : '[' <lexp>? (',' <lexp>)* ']' ;                            \n" \
"pair      : <string> ':' <lexp> ;                                      \n" \
"hash      : '{' <pair>? (',' <pair>)* '}' ;                            \n" \
"ident     : /[a-zA-Z][a-zA-Z0-9_]*/ ;                                  \n" \
"                                                                       \n" \
"term      : <factor> (('*' | '/' | '%') <factor>)* ;                   \n" \
"lexp      : <term> (('+' | '-') <term>)* ;                             \n" \
"let_v     : <ident> '=' <lexp> ';' ;                                   \n" \
"item      : <factor> '[' <lexp> ']' ;                                  \n" \
"let_a     : <item> '=' <lexp> ';' ;                                    \n" \
"var       : \"var\" <ident> '=' <lexp> ';' ;                           \n" \
"vararg    : \"...\" ;                                                  \n" \
"stmts     : <stmt>* ;                                                  \n" \
"                                                                       \n" \
"lambda    : \"func\"                                                     " \
"        '(' <ident>? (<vararg> | (',' <ident>)*) ')' '{' <stmts> '}' ; \n" \
"func      : \"func\" <ident>                                             " \
"        '(' <ident>? (<vararg> | (',' <ident>)*) ')' '{' <stmts> '}' ; \n" \
"                                                                       \n" \
"call      : <ident> '(' <lexp>? (',' <lexp>)* ')' ;                    \n" \
"return    : \"return\" <lexp> ';' ;                                    \n" \
"comment   : /#[^\n]*/ ;                                                \n" \
"eof       : /$/ ;                                                      \n" \
"stmt      : (<let_v> | <let_a> | <var> | (<lexp> ';')                    " \
"            | <func> | <return> | <comment>) ;                         \n" \
"program   : <stmts> <eof> ;                                            \n"

It seems <item> make hang.

Ignoring contents of lines that aren't recognized

How would you ignore content that isn't recognized by a rule?
Supposing I have something like this:
string : /"(.|[^"])*"/ ;
lang : /^/ /$/

This should recognize quoted strings, but would fail on anything else. So how to define the language such that it collects or discards anything that isn't a string?

*_t naming convention is reserved for POSIX types

I do not know whether POSIX compliance is an issue to you, but according to them, the typenames ending with _t are reserved for them(you can find a complete list of all reserved names here).

I could make this a pull request, but I figured that it is one substitution command and you might want to change the names completely then(because without the ending _t they sound a bit awkward I guess).

Feel free to close this issue if it is not a problem to you.

Warnings (cyclomatic_complexity > 15 or length > 1000 or parameter_count > 100)

I'm using Lizard plugin (https://github.com/terryyin/lizard) to analyze Cyclomatic Complexity on MPC. I found these after analysis?

NLOC CCN token PARAM length location

 170     62   1790      4     221 mpc_parse_run@[email protected]
  34     19    241      2      46 mpc_undefine_unretained@[email protected]
  60     21    631      1      74 mpc_copy@[email protected]
  21     17    222      1      21 mpc_re_escape_char@[email protected]
  95     30    863      2     113 mpc_print_unretained@[email protected]
  28     17    300      2      37 mpcf_fold_ast@[email protected]
  28     16    379      2      38 mpc_nodecount_unretained@[email protected]
 137     58   2304      2     166 mpc_optimise_unretained@[email protected]

How I can reduce future defect on future? Please advice. TQ

orangeduck / mpc Goto Github PK

mpc's People

Stargazers

Watchers

Forkers

mpc's Issues

Recommend Projects

Recommend Topics

Recommend Org