axilmar / parserlib Goto Github PK

View Code? Open in Web Editor NEW

91.0 8.0 19.0 9.34 MB

A c++ recursive-descent generic parsing library that supports left recursion.

License: Apache License 2.0

C++ 100.00%

cpp17 parser parser-library

parserlib's People

Contributors

Stargazers

Watchers

Forkers

compilerteaching midwinter1993 neoliang toreinar mingewang yuanmin2015 asmwarrior juancaf9312 cflaviu

parserlib's Issues

A match id is always being a std::string?

Hi, when I read the home page:

const auto grammar = (-terminalSet('+', '-') >> terminalRange('0', '9')) == std::string("int");
std::string input = "123";
ParseContext<> pc(input);
const bool ok = grammar(pc);
for(const auto& match : pc.matches()) {
    if (match.id() == "int") {
	    const auto parsedString = match.content();
    	//process int
    }
}

There is a condintion check:

match.id() == "int"

So, here comes my question: A match id is always be a std::string?

Thanks.

BTW: I'm also interested in develop a similar PEG like parsers in C++, see some of my idea here:

joemalle/limn#10 (comment)

typo in homepage markdown file

providing extra information regarding the source, for example line and oclumn numbers.

It should be "column".

your repo named cap get deleted?

I have a fork of your project named cap:

asmwarrior/cap: Context-aware programming language. Research project.

But It looks like you have delete your original repo?

Add a license

I think this is looking great, but a license would be helpful to those who might want to use your code. :)

Question: How to use only syntax rules

Is it possible to create & use a grammar that will parse a vector of already parsed tokens? It means the lexical parsing of terminals is not necessary. Only syntax rules will be used, something like:

enum class token { aa, bb, cc};    
auto rule = token::aa >> token::bb >> -token::cc;
std::vector<token> tokens{token::aa, token:bb, token:cc};
auto ok = parse(tokens.cbegin(), tokens.cend(), rule);

homepage document improvement suggestion

First, in the last section: https://github.com/axilmar/parserlib#resuming-from-errors

I suggest adding a real input string, so that the "resume from errors" can be more clear.

Second, in the first section: https://github.com/axilmar/parserlib#introduction
Especially in the code:

extern Rule<> add;

const auto val = +terminalRange('0', '9');

const auto num = val
               | '(' >> add >> ')';

Rule<> mul = mul >> '*' >> num
           | mul >> '/' >> num
           | num;

Rule<> add = add >> '+' >> mul
           | add >> '-' >> mul
           | mul;

I think you need to explain why the extern Rule<> add;, maybe adding some link to https://github.com/axilmar/parserlib#non-left-recursion, because in the section, it said //forward declaration of recursive rule in the comments.

BTW: look at the above code, why mul don't need a forward declaration? Is it because the definition and the reference is in the same statement?

skip whitespace for different lexers

I think for some kinds of low level lexer, the white space should not be skipped. But when parsing the high level Token, we should skip the white space. That's why the boost spirit has a skipper object as the parser input.

Can you have such kind of options? Thanks.

How to handle left recursive parsing?

Hi, I see this:

For recursive grammars, parse expressions must be wrapped into a Rule<> instance.

What does this mean?

Can you give a good or wrong example code?

For my understanding, PEG like grammar is just like calling a lot of functions. I know this as I'm one of the developers in this project: joemalle/limn: A tiny parser designed to compile quickly

Thanks.

Wrong processing two continuous terminal

I think the parserlib wrong process whitespaces between identifier and terminal. This is my rules:

#define JAVA_LETTER (range('a', 'z') | range('A', 'Z') | set("$_"))
#define JAVA_DIGIT range('0', '9')
#define JAVA_LETTER_OR_DIGIT (JAVA_LETTER | JAVA_DIGIT)
#define NEWLINE nl(expr("\r\n") | "\n\r" | '\n' | '\r')
#define EVERYTHING_TO(endChar) *(!expr(endChar) >> range(0, 255))

rule whitespace = *(NEWLINE | range(0, 32));
rule identifier = (JAVA_LETTER >> *JAVA_LETTER_OR_DIGIT);
//rule identifier = *JAVA_LETTER;
rule qualifiedName = identifier >> *('.' >> identifier);
rule annotationText = expr('(') >> EVERYTHING_TO(')') >> ')';
rule annotation = expr('@') >> qualifiedName >> annotationText;
rule packageDeclaration = *annotation >> "package" >> qualifiedName >> ';';
rule importDeclaration = expr("import") >> -expr("static") >> qualifiedName >> -expr(".*") >> ';';
rule classOrInterfaceModifier = expr("public") | "protected" | "private" | "static" | "abstract" | "final" | "strictfp";
rule templateBlockText = EVERYTHING_TO(set("<>"));
rule templateBlock = expr('<') >> templateBlockText >> -templateBlock >> '>';
rule classDeclaration = term("class") >> identifier /* -templateBlock >>*/ >> "extends" /* >> identifier >> "implements" >> identifier >> '{' >> '}'*/;
rule typeDeclaration = *classOrInterfaceModifier >> classDeclaration;
rule compilationUnit = /*packageDeclaration >> *importDeclaration >>*/ typeDeclaration;

but I cannot reach separation for "identifier" and "extends" word (rule classDeclaration). When I test the string "protected abstract static class ClassName extends", I obtain an error:

found 1 error:
    line 1, col 51:

This is my test code:

error_list errors;
RootCompilationUnit *root;
string str = "protected abstract static class ClassName extends \n";
input input(str.begin(), str.end());
parse(input, compilationUnit, whitespace, errors, root);

Example calculator does not handle parentheses

Calculator example reports a syntax error on the following inputs:
3*(2+3)
(2+3)
(2)

axilmar / parserlib Goto Github PK

parserlib's People

Contributors

Stargazers

Watchers

Forkers

parserlib's Issues

A match id is always being a std::string?

typo in homepage markdown file

your repo named cap get deleted?

Add a license

Question: How to use only syntax rules

homepage document improvement suggestion

skip whitespace for different lexers

How to handle left recursive parsing?

Wrong processing two continuous terminal

Example calculator does not handle parentheses

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent