Comments (9)
If you want to consume all the input you should match EOF
at the end.
program: spec EOF;
spec : definition | spec definition ;
from parglare.
For the few simple parsing algorithms I have seen, matching to EOF was something enforced by the algorithm, ie they added your additional rule rather than giving the user this choice.
As such, it's likely useful to explain that EOF
needs to be added in the documentation, as users probably won't expect this. I just checked, and EOF
is only shown in a few examples in the documentation, but not mentioned in any running text.
Anyway, I tried your suggestion, and the parser now gives a parse error on float
:
parglare.exceptions.ParseError: Error at position 4,4 => " real\n *float is a". Expected: EOF or define or eol
In the debug output, it first correctly reducestypelines : typeline ;
as follows:
Current state: 11
Context: real\n *float is a
Tokens expected: EOF or NAMETK or define or eol
Token ahead: <NAMETK(float)>
Reducing by prod '9: typelines = typeline'.
No action defined for 'typelines'.
Unpacking a single subresult.
Action result = type:<class 'list'> value:[[], 'real', '\n']
In the next state it reduces to typedefinition
, which is wrong, imho, since after a typedefinition
, you can only have another type definition (which starts with a "define"
keyword), you can have an empty line (eol
), or you can have EOF
. In particular, you cannot have a NAMETK
.
Current state: 10
Context: real\n *float is a
Tokens expected: EOF or NAMETK or define or eol
Token ahead: <NAMETK(float)>
Reducing by prod '5: typedefinition = typedefheader typelines'.
No action defined for 'typedefinition'.
Result is a list of subresults.
Action result = type:<class 'list'> value:[[[[], '\n'], 'define', 'type', '\n'], [[], 'real', '\n']]
So I fail to understand how it reaches this decision based on NAMETK
look-ahead.
State 10 is as follows:
State 10:typelines
5: typedefinition = typedefheader typelines . {eol, define, EOF, NAMETK}
10: typelines = typelines . typeline {eol, define, EOF, NAMETK}
11: typeline = . empties NAMETK eol {eol, define, EOF, NAMETK}
12: typeline = . empties NAMETK is a NAMETK eol {eol, define, EOF, NAMETK}
13: empties = . {eol, NAMETK}
14: empties = . empties eol {eol, NAMETK}
GOTO:
typeline->15, empties->12
ACTIONS:
define->REDUCE:5, eol->[REDUCE:5,REDUCE:13], EOF->REDUCE:5, NAMETK->[REDUCE:5,REDUCE:13]
The right direction would be production 12, probably first doing production 13 here.
from parglare.
I added a section on special rules EMPTY
and EOF
in the docs.
The problem with your parse is that your grammar is not LR(1). You can see that in state 10 there are two possible reductions for lookahead NAMETK
and eol
. LR parser can't parse that as it must deterministically decide at each step the right operation.
parglare allows for R/R conflict as long as there is only one non-EMPTY reduction and in case of R/R it will prefer non-EMPTY reduction (this should probably be dropped and raised exception on any R/R conflict).
In this case it's not what you want. You must either change your grammar to fit LR(1) constraints or you can just switch to Parser
-> GLRParser
and let GLR investigate all possibilities.
I suggest to externalize the grammar in separate file and use pglr
tool to check the grammar. It will report conflicts and can further visualize automata and GLR parsing trace.
from parglare.
Ok, that explains a lot.
As I use the unix philosophy of 'no news is good news' while working in a terminal, I never realized the grammar was broken, since the tool never complained. I don't have enough parsing background to know required properties of parser states for the various algorithms either.
In my experience, standard LALR(1) tools start throwing S/R and R/R conflicts when you move outside parsing limits. Sometimes they can continue by making some sane-ish choices, but I'd always try to stay within the limits, or at least check the cases for not being harmful.
Since LALR(1) is a 'compressed' form of LR(1), I assumed the LR parser would do the same.
As a suggestion, you could throw warnings or errors about conflicts, and provide an option to say 'I know about the conflicts, go ahead anyway'. to suppress known harmless conflicts.
from parglare.
Sometimes they can continue by making some sane-ish choices
...
As a suggestion, you could throw warnings or errors about conflicts, and provide an option to say 'I know about the conflicts, go ahead anyway'. to suppress known harmless conflicts.
parglare also uses some configurable choices with sane defaults to resolve conflicts, but these are used in table construction so if LR parser ends up with multiple actions for the same lookahead in any of the states that should be reported as error before parsing even begins.
That was the default behaviour in some of the previous versions but since I've been mainly working on GLR lately didn't notice that, after some reworkings, LR parser started accepting grammars with conflicts without any warning. This is definitely a candidate for test coverage.
Since LALR(1) is a 'compressed' form of LR(1), I assumed the LR parser would do the same.
As a matter of fact, parglare do compress tables using (somewhat changed) LALR algorithm. The change introduced should eliminate potential R/R conflicts known in LALR by not compressing states which leads to R/R conflicts.
from parglare.
Please don't feel offended. I know these parser things are a lot more complicated than they look. They would probably be useless if they didn't make a lot of decisions for me.
Unlike you (probably), I see a parser as a tool to parse text, so I can start writing the entire compiler behind it. That is, my goals are different from yours.
My observations are purely about "out of the box" behavior of the tool, after globally reading the documentation (while skipping the parts I think I know or consider irrelevant until later), but at my level of knowledge, I can't see through the text to the underlying system.
In other words, the tool is likely good, but it could be tuned a bit more towards protecting the less experienced user from him/herself.
Your LR compressing system seems like a very clever idea to me.
from parglare.
No problem :) LR/GLR parsers are complex beasts so it requires some understanding about the way they work. This is still work in progress. My goal is to make things as smooth as possible for newcomers but that is not easy to do as many things seems obvious from my point of view.
from parglare.
You might want to try textX which is the tool with a longer history and is based on top-down PEG parsing which is much easier to understand and reason/about, so should give you much better usage experience and less headaches. But, than again it all depends on your particular use-case.
from parglare.
It may help if you think in terms of language concepts only, as that is what the user sees and uses. You get two layers then, one at concepts in the language and how they work together, and one at implementation level, where the concepts must be realized.
I agree the complicated part is hiding the implementation from the language :)
I looked at textX, but it's too powerful for what I need.
from parglare.
Related Issues (20)
- A terminal that doesn't match specific strings HOT 2
- QUESTION: confusing 0.12.0 versions ... HOT 1
- ENHANCEMENT: new parameter for call_actions for not reversing HOT 1
- Parser object singleton or caching. Actions as object members HOT 3
- Error for named matches in an imported grammar with default action HOT 2
- Possible memory leak HOT 4
- Question on AST generation and actions with side effects HOT 1
- 0.13.0 is released
- Support for parenthesized groups HOT 1
- 0.14.0 is released
- 0.15.0 is released
- possible regression in regexp handling HOT 2
- Request to optimize recognizers matching HOT 4
- 0.16.0 is released
- `a_opt: a?;` is not equivalent to `a_opt: a | EMPTY;` and causes an error `First set empty for grammar symbol "a_opt". An infinite recursion on the grammar symbol.` when `a` is a sequence. HOT 4
- advice requested on implementing a language "feature" HOT 1
- Lexical disambiguation not working as expected HOT 1
- Iterating through Forest fails in NodeNonTerm solutions property HOT 3
- Overriding terminal indirectly through multiple module imports fails when using separator repetition HOT 3
- Separate imports of the same file result in multiple "instances" HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parglare.