Git Product home page Git Product logo

Comments (9)

igordejanovic avatar igordejanovic commented on July 19, 2024

If you want to consume all the input you should match EOF at the end.

program: spec EOF;
spec : definition | spec definition ;

from parglare.

alberth avatar alberth commented on July 19, 2024

For the few simple parsing algorithms I have seen, matching to EOF was something enforced by the algorithm, ie they added your additional rule rather than giving the user this choice.

As such, it's likely useful to explain that EOF needs to be added in the documentation, as users probably won't expect this. I just checked, and EOF is only shown in a few examples in the documentation, but not mentioned in any running text.


Anyway, I tried your suggestion, and the parser now gives a parse error on float:

parglare.exceptions.ParseError: Error at position 4,4 => " real\n    *float is a". Expected: EOF or define or eol

In the debug output, it first correctly reducestypelines : typeline ; as follows:

Current state: 11
	Context:  real\n    *float is a
	Tokens expected: EOF or NAMETK or define or eol
	Token ahead: <NAMETK(float)>
	Reducing by prod '9: typelines = typeline'.
	No action defined  for 'typelines'.
	Unpacking a single subresult.
	Action result = type:<class 'list'> value:[[], 'real', '\n']

In the next state it reduces to typedefinition, which is wrong, imho, since after a typedefinition, you can only have another type definition (which starts with a "define" keyword), you can have an empty line (eol), or you can have EOF. In particular, you cannot have a NAMETK.

Current state: 10
	Context:  real\n    *float is a
	Tokens expected: EOF or NAMETK or define or eol
	Token ahead: <NAMETK(float)>
	Reducing by prod '5: typedefinition = typedefheader typelines'.
	No action defined  for 'typedefinition'.
	Result is a list of subresults.
	Action result = type:<class 'list'> value:[[[[], '\n'], 'define', 'type', '\n'], [[], 'real', '\n']]

So I fail to understand how it reaches this decision based on NAMETK look-ahead.
State 10 is as follows:

State 10:typelines
	5: typedefinition = typedefheader typelines .   {eol, define, EOF, NAMETK}
	10: typelines = typelines . typeline   {eol, define, EOF, NAMETK}
	11: typeline = . empties NAMETK eol   {eol, define, EOF, NAMETK}
	12: typeline = . empties NAMETK is a NAMETK eol   {eol, define, EOF, NAMETK}
	13: empties = .   {eol, NAMETK}
	14: empties = . empties eol   {eol, NAMETK}


	GOTO:
	typeline->15, empties->12

	ACTIONS:
	define->REDUCE:5, eol->[REDUCE:5,REDUCE:13], EOF->REDUCE:5, NAMETK->[REDUCE:5,REDUCE:13]

The right direction would be production 12, probably first doing production 13 here.

from parglare.

igordejanovic avatar igordejanovic commented on July 19, 2024

I added a section on special rules EMPTY and EOF in the docs.

The problem with your parse is that your grammar is not LR(1). You can see that in state 10 there are two possible reductions for lookahead NAMETK and eol. LR parser can't parse that as it must deterministically decide at each step the right operation.

parglare allows for R/R conflict as long as there is only one non-EMPTY reduction and in case of R/R it will prefer non-EMPTY reduction (this should probably be dropped and raised exception on any R/R conflict).

In this case it's not what you want. You must either change your grammar to fit LR(1) constraints or you can just switch to Parser -> GLRParser and let GLR investigate all possibilities.
I suggest to externalize the grammar in separate file and use pglr tool to check the grammar. It will report conflicts and can further visualize automata and GLR parsing trace.

from parglare.

alberth avatar alberth commented on July 19, 2024

Ok, that explains a lot.
As I use the unix philosophy of 'no news is good news' while working in a terminal, I never realized the grammar was broken, since the tool never complained. I don't have enough parsing background to know required properties of parser states for the various algorithms either.

In my experience, standard LALR(1) tools start throwing S/R and R/R conflicts when you move outside parsing limits. Sometimes they can continue by making some sane-ish choices, but I'd always try to stay within the limits, or at least check the cases for not being harmful.
Since LALR(1) is a 'compressed' form of LR(1), I assumed the LR parser would do the same.

As a suggestion, you could throw warnings or errors about conflicts, and provide an option to say 'I know about the conflicts, go ahead anyway'. to suppress known harmless conflicts.

from parglare.

igordejanovic avatar igordejanovic commented on July 19, 2024

Sometimes they can continue by making some sane-ish choices
...
As a suggestion, you could throw warnings or errors about conflicts, and provide an option to say 'I know about the conflicts, go ahead anyway'. to suppress known harmless conflicts.

parglare also uses some configurable choices with sane defaults to resolve conflicts, but these are used in table construction so if LR parser ends up with multiple actions for the same lookahead in any of the states that should be reported as error before parsing even begins.
That was the default behaviour in some of the previous versions but since I've been mainly working on GLR lately didn't notice that, after some reworkings, LR parser started accepting grammars with conflicts without any warning. This is definitely a candidate for test coverage.

Since LALR(1) is a 'compressed' form of LR(1), I assumed the LR parser would do the same.

As a matter of fact, parglare do compress tables using (somewhat changed) LALR algorithm. The change introduced should eliminate potential R/R conflicts known in LALR by not compressing states which leads to R/R conflicts.

from parglare.

alberth avatar alberth commented on July 19, 2024

Please don't feel offended. I know these parser things are a lot more complicated than they look. They would probably be useless if they didn't make a lot of decisions for me.
Unlike you (probably), I see a parser as a tool to parse text, so I can start writing the entire compiler behind it. That is, my goals are different from yours.
My observations are purely about "out of the box" behavior of the tool, after globally reading the documentation (while skipping the parts I think I know or consider irrelevant until later), but at my level of knowledge, I can't see through the text to the underlying system.
In other words, the tool is likely good, but it could be tuned a bit more towards protecting the less experienced user from him/herself.

Your LR compressing system seems like a very clever idea to me.

from parglare.

igordejanovic avatar igordejanovic commented on July 19, 2024

No problem :) LR/GLR parsers are complex beasts so it requires some understanding about the way they work. This is still work in progress. My goal is to make things as smooth as possible for newcomers but that is not easy to do as many things seems obvious from my point of view.

from parglare.

igordejanovic avatar igordejanovic commented on July 19, 2024

You might want to try textX which is the tool with a longer history and is based on top-down PEG parsing which is much easier to understand and reason/about, so should give you much better usage experience and less headaches. But, than again it all depends on your particular use-case.

from parglare.

alberth avatar alberth commented on July 19, 2024

It may help if you think in terms of language concepts only, as that is what the user sees and uses. You get two layers then, one at concepts in the language and how they work together, and one at implementation level, where the concepts must be realized.
I agree the complicated part is hiding the implementation from the language :)

I looked at textX, but it's too powerful for what I need.

from parglare.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.