Git Product home page Git Product logo

Comments (9)

w4454962 avatar w4454962 commented on August 24, 2024 1

The relevant examples have been sent to your email, please

from pegtl.

skyrich62 avatar skyrich62 commented on August 24, 2024 1

PEG is inherently NON left-recursive, especially in a recursive descent parser, (which PEGTL is).

I've lifted the following from: https://medium.com/@gvanrossum_83706/left-recursive-peg-grammars-65dab3c580e1
This is an example of trying to produce a recursive descent parser in Python based upon left recursion in a PEG grammar.

Consider this hypothetical grammar rule:

expr: expr '+' term | term

If we were to naively translate this grammar fragment into a fragment of a recursive descent parser we’d get something like the following:

def expr():
    if expr() and expect('+') and term():
        return True
    if term():
        return True
    return False

So expr() starts by calling expr(), which starts by calling expr(), which starts by calling… This can only ends with a stack overflow, expressed as a RecursionError exception.

The traditional remedy is to rewrite the grammar.

and that is what must be done in your case as well. You haven't publicly shared your grammar, so I cannot provide specific guidance, but in general, you don't have to change the underlying data. You just have to describe in in a non left recurring way.

from pegtl.

skyrich62 avatar skyrich62 commented on August 24, 2024 1

Limiting depth of recursion is at best a band-aid, rather than the correct approach.

As Colin has said, left recursion is a property of the grammar, not of a language. So, if you think that your input source is "left recursive" by nature, that is not the correct way of looking at things.

Perhaps another example would help, this one from Ada. I'll use a simplified example of an Ada "name"

For this purpose, an Ada "name" is either a direct name, or a selected component, the revlant EBNF grammar from the LRM is as follows:

name ::= direct_name | selected_component
direct_name ::= identifier
selected_component ::= prefix '.' selector_name
selector_name ::= identifier
prefix ::= name

We can see here, several problems if we try to directly and naively translate this into PEGTL as follows:

struct selector_name : identifer { };
struct direct_name : identifier { ];
struct prefix : name {};
struct selected_component : seq< prefix, one< '.' >, selector_name > { };
struct name : sor< direct_name, selected_component > { }

There are two problems with this approach:

  1. The definition of struct name will guarantee that a selected component will never be reognized because as the parser sees "foo.bar", it will collect "foo" as a direct_name and stop. To fix this, we must swap the order of the elements of the sor<>, but this leads to the next problem:

  2. There is left recursion because "prefix" is just a "name", and so the parser will attempt to parse a name, then it will try a prefix, which will try a name, which will try a prefix... and infinite recursion results.

Fortunately, this grammar can, (as all can), be re-written to remove the left-recursion. After all, anything that can be solved with recursion can be solved with iteration:

struct dot : one< '.' > { };
struct direct_name : seq< identifier, not_at< dot > > { };
struct selected_component : list_must< identifier, dot > { };
struct name : sor<  direct_name, selected_component > { };

Now the parser will first try a direct_name by looking for an identifier, which is not followed by a dot. If that fails, it will try a selected_component by looking for any number of identifiers separated by a single dot. There is no more left-recursion.

NOTE: This is likely not the best way of solving this problem, but it is sufficient, I think, to demonstrate one way of eliminating left-recursion in favor of iteration. In this example "foo.bar" will cause the parser to start looking for a direct_name, finding an identifier, and then a dot, which eliminates a direct_name. The parser will back-track to the beginning and attempt to parse a selected_component and finally succeed.

from pegtl.

ColinH avatar ColinH commented on August 24, 2024

The PEGTL does not support left-recursion and is unlikely to do so anytime soon, or ever. Can you share your grammar?

from pegtl.

w4454962 avatar w4454962 commented on August 24, 2024

PEG is inherently NON left-recursive, especially in a recursive descent parser, (which PEGTL is).

I've lifted the following from: https://medium.com/@gvanrossum_83706/left-recursive-peg-grammars-65dab3c580e1 This is an example of trying to produce a recursive descent parser in Python based upon left recursion in a PEG grammar.

Consider this hypothetical grammar rule:

expr: expr '+' term | term

If we were to naively translate this grammar fragment into a fragment of a recursive descent parser we’d get something like the following:

def expr():
    if expr() and expect('+') and term():
        return True
    if term():
        return True
    return False

So expr() starts by calling expr(), which starts by calling expr(), which starts by calling… This can only ends with a stack overflow, expressed as a RecursionError exception.

The traditional remedy is to rewrite the grammar.

and that is what must be done in your case as well. You haven't publicly shared your grammar, so I cannot provide specific guidance, but in general, you don't have to change the underlying data. You just have to describe in in a non left recurring way.

It is mainly binary parsing. The existence of Left recursion in the target structure is a given fact. The target structure is not generated by itself, but existing data. So how to parse data with Left recursion is a problem worth thinking about.

What you said is correct. It is true that Left recursion causes an infinite loop. It is greedy. It repeatedly enters the same rule without completing a rule match, because the rule that defines the result is on the right.

Alternatively, limit the recursion depth and let it try a certain depth. If it exceeds the upper limit, I will give up. I found the "limit_depth. cpp" provided by pegtl, which seems to be able to limit the depth. However, if I add the limit, it will still cause infinite loop stack overflow

from pegtl.

w4454962 avatar w4454962 commented on August 24, 2024

After I limit the depth of Left recursion, it alleviates this problem to some extent, but it will take a long time.

from pegtl.

ColinH avatar ColinH commented on August 24, 2024

@w4454962 Left recursion is a property of a grammar, not of a language. https://en.wikipedia.org/wiki/Left_recursion explains how to eliminate it in general. I'm not quite sure what "left recursion in the target structure" means since your privately shared code example doesn't show the target structure that you want, it's generating a full parse_tree.

from pegtl.

w4454962 avatar w4454962 commented on August 24, 2024

I understand, for example, like the following code, convert all structures with Left recursion to linear iteration

e : a | '1'
;

a : e 'a'
;


'1a'    =  (1a) 
'1aa'  =  ((1a)a)
'1aaa' = (((1a)a)a)
'1aaaa' =  ((((1a)a)a)a) 

==

e <- '1' (a)*
a <- 'a'


'1a'    =  1(a)
'1aa'  =  1(aa)
'1aaa' = 1(aaa)
'1aaaa' = 1(aaaa)

Even if the data source is generated based on Left recursion, it should be analyzed in a linear way of thinking

I originally wanted to use reverse thinking. Since Left recursion generates data, I would use the generation steps to parse and restore data based on left recursion. But at present, there seems to be no good way to implement this idea. I can only regard it as the completion of linear structure parsing, and then do post-processing to convert the list back to recursive nested object structure

Thank you for your help and tips. Maybe Left recursion is intuitively consistent with human thought, but it may be too expensive for computers to implement

from pegtl.

w4454962 avatar w4454962 commented on August 24, 2024

Thank you for the answers from @ColinH and @skyrich62

from pegtl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.