Comments (9)
The relevant examples have been sent to your email, please
from pegtl.
PEG is inherently NON left-recursive, especially in a recursive descent parser, (which PEGTL is).
I've lifted the following from: https://medium.com/@gvanrossum_83706/left-recursive-peg-grammars-65dab3c580e1
This is an example of trying to produce a recursive descent parser in Python based upon left recursion in a PEG grammar.
Consider this hypothetical grammar rule:
expr: expr '+' term | term
If we were to naively translate this grammar fragment into a fragment of a recursive descent parser we’d get something like the following:
def expr():
if expr() and expect('+') and term():
return True
if term():
return True
return False
So expr() starts by calling expr(), which starts by calling expr(), which starts by calling… This can only ends with a stack overflow, expressed as a RecursionError exception.
The traditional remedy is to rewrite the grammar.
and that is what must be done in your case as well. You haven't publicly shared your grammar, so I cannot provide specific guidance, but in general, you don't have to change the underlying data. You just have to describe in in a non left recurring way.
from pegtl.
Limiting depth of recursion is at best a band-aid, rather than the correct approach.
As Colin has said, left recursion is a property of the grammar, not of a language. So, if you think that your input source is "left recursive" by nature, that is not the correct way of looking at things.
Perhaps another example would help, this one from Ada. I'll use a simplified example of an Ada "name"
For this purpose, an Ada "name" is either a direct name, or a selected component, the revlant EBNF grammar from the LRM is as follows:
name ::= direct_name | selected_component
direct_name ::= identifier
selected_component ::= prefix '.' selector_name
selector_name ::= identifier
prefix ::= name
We can see here, several problems if we try to directly and naively translate this into PEGTL as follows:
struct selector_name : identifer { };
struct direct_name : identifier { ];
struct prefix : name {};
struct selected_component : seq< prefix, one< '.' >, selector_name > { };
struct name : sor< direct_name, selected_component > { }
There are two problems with this approach:
-
The definition of struct name will guarantee that a selected component will never be reognized because as the parser sees "foo.bar", it will collect "foo" as a direct_name and stop. To fix this, we must swap the order of the elements of the sor<>, but this leads to the next problem:
-
There is left recursion because "prefix" is just a "name", and so the parser will attempt to parse a name, then it will try a prefix, which will try a name, which will try a prefix... and infinite recursion results.
Fortunately, this grammar can, (as all can), be re-written to remove the left-recursion. After all, anything that can be solved with recursion can be solved with iteration:
struct dot : one< '.' > { };
struct direct_name : seq< identifier, not_at< dot > > { };
struct selected_component : list_must< identifier, dot > { };
struct name : sor< direct_name, selected_component > { };
Now the parser will first try a direct_name by looking for an identifier, which is not followed by a dot. If that fails, it will try a selected_component by looking for any number of identifiers separated by a single dot. There is no more left-recursion.
NOTE: This is likely not the best way of solving this problem, but it is sufficient, I think, to demonstrate one way of eliminating left-recursion in favor of iteration. In this example "foo.bar" will cause the parser to start looking for a direct_name, finding an identifier, and then a dot, which eliminates a direct_name. The parser will back-track to the beginning and attempt to parse a selected_component and finally succeed.
from pegtl.
The PEGTL does not support left-recursion and is unlikely to do so anytime soon, or ever. Can you share your grammar?
from pegtl.
PEG is inherently NON left-recursive, especially in a recursive descent parser, (which PEGTL is).
I've lifted the following from: https://medium.com/@gvanrossum_83706/left-recursive-peg-grammars-65dab3c580e1 This is an example of trying to produce a recursive descent parser in Python based upon left recursion in a PEG grammar.
Consider this hypothetical grammar rule:
expr: expr '+' term | term
If we were to naively translate this grammar fragment into a fragment of a recursive descent parser we’d get something like the following:
def expr(): if expr() and expect('+') and term(): return True if term(): return True return False
So expr() starts by calling expr(), which starts by calling expr(), which starts by calling… This can only ends with a stack overflow, expressed as a RecursionError exception.
The traditional remedy is to rewrite the grammar.
and that is what must be done in your case as well. You haven't publicly shared your grammar, so I cannot provide specific guidance, but in general, you don't have to change the underlying data. You just have to describe in in a non left recurring way.
It is mainly binary parsing. The existence of Left recursion in the target structure is a given fact. The target structure is not generated by itself, but existing data. So how to parse data with Left recursion is a problem worth thinking about.
What you said is correct. It is true that Left recursion causes an infinite loop. It is greedy. It repeatedly enters the same rule without completing a rule match, because the rule that defines the result is on the right.
Alternatively, limit the recursion depth and let it try a certain depth. If it exceeds the upper limit, I will give up. I found the "limit_depth. cpp" provided by pegtl, which seems to be able to limit the depth. However, if I add the limit, it will still cause infinite loop stack overflow
from pegtl.
After I limit the depth of Left recursion, it alleviates this problem to some extent, but it will take a long time.
from pegtl.
@w4454962 Left recursion is a property of a grammar, not of a language. https://en.wikipedia.org/wiki/Left_recursion explains how to eliminate it in general. I'm not quite sure what "left recursion in the target structure" means since your privately shared code example doesn't show the target structure that you want, it's generating a full parse_tree
.
from pegtl.
I understand, for example, like the following code, convert all structures with Left recursion to linear iteration
e : a | '1'
;
a : e 'a'
;
'1a' = (1a)
'1aa' = ((1a)a)
'1aaa' = (((1a)a)a)
'1aaaa' = ((((1a)a)a)a)
==
e <- '1' (a)*
a <- 'a'
'1a' = 1(a)
'1aa' = 1(aa)
'1aaa' = 1(aaa)
'1aaaa' = 1(aaaa)
Even if the data source is generated based on Left recursion, it should be analyzed in a linear way of thinking
I originally wanted to use reverse thinking. Since Left recursion generates data, I would use the generation steps to parse and restore data based on left recursion. But at present, there seems to be no good way to implement this idea. I can only regard it as the completion of linear structure parsing, and then do post-processing to convert the list back to recursive nested object structure
Thank you for your help and tips. Maybe Left recursion is intuitively consistent with human thought, but it may be too expensive for computers to implement
from pegtl.
Thank you for the answers from @ColinH and @skyrich62
from pegtl.
Related Issues (20)
- MSVC: error C2338: static_assert failed: 'internal::dependent_true< T > && ( begin != std::string_view::npos ) HOT 3
- Example grammar proto3 does not accept enum fields starting from zero HOT 2
- data type of input? byte? character? HOT 3
- <ciso646> is removed in C++20 and should not be included HOT 1
- Feature Request: Add defines to exclude headers to improve compile time HOT 4
- parse_tree needs to be optimized HOT 8
- Does "pegtl" support the operation of binary data serialization/deserialization? HOT 2
- Why parsing succeeds? HOT 3
- Why do I have an infinite loop? HOT 4
- parser_tree.cpp example not compiling in VS 2022, as of PEGTL 3.2.6 HOT 4
- Order independence of rules HOT 7
- list_tail<> invokes action for trailing separator twice? HOT 10
- Any consideration of ghc::filesystem HOT 8
- Can't get custom error messages to work. HOT 8
- Backreferences and grammar tracing/analyzing. HOT 5
- vs2022 编译错误 x64-windows-static\include\tao\pegtl\parse.hpp(45,38): error C2062: 意外的类型“unknown-type”
- Issues with change_action_and_state HOT 1
- Inconsistent behaviour of `sor` for custom rules. HOT 6
- standard_trace fails to compile with an action that uses change_action_and_states HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pegtl.