Comments (8)
Did some more investigating. In my grammar I have rules like these:
select_columns
= head:column_list_item tail:(__ "," __ column_list_item)* trailing:(__ "," __ empty)? {
...
}
column_list_item
= expr:star_or_qualified_star kw:(__ EXCEPT __) columns:paren$list$column {
...
}
/ expr:star_or_qualified_star kw:(__ REPLACE __) columns:paren$list$alias$expr {
...
}
/ star_or_qualified_star
/ expr:expr alias:(__ alias)? {
...
}
expr = mysql_assign_expr / or_expr
mysql_assign_expr = ... // ends up referencing same stuff that or_expr references
or_expr = ...
It takes from reportInfiniteRecursion
:
- 4.1 seconds to check the
select_columns
rule - 3.9 seconds to check the
column_list_item
rule. - 3.7 seconds to check the
expr
rule.
Essentially it comes down to this main expr
rule being really big and complex and taking a whole lot of time to fully traverse.
from peggy.
Yes, it definitely can be optimized. I think that the problem is because it tried to check each rule independently and as result can traverse the same nodes many times.
The best strategy would be to introduce new pass that would calculate some useful properties of each AST node, which includes:
lookahead
: how many characters this node can lookahead? For example, literal always lookahead byliteral.length
, and character class by 1 characterminMatched
: the minimal characters that would consumed if that node matched. For example, for.
(dot) it is 1 character and for.?
it is 0 charactersmaxMatched
: the maximum characters that would consumed if that node matched. For example, for.
(dot) and.?
it is 1 character and for.+
it is unfinity charactershaveUserCode
: is this node have user actions or predicates (which could have side-effects)?
Then to find left recursion we need to find rules where minMatched
is 0.
from peggy.
What we're trying to do is prove that there is NOT a loop, which means we only need to find the first loop of any size. In this case, speed matters more than completeness.
I bet if we took one O(n) pass over the rules to generate an adjacency matrix indexed by rule number, then did a depth-first search on each non-seen rule, we'd be able to prove there were no loops more quickly.
from peggy.
In the same pass, we could prove that certain rules were not reachable by any of the allowedStartRules, marking them for removal.
from peggy.
I wonder if #467 will help with this?
from peggy.
I don't think this will help at all. Though I might misunderstand - I didn't test.
The main problem concerns the case where no errors are found. Which in my case is like 99.9999% of the time. Frankly, I wouldn't actually mind if it were slow when errors are present.
I assume that moving this to prepare
pass will also not effect the performance in any way.
from peggy.
Nod. There are a couple of speedups for big grammars that I can add after 4.0, but I expect most of those to have minimal impact to this. Someone will have to help with the approach Mingun suggested above, I'm unlikely to get to that myself.
from peggy.
I've been thinking of tackling this, but each time I look at it, the problem sort of overwhelms me as I can't seem to even fully wrap my head around the current implementation. Mainly me not being familiar with the code base :(
from peggy.
Related Issues (20)
- [breaking] Remove support for AMD and UMD formats HOT 3
- Allow use of an empty array as default value in allowedStartRules option HOT 1
- 4.0.1 dropped support for Node 18 HOT 3
- Implement soft-mode with access to partial results
- Start and end index of matched rule in the source code. HOT 5
- allow await HOT 1
- Code completion for Peggy grammar HOT 3
- posAssertion doesn't work HOT 1
- Add StartRules to .d.ts
- Allow whitespace between plucked word and its pattern
- Failed to run "peggy" on windows, "-S.exe" is missing, what's this? HOT 3
- Proposal to rename `grammarSource` option in parse method to `source` HOT 1
- Infinite repetition in RFCs HOT 3
- Failed to compile grammar containing imports HOT 3
- Grammar with token "constructor" fails to generate HOT 2
- Web tests fail
- Allow es6 plugins from CLI HOT 2
- Clean up rollup hacks in CLI
- Allow ES6 config files
- non-default startRule doesn't work with multiple allowedStartRules HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from peggy.