Git Product home page Git Product logo

Comments (8)

nene avatar nene commented on July 19, 2024

Did some more investigating. In my grammar I have rules like these:

select_columns
  = head:column_list_item tail:(__ "," __ column_list_item)* trailing:(__ "," __ empty)? {
      ...
    }

column_list_item
  = expr:star_or_qualified_star kw:(__ EXCEPT __) columns:paren$list$column {
    ...
  }
  / expr:star_or_qualified_star kw:(__ REPLACE __) columns:paren$list$alias$expr {
    ...
  }
  / star_or_qualified_star
  / expr:expr alias:(__ alias)? {
    ...
  }

expr = mysql_assign_expr / or_expr

mysql_assign_expr = ... // ends up referencing same stuff that or_expr references

or_expr = ...

It takes from reportInfiniteRecursion:

  • 4.1 seconds to check the select_columns rule
  • 3.9 seconds to check the column_list_item rule.
  • 3.7 seconds to check the expr rule.

Essentially it comes down to this main expr rule being really big and complex and taking a whole lot of time to fully traverse.

from peggy.

Mingun avatar Mingun commented on July 19, 2024

Yes, it definitely can be optimized. I think that the problem is because it tried to check each rule independently and as result can traverse the same nodes many times.

The best strategy would be to introduce new pass that would calculate some useful properties of each AST node, which includes:

  • lookahead: how many characters this node can lookahead? For example, literal always lookahead by literal.length, and character class by 1 character
  • minMatched: the minimal characters that would consumed if that node matched. For example, for . (dot) it is 1 character and for .? it is 0 characters
  • maxMatched: the maximum characters that would consumed if that node matched. For example, for . (dot) and .? it is 1 character and for .+ it is unfinity characters
  • haveUserCode: is this node have user actions or predicates (which could have side-effects)?

Then to find left recursion we need to find rules where minMatched is 0.

from peggy.

hildjj avatar hildjj commented on July 19, 2024

What we're trying to do is prove that there is NOT a loop, which means we only need to find the first loop of any size. In this case, speed matters more than completeness.

I bet if we took one O(n) pass over the rules to generate an adjacency matrix indexed by rule number, then did a depth-first search on each non-seen rule, we'd be able to prove there were no loops more quickly.

from peggy.

hildjj avatar hildjj commented on July 19, 2024

In the same pass, we could prove that certain rules were not reachable by any of the allowedStartRules, marking them for removal.

from peggy.

hildjj avatar hildjj commented on July 19, 2024

I wonder if #467 will help with this?

from peggy.

nene avatar nene commented on July 19, 2024

I don't think this will help at all. Though I might misunderstand - I didn't test.

The main problem concerns the case where no errors are found. Which in my case is like 99.9999% of the time. Frankly, I wouldn't actually mind if it were slow when errors are present.

I assume that moving this to prepare pass will also not effect the performance in any way.

from peggy.

hildjj avatar hildjj commented on July 19, 2024

Nod. There are a couple of speedups for big grammars that I can add after 4.0, but I expect most of those to have minimal impact to this. Someone will have to help with the approach Mingun suggested above, I'm unlikely to get to that myself.

from peggy.

nene avatar nene commented on July 19, 2024

I've been thinking of tackling this, but each time I look at it, the problem sort of overwhelms me as I can't seem to even fully wrap my head around the current implementation. Mainly me not being familiar with the code base :(

from peggy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.