ksp-kos / tinypg Goto Github PK
View Code? Open in Web Editor NEWThis project forked from sickheadgames/tinypg
Fork of the C# Tiny Parser Generator by Herre Kuijpers.
This project forked from sickheadgames/tinypg
Fork of the C# Tiny Parser Generator by Herre Kuijpers.
This issue in kOS project : KSP-KOS/KOS#2135
seems to imply that TinyPG itself can be edited to improve its regex performance in the scanner.
Example Text:
set ident to 1234 * sqrt(5432.1).[EOF]
^
|
|
Imagine the Scanner's startpos is currently here
because the scanner has already tokenized this much
so far:
set[whitespace skipped]ident[whitespace skipped]
That means the substring of the input file the scanner hasn't consumed yet is this:
to 1234 * sqrt(5432.1).[EOF]
^
And the zeroth position of that subset is where the caret is.
The Scanner currently does this in a for loop, inside LookAhead():
to 1234 * sqrt(5432.1)[EOF]
in the above example).But notice the bold text above. Only matches that start at index 0 count, but the way it implements this is to find the matches at higher indeces, but then it immediately throw them away. This is very inefficient, as discovered by @tsholmes. For example, if the scanner was looking at the above example, the rule to match INTEGER
will find a hit at index 3 on 1234
, but since that's not at index 0, it will be thrown out. The rule to match MULTIPLY
will find a hit on the substring * at index 8
, but since that's not at index 0, it doesn't count and gets thrown out. It will also find a hit for IDENTIFIER
on the substring sqrt
at index 10, but since that's not at index 0, it doesn't count and gets thrown out. etc, etc, etc. The only match that doesn't get thrown away is the one to find the keyword TO
, which is kept because it was at index zero.
If you imagine a large file, this is a lot of matching that just gets thrown away right away.
By inserting an implied caret ("^") into the regex before running Regex.Match(), the Match routine itself can be told not to bother with any matches that don't start at index zero. Then instead of getting the match and immediately throwing it away, it just won't find the match in the first place.
This is for fixing this issue from kOS in tinypg
TinyPG spits out Parser.cs, ParseTree.cs, and Scanner.cs with this stock comment on top:
// Generated by TinyPG v1.3 available at www.codeproject.com
We should make our version of TinyPG spit out a more verbose clear comment that explains to people looking at our project that:
1 - DO NOT EDIT THIS FILE- IT IS AUTOGENERATED BY A PROGRAM CALLED TINYPG.
2 - And where to get TinyPG (our version of it in our github home)
3 - And how to run TinyPG to re-generate these files.
4 - And that the real change is to edit the kRISC.tpg file
This is because multiple times we've gotten PR's from people trying to change the parser by editing these files directly. We could make it more clear what's happening.
We could also perhaps change the folder tree to make them under a folder called "Autogenerated" but that's more for the KOS project not the TinyPG project. But I mention it here for reference.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.