itod / pegkit Goto Github PK

'Parsing Expression Grammar' toolkit for Cocoa/Objective-C

License: MIT License

Objective-C 99.78% C 0.11% HTML 0.07% Rich Text Format 0.04%

pegkit's Introduction

PEGKit

PEGKit is a 'Parsing Expression Grammar' toolkit for iOS and OS X written by Todd Ditchendorf in Objective-C and released under the MIT Open Source License.

Always use the Xcode Workspace PEGKit.xcworkspace, NOT the Xcode Project.

This project includes TDTemplateEngine as a Git Submodule. So proper cloning of this project requires the --recursive argument:

git clone --recursive [email protected]:itod/pegkit.git

PEGKit is heavily influenced by ANTLR by Terence Parr and "Building Parsers with Java" by Steven John Metsker.

The PEGKit Framework offers 2 basic services of general interest to Cocoa developers:

String Tokenization via the Objective-C PKTokenizer and PKToken classes.
Objective-C parser generation via grammars - Generate source code for an Objective-C parser class from simple, intuitive, and powerful BNF-style grammars (similar to yacc or ANTLR). While parsing, the generated parser will provide callbacks to your Objective-C delegate.

The PEGKit source code is available on Github.

A tutorial for using PEGKit in your iOS applications is available on GitHub.

History

PEGKit is a re-write of an earlier framework by the same author called ParseKit. ParseKit should generally be considered deprecated, and PEGKit should probably be used for all future development.

ParseKit produces dynamic, non-deterministic parsers at runtime. The parsers produced by ParseKit exhibit poor (exponential) performance characteristics -- although they have some interesting properties which are useful in very rare circumstances.
PEGKit produces static ObjC source code for deterministic (PEG) memoizing parsers at design time which you can then compile into your project. The parsers produced by PEGKit exhibit good (linear) performance characteristics.

Tokenization

Basic Usage of PKTokenizer

PEGKit provides general-purpose string tokenization services through the PKTokenizer and PKToken classes. Cocoa developers will be familiar with the NSScanner class provided by the Foundation Framework which provides a similar service. However, the PKTokenizer class is much easier to use for many common tokenization tasks, and offers powerful configuration options if the default tokenization behavior doesn't match your needs.

`PKTokenizer`
`+ (id)tokenizerWithString:(NSString )s;` `- (PKToken )nextToken;` `…`

PKTokenizer

+ (id)tokenizerWithString:(NSString *)s;

- (PKToken *)nextToken;
…

To use PKTokenizer, provide it with an NSString object and retrieve a series of PKToken objects as you repeatedly call the -nextToken method. The EOFToken singleton signals the end.

NSString *s = @"2 != -47. /* comment */ Blast-off!! 'Woo-hoo!' //comment";

PKTokenizer *t = [PKTokenizer tokenizerWithString:s];
PKToken *eof = [PKToken EOFToken];
PKToken *tok = nil;

while (eof != (tok = [t nextToken])) {
    NSLog(@"(%@) (%.1f) : %@", tok.stringValue, tok.floatValue, [tok debugDescription]);
}

Outputs:

(2) (2.0) : <Number «2»>

(!=) (0.0) : <Symbol «!=»>

(-47) (-47.0) : <Number «-47»>

(.) (0.0) : <Symbol «.»>

(Blast-off) (0.0) : <Word «Blast-off»>

(!) (0.0) : <Symbol «!»>

(!) (0.0) : <Symbol «!»>

('Woo-hoo!') (0.0) : <Quoted String «'Woo-hoo!'»>

Each PKToken object returned has a stringValue, a floatValue and a tokenType. The tokenType is and enum value type called PKTokenType with possible values of:

PKTokenTypeWord
PKTokenTypeNumber
PKTokenTypeQuotedString
PKTokenTypeSymbol
PKTokenTypeWhitespace
PKTokenTypeComment
PKTokenTypeDelimitedString

PKTokens also have corresponding BOOL properties for convenience (isWord, isNumber, etc.)

`PKToken`
`+ (PKToken )EOFToken;` `@property (readonly) PKTokenType tokenType;` `@property (readonly) CGFloat floatValue;` `@property (readonly, copy) NSString stringValue;` `@property (readonly) BOOL isNumber;` `@property (readonly) BOOL isSymbol;` `@property (readonly) BOOL isWord;` `@property (readonly) BOOL isQuotedString;` `@property (readonly) BOOL isWhitespace;` `@property (readonly) BOOL isComment;` `@property (readonly) BOOL isDelimitedString;` `…`

PKToken

+ (PKToken *)EOFToken;

@property (readonly) PKTokenType tokenType;

@property (readonly) CGFloat floatValue;
@property (readonly, copy) NSString *stringValue;

@property (readonly) BOOL isNumber;
@property (readonly) BOOL isSymbol;
@property (readonly) BOOL isWord;
@property (readonly) BOOL isQuotedString;
@property (readonly) BOOL isWhitespace;
@property (readonly) BOOL isComment;
@property (readonly) BOOL isDelimitedString;

…

Default Behavior of PKTokenizer

The default behavior of PKTokenizer is correct for most common situations and will fit many tokenization needs without additional configuration.

Number

Sequences of digits («2» «42» «1054») are recognized as Number tokens. Floating point numbers containing a dot («3.14») are recognized as single Number tokens as you'd expect (rather than two Number tokens separated by a «.» Symbol token). By default, PKTokenizer will recognize a «-» symbol followed immediately by digits («-47») as a number token with a negative value. However, «+» characters are always seen as the beginning of a Symbol token by default, even when followed immediately by digits, so "explicitly-positive" Number tokens are not recognized by default (this behavior can be configured, see below).

Symbol

Most symbol characters («.» «!») are recognized as single-character Symbol tokens (even when sequential such as «!»``«!»). However, notice that PKTokenizer recognizes common multi-character symbols («!=») as a single Symbol token by default. In fact, PKTokenizer can be configured to recognize any given string as a multi-character symbol. Alternatively, it can be configured to always recognize each symbol character as an individual Symbol token (no multi- character symbols). The default multi-character symbols recognized by PKTokenizer are: «<=», «>=», «!=», «==».

Word

«Blast-off» is recognized as a single Word token despite containing a symbol character («-») that would normally signal the start of a new Symbol token. By default, PKTokenzier allows Word tokens to contain (but not start with) several symbol and number characters: «-», «_», «'», «0»-«9». The consequence of this behavior is that PKTokenizer will recognize the following strings as individual Word tokens by default: «it's», «first_name», «sat-yr-9» «Rodham-Clinton». Again, you can configure PKTokenizer to alter this default behavior.

Quoted String

PKTokenizer produces Quoted String tokens for substrings enclosed in quote delimiter characters. The default delimiters are single- or double-quotes («'» or «"»). The quote delimiter characters may be changed (see below), but must be a single character. Note that the stringValue of Quoted String tokens include the quote delimiter characters («'Woo-hoo!'»).

Whitespace

By default, whitespace characters are silently consumed by PKTokenizer, and Whitespace tokens are never emitted. However, you can configure which characters are considered whitespace characters or even ask PKTokenizer to return Whitespace tokens containing the literal whitespace stringValues by setting: t.whitespaceState.reportsWhitespaceTokens = YES.

Comment

By default, PKTokenizer recognizes C-style («//») and C++-style («/*» «*/») comments and silently removes the associated comments from the output rather than producing Comment tokens. See below for steps to either change comment delimiting markers, report Comment tokens, or to turn off comments recognition altogether.

Delimited String

The Delimited String token type is a powerful feature of PEGKit which can be used much like a regular expression. Use the Delimited String token type to ask PKTokenizer to recognize tokens with arbitrary start and end symbol strings much like a Quoted String but with more power:

The start and end symbols may be multi-char (e.g. «<#» «#>»)
The start and end symbols need not match (e.g. «<?=» «?>»)
The characters allowed within the delimited string may be specified using an NSCharacterSet

Customizing PKTokenizer behavior

There are two basic types of decisions PKTokenizer must make when tokenizing strings:

Which token type should be created for a given start character?
Which characters are allowed within the current token being created?

PKTokenizer's behavior with respect to these two types of decisions is totally configurable. Let's tackle them, starting with the second question first.

Changing which characters are allowed within a token of a particular type

Once PKTokenizer has decided which token type to create for a given start character (see below), it temporarily passes control to one of its "state" helper objects to finish consumption of characters for the current token. Therefore, the logic for deciding which characters are allowed within a token of a given type is controlled by the "state" objects which are instances of subclasses of the abstract PKTokenizerState class: PKWordState, PKNumberState, PKQuoteState, PKSymbolState, PKWhitespaceState, PKCommentState, and PKDelimitState. The state objects are accessible via properties of the PKTokenizer object.

`PKTokenizer`
`…` `@property (readonly, retain) PKWordState wordState;` `@property (readonly, retain) PKNumberState numberState;` `@property (readonly, retain) PKQuoteState quoteState;` `@property (readonly, retain) PKSymbolState symbolState;` `@property (readonly, retain) PKWhitespaceState whitespaceState;` `@property (readonly, retain) PKCommentState commentState;` `@property (readonly, retain) PKDelimitState *delimitState;`

PKTokenizer

…
@property (readonly, retain) PKWordState *wordState;
@property (readonly, retain) PKNumberState *numberState;
@property (readonly, retain) PKQuoteState *quoteState;
@property (readonly, retain) PKSymbolState *symbolState;
@property (readonly, retain) PKWhitespaceState *whitespaceState;
@property (readonly, retain) PKCommentState *commentState;
@property (readonly, retain) PKDelimitState *delimitState;

Some of the PKTokenizerState subclasses have methods that alter which characters are allowed within tokens of their associated token type.

For example, if you want to add a new multiple-character symbol like «===»:

…
PKTokenizer *t = [PKTokenizer tokenizerWithString:s];
[t.symbolState add:@"==="];
…

Now «===» strings will be recognized as a single Symbol token with a stringValue of «===». There is a corresponding -[PKSymbolState remove:] method for removing recognition of given multi-char symbols.

If you don't want to allow digits within Word tokens (digits are allowed within Words by default):

…
[t.wordState setWordChars:NO from:'0' to:'9'];
…

Say you want to allow floating-point Number tokens to end with a «.», sans trailing «0». In other words, you want «49.» to be recognized as a single Number token with a floatValue of «49.0» rather than a Number token followed by a Symbol token with a stringValue of «.»:

…
t.numberState.allowsTrailingDot = YES;
…

Recognition of scientific notation (exponential numbers) can be enabled to recognize numbers like «10e+100», «6.626068E-34» and «6.0221415e23». The resulting PKToken objects will have floatValues which represent the full value of the exponential number, yet retain the original exponential representation as their stringValues.

…
t.numberState.allowsScentificNotation = YES;
…

Similarly, recognition of common octal and hexadecimal number notation can be enabled to recognize numbers like «020» (octal 16) and «0x20» (hex 32).

…
t.numberState.allowsOctalNotation = YES;
t.numberState.allowsHexadecimalNotation = YES;
…

The resulting PKToken objects will have a tokenType of PKTokenTypeNumber and a stringValue matching the original source notation («020» or «0x20»). Their floatValues will represent the normal decimal value of the number (in this case 16 and 32).

You can also configure which characters are recognized as whitespace within a whitespace token. To treat digits as whitespace characters within whitespace tokens:

…
[t.whitespaceState setWhitespaceChars:YES from:'0' to:'9'];
…

By default, whitespace chars are silently consumed by a tokenizer's PKWhitespaceState. To force reporting of PKTokens of type PKTokenTypeWhitespace containing the encountered whitespace chars as their stringValues (e.g. this would be necessary for a typical XML parser in which significant whitespace must be reported):

…
t.whitespaceState.reportsWhitespaceTokens = YES;
…

Similarly, comments are also silently consumed by default. To report Comment tokens instead:

…
t.commentState.reportsCommentTokens = YES;
…

Changing which token type is created for a given start character

PKTokenizer controls the logic for deciding which token type should be created for a given start character before passing the responsibility for completing tokens to its "state" helper objects. To change which token type is created for a given start character, you must call a method of the PKTokenizer object itself: -[PKTokenizer setTokenizerState:from:to:].

`PKTokenizer`
`…` - (void)setTokenizerState:(PKTokenizerState *)state from:(PKUniChar)start to:(PKUniChar)end; `…`

PKTokenizer

…

- (void)setTokenizerState:(PKTokenizerState *)state 
                     from:(PKUniChar)start 
                       to:(PKUniChar)end;

…

For example, suppose you want to turn off support for Number tokens altogether. To recognize digits as signaling the start of Word tokens:

…
PKTokenizer *t = [PKTokenizerWithString:s];
[t setTokenizerState:t.wordState from:'0' to:'9'];
…

This will cause PKTokenizer to begin creating a Word token (rather than a Number token) whenever a digit («0», «1», «2», «3»,«4», «5», «6», «7», «8», «9», «0» ) is encountered.

As another example, say you want to add support for new Quoted String token delimiters, such as «#». This would cause a string like #oh hai# to be recognized as a Quoted String token rather than a Symbol, two Words, and a Symbol. Here's how:

…
[t setTokenizerState:t.quoteState from:'#' to:'#'];
…

Note that if the from: and to: arguments are the same char, only behavior for that single char is affected.

Alternatively, say you want to recognize «+» characters followed immediately by digits as explicitly positive Number tokens rather than as a Symbol token followed by a Number token:

…
[t setTokenizerState:t.numberState from:'+' to:'+'];
…

Finally, customization of comments recognition may be necessary. By default, PKTokenizer passes control to its commentState object which silently consumes the comment text found after «//» or between «/*» «*/». This default behavior is achieved with the sequence:

…
[t setTokenizerState:t.commentState from:'/' to:'/'];
[t.commentState addSingleLineStartSymbol:@"//"];
[t.commentState addMultiLineStartSymbol:@"/*" endSymbol:@"*/"];
…

To recognize single-line comments starting with #:

…
[t setTokenizerState:t.commentState from:'#' to:'#'];
[t.commentState addSingleLineStartSymbol:@"#"];
…

To recognize multi-line "XML"- or "HTML"-style comments:

…
[t setTokenizerState:t.commentState from:'<' to:'<'];
[t.commentState addMultiLineStartSymbol:@"<!--" endSymbol:@"-->"];
…

To disable comments recognition altogether, tell PKTokenizer to pass control to its symbolState instead of its commentState.

…
[t setTokenizerState:t.symbolState from:'/' to:'/'];
…

Now PKTokenizer will return individual Symbol tokens for all «/» and «*» characters, as well as any other characters set as part of a comment start or end symbol.

Grammars

Basic Grammar Syntax

PEGKit allows users to build parsers for custom languages from a declarative, BNF-style grammar without writing any code. By inserting your grammar into the ParserGen.app application, Objective-C source code is generated which contains a parser for your language – specifically, a subclass of PKParser.

The grammar below describes a simple toy language called Cold Beer and will serve as a quick introduction to the PEGKit grammar syntax. The rules of the Cold Beer language are as follows. The language consists of a sequence of one or more sentences beginning with the word »cold« followed by a repetition of either »cold« or »freezing« followed by »beer« and terminated by the symbol ».«.

For example, each of the following lines are valid instances of the Cold Beer language (as is the example as a whole):

cold cold cold freezing cold freezing cold beer.
cold cold freezing cold beer.
cold freezing beer.
cold beer.

The following lines are not valid Cold Beer statements:

freezing cold beer.
cold freezing beer
beer.

Here is a complete PEGKit grammar for the Cold Beer language.

start = sentence+;
sentence = adjectives 'beer' '.';
adjectives = cold adjective*;
adjective = cold | freezing;
cold = 'cold';
freezing = 'freezing';

As shown above, the PEGKit grammar syntax consists of individual language production declarations separated by »;«. Whitespace is ignored, so the productions can be formatted liberally with whitespace as the programmer prefers. Comments are also allowed and resemble the comment style of Objective-C. So a commented Cold Beer grammar may appear as:

/*
    A Grammar for the Cold Beer Language
    by Todd Ditchendorf
*/
start = sentence+;     // outermost production
sentence = adjectives 'beer' '.';
adjectives = cold adjective*;
adjective = cold | 'freezing';
cold = 'cold';
freezing = 'freezing';

Rules

Every PEGKit grammar begins with the highest-level or outermost rule in the language. This rule must be declared first, but it may have any name you like. For Cold Beer, the outermost rule is:

start = sentence+;

Which states that the outermost rule of this language consists of a sequence of one or more (»+«) instances of the sentence rule.

sentence = adjectives 'beer' '.';

The sentence rule states that sentences are a sequence of the adjective rule followed by the literal strings beer and .

adjectives = cold adjective*;

In turn, adjectives is a sequence of a single instance of the cold rule followed by a repetition (»*« read as 'zero or more') of the adjective rule.

adjective = cold | freezing;
cold = 'cold';
freezing = 'freezing';

The adjective rule is an alternation of either an instance of the cold or the freezing rule. The cold rule is the literal string cold and freezing the literal string freezing.

Grouping

A language may be expressed in many different, yet equivalent grammars. Rules may be referenced in any order (even before they are defined) and grouped using parentheses (»(« and »)«).

For example, the Cold Beer language could also be represented by the following grammar:

start = ('cold' ('cold' | 'freezing')* 'beer' '.')+;

Discarding

The post-fix ! operator can be used to discard a token which is not needed to compute a result.

Example:

addExpr = atom ('+'! atom)*;
atom = Number;

The + token will not be necessary to calculate the result of matched addition expressions, so we can discard it.

Actions

Actions are small pieces of Objective-C source code embedded directly in a PEGKit grammar rule. Actions are enclosed in curly braces and placed after any rule reference.

In any action, there is a self.assembly object available (of type PKAssembly) which serves as a stack (via the PUSH() and POP() convenience macros). The assembly's stack contains the most recently parsed tokens (instances of PKToken), and also serves as a place to store your work as you compute the result.

Actions are executed immediately after their preceding rule reference matches. So tokens which have recently been matched are available at the top of the assembly's stack.

Example 1:

// matches addition expressions like `1 + 3 + 4`
addExpr  = atom plusAtom*;

plusAtom = '+'! atom
{
    PUSH_DOUBLE(POP_DOUBLE() + POP_DOUBLE());
};

atom     = Number
{
    // pop the double value of token on the top of the stack
    // and push it back as a double value 
    PUSH_DOUBLE(POP_DOUBLE()); 
};

Example 2:

// matches or expressions like `foo or bar` or `foo || bar || baz`
orExpr = item (or item {
    id rhs = POP();
    id lhs = POP();
    MyOrNode *orNode = [MyOrNode nodeWithChildren:lhs, rhs];
    PUSH(orNode);
})*;
or    =  'or'! | '||'!;
item  = Word;

Rule Actions

@before - setup code goes here. executed before parsing of this rule begins.
@after - tear down code goes here. executed after parsing of this rule ends.

Rule actions are placed inside a rule -- after the rule name, but before the = sign.

Example:

// matches things like `-1` or `---1` or `--------1`

@extension { // this is a "Grammar Action". See below.
    @property (nonatomic) BOOL negative;
}

unaryExpr 
@before { _negative = NO; }
@after  {
    double d = POP_DOUBLE();
    d = (_negative) ? -d : d;
    PUSH_DOUBLE(d);
}
    = ('-'! { _negative = !_negative; })+ num;
num = Number;

Grammar Actions

PEGKit has a feature inspired by ANTLR called "Grammar Actions". Grammar Actions are a way to do exactly what you are looking for: inserting arbitrary code in various places in your Parser's .h and .m files. They must be placed at the top of your grammar before any rules are listed.

Here are all of the Grammar Actions currently available, along with a description of where their bodies are inserted in the source code of your generated parser:

In the .h file:

@h - top of .h file
@interface - inside the @interface portion of header

In the .m file:

@m - top of .m file
@extension - inside a private @interface MyParser () class extension in the .m file
@ivars - private ivars inside the @implementation MyParser {} in the .m file
@implementation - inside your parser's @implementation. A place for defining methods.
@init - inside your parser's init method
@dealloc - inside your parser's dealloc method if ARC is not enabled
@before - setup code goes here. executed before parsing begins.
@after - tear down code goes here. executed after parsing ends.

(notice that the @before and @after Grammar Actions listed here are distinct from the @before and @after which may also be placed in each individual rule.)

Semantic Predicates

Semantic Predicates are another feature lifted directly from ANTLR. Consider:

lowercaseWord = { islower([LS(1) characterAtIndex:0]) }? Word;

The Semantic Predicate part is the { ... }?. Like Grammar Actions, Semantic Predicates are small snippets of Objective-C code embedded directly in your grammar. These can be placed anywhere in your grammar rules. They should contain either a single expression or a series of statements ending in a return statement which evaluates to a boolean value. This one contains a single expression. If the expression evaluates to false, matching of the current rule (lowercaseWord in this case) will fail. A true value will allow matching to proceed.

There are also a number of convenience macros defined for your use in Predicates and Actions.

LS(num) will fetch a Lookahead String and the argument specifies how far to lookahead. So LS(1) means lookahead by 1. In other words, "fetch the string value of the first upcoming token the parser is about to try to match".
MATCHES_IGNORE_CASE(str, regexPattern) is a convenience macro to do regex matches. It has a case-sensitive friend: MATCHES(str, regexPattern). The second argument is an NSString* regex pattern. Meaning should be obvious.

pegkit's People

Contributors

Stargazers

Watchers

pegkit's Issues

Should be able to remove delimiters from token stringValue

Tokens you get back include the delimiters. Generally, if its a comment, you know its delimited by /* */. Similarly, if its a quoted string, you know its delimited by ' or ". I think in the most common use cases, users will expect the tokens returned to be without the delimiters.

Right now the only way to do that is to do some additional parsing on the returned tokens or using substrings or some similar means. I think the best way to do this will be in the state objects (PKQuoteState, PKCommentState) etc. One way would be to have an includeDelimiter property, that when not set, does not include delimiters in the stringValue.

Use of exceptions for flow control kills performance

Title says it all; Profiling a relatively simple parser generated by PEGKit/ParserGenApp shows that it spends about ~42% of its time doing exception processing. Even after removing all the string formatting of the exception messages, it's still spending ~25% of its time in obc_exception_throw.

I can imagine why it was done this way -- exceptions provide an easy way to unwind arbitrarily deep call stacks, which I suspect is a significant benefit/short cut in the context of a recursive descent parser -- but the performance impact is pretty significant.

Infinite recursion in -[PGParserGenVisitor lookaheadSetForNode:]

I made a direct translation of the XPath 1.0 EBNF rules to a PEGKit grammar. The issue that I am seeing is that the ParserGenApp is infinitely recursing in -[PGParserGenVisitor lookaheadSetForNode:] when I try to generate the parser.

Here is a reduction of the grammar which causes infinite recursion:

locationPath
    = relativeLocationPath
    ;

relativeLocationPath
    = step
    | relativeLocationPath '/' step
    | relativeLocationPath '//' step
    ;

step
    = '.'
    | '..'
    ;

I noticed that the Panthro XPath grammar has a comment at the top about changing the relativeLocationPath production. This fixes one infinite recursion issue, but there is another, seemingly being triggered by the filterExpr production:

filterExpr
    = primaryExpr
    | filterExpr predicate
    ;

And when I print the PGBaseNodes that reach the default case in the switch block within -[PGParserGenVisitor lookaheadSetForNode:], I get:

...
(. '//' #relativeLocationPath)
($filterExpr (| #primaryExpr (. #filterExpr #predicate)))
($primaryExpr (| #variableReference (. '(' #expr ')') QuotedString Number #functionCall))
($variableReference (. '$' Word))
(. '$' Word)
(. '(' #expr ')')
($functionCall (. #functionName '(' (? (. #argument (* (. ',' #argument)))) ')'))
(. #functionName '(' (? (. #argument (* (. ',' #argument)))) ')')
($functionName Word)
(. #filterExpr #predicate)
($filterExpr (| #primaryExpr (. #filterExpr #predicate)))
($primaryExpr (| #variableReference (. '(' #expr ')') QuotedString Number #functionCall))
($variableReference (. '$' Word))
(. '$' Word)
(. '(' #expr ')')
($functionCall (. #functionName '(' (? (. #argument (* (. ',' #argument)))) ')'))
(. #functionName '(' (? (. #argument (* (. ',' #argument)))) ')')
($functionName Word)
(. #filterExpr #predicate)
($filterExpr (| #primaryExpr (. #filterExpr #predicate)))
...

Grammar error reports wrong line number

When there is a syntax error in ParserGenApp the app reports the wrong line number. I think it is of by the number of @ blocks.

This is an example where a x is added after the ; on line to generate an error. The alert box shows the error as being on line 5 which is not correct.

Generated Parser will crash in execute: without return value

New to Parse/PegKit so I may be missing something obvious. Absolutely loving it though...

I was experiencing random errors while trying to update the actions from the MiniMath example. For example, even adding something like: NSNumber *numA = @5; was causing it to have an EXC_BAD_ACCESS in the [self execute:^{ line.

Here is the code generated by Parser Gen App:

(void)multExpr_ {

[self primary_];
while ([self speculate:^{ [self match:MINIMATH_TOKEN_KIND_STAR discard:YES]; [self primary_]; }]) {
[self match:MINIMATH_TOKEN_KIND_STAR discard:YES];
[self primary_];
[self execute:(id)^{
NSNumber *numA = @5;
PUSH_DOUBLE(POP_DOUBLE() * POP_DOUBLE());
}];
}
}

Apparently, the execute block is expecting a return value, and without one, it is just getting whatever happens to be in memory. Adding "return nil;" to the block fixed the problem.

If I haven't missed something obvious, you might want to update the example to include the necessary returns (though I am not actually sure what is being done with the value) or remove the need for them.

Using Quote as Symbol wont compile

When I turn off QuotedStrings and try to use " as a symbol, ParserGenApp generates the following lines which fail to compile:

self.tokenKindTab[@"""] = @(ACTION_TOKEN_KIND_QUOTE);
self.tokenKindNameTab[ACTION_TOKEN_KIND_QUOTE] = @""";

the lines should be (it just needs a special case for quotes):
self.tokenKindTab[@"""] = @(ACTION_TOKEN_KIND_QUOTE);
self.tokenKindNameTab[ACTION_TOKEN_KIND_QUOTE] = @""";

As a stopgap, I added a method to my version of PGTokenKindDescriptor:

-(NSString*)stringValue {
if ([_stringValue isEqualToString:@"""]) {
return @"\"";
}
return _stringValue;
}

(Note: I am still using 0.3.5, so please forgive me if this has already been fixed.)

Syntax Question

I've been reading here to try and decode how to create a grammar to parse some text.

I see here your use you use ~ but I can't quite figure out what that is for. I think it means take the entire string up to <.

Here is an example of what I am trying to parse:

title: This is a tile of this document

participant name as "Some friendly name"
participant name1 as "Some friendly name1"

The keywork title has a string that is terminated by \n.

The keywork participant has two parts where the second is optional.

Do you have any documentation that defines the built in types like QuotedString and what the ~ is used for?

Using 'S' in a definition will produce 'TOKEN_KIND_BUILTIN_S' instead of 'TOKEN_KIND_BUILTIN_WHITESPACE'.

Another one for you with a test grammar.

@before {
    PKTokenizer *t = self.tokenizer;

    // whitespace
    self.silentlyConsumesWhitespace = NO;
    t.whitespaceState.reportsWhitespaceTokens = YES;

    // NOTE: mated `S` (i.e. whitespace) tokens will never be preserved by this parser's assembly, unless you turn on the `preservesWhitespaceTokens` below
    // So by default, it is as if all `S` references were actually defined as `S!`. Not sure I still like this default, but that's how it is for now.
    //self.assembly.preservesWhitespaceTokens = YES;
}

lines = id+;
id = Word | S;

Save deletes undo history in ParseGenApp

BUG

When clicking File->Saveor Generate in ParseGenApp the undo history is lost.

This makes it hard to quickly test a new grammar entry and then rollback to pre-edit condition.

The app should maintain undo history while it is running in the same document.

Some research notes on issue

NSTextView is bound to grammar as KVO in interface builder
NSTextView editable is bound to busy as KVO in interface builder
Saving doc caret jumps to top of doc
When saving NSTextView looses focus
Disable busy logic keeps undo history after generate but still broken for File->Save

- (void)done {
    ...
//    self.busy = NO;
//    [self focusTextView];
    ...
}

- (IBAction)generate:(id)sender {
    ...
   //self.busy = YES;
    ...
}

Smart Quotes and Dashes

Since Maverics the settings in Interface builder for NSTextViews SmartQuotes and SmartDashes does not work. It must be disabled in code.

This video demonstrates the NSTextView changing from one quote style to the smart quote style:

Command Line App

This is a feature request.

It would be nice if there were a command line app similar to ParserGenApp. This way XCode could be configured to automatically build the grammar as part of the compile phase.

This might be a way it would work

pegkit --grammar /path/to/grammar/input.grammar --output-path /path/where/to/generate/source --output-class MyClassName

Argument	Description
grammar	The input grammar file
output-path	The path where class files will be generated
output-class	The name of the class to generate. The extensions `.m` and `.h` will be appended to this file

The CLI took would be added to a build phase in an XCode project before the compile step.

Binary is missing Template dependency

The output app ParseAppGen is missing the template library. When running the app outside of XCode or double clicking on saved file will cause app to crash. This is the crash generated when run outside of XCode:

Dyld Error Message:
  Library not loaded: @rpath/TDTemplateEngine.framework/Versions/A/TDTemplateEngine
  Referenced from: /Users/USER/Library/Developer/Xcode/DerivedData/PEGKit-evbpxkuihrhytecvnigwxryrwvcu/Build/Products/Debug/ParserGenApp.app/Contents/MacOS/ParserGenApp
  Reason: image not found

Full Output

Process:               ParserGenApp [62015]
Path:                  /Users/USER/Library/Developer/Xcode/DerivedData/PEGKit-evbpxkuihrhytecvnigwxryrwvcu/Build/Products/Debug/ParserGenApp.app/Contents/MacOS/ParserGenApp
Identifier:            com.parsekit.ParserGenApp
Version:               ???
Code Type:             X86-64 (Native)
Parent Process:        ??? [1]
Responsible:           ParserGenApp [62015]
User ID:               1489

Date/Time:             2016-11-21 07:51:13.772 -0600
OS Version:            Mac OS X 10.11.6 (15G1108)
Report Version:        11
Anonymous UUID:        0D496D28-74B4-26D0-DF03-1BFED4ABE361

Sleep/Wake UUID:       1CEA6906-4E8A-400D-ABC9-28B1E7BD641F

Time Awake Since Boot: 290000 seconds
Time Since Wake:       4500 seconds

System Integrity Protection: enabled

Crashed Thread:        0

Exception Type:        EXC_BREAKPOINT (SIGTRAP)
Exception Codes:       0x0000000000000002, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Application Specific Information:
dyld: launch, loading dependent libraries

Dyld Error Message:
  Library not loaded: @rpath/TDTemplateEngine.framework/Versions/A/TDTemplateEngine
  Referenced from: /Users/USER/Library/Developer/Xcode/DerivedData/PEGKit-evbpxkuihrhytecvnigwxryrwvcu/Build/Products/Debug/ParserGenApp.app/Contents/MacOS/ParserGenApp
  Reason: image not found

Binary Images:
    0x7fff6de80000 -     0x7fff6deb7a47  dyld (360.22) <DC81CC9D-651A-3A45-8809-928282052BD3> /usr/lib/dyld
    0x7fff8a022000 -     0x7fff8a22ffff  libicucore.A.dylib (551.51.4) <3899B146-3840-3D4A-8C4A-FE391D5D25C7> /usr/lib/libicucore.A.dylib

ParserGenApp output wrong codes.

Given a rule:

atom = '('? Number | Word;

We got generated code:

- (void)__atom {

    if ([self predicts:EXPRESSIONPARSER_TOKEN_KIND_OPEN_PAREN, 0]) {
        if ([self predicts:EXPRESSIONPARSER_TOKEN_KIND_OPEN_PAREN, 0]) {
            [self match:EXPRESSIONPARSER_TOKEN_KIND_OPEN_PAREN discard:NO]; 
        }
        [self matchNumber:NO]; 
    } else if ([self predicts:TOKEN_KIND_BUILTIN_WORD, 0]) {
        [self matchWord:NO]; 
    } else {
        [self raise:@"No viable alternative found in rule 'atom'."];
    }

    [self fireDelegateSelector:@selector(parser:didMatchAtom:)];
}

The first if should be:

    if ([self predicts:EXPRESSIONPARSER_TOKEN_KIND_OPEN_PAREN, TOKEN_KIND_BUILTIN_NUMBER, 0]) {
        if ([self predicts:EXPRESSIONPARSER_TOKEN_KIND_OPEN_PAREN, 0]) {
            [self match:EXPRESSIONPARSER_TOKEN_KIND_OPEN_PAREN discard:NO]; 
        }

'{ MATCHES(@"\\n", LS(1)) }? S' will generate '[self matchS:NO];' instead of '[self matchWhitespace:NO];'

This sample grammar demonstrates the issue where ParseGenApp creates invalid code. My eol rule is an attempt to get PEGKit to match newline characters. Maybe there's an easier way?

  program
  @before {
    PKTokenizer *t = self.tokenizer;

    // whitespace
    self.silentlyConsumesWhitespace = NO;
    t.whitespaceState.reportsWhitespaceTokens = YES;
  //  self.assembly.preservesWhitespaceTokens = YES;

    [t.symbolState add:@"\\n"];

    // setup comments
    t.commentState.reportsCommentTokens = YES;
    [t.commentState addSingleLineStartMarker:@"//"];
    [t.commentState addMultiLineStartMarker:@"/*" endMarker:@"*/"];
  }
    = eol;

  eol
    = { MATCHES(@"\\n", LS(1)) }? S
    ;

Curly Brace in action causes parse error

With some rules like:

name = Word;

signedNumber = ('+' Number){ } | ('-' Number){
PUSH_INT(POP_INT()*-1);
};

typeNumber = ( '(' signedNumber (','! signedNumber)? ')' )
{
NSMutableArray* typeNumb = [[NSMutableArray alloc] init];
do{
[typeNumb addObject:POP()];
}while([POP_STR() isEqualToString:@"("]);
PUSH(typeNumb);
}

        | Empty
        {
            PUSH(@[]);
        };

This causes a parse error on the rule typeNumber, i've narrowed it down to the do{ } while(); part of it
if you remove this do-while it all works.

Is this not allowed in an action better way to do this? Removing the do while it parses and compiles just fine.

Thanks

Full Grammar:

createTableStmt = 'CREATE'! tempOpt 'TABLE'! existsOpt createTableName '('! columns tableConstraint ')'! rowIDOpt (';'!)?
{
CreateTableStatement* newStatement = [[CreateTableStatement alloc] init];
newStatement.rowID = POP_BOOL();
PUSH(newStatement);
};

databaseName = Word;

tableName = Word;

tableConstraint = ',' constraintName (primaryTableConstraint | uniqueTableConstraint | foreignTableConstraint);

indexColumns = columnNameList;

primaryTableConstraint = 'PRIMARY'! 'KEY'! indexColumns conflictClause;

uniqueTableConstraint = 'UNIQUE' indexColumns conflictClause;

foreignTableConstraint = 'FOREIGN' 'KEY' columnNameList foreignKeyClause;

tempOpt = ('TEMP'! | 'TEMPORARY'!)
{
PUSH(@0);
}
| Empty
{
NSLog(@"Empty! Tmp Option");
PUSH(@(0));
}
;

existsOpt = ('IF'! 'NOT'! 'EXISTS'!)
{
PUSH(@(1));
}
| Empty{
PUSH(@(0));
}
;

rowIDOpt = ( 'WITHOUT'! 'ROWID'! )
{
PUSH(@(1));
}
| Empty
{
PUSH(@(1));
}
;

createTableName = ( databaseName '.'! )? tableName;

columnName = Word;

name = Word;

typeName = (name typeNumber)
{
NSMutableArray* typeNumber = POP();
NSString* name = POP();
TypeName* typeName = [[TypeName alloc] initWithName:name andTypeNumber:typeNumber];
PUSH(typeName);
}
| Empty{
PUSH(@[]);
};

signedNumber = ('+' Number){ } | ('-' Number){
PUSH_INT(POP_INT()*-1);
};

columns = columnDef (',' columnDef )* ;

columnDef = columnName typeName columnConstraint*;

constraintName = ('CONSTRAINT'! name) | Empty;

primaryKeyConstraint = 'PRIMARY' 'KEY' ('ASC' | 'DESC')? conflictClause 'AUTOINCREMENT'? ;

notNullConstraint = 'NOT'! 'NULL'! conflictClause{
PUSH([[NotNullConstraint alloc] initWithConflictClause:POP()]);
};

uniqueConstraint = 'UNIQUE' conflictClause;

defaultValueConstraint = 'DEFAULT' (signedNumber | literalValue);

collationName = name;

collateConstraint = 'COLLATE' collationName;

rollBackResolution = 'ROLLBACK'! {
PUSH_INT(0);
};

abortResolution = 'ABORT!'
{
PUSH_INT(1);
};

failResolution = 'FAIL'!
{
PUSH_INT(2);
};

ignoreResolution = 'IGNORE'!
{
PUSH_INT(3);
};

replaceResolution = 'REPLACE'!
{
PUSH_INT(3);
};

conflictClause = ('ON' 'CONFLICT' ( rollBackResolution | abortResolution | failResolution | ignoreResolution | replaceResolution))
{
PUSH([[ConflictClause alloc] initWithResolutionType:POP_INT()]);
}
| Empty
{
PUSH([[ConflictClause alloc] initWithResolutionType:-1]);
}

foreignTable = tableName;

columnNameList = '(' columnName (',' columnName)* ')';

ParserGenApp can't generate parser for pattern grammar.

ParserGenApp can't generate parser for pattern grammar.
pattern.grammar in "res" directory:

start = s;

s = /\w+/;

Cannot compile in 32 bits

I downloaded the current master (as of 7 August 2017) and it compiles nicely in 64 bits in Xcode 8.2.1. However, I could not compile any OS X target in 32 bits, which I need. The compiler complains that implicit Ivars (?) like _string or _literal are not declared.

How can I compile the targets in 32 bits?

-[PKReader setString:] conflict terminates app due to Assertion failure

Hi,
some designated initializer[s] will setString: and also setStream: properties,
more than once, it seems, and in a way that makes NSAssert(!_stream, @"")
fail in PKReader.

For minimizing the problem description, I had ParserGenApp work on the
following grammar, using defaults, and produce AParser.[hm]:

  a = 'A'
    ;

Then, I ran this command line utility:

#import <Foundation/Foundation.h>
#import <PEGKit/PEGKit.h>
#import "AParser.h"

void startParsing(NSInputStream* input) {
    NSError* report;
    PKParser* simplyA = [[AParser alloc] initWithDelegate:nil];
    [simplyA parseStream:input error:&report];
}

int main(int argc, const char * argv[]) {
    NSData* const pseudoFile = [@"A" dataUsingEncoding:NSUTF8StringEncoding];
    
    @autoreleasepool {
        NSInputStream* input = [NSInputStream inputStreamWithData:pseudoFile];
        startParsing(input);
    }
    return 0;
}

which terminated with backtrace, the second time setString: was called:

  ...
  4   PEGKit  0x0000000100013d1c -[PKReader setString:] + 204
  ...

Following call sequences with breakpoints set at -[PKReader setString:] and
at -[PKReader setStream:] reveals that a stream is set (as expected), and then,
later, another setString:nil effectively asserts the stream to be nil, which it isn't.

Is the above program using the PK-types properly?

Random hang on startup

This may very well be something I've done, but I'm not sure how to diagnose it.

Randomly my startup of the app will hang. Always at the location in the attached screen shot. I'm using PEGKit on a background thread during startup so my thought at the moment is that there is some sort of lock kicking in. I'm just not sure what and where.

xctest crashes when running unit tests

When I run the unit tests on both Xcode 5.0.2 or 5.1 (OS X 10.8.5) xctest crashes:

*** NSTask: Task create for path '**absolute-path**/pegkit/build/PEGKit/Build/Products/Debug/PEGKitTests.xctest/Contents/MacOS/PEGKitTests' failed: 22, "Invalid argument".  Terminating temporary process.

This seems to be, because the binary at that location is bad in some way. I tried to get diagnostic info using otool -L and lipo. The former tells me

 Reference table (0 entries)

and the latter

Non-fat file: **absolute-path**/pegkit/build/PEGKit/Build/Products/Debug/PEGKitTests.xctest/Contents/MacOS/PEGKitTests is architecture: x86_64

There are no special characters in the path.

Any ideas?

Details:

Code Type:       X86-64 (Native)
Parent Process:  xctest [16443]
User ID:         501

Date/Time:       2014-03-25 16:19:41.195 +0100
OS Version:      Mac OS X 10.8.5 (12F37)
Report Version:  10

Interval Since Last Report:          5193103 sec
Crashes Since Last Report:           1431
Per-App Crashes Since Last Report:   2
Anonymous UUID:                      B2CABE9D-3A4E-072F-3902-5DAE1C82531D

Crashed Thread:  0  Dispatch queue: com.apple.main-thread

Exception Type:  EXC_BREAKPOINT (SIGTRAP)
Exception Codes: 0x0000000000000002, 0x0000000000000000

Application Specific Information:
*** NSTask: Task create for path '**absolute-path**/pegkit/build/PEGKit/Build/Products/Debug/PEGKitTests.xctest/Contents/MacOS/PEGKitTests' failed: 22, "Invalid argument".  Terminating temporary process.
*** multi-threaded process forked ***

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   com.apple.Foundation            0x00007fff931e9308 ___NEW_PROCESS_COULD_NOT_BE_EXECD___ + 5
1   com.apple.Foundation            0x00007fff930c2861 -[NSConcreteTask launchWithDictionary:] + 3544
2   com.apple.Foundation            0x00007fff930c133d +[NSTask launchedTaskWithLaunchPath:arguments:] + 205
3   xctest                          0x0000000100001385 0x100000000 + 4997
4   xctest                          0x00000001000014ae 0x100000000 + 5294
5   xctest                          0x0000000100000e53 0x100000000 + 3667
6   libdyld.dylib                   0x00007fff9219d7e1 start + 1

Failed to match input token

Downloaded Zip of PegKit
Downloaded Zip of templateengine
Opened the workspace of PegKit andadded templateengine project in that workspace
Ran ParseGenApp target and generated code for minmath grammar successfully
Ran ParseGenApp target and generated code for mysql grammar but got error

Failed to match next input token: Line : 3
Near : @ singleLineComments
Expected : Word

Found : @

Any reason why?
mysql grammar works, if the code is downloaded from http://itod.github.io/PEGKitMiniMathTutorial/

Using throws to indicate end of parse

During some debugging of my code I activated a generate exception symbolic breakpoint and noticed that PEGKit appears to be using exceptions internally to indicate that it's failed to match part of the grammar. Specifically in

-[PKParser match:(NSInteger)tokenKind discard:(BOOL)discard]

In my code it appears to be failing to match a key path name which is correct as the string I'm parsing doesn't contain one. I don't get any exceptions throw to my code and tracing back through the stack I can see a try-catch which will kick in.

I've always understood that it's better to populate NSError * pointers because it's expensive to throw exceptions, so I'm wondering if a code improvement might be to replace the usage of throws on match failures with NSError* pointers. Then in higher parts of the code, it can decide if there is an actual error and throw it, or that it's just that the grammar is not matching and it should proceed to the next match attempt.

Alternatively it could be that my grammar is not as good as I think and needs to be adjusted so the internal exceptions are not generated.

ParserGenApp Newline issues

I've been having a lot issues with how the new line character is handled inside the ParserGenApp

Anytime I directly try to create/edit a grammar from the NSTextView it breaks compleatly

This case was after typing the following line

test = “Hello”;

2015-04-11 18:39:00.260 ParserGenApp[11654:1662437] *** Assertion failure in -[PKSymbolState nextTokenizerStateFor:tokenizer:], /Users/anthonyl/source/pegkit/src/PKTokenizerState.m:106
2015-04-11 18:39:00.260 ParserGenApp[11654:1662437] Invalid parameter not satisfying: c < STATE_COUNT
2015-04-11 18:40:26.656 ParserGenApp[11654:1662437] Failed to match next input token: Expected : «EOF»

Most tests fail when running unit tests in Release mode

When switching the unit tests to build in Release mode (using the scheme settings for that target), most of them fail. This apparently happens, because “-description” returns different results.

Exceptions should not be used in Objective C

The use of @throw is problematic. It prevents you from usefully setting breakpoints on objective-C exceptions because PKParser will throw many exceptions as part of its normal operation. A developer must choose between not breaking on exceptions that are actually errors or hitting continue many times during parsing.

Fails to parse 2nd time with memoization

With the memoization option on, the parser will work correctly the first time, but fail on the second attempt (even on the same string).

It seems that it is generating a _clearMemo method to clear the memoization, but it is calling the clearMemo method (no leading underscore) which is empty, so the actual cleanup code never gets called.

Manually changing the _clearMemo method to clearMemo fixes the problem (as does calling [self _clearMemo] from clearMemo.

Thanks,
Jon

[docs] typo in link

Minor feedback: the README contains a typo in the link to the Grammars section: github.com/itod/pegkit#grammars is misspelled as github.com/itod/pegkit#garmmars.

Adding workspace to Xcode project or other workspace does not allow for target dependancy

I saw you updated to use a workspace. and that I should include that. I drag that into an Xcode project and it doesn't give me any targets or anything to link to. Is that the correct description or should it say use a workspace and add the projects in that one to the workspace. I am unable to build by dragging in the workspace. Any documentation or examples would be very helpful.

Redefinition of 'ACTION_ACTION_TOKEN_KIND___'

I could be doing something wrong, but adding certain symbols (e.g. '?(' '≤' '≥' '≠' ) to the symbolState in the grammar causes it to generate code which will not compile because of a "Redefinition of enumerator 'ACTION_ACTION_TOKEN_KIND___'" error.

start @before{
PKTokenizer *t = self.tokenizer;
[t.symbolState add:@"<="];
[t.symbolState add:@">="];
[t.symbolState add:@"!="];
//[t.symbolState add:@"≤"];
//[t.symbolState add:@"≥"];
[t.symbolState add:@"||"];
[t.symbolState add:@"&&"];
[t.symbolState add:@"?("];
[t.symbolState add:@"):"];
} = boolExpr | mathExpr;

Actions on Empty items in ParseGenApp and file generation

Hi,
I wrote the question on SO about Custom Objects in Parsekit
(http://stackoverflow.com/questions/22665682/custom-objects-in-parsekit-actions/22693770?noredirect=1#22693770)

I see the check in about adding empty actions but when I attempt to generate the .m and .h files for an empty action I get an unable to parse file. The grammar you provided me with also exhibits the error.
Thanks

Assertion failure in - (PKTokenizerState )nextTokenizerStateFor:(PKUniChar)c tokenizer:(PKTokenizer )t

The first line in this method is:
NSParameterAssert(c < STATE_COUNT);

Since 'c' is an int32_t value holding a Unicode character and STATE_COUNT is 256, this line is easily triggered when parsing strings that contain characters that have higher Unicode values. I think that this line is an error, and commenting it out solves the issue for me. Can you explain why this line is important and how I can avoid triggering it? If it represents a bug, is there any possibility for a new release that fixes the problem?

Assertion Failure in Parsegenapp with supplied grammar

I have a grammar that I translated from:
http://www.sqlite.org/docsrc/doc/trunk/art/syntax/all-bnf.html

It is an extremely rough draft of the grammar but is complete. I took what was in the above site and converted it into an ANTLR valid grammar and was able to generate parsing code from the grammar so I know that the syntax and rule definitions are correct and syntactically valid. Other subtle gotchas are sure to abound. But this is not a request to fix the grammar but to provide it as an example of something that will hard crash the ParseGenApp.

again this is not asking for a grammar fix or anything like that (but you're welcome to as well) but just an a data set that crashes the app hard.

Thanks for the great tool!
If there is a better way to provide these types of bugs/issues let me know

parsegen grammar

//******************************************************************************
sqlstmtlist = sqlstmt  ( ';' ( sqlstmt )? )*;

databasename =  Word;
tablename = Word;
newtablename =  tablename;

tableorindexname = tablename;   

savepointname   =Word;
columnname  =Word;
indexname   =Word;
collationname   =Word;
name        =Word;
foreigntable    =Word;
triggername =Word;  
viewname    =Word;
modulename  =Word;
moduleargument  =Word;
initialselect   =Word;
recursiveselect =Word;
errormessage=Word;  
pragmaname= Word;   
columnalias=Word;
tablealias=Word;
functionname    =Word;

explainOption  =    ( 'EXPLAIN' ( 'QUERY' 'PLAN' )? )?;

sqlstmt =   explainOption actionStatement;

actionStatement 
        =  altertablestmt 
        | analyzestmt
        | attachstmt 
        | beginstmt 
        | commitstmt 
        | createindexstmt 
        | createtablestmt 
        | createtriggerstmt 
        | createviewstmt 
        | createvirtualtablestmt
        | deletestmt 
        | detachstmt 
        | dropindexstmt 
        | droptablestmt 
        | droptriggerstmt 
        | dropviewstmt 
        | insertstmt 
        | pragmastmt 
        | reindexstmt 
        | releasestmt 
        | rollbackstmt 
        | savepointstmt 
        | selectstmt 
        | updatestmt 
        | vacuumstmt 
        ;

altertablestmt  =   'ALTER' 'TABLE' ( databasename '.' )? tablename ( 'RENAME' 'TO' newtablename | 'ADD' ( 'COLUMN' )? columndef );

analyzestmt =   'ANALYZE' ( databasename | tableorindexname | databasename '.' tableorindexname )?;


attachstmt  =   'ATTACH' ( 'DATABASE' )? expr 'AS' databasename;


beginstmt   =   'BEGIN' ( 'DEFERRED' | 'IMMEDIATE' | 'EXCLUSIVE' )? ( 'TRANSACTION' )?;


commitstmt  =   ( 'COMMIT' | 'END' ) ( 'TRANSACTION' )?;


rollbackstmt    =   'ROLLBACK' ( 'TRANSACTION' )? ( 'TO' ( 'SAVEPOINT' )? savepointname )?;


savepointstmt   =   'SAVEPOINT' savepointname;


releasestmt =   'RELEASE' ( 'SAVEPOINT' )? savepointname;


createindexstmt = 'CREATE' ( 'UNIQUE' )? 'INDEX' ( 'IF' 'NOT' 'EXISTS' )? ( databasename '.' )? indexname 'ON' tablename '(' indexedcolumn ( ',' indexedcolumn )* ')' ( 'WHERE' expr )?;


indexedcolumn   =   columnname ( 'COLLATE' collationname )? ( 'ASC' | 'DESC' )?;


createtablestmt = 'CREATE' ( 'TEMP' | 'TEMPORARY' )? 'TABLE' ( 'IF' 'NOT' 'EXISTS' )? ( databasename '.' )? tablename ( '(' columndef ( ',' columndef )* ( ',' tableconstraint )* ')' ( 'WITHOUT' 'ROWWord' )? | 'AS' selectstmt );


columndef   =   columnname ( typename )? ( columnconstraint )*;

typename    =   name ( '(' signednumber ')' | '(' signednumber ',' signednumber ')' )?;

columnconstraint    =   ( 'CONSTRAINT' name )? ( 'PRIMARY' 'KEY' ( 'ASC' | 'DESC' )? conflictclause ( 'AUTOINCREMENT' )? | 'NOT' 'NULL' conflictclause | 'UNIQUE' conflictclause | 'CHECK' '(' expr ')' | 'DEFAULT' ( signednumber | literalvalue | '(' expr ')' ) | 'COLLATE' collationname | foreignkeyclause );

signednumber    =   ( '+' | '-' )? numericliteral;

tableconstraint =   ( 'CONSTRAINT' name )? ( ( 'PRIMARY' 'KEY' | 'UNIQUE' ) '(' indexedcolumn ( ',' indexedcolumn )* ')' conflictclause | 'CHECK' '(' expr ')' | 'FOREIGN' 'KEY' '(' columnname ( ',' columnname )* ')' foreignkeyclause );

foreignkeyclause    =   'REFERENCES' foreigntable ( '(' columnname ( ',' columnname )* ')' )?
                        ( ( 'ON' ( 'DELETE' | 'UPDATE' ) ( 'SET' 'NULL' | 'SET' 'DEFAULT' | 'CASCADE' | 'RESTRICT' | 'NO' 'ACTION' ) | 'MATCH' name ) )?
                        ( ( 'NOT' )? 'DEFERRABLE' ( 'INITIALLY' 'DEFERRED' | 'INITIALLY' 'IMMEDIATE' )? )?;

conflictclause  =   ( 'ON' 'CONFLICT' ( 'ROLLBACK' | 'ABORT' | 'FAIL' | 'IGNORE' | 'REPLACE' ) )?;

createtriggerstmt   =   'CREATE' ( 'TEMP' | 'TEMPORARY' )? 'TRIGGER' ( 'IF' 'NOT' 'EXISTS' )?
                        ( databasename '.' )? triggername ( 'BEFORE' | 'AFTER' | 'INSTEAD' 'OF' )?
                        ( 'DELETE' | 'INSERT' | 'UPDATE' ( 'OF' columnname ( ',' columnname )* )? ) 'ON' tablename
                        ( 'FOR' 'EACH' 'ROW' )? ( 'WHEN' expr )?
                        'BEGIN' ( updatestmt | insertstmt | deletestmt | selectstmt ) ';' 'END';

createviewstmt  = 'CREATE' ( 'TEMP' | 'TEMPORARY' )? 'VIEW' ( 'IF' 'NOT' 'EXISTS' )? ( databasename '.' )? viewname 'AS' selectstmt;

createvirtualtablestmt  = 'CREATE' 'VIRTUAL' 'TABLE' ( 'IF' 'NOT' 'EXISTS' )? ( databasename '.' )? tablename 'USING' modulename ( '(' moduleargument ( ',' moduleargument )* ')' )?;

withclause  =   'WITH' ( 'RECURSIVE' )? ctetablename 'AS' '(' selectstmt ')' ( ',' ctetablename 'AS' '(' selectstmt ')' )*;


ctetablename    =   tablename ( '(' columnname ( ',' columnname )* ')' )?;

recursivecte    =   ctetablename 'AS' '(' initialselect ( 'UNION' | 'UNION' 'ALL' ) recursiveselect ')';

commontableexpression   =   tablename ( '(' columnname ( ',' columnname )* ')' )? 'AS' '(' selectstmt ')';


deletestmt  =   ( withclause )? 'DELETE' 'FROM' qualifiedtablename ( 'WHERE' expr )? limitedDeleteStatement?;

limitedDeleteStatement  = ( ( 'ORDER' 'BY' orderingterm ( ',' orderingterm )* )? 'LIMIT' expr ( ( 'OFFSET' | ',' ) expr )? )?;

detachstmt  =   'DETACH' ( 'DATABASE' )? databasename;

dropindexstmt   =   'DROP' 'INDEX' ( 'IF' 'EXISTS' )? ( databasename '.' )? indexname;

droptablestmt   =   'DROP' 'TABLE' ( 'IF' 'EXISTS' )? ( databasename '.' )? tablename;

droptriggerstmt =   'DROP' 'TRIGGER' ( 'IF' 'EXISTS' )? ( databasename '.' )? triggername;

dropviewstmt    =   'DROP' 'VIEW' ( 'IF' 'EXISTS' )? ( databasename '.' )? viewname;
bindparameter   = ('?'Word? | '='Word);
unaryoperator  =    '-'|    '+'  |  '~' |  ' NOT';
binaryoperator =    '||' | '*' | '/' | '%' | '+' | '-' | '<<' | '>>' | '&' | '|' | '<' | '<=' | '>' | '>=' | '=' | '==' | '!=' | '<>' | 'IS' | ( 'IS' 'NOT') | 'IN' |  'LIKE'  | 'GLOB' | 'MATCH' | 'REGEXP' | 'AND' | 'OR';

expr        = requiredExp optionalExp*;
requiredExp =   (literalvalue 
        | bindparameter 
        | ( ( databasename '.' )? tablename '.' )? columnname 
        | unaryoperator expr 
        | functionname '(' ( ( 'DISTINCT' )? expr ( ',' expr )* | '*' )? ')' 
        | '(' expr ')' 
        | 'CAST' '(' expr 'AS' typename ')'
        | ( ( 'NOT' )? 'EXISTS' )? '(' selectstmt ')' 
        | 'CASE' ( expr )? 'WHEN' expr 'THEN' expr ( 'ELSE' expr )? 'END' 
        | raisefunction);
optionalExp =
        binaryoperator expr 
        | 'COLLATE' collationname 
        | ( 'NOT' )? ( 'LIKE' | 'GLOB' | 'REGEXP' | 'MATCH' ) expr ( 'ESCAPE' expr )? 
        | ( 'ISNULL' | 'NOTNULL' | 'NOT' 'NULL' ) 
        | 'IS' ( 'NOT' )? expr 
        | ( 'NOT' )? 'BETWEEN' expr 'AND' expr 
        | ( 'NOT' )? 'IN' ( '(' ( selectstmt | expr ( ',' expr )* )? ')' 
        | ( databasename '.' )? tablename )
        ;

raisefunction   =   'RAISE' '(' ( 'IGNORE' | ( 'ROLLBACK' | 'ABORT' | 'FAIL' ) ',' errormessage ) ')';

stringliteral  =  Word;

blobliteral  =('x'|'X')'\'' Word '\'';   

literalvalue    =   numericliteral | stringliteral| blobliteral | 'NULL' |  'CURRENT_TIME' |    'CURRENT_DATE' | 'CURRENT_TIMESTAMP';

digit   =   '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9';

decimalpoint = ('.'|',');

numericliteral  =   ( digit ( decimalpoint ( digit )* )? | decimalpoint digit ) ( 'E' ( '+' | '-' )? digit )?;

insertstmt  =   ( withclause )? ( 'INSERT' | 'REPLACE' | 'INSERT' 'OR' 'REPLACE' | 'INSERT' 'OR' 'ROLLBACK' | 'INSERT' 'OR' 'ABORT' | 'INSERT' 'OR' 'FAIL' | 'INSERT' 'OR' 'IGNORE' ) 'INTO' ( databasename '.' )? tablename ( '(' columnname ( ',' columnname )* ')' )? ( 'VALUES' '(' expr ( ',' expr )* ')' ( ',' '(' expr ( ',' expr )* ')' )* | selectstmt | 'DEFAULT' 'VALUES' );

pragmastmt  =   'PRAGMA' ( databasename '.' )? pragmaname ( '=' pragmavalue | '(' pragmavalue ')' )?;

pragmavalue =   signednumber| name | stringliteral;

reindexstmt =   'REINDEX' ( collationname | ( databasename '.' )? ( tablename | indexname ) )?;

selectstmt  =   ( 'WITH' ( 'RECURSIVE' )? commontableexpression ( ',' commontableexpression )* )?
                ( 'SELECT' ( 'DISTINCT' | 'ALL' )? resultcolumn ( ',' resultcolumn )*
                ( 'FROM' ( tableorsubquery ( ',' tableorsubquery )* | joinclause ) )?
                ( 'WHERE' expr )?
                ( 'GROUP' 'BY' expr ( ',' expr )* ( 'HAVING' expr )? )? | 'VALUES' '(' expr ( ',' expr )* ')' ( ',' '(' expr ( ',' expr )* ')' )* ) ( compoundoperator ( 'SELECT' ( 'DISTINCT' | 'ALL' )? resultcolumn ( ',' resultcolumn )*
                ( 'FROM' ( tableorsubquery ( ',' tableorsubquery )* | joinclause ) )?
                ( 'WHERE' expr )?
                ( 'GROUP' 'BY' expr ( ',' expr )* ( 'HAVING' expr )? )? | 'VALUES' '(' expr ( ',' expr )* ')' ( ',' '(' expr ( ',' expr )* ')' )* ) )*
                ( 'ORDER' 'BY' orderingterm ( ',' orderingterm )* )?
                ( 'LIMIT' expr ( ( 'OFFSET' | ',' ) expr )? )?
            ;

joinclause  =   tableorsubquery ( joinoperator tableorsubquery joinconstraint )?;

selectcore  =   'SELECT' ( 'DISTINCT' | 'ALL' )? resultcolumn ( ',' resultcolumn )*
                ( 'FROM' ( tableorsubquery ( ',' tableorsubquery )* | joinclause ) )?
                ( 'WHERE' expr )?
                ( 'GROUP' 'BY' expr ( ',' expr )* ( 'HAVING' expr )? )?
            |   'VALUES' '(' expr ( ',' expr )* ')' ( ',' '(' expr ( ',' expr )* ')' )*
            ;

factoredselectstmt  =   ( 'WITH' ( 'RECURSIVE' )? commontableexpression ( ',' commontableexpression )* )?
                        selectcore ( compoundoperator selectcore )*
                        ( 'ORDER' 'BY' orderingterm ( ',' orderingterm )* )?
                        ( 'LIMIT' expr ( ( 'OFFSET' | ',' ) expr )? )?
                    ;

simpleselectstmt    =   ( 'WITH' ( 'RECURSIVE' )? commontableexpression ( ',' commontableexpression )* )?
                        selectcore ( 'ORDER' 'BY' orderingterm ( ',' orderingterm )* )?
                        ( 'LIMIT' expr ( ( 'OFFSET' | ',' ) expr )? )?
                    ;

compoundselectstmt  =   ( 'WITH' ( 'RECURSIVE' )? commontableexpression ( ',' commontableexpression )* )?
                        selectcore ( 'UNION' | 'UNION' 'ALL' | 'INTERSECT' | 'EXCEPT' ) selectcore
                        ( 'ORDER' 'BY' orderingterm ( ',' orderingterm )* )?
                        ( 'LIMIT' expr ( ( 'OFFSET' | ',' ) expr )? )?
                    ;

tableorsubquery     =   ( databasename '.' )? tablename ( ( 'AS' )? tablealias )? ( 'INDEXED' 'BY' indexname | 'NOT' 'INDEXED' )?
                    |   '(' ( tableorsubquery ( ',' tableorsubquery )* | joinclause ) ')'
                    |   '(' selectstmt ')' ( ( 'AS' )? tablealias )?
                ;

resultcolumn        =   '*'
            |   tablename '.' '*'
            |   expr ( ( 'AS' )? columnalias )?
            ;

joinoperator        =   ','
                |   ( 'NATURAL' )? ( 'LEFT' ( 'OUTER' )? | 'INNER' | 'CROSS' )? 'JOIN'
                ;

joinconstraint      =   ( 'ON' expr | 'USING' '(' columnname ( ',' columnname )* ')' )?;

orderingterm        =   expr ( 'COLLATE' collationname )? ( 'ASC' | 'DESC' )?;

compoundoperator    =   'UNION'|'UNION' 'ALL'| 'INTERSECT'| 'EXCEPT';

updatestmt      =   ( withclause )? 'UPDATE' ( 'OR' 'ROLLBACK' | 'OR' 'ABORT' | 'OR' 'REPLACE' | 'OR' 'FAIL' | 'OR' 'IGNORE' )? qualifiedtablename
                'SET' columnname '=' expr ( ',' columnname '=' expr )* ( 'WHERE' expr )?   limitedupdate?   ;

limitedupdate   =( ( 'ORDER' 'BY' orderingterm ( ',' orderingterm )* )? 'LIMIT' expr ( ( 'OFFSET' | ',' ) expr )? )?;

qualifiedtablename  =   ( databasename '.' )? tablename ( 'INDEXED' 'BY' indexname | 'NOT' 'INDEXED' )?;

vacuumstmt  =   'VACUUM';

//commentsyntax =    ( anythingexceptnewline )* ( newline | endofinput )
//commentsyntax =   /* ( anythingexcept*/ )* ( */ | endofinput )

Here is the ANTLR grammar that is successful:
(all ':' converted to '=' and ID converted to Word)

grammar sqlitecomplete;
options{backtrack=true;}
ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;


start : sqlstmtlist;

databasename
    :   ID;
tablename
    :   ID;
newtablename :  tablename;

tableorindexname : tablename;   

savepointname   :ID;
columnname  :ID;
indexname   :ID;
collationname   :ID;
name        :ID;
foreigntable    :ID;
triggername :ID;    
viewname    :ID;
modulename  :ID;
moduleargument  :ID;
initialselect   :ID;
recursiveselect :ID;
errormessage:ID;    
pragmaname: ID; 
columnalias:ID;
tablealias:ID;
functionname    :ID;

sqlstmtlist : sqlstmt  ( ';' ( sqlstmt )? )*;
explainOption 
    :   ( 'EXPLAIN' ( 'QUERY' 'PLAN' )? )?;
sqlstmt :   explainOption actionStatement;
actionStatement 
        :  altertablestmt 
        | analyzestmt
        | attachstmt 
        | beginstmt 
        | commitstmt 
        | createindexstmt 
        | createtablestmt 
        | createtriggerstmt 
        | createviewstmt 
        | createvirtualtablestmt
        | deletestmt 
        | detachstmt 
        | dropindexstmt 
        | droptablestmt 
        | droptriggerstmt 
        | dropviewstmt 
        | insertstmt 
        | pragmastmt 
        | reindexstmt 
        | releasestmt 
        | rollbackstmt 
        | savepointstmt 
        | selectstmt 
        | updatestmt 
        | vacuumstmt 
        ;

altertablestmt  :   'ALTER' 'TABLE' ( databasename '.' )? tablename ( 'RENAME' 'TO' newtablename | 'ADD' ( 'COLUMN' )? columndef );

analyzestmt :   'ANALYZE' ( databasename | tableorindexname | databasename '.' tableorindexname )?;


attachstmt  :   'ATTACH' ( 'DATABASE' )? expr 'AS' databasename;


beginstmt   :   'BEGIN' ( 'DEFERRED' | 'IMMEDIATE' | 'EXCLUSIVE' )? ( 'TRANSACTION' )?;


commitstmt  :   ( 'COMMIT' | 'END' ) ( 'TRANSACTION' )?;


rollbackstmt    :   'ROLLBACK' ( 'TRANSACTION' )? ( 'TO' ( 'SAVEPOINT' )? savepointname )?;


savepointstmt   :   'SAVEPOINT' savepointname;


releasestmt :   'RELEASE' ( 'SAVEPOINT' )? savepointname;


createindexstmt : 'CREATE' ( 'UNIQUE' )? 'INDEX' ( 'IF' 'NOT' 'EXISTS' )? ( databasename '.' )? indexname 'ON' tablename '(' indexedcolumn ( ',' indexedcolumn )* ')' ( 'WHERE' expr )?;


indexedcolumn   :   columnname ( 'COLLATE' collationname )? ( 'ASC' | 'DESC' )?;


createtablestmt : 'CREATE' ( 'TEMP' | 'TEMPORARY' )? 'TABLE' ( 'IF' 'NOT' 'EXISTS' )? ( databasename '.' )? tablename ( '(' columndef ( ',' columndef )* ( ',' tableconstraint )* ')' ( 'WITHOUT' 'ROWID' )? | 'AS' selectstmt );


columndef   :   columnname ( typename )? ( columnconstraint )*;

typename    :   name ( '(' signednumber ')' | '(' signednumber ',' signednumber ')' )?;

columnconstraint    :   ( 'CONSTRAINT' name )?
( 'PRIMARY' 'KEY' ( 'ASC' | 'DESC' )? conflictclause ( 'AUTOINCREMENT' )? | 'NOT' 'NULL' conflictclause | 'UNIQUE' conflictclause | 'CHECK' '(' expr ')' | 'DEFAULT' ( signednumber | literalvalue | '(' expr ')' ) | 'COLLATE' collationname | foreignkeyclause );

signednumber    :   ( '+' | '-' )? numericliteral;

tableconstraint :   ( 'CONSTRAINT' name )? ( ( 'PRIMARY' 'KEY' | 'UNIQUE' ) '(' indexedcolumn ( ',' indexedcolumn )* ')' conflictclause | 'CHECK' '(' expr ')' | 'FOREIGN' 'KEY' '(' columnname ( ',' columnname )* ')' foreignkeyclause );

foreignkeyclause    :   'REFERENCES' foreigntable ( '(' columnname ( ',' columnname )* ')' )?
( ( 'ON' ( 'DELETE' | 'UPDATE' ) ( 'SET' 'NULL' | 'SET' 'DEFAULT' | 'CASCADE' | 'RESTRICT' | 'NO' 'ACTION' ) | 'MATCH' name ) )?
( ( 'NOT' )? 'DEFERRABLE' ( 'INITIALLY' 'DEFERRED' | 'INITIALLY' 'IMMEDIATE' )? )?;

conflictclause  :   ( 'ON' 'CONFLICT' ( 'ROLLBACK' | 'ABORT' | 'FAIL' | 'IGNORE' | 'REPLACE' ) )?;

createtriggerstmt   : 'CREATE' ( 'TEMP' | 'TEMPORARY' )? 'TRIGGER' ( 'IF' 'NOT' 'EXISTS' )?
( databasename '.' )? triggername ( 'BEFORE' | 'AFTER' | 'INSTEAD' 'OF' )?
( 'DELETE' | 'INSERT' | 'UPDATE' ( 'OF' columnname ( ',' columnname )* )? ) 'ON' tablename
( 'FOR' 'EACH' 'ROW' )? ( 'WHEN' expr )?
'BEGIN' ( updatestmt | insertstmt | deletestmt | selectstmt ) ';' 'END';

createviewstmt  : 'CREATE' ( 'TEMP' | 'TEMPORARY' )? 'VIEW' ( 'IF' 'NOT' 'EXISTS' )? ( databasename '.' )? viewname 'AS' selectstmt;

createvirtualtablestmt  : 'CREATE' 'VIRTUAL' 'TABLE' ( 'IF' 'NOT' 'EXISTS' )? ( databasename '.' )? tablename 'USING' modulename ( '(' moduleargument ( ',' moduleargument )* ')' )?;

withclause  :   'WITH' ( 'RECURSIVE' )? ctetablename 'AS' '(' selectstmt ')' ( ',' ctetablename 'AS' '(' selectstmt ')' )*;


ctetablename    :   tablename ( '(' columnname ( ',' columnname )* ')' )?;

recursivecte    :   ctetablename 'AS' '(' initialselect ( 'UNION' | 'UNION' 'ALL' ) recursiveselect ')';

commontableexpression   :   tablename ( '(' columnname ( ',' columnname )* ')' )? 'AS' '(' selectstmt ')';


deletestmt  :   ( withclause )? 'DELETE' 'FROM' qualifiedtablename ( 'WHERE' expr )? limitedDeleteStatement?;

limitedDeleteStatement  : ( ( 'ORDER' 'BY' orderingterm ( ',' orderingterm )* )? 'LIMIT' expr ( ( 'OFFSET' | ',' ) expr )? )?;

detachstmt  :   'DETACH' ( 'DATABASE' )? databasename;

dropindexstmt   :   'DROP' 'INDEX' ( 'IF' 'EXISTS' )? ( databasename '.' )? indexname;

droptablestmt   :   'DROP' 'TABLE' ( 'IF' 'EXISTS' )? ( databasename '.' )? tablename;

droptriggerstmt :   'DROP' 'TRIGGER' ( 'IF' 'EXISTS' )? ( databasename '.' )? triggername;

dropviewstmt    :   'DROP' 'VIEW' ( 'IF' 'EXISTS' )? ( databasename '.' )? viewname;
bindparameter   : ('?'ID? | ':'ID);
unaryoperator  :    '-'|    '+'  |  '~' |  ' NOT';
binaryoperator :    '||' | '*' | '/' | '%' | '+' | '-' | '<<' | '>>' | '&' | '|' | '<' | '<=' | '>' | '>=' | '=' | '==' | '!=' | '<>' | 'IS' | ( 'IS' 'NOT') | 'IN' |  'LIKE'  | 'GLOB' | 'MATCH' | 'REGEXP' | 'AND' | 'OR';

expr        : requiredExp optionalExp*;
requiredExp :   (literalvalue 
        | bindparameter 
        | ( ( databasename '.' )? tablename '.' )? columnname 
        | unaryoperator expr 
        | functionname '(' ( ( 'DISTINCT' )? expr ( ',' expr )* | '*' )? ')' 
        | '(' expr ')' 
        | 'CAST' '(' expr 'AS' typename ')'
        | ( ( 'NOT' )? 'EXISTS' )? '(' selectstmt ')' 
        | 'CASE' ( expr )? 'WHEN' expr 'THEN' expr ( 'ELSE' expr )? 'END' 
        | raisefunction);
optionalExp :
        binaryoperator expr 
        | 'COLLATE' collationname 
        | ( 'NOT' )? ( 'LIKE' | 'GLOB' | 'REGEXP' | 'MATCH' ) expr ( 'ESCAPE' expr )? 
        | ( 'ISNULL' | 'NOTNULL' | 'NOT' 'NULL' ) 
        | 'IS' ( 'NOT' )? expr 
        | ( 'NOT' )? 'BETWEEN' expr 'AND' expr 
        | ( 'NOT' )? 'IN' ( '(' ( selectstmt | expr ( ',' expr )* )? ')' 
        | ( databasename '.' )? tablename )
        ;

raisefunction   :   'RAISE' '(' ( 'IGNORE' | ( 'ROLLBACK' | 'ABORT' | 'FAIL' ) ',' errormessage ) ')';

stringliteral  :  ID;

blobliteral  :('x'|'X')'\'' ID '\'';     

literalvalue    :   numericliteral | stringliteral| blobliteral | 'NULL' |  'CURRENT_TIME' |    'CURRENT_DATE' | 'CURRENT_TIMESTAMP';

digit   :   '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9';

decimalpoint : ('.'|',');

numericliteral  :   ( digit ( decimalpoint ( digit )* )? | decimalpoint digit ) ( 'E' ( '+' | '-' )? digit )?;

insertstmt  :   ( withclause )? ( 'INSERT' | 'REPLACE' | 'INSERT' 'OR' 'REPLACE' | 'INSERT' 'OR' 'ROLLBACK' | 'INSERT' 'OR' 'ABORT' | 'INSERT' 'OR' 'FAIL' | 'INSERT' 'OR' 'IGNORE' ) 'INTO' ( databasename '.' )? tablename ( '(' columnname ( ',' columnname )* ')' )? ( 'VALUES' '(' expr ( ',' expr )* ')' ( ',' '(' expr ( ',' expr )* ')' )* | selectstmt | 'DEFAULT' 'VALUES' );

pragmastmt  :   'PRAGMA' ( databasename '.' )? pragmaname ( '=' pragmavalue | '(' pragmavalue ')' )?;

pragmavalue :   signednumber| name | stringliteral;

reindexstmt :   'REINDEX' ( collationname | ( databasename '.' )? ( tablename | indexname ) )?;

selectstmt  :   ( 'WITH' ( 'RECURSIVE' )? commontableexpression ( ',' commontableexpression )* )?
                ( 'SELECT' ( 'DISTINCT' | 'ALL' )? resultcolumn ( ',' resultcolumn )*
                ( 'FROM' ( tableorsubquery ( ',' tableorsubquery )* | joinclause ) )?
                ( 'WHERE' expr )?
                ( 'GROUP' 'BY' expr ( ',' expr )* ( 'HAVING' expr )? )? | 'VALUES' '(' expr ( ',' expr )* ')' ( ',' '(' expr ( ',' expr )* ')' )* ) ( compoundoperator ( 'SELECT' ( 'DISTINCT' | 'ALL' )? resultcolumn ( ',' resultcolumn )*
                ( 'FROM' ( tableorsubquery ( ',' tableorsubquery )* | joinclause ) )?
                ( 'WHERE' expr )?
                ( 'GROUP' 'BY' expr ( ',' expr )* ( 'HAVING' expr )? )? | 'VALUES' '(' expr ( ',' expr )* ')' ( ',' '(' expr ( ',' expr )* ')' )* ) )*
                ( 'ORDER' 'BY' orderingterm ( ',' orderingterm )* )?
                ( 'LIMIT' expr ( ( 'OFFSET' | ',' ) expr )? )?
            ;

joinclause  :   tableorsubquery ( joinoperator tableorsubquery joinconstraint )?;

selectcore  :   'SELECT' ( 'DISTINCT' | 'ALL' )? resultcolumn ( ',' resultcolumn )*
                ( 'FROM' ( tableorsubquery ( ',' tableorsubquery )* | joinclause ) )?
                ( 'WHERE' expr )?
                ( 'GROUP' 'BY' expr ( ',' expr )* ( 'HAVING' expr )? )?
            |   'VALUES' '(' expr ( ',' expr )* ')' ( ',' '(' expr ( ',' expr )* ')' )*
            ;

factoredselectstmt  :   ( 'WITH' ( 'RECURSIVE' )? commontableexpression ( ',' commontableexpression )* )?
                        selectcore ( compoundoperator selectcore )*
                        ( 'ORDER' 'BY' orderingterm ( ',' orderingterm )* )?
                        ( 'LIMIT' expr ( ( 'OFFSET' | ',' ) expr )? )?
                    ;

simpleselectstmt    :   ( 'WITH' ( 'RECURSIVE' )? commontableexpression ( ',' commontableexpression )* )?
                        selectcore ( 'ORDER' 'BY' orderingterm ( ',' orderingterm )* )?
                        ( 'LIMIT' expr ( ( 'OFFSET' | ',' ) expr )? )?
                    ;

compoundselectstmt  :   ( 'WITH' ( 'RECURSIVE' )? commontableexpression ( ',' commontableexpression )* )?
                        selectcore ( 'UNION' | 'UNION' 'ALL' | 'INTERSECT' | 'EXCEPT' ) selectcore
                        ( 'ORDER' 'BY' orderingterm ( ',' orderingterm )* )?
                        ( 'LIMIT' expr ( ( 'OFFSET' | ',' ) expr )? )?
                    ;

tableorsubquery     :   ( databasename '.' )? tablename ( ( 'AS' )? tablealias )? ( 'INDEXED' 'BY' indexname | 'NOT' 'INDEXED' )?
                |   '(' ( tableorsubquery ( ',' tableorsubquery )* | joinclause ) ')'
                |   '(' selectstmt ')' ( ( 'AS' )? tablealias )?
                ;

resultcolumn        :   '*'
            |   tablename '.' '*'
            |   expr ( ( 'AS' )? columnalias )?
            ;

joinoperator        :   ','
                |   ( 'NATURAL' )? ( 'LEFT' ( 'OUTER' )? | 'INNER' | 'CROSS' )? 'JOIN'
                ;

joinconstraint      :   ( 'ON' expr | 'USING' '(' columnname ( ',' columnname )* ')' )?;

orderingterm        :   expr ( 'COLLATE' collationname )? ( 'ASC' | 'DESC' )?;

compoundoperator    :   'UNION'|'UNION' 'ALL'| 'INTERSECT'| 'EXCEPT';

updatestmt      :   ( withclause )? 'UPDATE' ( 'OR' 'ROLLBACK' | 'OR' 'ABORT' | 'OR' 'REPLACE' | 'OR' 'FAIL' | 'OR' 'IGNORE' )? qualifiedtablename
                'SET' columnname '=' expr ( ',' columnname '=' expr )* ( 'WHERE' expr )?   limitedupdate?   ;

limitedupdate   :( ( 'ORDER' 'BY' orderingterm ( ',' orderingterm )* )? 'LIMIT' expr ( ( 'OFFSET' | ',' ) expr )? )?;

qualifiedtablename  :   ( databasename '.' )? tablename ( 'INDEXED' 'BY' indexname | 'NOT' 'INDEXED' )?;

vacuumstmt  :   'VACUUM';

//commentsyntax :    ( anythingexceptnewline )* ( newline | endofinput )
//commentsyntax :   /* ( anythingexcept*/ )* ( */ | endofinput )

Line Numbers in Text View

When working with grammar in ParserGenApp it would be nice if the text view had line numbers so it is easier to identify which line has an error.

iCalendar grammar

I'm using PEGKit to make an iCalendar (RFC 5545) parser. It takes in a .ics file and reads it and will parse out the first class objects. I am running into an issue, however. How is one to handle EOFs? I'm having difficulty with this. My code gets stuck in an infinite loop when it finds them:

No viable alternative found in rule 'calprops'.
Line : 9223372036854775807
Near : «EOF» «EOF» 
Found : «EOF»

Is there any handling of this? I've tried searching for EOF but with no luck.

ParserGenApp should use ARC

Feature Request - Change to ARC

Is there a reason ParserGenApp is not using ARC?

Adding new code and functionality will be easier if we don't need to do autorelease etc. It will also be less error prone.

I can potentially see a reason not do do it in PegKit Framework if you are trying to maintain support for old code. But Cocoa pods can do that for you.

Error reporting request

Hi. I'm parsing a simple string that looks like this:

[Abc]

This parses fine, but when the app checks 'Abc', it's not valid so I'm raising an error by using the [parser raise:@"..."]; statement.

At the moment I get an error which when dumped to the logs, looks like this:

Error Domain=PEGKitErrorDomain Code=1 "Failed to match next input token" UserInfo=0x7f9a7b710710 {NSLocalizedDescription=Failed to match next input token, range=NSRange: {9223372036854775807, 5}, lineNumber=Unknown, NSLocalizedFailureReason=Unable to find any runtime object called Abc
Line : Unknown
}

My error message seems to only be accessible through the localizedFailureReason and wrapped with line feeds. IN my case I want to do other things with it.

I'd like to have the peg kit actually put my error message as the description. I'm also not sure what the range is meant to represent as any time I see an error, it always seems to be x,5. With neither the 5 or the 5 having anything to do with the location of error as far as I can tell.

Can this be improved?

Turn off emitting some didMatchXXX callbacks

Feature Request

I hope you don't mind my creating issues as I find myself wishing I could do something in PegKit. PegKit is awesome and if I did not like it I would not create issues ;) One feature I especially love is the generation of call backs for the grammar.

I really try to keep my code warnings down to zero. But one annoying thing is ParseGenApp generates callbacks for parts of grammar that will never need a callback. I can put in a bunch of empty methods to resolve those errors but that seems a bit inefficient both from a coding stand point an execution standpoint.

I think it would be nice to have some sort of syntax in the grammar definition language that tells ParseGenApp to not emit that callback.

One great upside of doing this in grammar it become very easy to see which callbacks the developer has forgotten to implement.

I see there are quite a few pull request that are not merged so I am not sure if it is worth the time to do some of this and create a pull request for it.

Question about how the parser works?

I'm not sure if I'm missing something or not. If I have an expression like this:

type = class | protocol;
class = '[' Word ']';
protocol = '<' Word '>';

When passing "[MyClass]..." I would like to get back (on the stack) "[MyClass]", but at the moment I'm getting "[", "MyClass", "]".

I'm wondering if there is something in the syntax that tells the parser to add the combined values to the stack as a single token.

Another question: When looking at tokens on the stack, is there a way to find the name of the expression that matched in the grammar. Ie. in the above, to get back 'class', or 'protocol, or 'type'.

Quotation mark generating invalid parse code

I have a grammar with a rule that looks like:

tableName =  ('"'!)? Word  ('"'!)?;

This outputs parsing code as:

- (void)tableName_ {

    if ([self predicts:TOKEN_KIND_QUOTE, 0]) {
        [self match:TOKEN_KIND_QUOTE discard:YES]; 
    }
    [self matchWord:NO]; 
    if ([self predicts:CREATETABLEPARSER_TOKEN_KIND_QUOTE_1, 0]) {
        [self match:CREATETABLEPARSER_TOKEN_KIND_QUOTE_1 discard:YES]; 
    }

}

The problem is that TOKEN_KIND_QUOTE is undefined and causes a build error. the CREATETABLEPARSER_TOKEN_KIND_QUOTE_1 works just fine and generates correctly. Honestly they should be the same.

itod / pegkit Goto Github PK

pegkit's Introduction

PEGKit

History

Documentation

Tokenization

Basic Usage of PKTokenizer

Default Behavior of PKTokenizer

Number

Symbol

Word

Quoted String

Whitespace

Comment

Delimited String

Customizing PKTokenizer behavior

Changing which characters are allowed within a token of a particular type

Changing which token type is created for a given start character

Grammars

Basic Grammar Syntax

Rules

Grouping

Discarding

Actions

Rule Actions

Grammar Actions

In the .h file:

In the .m file:

Semantic Predicates

pegkit's People

Contributors

Stargazers

Watchers

Forkers

pegkit's Issues

Some research notes on issue

Full Output

Recommend Projects

Recommend Topics

Recommend Org