tunnelvisionlabs / antlr4 Goto Github PK

View Code? Open in Web Editor NEW

73.0 73.0 12.0 60.29 MB

The highly-optimized fork of ANTLR 4 (see README)

License: Other

Java 93.26% ANTLR 3.25% GAP 3.38% Python 0.03% HTML 0.09%

antlr4's People

Contributors

Stargazers

Watchers

Forkers

vitorelli johnwashburn modulexcite burtharris olavurdj asosnoviy nixel2007 tralivali1234 foxdd daniellansun paulk-asert br0nstein sgammon

antlr4's Issues

Question: How could I get a complete jar to compile?

I wanna compile g4 file without maven plugin, Does it exist any complete jar of tunnelvisionlabs?

look like: https://www.antlr.org/download/antlr-4.7.1-complete.jar

baseContext should check label types across rules

The following grammar should produce an error relating to the value label before generating code:

grammar A;

a : 'a';
b : 'b';
y1 : value=a;
y2 options { baseContext = y1; } : value=b;

Create a framework for transformations using immutable trees

By supporting immutable trees as output from the parser, it would be easier to create a generalized framework for concrete syntax transformations in the trees. The factory described in #1 would simplify the creation of new trees during the transformation process.

These trees would also simplify the implementation of an incremental lexer and/or parser, which is capable of efficiently creating a complete new parse tree following changes to a subset of the input sequence.

The .NET Compiler Platform is using a similar technique with great success.

Remove antlr4-annotations module/dependency

The POM of com.tunnelvisionlabs:antlr4-runtime declares a compile dependency on com.tunnelvisionlabs:antlr4-annotations. However, it seems that these annotations have been moved into antlr4-runtime, making the antlr4-annotations module obsolete.

org.antlr4 no longer publishes an antlr4-annotations module (latest version is 4.3); hence I suspect this is a left-over that should be removed.

Assert in ParserRuleContext.java makes copyFrom unusable

antlr4/runtime/Java/src/org/antlr/v4/runtime/ParserRuleContext.java

Line 143 in 269c382

assert t.getParent() == null || t.getParent() == this;

If you call

context1.copyFrom(context2)

you necessarily call addAnyChild which have this assert, so it fails.
Vanilla antlr doesn't have this, why do you?

Merge changes from 4.10 of antlr4 RI

Hi @sharwell

The reference implementation of antlr4 was released 4.10 just now, how about merging the changes into the optimized fork?
https://github.com/antlr/antlr4/releases/tag/4.10

Document optimized ATN transitions

This unique feature of the optimized fork is not currently documented.

Compiler errors related to CAP#1

Hi,

I am experimenting with the tunnelvisionlabs fork but met a lots of compilation errors related to "CAP#1". The errors come from my code in which I reference the context objects generated by antlr such as "Update_stmt_setContext".

I am using version 4.5. Is tunnelvisionlabs fork 100% compatible with original antlr4? Should I try a different version?

Please advise, thanks
water

[antlr4.6]Failed to generate parser after upgraded to 4.6 from 4.5.3

Hi Sam(@sharwell),

After I upgraded the optimized version of antlr4 to 4.6, groovy parser can not be generated. Could you please take a look at the broken changing issue? Thanks in advance!

Here are steps to reproduce:

git clone -b antlr4_6 https://github.com/danielsun1106/groovy-parser.git
cd groovy-parser
./gradlew antlr4

Error messages:
https://travis-ci.org/danielsun1106/groovy-parser/builds/188067699

error(75): GroovyParser.g4:6:11: label 'left=variableNames' type mismatch with previous definition: left=expression
error(75): GroovyParser.g4:9:85: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:10:141: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:11:118: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:20:18: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:21:209: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:28:18: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:29:131: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:30:94: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:31:91: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:32:93: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:33:91: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:34:90: label 'right=expression' type mismatch with previous definition: right=statementExpression

Alt label completion analysis does not consider baseContext

expression
  : ID # alt1
  ;

expression2 // expected error (too few alt labels)
options { baseContext = expression; }
  : ID
  ;

ID : [a-z]+;

Check the existence of variables

Hi Sam ( @sharwell ),
Could you please provide some method to check the existence of variables?
For example, if I want to check whether rule loopStatement contains variable footprint, try-catch is required because $loopStatement::footprint will throw NullPointerException when rule loopStatement is not the parent rule of continueStatement(i.e. continue statement is not inside a loop statement). Though it can complete checking, it will also impact the performance of parsing.

    try {
        $isInsideLoop = null != $loopStatement::footprint;
    } catch(NullPointerException e) {
        $isInsideLoop = false;
    }

Complete code:
https://github.com/apache/groovy/blob/master/src/main/antlr/GroovyParser.g4#L615-L634

loopStatement
locals[ String footprint = "" ]
    :   FOR LPAREN forControl rparen nls statement                                                          #forStmtAlt
    |   WHILE expressionInPar nls statement                                                                   #whileStmtAlt
    |   DO nls statement nls WHILE expressionInPar                                                            #doWhileStmtAlt
    ;

continueStatement
locals[ boolean isInsideLoop ]
@init {
    try {
        $isInsideLoop = null != $loopStatement::footprint;
    } catch(NullPointerException e) {
        $isInsideLoop = false;
    }
}
    :   CONTINUE
        { require($isInsideLoop, "the continue statement is only allowed inside loops", -8); }
        identifier?
;

Cheers,
Daniel.Sun

Unexpected parsing result in left recursive rule

Hi @sharwell ,

expression
    :  { SemanticPredicates.isTypeCast(_input) }? 
       castParExpression expression                                                        #castExprAlt

( the relevant workaround: https://github.com/danielsun1106/groovy-parser/blob/bbf0d3abaa2cf8cfdde3a8a3588bf39df2989083/src/main/antlr/GroovyParser.g4#L779-L788 )

If the above semantic predicate fails, antlr4 will not choose other alternatives of expression, so we have to extract castExpression rule.
In addition, even if we can workaround the semantic predicate issue, we found the priority of expression is not correct either, e.g.
(Integer) m() instanceof Integer should be parsed as ((Integer) m()) instanceof Integer, but is parsed as (Integer) (m() instanceof Integer) currently. As you can see, we put the castExpression at the first place so we thought it should have higher priority than instanceof expression.

Please have a look into the issue. Thanks in advance.

Cheers,
Daniel.Sun

How to build tool or where to get it from in order to generate optimized classes

Hi, I am trying to find the tool of the optimized build but I could not build it. Can you guide me how to make it or where to get it from? As I see it should generate different output than the default tool if I want to use it with the optimized runtime.

Alt label alias analysis does not consider baseContext

expression
  : ID # alt1
  ;

expression2
options { baseContext = expression; }
  : ID # alt1 // unexpected error
  ;

ID : [a-z]+;

Provide more readable result for org.antlr.v4.runtime.TokenStream#getText(java.lang.Object, java.lang.Object)

Hi Sam(@sharwell),

   When I develop the new parser Parrot for Groovy programming language,

I find the text of tokens generated by antlr4 are a bit hard to read and ambiguous, e.g.

class X {a b

As you see, a and b are concatenated without any separators:

Unexpected input: 'ab' at line: 1, column: 12

Currently org.antlr.v4.runtime.TokenStream#getText(java.lang.Object, java.lang.Object) is used.
Could you please provide more readable getText method? (e.g. remaining text of the hidden channel tokens) Or give me some hints to implement it by myself? Thanks in advance!

Remove state variable ParserFactory.ruleFunctions

antlr4/tool/src/org/antlr/v4/codegen/ParserFactory.java

Line 63 in f3510fd

 private final Map<Rule, RuleFunction> ruleFunctions = new HashMap<Rule, RuleFunction>(); 

This field can likely be removed once this is addressed:

antlr4/tool/src/org/antlr/v4/codegen/model/InvokeRule.java

Line 44 in f3510fd

// TODO: move to factory

Label conflict analysis does not consider baseContext

input
  : e=expression e=expression2 // unexpected error
  ;

expression
  : ID
  ;

expression2
options { baseContext = expression; }
  : ID
  ;

ID : [a-z]+;

Left recursive rule cannot specify baseContext

Code generation fails unexpectedly for the following grammar.

expression
  : ID
  : expression '+' expression
  ;

expression2
options { baseContext = expression; }
  : NUMBER
  : expression2 '+' expression2
  ;

ID : [a-z]+;
NUMBER : [0-9]+;

PHP Target

Hi @sharwell!

Do you consider supporting other targets? Can you give an idea of how much effort is needed to add support for a new target?

Question: what is the difference between this project and antlr/antlr4

This project seems an old snapshot of https://github.com/antlr/antlr4, giving another artifact id with the same package name with https://github.com/antlr/antlr4.

However, @sharwell merged a pull-request (#2) in this project. Why? Does this project live?

Support "Schrödinger's token"

The CaretToken used in ANTLRWorks 2 and GoWorks already acts as this type of token used with ANTLR 4. A generalized mechanism for handling this could be implemented by allowing the pushMode operation to push more than one mode in parallel, creating multiple branches of the lexer. For example, a new command could be created with the following form:

pushAnyMode(PossibleModeA, PossibleModeB)

The actual determination of the mode taken during parsing is determined by the prediction algorithm at the parser at the time the potential tokens are examined.

Question: should the META-INF/services/javax.annotation.processing.Processor file exist in the runtime artifact?

Are there scenarios where it is still needed at runtime? I would have thought that generation would require this file but then not needed at runtime? Thanks.

ATN clearDFA() results in nullpointer exception

public final void clearDFA() {
		decisionToDFA = new DFA[decisionToState.size()];
		for (int i = 0; i < decisionToDFA.length; i++) {
			decisionToDFA[i] = new DFA(decisionToState.get(i), i);
		}

the clearDFA function of ATN.java first reinitializes the DFA array and then assigns the value to array elements, due to this when parallel parsers are in use, calling clearDFA from one parser results in getting nullpointer exception in multithreaded scenarios

java.lang.NullPointerException: null
at org.antlr.v4.runtime.atn.ParserATNSimulator.adaptivePredict(ParserATNSimulator.java:367) ~[antlr4-runtime-4.9.0.jar:4.9.0]
at org.antlr.v4.runtime.atn.ParserATNSimulator.adaptivePredict(ParserATNSimulator.java:357) ~[antlr4-runtime-4.9.0.jar:4.9.0]

From the ANTLR main release branch i can see the clearDFA logic is written without the reinitilization of DFA array.. so whether this is a bug in the optimized fork ?

public final void clearDFA() {
		for (int i = 0; i < decisionToDFA.length; i++) {
			decisionToDFA[i] = new DFA(decisionToState.get(i), i);
		}

"TypeError: Class constructor Lexer cannot be invoked without 'new'"

$antlr4 -Dlanguage=JavaScript SplParser.g4

Generate file：SplParserLexer.js.

In SplParserLexer.js codes:

function SplParserLexer(input) {
antlr4.Lexer.call(this, input);
this._interp = new antlr4.atn.LexerATNSimulator(this, atn, decisionsToDFA, new antlr4.PredictionContextCache());
return this;
}

Browser error："TypeError: Class constructor Lexer cannot be invoked without 'new'"

How to solve it,please?

Improve the performance of failed semantic predicate

Background

When semantic predicate fails, FailedPredicateException will be thrown to change the control flow. But as we all know, filling the stack trace is expensive especially when FailedPredicateException is thrown frequently.

Proposal

I propose to override fillInStackTrace() of FailedPredicateException, and its implementation is simplified as follows:

    public synchronized Throwable fillInStackTrace() {
        return this;
    }

its default implementation is:

    public synchronized Throwable fillInStackTrace() {
        if (stackTrace != null ||
            backtrace != null /* Out of protocol state */ ) {
            fillInStackTrace(0);
            stackTrace = UNASSIGNED_STACK;
        }
        return this;
    }

Advantages & Disadvantages

The performance of parsing can be improved to some extent. The more semantic predicates we use, the more performance we will gain.
We can not get the stack trace of FailedPredicateException, but usually we do not care about its stack trace.
If we want to address FailedPredicateException some day, we can add the state to the exception message, e.g.

setState(1126);
if (!( !SemanticPredicates.isInvalidMethodDeclaration(_input) ))
    throw new FailedPredicateException(
        this, 
        " !SemanticPredicates.isInvalidMethodDeclaration(_input),  state: " 
            + 1126 /* the state is added here */);
setState(1127);

Investigate generating a syntax factory

The SyntaxFactory class in Roslyn provides simple methods for creating new syntax trees. ANTLR users wishing to perform manual construction of syntax trees could benefit from automatically generating this type of factory.

Error recovery does not work correctly for grammars that contain fully-optional rule bodies.

Root cause: LL1Analyzer returns incorrect follow tokens when exploring a rule with epsilon transitions from start to rule stop state, by exploring the rule stop target transition.

Background: During parsing, when an error is encountered, before exiting the rule the parser attempts error recovery. DefaultErrorStrategy's recover implementation works by resynchronizing the parser by consuming tokens until it finds one in the resynchronization set. If it successful, the input stream is then in a state where the next token is known to be able to be consumed by some ancestor rule being recursively parsed, and parsing can continue. The key is to correctly compute the resynchronization set--the set of tokens that may follow the immediate rule or any ancestor rule we are parsing. That is done using the ATN: each rule transition object stores a follow state (the state the rule resumes to after the sub-rule invocation returns); in broad strokes, starting from the follow state of the invoking state of each ancestor rule context, LL1 Analyzer walks the epsilon, predicate, and rule transitions of the ATN, adding tokens (defined by atom and wildcard transitions) it discovers. Importantly, it doesn't transition past these token transitions to ensure it operates with LL(1), and it doesn't continue past the stop state of each invoking rule.

Why does it matter that it not look past the stop state of some rule X? Because the stop state has a transition to the follow state for all possible transitions to rule X. These may not be possible states that can produce the next token as defined by the current rule context (including its ancestor rule contexts). For example, in the following grammar, rule2's stop state has an epsilon transition to the follow state of the rule transition to rule2 of both the start rule and rule1, but only ABC may follow HELLO WORLD when parsed from the start rule. If LL1 Analyzer continued past the stop state of rule2, it would transition to a state in rule1 and ZZZ would be erroneously added to the resynchronization set:

start : rule2 'ABC' EOF ;
rule1 : rule2 'ZZZ' ;
rule2 : rule3 ;
rule3 : 'HELLO' 'WORLD' ;

The bug in this issue comes into play when writing rules that may consume no tokens (i.e. a rule in which the stop state may be reached from the start state purely through epsilon/predicate transitions). For example, rule4 in the following grammar:

start : rule1 'ABC' EOF ;
rule1 : rule2 rule4 ;
rule2 : 'HELLO' 'WORLD' ;
rule3 : rule4 'ZZZ' ;
rule4 : 'DEF'* ;

Here, ZZZ should not be in the resynchronization set of the parser state after HELLO in rule2, but it is because given the rule contexts (start) -> (rule1) -> (rule2 HELLO) it can be reached by a rule transition from rule1 -> rule4 and epsilon transitions from rule4 to rule3 and an atom transition to ZZZ, before a depth-first arrival at rule1's stop state. This means parsing "HELLO ZZZ ZZZ ABC" would recover from a MismatchedTokenException in rule2 by consuming no tokens, rule1 completes normally, then the start rule also hits a MismatchedTokenException since the input stream is currently at the first ZZZ token and it is expecting ABC.

The fix is to ensure that LL1 Analyzer correctly computes the result set by never transitioning from a rule stop state, outside of special scenarios where the stop state is not set such as SLL(1) lookaheads. Then, in our example, given the correct resynchronization set {DEF, ABC} recover will correctly resynchronize by consuming the two ZZZ tokens and leaving the input stream at token ABC for the start rule to successfully parse. Note: The original ANTLR4 repository does not have this defect, as it already implements correct stopping.

As a side note, what if rule1 had a 'JKL'? optional token after the call to rule4? LL1 Analyzer already correctly handles this by storing the follow state of each rule transition that it walks, and when reaching a rule stop state it continues from that follow state if there is one. This doesn't conflict with what was described above, as we are not transitioning to the rule stop state target states but rather to the follow state of the invoking rule transition (in this example, that means when we encounter the rule4 stop state we transition to the state in rule1 after the rule4 call, in order to add token 'JKL' to the result set).