Git Product home page Git Product logo

antlr4's People

Contributors

abego avatar bhamiltoncx avatar calaura avatar carocad avatar chaseoxide avatar davesisson avatar dtymon avatar ericvergnaud avatar eternalphane avatar ewanmellor avatar hanjoes avatar janyou avatar jm-mikkelsen avatar kvanttt avatar lingyv-li avatar marcohu avatar michaelpj avatar mike-lischke avatar niccroad avatar nttdatahenriksorensen avatar parrt avatar pboyer avatar redtailedhawk avatar renatahodovan avatar sharwell avatar solussd avatar thomasb81 avatar willfaught avatar wjkohnen avatar xied75 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

antlr4's Issues

Create a framework for transformations using immutable trees

By supporting immutable trees as output from the parser, it would be easier to create a generalized framework for concrete syntax transformations in the trees. The factory described in #1 would simplify the creation of new trees during the transformation process.

These trees would also simplify the implementation of an incremental lexer and/or parser, which is capable of efficiently creating a complete new parse tree following changes to a subset of the input sequence.

The .NET Compiler Platform is using a similar technique with great success.

Remove antlr4-annotations module/dependency

The POM of com.tunnelvisionlabs:antlr4-runtime declares a compile dependency on com.tunnelvisionlabs:antlr4-annotations. However, it seems that these annotations have been moved into antlr4-runtime, making the antlr4-annotations module obsolete.

org.antlr4 no longer publishes an antlr4-annotations module (latest version is 4.3); hence I suspect this is a left-over that should be removed.

Compiler errors related to CAP#1

Hi,

I am experimenting with the tunnelvisionlabs fork but met a lots of compilation errors related to "CAP#1". The errors come from my code in which I reference the context objects generated by antlr such as "Update_stmt_setContext".

I am using version 4.5. Is tunnelvisionlabs fork 100% compatible with original antlr4? Should I try a different version?

Please advise, thanks
water

[antlr4.6]Failed to generate parser after upgraded to 4.6 from 4.5.3

Hi Sam(@sharwell),

After I upgraded the optimized version of antlr4 to 4.6, groovy parser can not be generated. Could you please take a look at the broken changing issue? Thanks in advance!

Here are steps to reproduce:

  1. git clone -b antlr4_6 https://github.com/danielsun1106/groovy-parser.git
  2. cd groovy-parser
  3. ./gradlew antlr4

Error messages:
https://travis-ci.org/danielsun1106/groovy-parser/builds/188067699

error(75): GroovyParser.g4:6:11: label 'left=variableNames' type mismatch with previous definition: left=expression
error(75): GroovyParser.g4:9:85: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:10:141: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:11:118: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:20:18: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:21:209: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:28:18: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:29:131: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:30:94: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:31:91: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:32:93: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:33:91: label 'right=expression' type mismatch with previous definition: right=statementExpression
error(75): GroovyParser.g4:34:90: label 'right=expression' type mismatch with previous definition: right=statementExpression

Check the existence of variables

Hi Sam ( @sharwell ),
     Could you please provide some method to check the existence of variables?
     For example, if I want to check whether rule loopStatement contains variable footprint, try-catch is required because $loopStatement::footprint will throw NullPointerException when rule loopStatement is not the parent rule of continueStatement(i.e. continue statement is not inside a loop statement). Though it can complete checking, it will also impact the performance of parsing.

    try {
        $isInsideLoop = null != $loopStatement::footprint;
    } catch(NullPointerException e) {
        $isInsideLoop = false;
    }

Complete code:
https://github.com/apache/groovy/blob/master/src/main/antlr/GroovyParser.g4#L615-L634

loopStatement
locals[ String footprint = "" ]
    :   FOR LPAREN forControl rparen nls statement                                                          #forStmtAlt
    |   WHILE expressionInPar nls statement                                                                   #whileStmtAlt
    |   DO nls statement nls WHILE expressionInPar                                                            #doWhileStmtAlt
    ;

continueStatement
locals[ boolean isInsideLoop ]
@init {
    try {
        $isInsideLoop = null != $loopStatement::footprint;
    } catch(NullPointerException e) {
        $isInsideLoop = false;
    }
}
    :   CONTINUE
        { require($isInsideLoop, "the continue statement is only allowed inside loops", -8); }
        identifier?
;

Cheers,
Daniel.Sun

Unexpected parsing result in left recursive rule

Hi @sharwell ,

expression
    :  { SemanticPredicates.isTypeCast(_input) }? 
       castParExpression expression                                                        #castExprAlt

( the relevant workaround: https://github.com/danielsun1106/groovy-parser/blob/bbf0d3abaa2cf8cfdde3a8a3588bf39df2989083/src/main/antlr/GroovyParser.g4#L779-L788 )

If the above semantic predicate fails, antlr4 will not choose other alternatives of expression, so we have to extract castExpression rule.
In addition, even if we can workaround the semantic predicate issue, we found the priority of expression is not correct either, e.g.
(Integer) m() instanceof Integer should be parsed as ((Integer) m()) instanceof Integer, but is parsed as (Integer) (m() instanceof Integer) currently. As you can see, we put the castExpression at the first place so we thought it should have higher priority than instanceof expression.

Please have a look into the issue. Thanks in advance.

Cheers,
Daniel.Sun

Provide more readable result for org.antlr.v4.runtime.TokenStream#getText(java.lang.Object, java.lang.Object)

Hi Sam(@sharwell),

   When I develop the new parser Parrot for Groovy programming language, 

I find the text of tokens generated by antlr4 are a bit hard to read and ambiguous, e.g.

class X {a b

As you see, a and b are concatenated without any separators:

Unexpected input: 'ab' at line: 1, column: 12

Currently org.antlr.v4.runtime.TokenStream#getText(java.lang.Object, java.lang.Object) is used.
Could you please provide more readable getText method? (e.g. remaining text of the hidden channel tokens) Or give me some hints to implement it by myself? Thanks in advance!

Left recursive rule cannot specify baseContext

Code generation fails unexpectedly for the following grammar.

expression
  : ID
  : expression '+' expression
  ;

expression2
options { baseContext = expression; }
  : NUMBER
  : expression2 '+' expression2
  ;

ID : [a-z]+;
NUMBER : [0-9]+;

PHP Target

Hi @sharwell!

Do you consider supporting other targets? Can you give an idea of how much effort is needed to add support for a new target?

Support "Schrödinger's token"

The CaretToken used in ANTLRWorks 2 and GoWorks already acts as this type of token used with ANTLR 4. A generalized mechanism for handling this could be implemented by allowing the pushMode operation to push more than one mode in parallel, creating multiple branches of the lexer. For example, a new command could be created with the following form:

pushAnyMode(PossibleModeA, PossibleModeB)

The actual determination of the mode taken during parsing is determined by the prediction algorithm at the parser at the time the potential tokens are examined.

ATN clearDFA() results in nullpointer exception

public final void clearDFA() {
		decisionToDFA = new DFA[decisionToState.size()];
		for (int i = 0; i < decisionToDFA.length; i++) {
			decisionToDFA[i] = new DFA(decisionToState.get(i), i);
		}

the clearDFA function of ATN.java first reinitializes the DFA array and then assigns the value to array elements, due to this when parallel parsers are in use, calling clearDFA from one parser results in getting nullpointer exception in multithreaded scenarios

java.lang.NullPointerException: null
at org.antlr.v4.runtime.atn.ParserATNSimulator.adaptivePredict(ParserATNSimulator.java:367) ~[antlr4-runtime-4.9.0.jar:4.9.0]
at org.antlr.v4.runtime.atn.ParserATNSimulator.adaptivePredict(ParserATNSimulator.java:357) ~[antlr4-runtime-4.9.0.jar:4.9.0]

From the ANTLR main release branch i can see the clearDFA logic is written without the reinitilization of DFA array.. so whether this is a bug in the optimized fork ?

public final void clearDFA() {
		for (int i = 0; i < decisionToDFA.length; i++) {
			decisionToDFA[i] = new DFA(decisionToState.get(i), i);
		}

"TypeError: Class constructor Lexer cannot be invoked without 'new'"

$antlr4 -Dlanguage=JavaScript SplParser.g4

Generate file:SplParserLexer.js.

In SplParserLexer.js codes:

function SplParserLexer(input) {
antlr4.Lexer.call(this, input);
this._interp = new antlr4.atn.LexerATNSimulator(this, atn, decisionsToDFA, new antlr4.PredictionContextCache());
return this;
}

Browser error:"TypeError: Class constructor Lexer cannot be invoked without 'new'"

How to solve it,please?

Improve the performance of failed semantic predicate

Background

When semantic predicate fails, FailedPredicateException will be thrown to change the control flow. But as we all know, filling the stack trace is expensive especially when FailedPredicateException is thrown frequently.

Proposal

I propose to override fillInStackTrace() of FailedPredicateException, and its implementation is simplified as follows:

    public synchronized Throwable fillInStackTrace() {
        return this;
    }

its default implementation is:

    public synchronized Throwable fillInStackTrace() {
        if (stackTrace != null ||
            backtrace != null /* Out of protocol state */ ) {
            fillInStackTrace(0);
            stackTrace = UNASSIGNED_STACK;
        }
        return this;
    }

Advantages & Disadvantages

  • The performance of parsing can be improved to some extent. The more semantic predicates we use, the more performance we will gain.
  • We can not get the stack trace of FailedPredicateException, but usually we do not care about its stack trace.
    If we want to address FailedPredicateException some day, we can add the state to the exception message, e.g.
setState(1126);
if (!( !SemanticPredicates.isInvalidMethodDeclaration(_input) ))
    throw new FailedPredicateException(
        this, 
        " !SemanticPredicates.isInvalidMethodDeclaration(_input),  state: " 
            + 1126 /* the state is added here */);
setState(1127);

Investigate generating a syntax factory

The SyntaxFactory class in Roslyn provides simple methods for creating new syntax trees. ANTLR users wishing to perform manual construction of syntax trees could benefit from automatically generating this type of factory.

Error recovery does not work correctly for grammars that contain fully-optional rule bodies.

Root cause: LL1Analyzer returns incorrect follow tokens when exploring a rule with epsilon transitions from start to rule stop state, by exploring the rule stop target transition.

Background: During parsing, when an error is encountered, before exiting the rule the parser attempts error recovery. DefaultErrorStrategy's recover implementation works by resynchronizing the parser by consuming tokens until it finds one in the resynchronization set. If it successful, the input stream is then in a state where the next token is known to be able to be consumed by some ancestor rule being recursively parsed, and parsing can continue. The key is to correctly compute the resynchronization set--the set of tokens that may follow the immediate rule or any ancestor rule we are parsing. That is done using the ATN: each rule transition object stores a follow state (the state the rule resumes to after the sub-rule invocation returns); in broad strokes, starting from the follow state of the invoking state of each ancestor rule context, LL1 Analyzer walks the epsilon, predicate, and rule transitions of the ATN, adding tokens (defined by atom and wildcard transitions) it discovers. Importantly, it doesn't transition past these token transitions to ensure it operates with LL(1), and it doesn't continue past the stop state of each invoking rule.

Why does it matter that it not look past the stop state of some rule X? Because the stop state has a transition to the follow state for all possible transitions to rule X. These may not be possible states that can produce the next token as defined by the current rule context (including its ancestor rule contexts). For example, in the following grammar, rule2's stop state has an epsilon transition to the follow state of the rule transition to rule2 of both the start rule and rule1, but only ABC may follow HELLO WORLD when parsed from the start rule. If LL1 Analyzer continued past the stop state of rule2, it would transition to a state in rule1 and ZZZ would be erroneously added to the resynchronization set:

start : rule2 'ABC' EOF ;
rule1 : rule2 'ZZZ' ;
rule2 : rule3 ;
rule3 : 'HELLO' 'WORLD' ;

The bug in this issue comes into play when writing rules that may consume no tokens (i.e. a rule in which the stop state may be reached from the start state purely through epsilon/predicate transitions). For example, rule4 in the following grammar:

start : rule1 'ABC' EOF ;
rule1 : rule2 rule4 ;
rule2 : 'HELLO' 'WORLD' ;
rule3 : rule4 'ZZZ' ;
rule4 : 'DEF'* ;

Here, ZZZ should not be in the resynchronization set of the parser state after HELLO in rule2, but it is because given the rule contexts (start) -> (rule1) -> (rule2 HELLO) it can be reached by a rule transition from rule1 -> rule4 and epsilon transitions from rule4 to rule3 and an atom transition to ZZZ, before a depth-first arrival at rule1's stop state. This means parsing "HELLO ZZZ ZZZ ABC" would recover from a MismatchedTokenException in rule2 by consuming no tokens, rule1 completes normally, then the start rule also hits a MismatchedTokenException since the input stream is currently at the first ZZZ token and it is expecting ABC.

The fix is to ensure that LL1 Analyzer correctly computes the result set by never transitioning from a rule stop state, outside of special scenarios where the stop state is not set such as SLL(1) lookaheads. Then, in our example, given the correct resynchronization set {DEF, ABC} recover will correctly resynchronize by consuming the two ZZZ tokens and leaving the input stream at token ABC for the start rule to successfully parse. Note: The original ANTLR4 repository does not have this defect, as it already implements correct stopping.

As a side note, what if rule1 had a 'JKL'? optional token after the call to rule4? LL1 Analyzer already correctly handles this by storing the follow state of each rule transition that it walks, and when reaching a rule stop state it continues from that follow state if there is one. This doesn't conflict with what was described above, as we are not transitioning to the rule stop state target states but rather to the follow state of the invoking rule transition (in this example, that means when we encounter the rule4 stop state we transition to the state in rule1 after the rule4 call, in order to add token 'JKL' to the result set).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.