Git Product home page Git Product logo

pikaparser-kotlin's Introduction

Layout-sensitive pika parser

This version of the pika parsing algorithm is based on the reference implementation of the pika parsing algorithm, described in the paper:

Pika parsing: reformulating packrat parsing as a dynamic programming algorithm solves the left recursion and error recovery problems. Luke A. D. Hutchison, May 2020.

It has, however, been migrated to Kotlin from the original Java. The ultimate goal of this implementation is to allow for parsing whitespace-sensitive layout syntax as is used in languages such as Haskell, F#, Elm, and Python.

From the README of the reference implementation project:

Pika parsing is the inverse of packrat parsing: instead of parsing top-down, left to right, pika parsing parses bottom-up, right to left, using dynamic programming. This reversed parsing order allows the parser to directly handle left-recursive grammars, and allows the parser to optimally recover from syntax errors.

Example usage

Parsing code

The following examples are in Kotlin, but follow the examples given on the reference implementation README at [https://github.com/lukehutch/pikaparser]

fun main() {

  val grammarSpec = Files.readAllLines(Paths.get("arithmetic.grammar")).joinToString()
  
  val grammar = MetaGrammar.parse(grammarSpec)
  
  val input = Files.readAllLines(Paths.get("arithmetic.input")).joinToString()
  
  val memoTable = grammar.parse(input)
  
  val topRuleName = "Program"
  val recoveryRuleNames = listOf(topRuleName, "Statement")
  
  ParserInfo.printParseResult(topRuleName, grammar, memoTable, input, recoveryRuleNames, false)
}

Grammar description file: arithmetic.grammar

Program <- Statement+;
Statement <- var:[a-z]+ '=' E ';';
E[4] <- '(' E ')';
E[3] <- num:[0-9]+ / sym:[a-z]+;
E[2] <- arith:(op:'-' E);
E[1,L] <- arith:(E op:('*' / '/') E);
E[0,L] <- arith:(E op:('+' / '-') E);

The rules are of the form RuleName <- [ASTNodeLabel:]Clause;

Clauses can be of the form:

  • X Y Z for a sequence of matches (X should match, followed by Y, followed by Z), i.e. Seq
  • X / Y / Z for ordered choice (X should match, or if it doesn't, Y should match, or if it doesn't' Z should match) , i.e. First
  • X+ to indicate that X must match one or more times, i.e. OneOrMore
  • X* to indicate that X must match zero or more times, i.e. ZeroOrMore
  • X? to indicate that X may optionally match, i.e. Optional
  • &X to look ahead and require X to match without consuming characters, i.e. FollowedBy
  • !X to look ahead and require that there is no match (the logical negation of &X), i.e. NotFollowedBy

The number in the optional square brackets after the rule name is the precedence, followed by an optional associativity modifier (,L or ,R).

Input string to parse: arithmetic.input

discriminant=b*b-4*a*c;

Generated parse tree:

Parse tree

Alternative view of generated parse tree:

Alternative view of parse tree

Generated Abstract Syntax Tree (AST):

Alternative view of parse tree

Printing syntax errors

To find syntax errors, call:

/* This call to getSyntaxErrors returns a value of type NavigableMap<Integer, Entry<Integer, String>>.
 * 
 * The first two parameters designate the grammar and the input.
 *
 * The remaining arguments are a varargs parameter listing the names of all the grammar rules 
 * that should span all the designated input. 
 */ 
val syntaxErrors = 
      memoTable.getSyntaxErrors(grammar, input, "Program", "Statement", "Expr");

Any character range that is not spanned by a match of one of the named rules is returned in the result as a syntax error. You can print out the characters in those ranges as syntax errors. The entries in the returned NavigableMap have as the key the start position of a syntax error (a zero-indexed character position from the beginning of the string), and as the value an entry consisting of the end position of the syntax error and the span of the input between the start position and the end position.

Error recovery

You can recover from syntax errors by finding the next match of any grammar rule of interest after the syntax error (i.e. after the end of the last character matched by a previous grammar rule). For example:

val programEntries: NavigableMap<Int, Match> = grammar.getNavigableMatches("Program", memoTable)
var matchEndPosition = 0

if (programEntries.isNotEmpty()) {
    val programMatch = programEntries.firstEntry().value
    if (programMatch != null) {
        val startPos = programMatch.memoKey.startPos
        val length: Int = programMatch.length
        matchEndPosition = startPos + length
    }
}
if (matchEndPosition < input.length) {
    val statementEntries: NavigableMap<Int, Match> = grammar.getNavigableMatches("Statement", memoTable)
    val nextStatementEntry = statementEntries.ceilingEntry(matchEndPosition)
    if (nextStatementEntry != null) {
        val nextStatementMatch = nextStatementEntry.value
        // ...
    }
}

pikaparser-kotlin's People

Contributors

pblair-clear avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.