Git Product home page Git Product logo

nimly's Introduction

nimly

github_workflow nimble

Lexer Generator and Parser Generator as a Macro Library in Nim.

With nimly, you can make lexer/parser by writing definition in formats like lex/yacc. nimly generates lexer and parser by using macro in compile-time, so you can use nimly not as external tool of your program but as a library.

niml

niml is a macro to generate a lexer.

macro niml

macro niml makes a lexer. Almost all part of constructing a lexer is done in compile-time. Example is as follows.

## This makes a LexData object named myLexer.
## This lexer returns value with type ``Token`` when a token is found.
niml myLexer[Token]:
  r"if":
    ## this part converted to procbody.
    ## the arg is (token: LToken).
    return TokenIf()
  r"else":
    return TokenElse()
  r"true":
    return TokenTrue()
  r"false":
    return TokenFalse()
  ## you can use ``..`` instead of ``-`` in ``[]``.
  r"[a..zA..Z\-_][a..zA..Z0..9\-_]*":
    return TokenIdentifier(token)
  ## you can define ``setUp`` and ``tearDown`` function.
  ## ``setUp`` is called from ``open``, ``newWithString`` and
  ## ``initWithString``.
  ## ``tearDown`` is called from ``close``.
  ## an example is ``test/lexer_global_var.nim``.
  setUp:
    doSomething()
  tearDown:
    doSomething()

Meta charactors are as following:

  • \: escape character
  • .: match with any charactor
  • [: start of character class
  • |: means or
  • (: start of subpattern
  • ): end of subpattern
  • ?: 0 or 1 times quantifier
  • *: 0 or more times quantifire
  • +: 1 or more times quantifire
  • {: {n,m} is n or more and m or less times quantifire

In [], meta charactors are as following

  • \: escape character
  • ^: negate character (only in first position)
  • ]: end of this class
  • -: specify character range (.. can be used instead of this)

Each of followings is recognized as character set.

  • \d: [0..9]
  • \D: [^0..9]
  • \s: [ \t\n\r\f\v]
  • \S: [^ \t\n\r\f\v]
  • \w: [a..zA..Z0..9_]
  • \w: [^a..zA..Z0..9_]

nimy

nimy is a macro to generate a LALR(1) parser.

macro nimy

macro nimy makes a parser. Almost all part of constructing a parser is done in compile-time. Example is as follows.

## This makes a LexData object named myParser.
## first cloud is the top-level of the BNF.
## This lexer recieve tokens with type ``Token`` and token must have a value
## ``kind`` with type enum ``[TokenTypeName]Kind``.
## This is naturally satisfied when you use ``patty`` to define the token.
nimy myParser[Token]:
  ## the starting non-terminal
  ## the return type of the parser is ``Expr``
  top[Expr]:
    ## a pattern.
    expr:
      ## proc body that is used when parse the pattern with single ``expr``.
      ## $1 means first position of the pattern (expr)
      return $1

  ## non-terminal named ``expr``
  ## with returning type ``Expr``
  expr[Expr]:
    ## first pattern of expr.
    ## ``LPAR`` and ``RPAR`` is TokenKind.
    LPAR expr RPAR:
      return $2

    ## second pattern of expr.
    ## ``PLUS`` is TokenKind.
    expr PLUS expr
      return $2

You can use following EBNF functions:

  • XXX[]: Option (0 or 1 XXX). The type is seq[xxx] where xxx is type of XXX.
  • XXX{}: Repeat (0 or more XXX). The type is seq[xxx] where xxx is type of XXX.

Example of these is in next section.

Example

tests/test_readme_example.nim is an easy example.

import unittest
import patty
import strutils

import nimly

## variant is defined in patty
variant MyToken:
  PLUS
  MULTI
  NUM(val: int)
  DOT
  LPAREN
  RPAREN
  IGNORE

niml testLex[MyToken]:
  r"\(":
    return LPAREN()
  r"\)":
    return RPAREN()
  r"\+":
    return PLUS()
  r"\*":
    return MULTI()
  r"\d":
    return NUM(parseInt(token.token))
  r"\.":
    return DOT()
  r"\s":
    return IGNORE()

nimy testPar[MyToken]:
  top[string]:
    plus:
      return $1

  plus[string]:
    mult PLUS plus:
      return $1 & " + " & $3

    mult:
      return $1

  mult[string]:
    num MULTI mult:
      return "[" & $1 & " * " & $3 & "]"

    num:
      return $1

  num[string]:
    LPAREN plus RPAREN:
      return "(" & $2 & ")"

    ## float (integer part is 0-9) or integer
    NUM DOT[] NUM{}:
      result = ""
      # type of `($1).val` is `int`
      result &= $(($1).val)
      if ($2).len > 0:
        result &= "."
      # type of `$3` is `seq[MyToken]` and each elements are NUM
      for tkn in $3:
        # type of `tkn.val` is `int`
        result &= $(tkn.val)

test "test Lexer":
  var testLexer = testLex.newWithString("1 + 42 * 101010")
  testLexer.ignoreIf = proc(r: MyToken): bool = r.kind == MyTokenKind.IGNORE

  var
    ret: seq[MyTokenKind] = @[]

  for token in testLexer.lexIter:
    ret.add(token.kind)

  check ret == @[MyTokenKind.NUM, MyTokenKind.PLUS, MyTokenKind.NUM,
                 MyTokenKind.NUM, MyTokenKind.MULTI,
                 MyTokenKind.NUM, MyTokenKind.NUM, MyTokenKind.NUM,
                 MyTokenKind.NUM, MyTokenKind.NUM, MyTokenKind.NUM]

test "test Parser 1":
  var testLexer = testLex.newWithString("1 + 42 * 101010")
  testLexer.ignoreIf = proc(r: MyToken): bool = r.kind == MyTokenKind.IGNORE

  var parser = testPar.newParser()
  check parser.parse(testLexer) == "1 + [42 * 101010]"

  testLexer.initWithString("1 + 42 * 1010")

  parser.init()
  check parser.parse(testLexer) == "1 + [42 * 1010]"

test "test Parser 2":
  var testLexer = testLex.newWithString("1 + 42 * 1.01010")
  testLexer.ignoreIf = proc(r: MyToken): bool = r.kind == MyTokenKind.IGNORE

  var parser = testPar.newParser()
  check parser.parse(testLexer) == "1 + [42 * 1.01010]"

  testLexer.initWithString("1. + 4.2 * 101010")

  parser.init()
  check parser.parse(testLexer) == "1. + [4.2 * 101010]"

test "test Parser 3":
  var testLexer = testLex.newWithString("(1 + 42) * 1.01010")
  testLexer.ignoreIf = proc(r: MyToken): bool = r.kind == MyTokenKind.IGNORE

  var parser = testPar.newParser()
  check parser.parse(testLexer) == "[(1 + 42) * 1.01010]"

Install

  1. nimble install nimly

Now, you can use nimly with import nimly.

vmdef.MaxLoopIterations Problem

During compiling lexer/parser, you can encounter errors with interpretation requires too many iterations. You can avoid this error to use the compiler option maxLoopIterationsVM:N which is available since nim v1.0.6.

See #11 to detail.

Contribute

  1. Fork this
  2. Create new branch
  3. Commit your change
  4. Push it to the branch
  5. Create new pull request

Changelog

See changelog.rst.

Developing

You can use nimldebug and nimydebug as a conditional symbol to print debug info.

example: nim c -d:nimldebug -d:nimydebug -r tests/test_readme_example.nim

nimly's People

Contributors

loloicci avatar shinkarom avatar zabemath avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

nimly's Issues

Confusing error message on ambiguous parse

I am not an expert in lexing/parsing, but I think this can be improved. In the example below, there are two parsing matchers, and they both essentially do the same thing. word1 matches a sequence of word2s, and word2 matches a sequence of characters. In this example, it is impossible to tell where the words should be broken up; any sequence of characters could be divided into quite a few sequences of words. This fails to compile (message below) with an unhelpful message that could probably be improved.

import patty
import nimly
import unittest

variantp FluentToken:
  Character(character: char)

niml fluentLexer[FluentToken]:
  r"[A..Za..z1..9\-]":
    Character(token.token[0])

nimy fluentParser[FluentToken]:
  top[seq[string]]:
    word1:
      return $1
  
  word1[seq[string]]:
    word2{}:
      return $1
  
  word2[string]:
    Character{}:
      var str = ""
      for character in $1:
        str &= character.character
      return str

test "test":
  var testLexer = fluentLexer.newWithString("testing")
  var parser = fluentParser.newParser()
  discard parser.parse(testLexer)

Outputs:

➜  nim-fluent nim c -r src/nim_fluent/test.nim
Hint: used config file '/Users/wys/.choosenim/toolchains/nim-1.4.4/config/nim.cfg' [Conf]
Hint: used config file '/Users/wys/.choosenim/toolchains/nim-1.4.4/config/config.nims' [Conf]
.........................................
stack trace: (most recent call last)
/Users/wys/my-workspace/nim-fluent/src/nim_fluent/test.nim(22, 5)
/Users/wys/my-workspace/nim-fluent/src/nim_fluent/test.nim(22, 5) :tmp
/Users/wys/.nimble/pkgs/nimly-0.7.0/nimly/lalr.nim(159, 18) makeTableLALR
/Users/wys/.nimble/pkgs/nimly-0.7.0/nimly/lalr.nim(119, 19) toLALRKernel
/Users/wys/.nimble/pkgs/nimly-0.7.0/nimly/lalr.nim(74, 13) closure
/Users/wys/.nimble/pkgs/nimly-0.7.0/nimly/lalr.nim(65, 25) closure
/Users/wys/.nimble/pkgs/nimly-0.7.0/nimly/parsetypes.nim(291, 11) calFirsts
/Users/wys/.choosenim/toolchains/nim-1.4.4/lib/system/assertions.nim(30, 26) failedAssertImpl
/Users/wys/.choosenim/toolchains/nim-1.4.4/lib/system/assertions.nim(23, 11) raiseAssert
/Users/wys/.choosenim/toolchains/nim-1.4.4/lib/system/fatal.nim(49, 5) sysFatal
/Users/wys/my-workspace/nim-fluent/src/nim_fluent/test.nim(12, 6) template/generic instantiation of `nimy` from here
/Users/wys/.choosenim/toolchains/nim-1.4.4/lib/system/fatal.nim(49, 5) Error: unhandled exception: /Users/wys/.nimble/pkgs/nimly-0.7.0/nimly/parsetypes.nim(291, 18) `false`  [AssertionDefect]

Add Function to Define setUp and tearDown in niml

Add function to define setUp and tearDown in niml nand nimy.
setUp runs when the lexer/parser is constructed and tearDown runs when they destructed.

  • implement setUp and tearDown in niml
  • implement setUp and tearDown in nimy
  • add documents

Remake Errors to Solve Some Warnings

Warning: inherit from a more precise exception type like ValueError, IOError or OSError. If these don't suit, inherit from CatchableError or Defect.

[Suggestion] Add EOF to lexer

It is useful to be able to match the end of the string when lexing. Consider the following grammar in ebnf:

text_line ::= [a-zA-Z]* line_end
line_end ::= "\u000D\u000A" | "\u000A" | EOF

The grammar above will take any alphabetic characters until either a newline or the end of the file. As a result, a blank newline at the end of a line is optional.
This grammar can not accurately be represented in nimly because of the EOF. nimly can be expanded by adding the $ symbol to mean the end of input, similar to regex:

niml fluentLexer[MyToken]:
  "[a..zA..Z]":
    MyAlphaToken(token.token)
  "[\u000D\u000A|\u000A|$]":
    MyLineEndToken()

Change nimy design

Change the usage of nimy
from:

nimy par:
  ...

par.init()
par.parse(lexer)

to:

nimy par:
  ...

var parser = par.newParser()
parser.parse(lexer)

It is enabled by #57.
It makes nimy enable to use like niml and maybe #56.

This is a breaking change.

Ability to define two parsers from the same lexer

If you define a lexer niml lexer[MyToken] and then two parsers nimy parserOne[MyToken] and nimy parserTwo[MyToken] you get an error due to a redefinition of the function parse (parsegen.nim:1001)

Fix sets usage

.../nimly/src/nimly/parsetypes.nim(165, 12) Warning: Deprecated since v0.20; sets are initialized by default; isValid is deprecated [Deprecated]
.../nimly/src/nimly/parsetypes.nim(166, 16) Warning: Deprecated since v0.20; sets are initialized by default; isValid is deprecated [Deprecated]

Debug setup timing in niml

setUp is not been executed in

proc open*[T](lexer: var NimlLexer[T], path: string) =
lexer.open(openFileStream(path))
proc initWithString*[T](lexer: var NimlLexer[T], str: string) =
lexer.open(newStringStream(str))

Related with #55

Double counting newlines

The following code:

import strutils

import nimly/lextypes
import nimly/lexgen
import nimly/lexer


type
  TokenType = enum
    OP
    INT
    IGNORE

  Token = object
    typ: TokenType
    val: string


niml testLex[Token]:
  r"\+|-|\*|/":
    echo "$1 @ (line, col): ($2, $3)" % [$token.token, $token.lineNum, $token.colNum]
    return Token(typ: OP, val: token.token)
  r"\d+":
    echo "$1 @ (line, col): ($2, $3)" % [$token.token, $token.lineNum, $token.colNum]
    return Token(typ: INT, val: token.token)
  r"\s":
    return Token(typ: IGNORE, val: "")


when isMainModule:

    var
      str = "1 / \n 22 + \n42"
      calcLexer = testLex.newWithString(str)
    calcLexer.ignoreIf = proc(r: Token): bool = r.val == ""

    for token in calcLexer.lexIter:
      echo "$1: $2" % [$token.typ, $token.val]

outputs the following:

1 @ (line, col): (1, 0)
INT: 1
/ @ (line, col): (1, 2)
OP: /
22 @ (line, col): (3, 1)
INT: 22
+ @ (line, col): (3, 4)
OP: +
42 @ (line, col): (5, 0)
INT: 42

So it is correctly tracking the column position but double counting newlines. I realise that nimly is using lexbase for this but I've used lexbase for a handwritten lexer and didn't have this issue. I've taken a look at the code but I can't see where the double count happens. There is a bit too much macro magic happening here for me to follow.

Remove Unused Functions

.../nimly/src/nimly/parser.nim(84, 6) Hint: 'inst' is declared but not used [XDeclaredButNotUsed]
.../nimly/src/nimly/parser.nim(65, 3) Hint: 'IntToSym' is declared but not used [XDeclaredButNotUsed]
.../nimly/src/nimly/parser.nim(66, 3) Hint: 'IntToRule' is declared but not used [XDeclaredButNotUsed]
.../nimly/src/nimly/parser.nim(61, 3) Hint: 'RuleToInt' is declared but not used [XDeclaredButNotUsed]
.../src/nimly/parser.nim(60, 3) Hint: 'SymbolToInt' is declared but not used [XDeclaredButNotUsed]
...
.../nimly/src/nimly/parsegen.nim(935, 5) Hint: 'its' is declared but not used [XDeclaredButNotUsed]
.../nimly/src/nimly/parsegen.nim(936, 5) Hint: 'itr' is declared but not used [XDeclaredButNotUsed]
.../nimly/src/nimly/parsegen.nim(354, 6) Hint: 'addIntToRule' is declared but not used [XDeclaredButNotUsed]
.../nimly/src/nimly/parsegen.nim(325, 6) Hint: 'addRuleToInt' is declared but not used [XDeclaredButNotUsed]
.../nimly/src/nimly/parsegen.nim(296, 6) Hint: 'addVarSymToInt' is declared but not used [XDeclaredButNotUsed]
.../nimly/src/nimly/parsegen.nim(383, 6) Hint: 'addVarIntToSym' is declared but not used [XDeclaredButNotUsed]

Add Function Lexer Produce a Token for EOF

Add an option to NimlLexer to lexNext / lexIter provide a specified token when the lexer reaches EOF only once.

Example:

# lexer setup
lexer.ignoreIf = someProc
lexer.produceEOFToken(tokenForEOF) # if it is not given, reaching EOF produces no token.

# use lexer
...

related: #70

Compile time parser?

Hi,

I'm trying to use nimly in macro to parse a string and generate Nim AST, but I get an error when I try to use a generated parser:

Error: cannot evaluate at compile time: parser

The code I use is pretty simple and looks like this:

variant SomeToken:
  ...

niml lexer[SomeToken]:
  ...

nimy parser[SomeToken]:
  ...

macro fromParsedString(str: untyped) =
  var lex = lexer.newWithString($str)
  parser.init() # <- Error: cannot evaluate at compile time: parser
  let someAst = parser.parse(lex)

Is it possibe to use nimly generated parser at compile time?
I can prepare a full example code but I want to know if it's not unsupported by the nimly design?

hangs with `^` regex

variant MyToken:
  tSYM(str: string)

niml lexer[MyToken]:
  r"\S+": tSYM(token.token)
nimble --cc:tcc build
  Verifying dependencies for [email protected]
      Info: Dependency on nimly@any version already satisfied
  Verifying dependencies for [email protected]
      Info: Dependency on patty@>= 0.3.3 already satisfied
  Verifying dependencies for [email protected]
   Building kiosk/kiosk using c backend
...

Long delay about 10-15 seconds, then error follows:

        ... Command: "/home/ponyatov/.nimble/bin/nim" c --noNimblePath -d:NimblePkgVersion=0.1.0 --cc:tcc --path:"/home/ponyatov/.nimble/pkgs/nimly-0.6.0"  --path:"/home/ponyatov/.nimble/pkgs/patty-0.3.3"  -o:"/home/ponyatov/metaL/kiosk/kiosk" "/home/ponyatov/metaL/kiosk/src/kiosk.nim"
        ... stack trace: (most recent call last)
        ... /home/ponyatov/metaL/kiosk/src/kiosk.nim(33, 11)
        ... /home/ponyatov/metaL/kiosk/src/kiosk.nim(33, 11) :tmp
        ... /home/ponyatov/.nimble/pkgs/nimly-0.6.0/nimly/lexgen.nim(595, 43) convertToLexData
        ... /home/ponyatov/.choosenim/toolchains/nim-1.2.4/lib/pure/collections/tables.nim(846, 15) []
        ... /home/ponyatov/.choosenim/toolchains/nim-1.2.4/lib/pure/collections/tables.nim(262, 7) []
        ... /home/ponyatov/metaL/kiosk/src/kiosk.nim(30, 6) template/generic instantiation of `niml` from here
        ... /home/ponyatov/.choosenim/toolchains/nim-1.2.4/lib/pure/collections/tables.nim(262, 7) Error: unhandled exception: key not found: -1 [KeyError]

Parser does not work when compiling to JavaScript

Minimal example:

# A parser that only parses the string "0"

import nimly
import patty

variant ZeroToken:
  zero

niml ZeroLexer[ZeroToken]:
  r"0":
    zero()

nimy ZeroParser[ZeroToken]:
  top[int]:
    zero:
      0

var lexer = ZeroLexer.newWithString("0")
var parser = ZeroParser.newParser
echo parser.parse lexer

When the code is compiled to C, it successfully outputs 0. However, if compiled to JavaScript (with -d:release), it outputs:

/home/xigoi/sandbox/nimlytest.js:878
    throw new Error(cbuf_1420201);
    ^

Error: Error: unhandled exception: Unexpected token(kind: TermS, term: zero)is passed.
token: (kind: zero) [NimyActionError]

    at unhandledException (/home/xigoi/sandbox/nimlytest.js:878:11)
    at raiseException (/home/xigoi/sandbox/nimlytest.js:478:5)
    at parseImpl_14412133 (/home/xigoi/sandbox/nimlytest.js:2596:11)
    at parse_14412000 (/home/xigoi/sandbox/nimlytest.js:2962:25)
    at Object.<anonymous> (/home/xigoi/sandbox/nimlytest.js:2972:23)
    at Module._compile (node:internal/modules/cjs/loader:1108:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1137:10)
    at Module.load (node:internal/modules/cjs/loader:988:32)
    at Function.Module._load (node:internal/modules/cjs/loader:828:14)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:76:12)

Note that this happens both in Node and in the browser. The error happens on the first token passed to the parser, no matter what the parser looks like.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.