Git Product home page Git Product logo

parsergen's Introduction

  • The parsergen/scannergen combo generates source code files of LR1/GLR parser & scanner from a set of annotated production rules, aka grammar.
  • Both parsergen & scannergen use the same combo (i.e. themselves) to re-generate their own parser & scanner, respectively, to evolve.
  • Building the generated code with -std=c++2a is required.
  • ๐Ÿง˜ Most often you need the combo, but not always:
    • Sometimes reusing an existing scanner with another parser is feasible and cheaper. (%IDDEF_SOURCE)
    • Sometimes a standalone scanner suffices. (see CBrackets)

Table of Contents

(Created by gh-md-toc)

Installation

  1. Make sure you have installed yay or any other pacman wrapper.

  2. yay -S parsergen to install.

  3. yay -Ql parsergen to see the installed files:

    parsergen /usr/
    parsergen /usr/bin/
    parsergen /usr/bin/grammarstrip
    parsergen /usr/bin/parsergen
    parsergen /usr/bin/scannergen
    parsergen /usr/share/
    parsergen /usr/share/licenses/
    parsergen /usr/share/licenses/parsergen/
    parsergen /usr/share/licenses/parsergen/LICENSE
    parsergen /usr/share/parsergen/
    parsergen /usr/share/parsergen/RE_Suite.txt
  4. Three commands grammarstrip parsergen scannergen at your disposal.

from github in any of Linux distros

  1. Make sure you have installed cmake make gcc git fmt, or the likes.

  2. git clone https://github.com/buck-yeh/parsergen.git
    cd parsergen
    cmake -D FETCH_DEPENDEES=1 -D DEPENDEE_ROOT=_deps .
    make -j
    PSGEN_DIR="/full/path/to/current/dir"

    p.s. You can install a tagged version by replacing main with tag name.

  3. Three commands at your disposal:

    • $PSGEN_DIR/ParserGen/grammarstrip
    • $PSGEN_DIR/ParserGen/parsergen
    • $PSGEN_DIR/ScannerGen/scannergen
  4. ๐Ÿค” But is it possible to just type grammarstrip parsergen scannergen to run them?
    ๐Ÿ’ก Append the following lines to ~/.bashrc:

    PSGEN_DIR="/full/path/to/parsergen/dir"
    alias grammarstrip="$PSGEN_DIR/ParserGen/grammarstrip"
    alias parsergen="$PSGEN_DIR/ParserGen/parsergen"
    alias scannergen="$PSGEN_DIR/ScannerGen/scannergen"

    And run the following line:

    . ~/.bashrc

    There you go! It will also take effect in subsequently opened console windows and will last after reboot.

A quick guide to parsergen/scannergen combo

When you need to quickly implement a parser for an improvised or deliberately designed DSL, prepare a grammar file in simple BNF rules with semantic annotations and then let the combo generate C++ code of parser & scanner.

Write grammar

example/CalcInt/grammar.txt defines a calculator for basic arithmetics + - * / % of integral constants in decimal, octal, or hexadecimal.

lexid   Spaces // (1)

//
//      Output Options (2)
//
%CONTEXT [[std::ostream &]]

%ON_ERROR [[
    $c <<"COL#" <<$pos.m_Col <<": " <<$message <<'\n';
]]

%EXTRA_TOKENS   [[dec_num|oct_num|hex_num|spaces]]
//%SHOW_UNDEFINED

//
//      Operator Precedence (3)
//
left   + -
left   * / %
right  ( )

//
//      Grammar with Reduction Code (4)
//
<@> ::= <Expr>  [[
    $r = $1;
]]

<Expr> ::= <Expr> + <Expr>  [[
    bux::unlex<int>($1) += bux::unlex<int>($3);
    $r = $1;
]]
<Expr> ::= <Expr> - <Expr>  [[
    bux::unlex<int>($1) -= bux::unlex<int>($3);
    $r = $1;
]]
<Expr> ::= <Expr> * <Expr>  [[
    bux::unlex<int>($1) *= bux::unlex<int>($3);
    $r = $1;
]]
<Expr> ::= <Expr> / <Expr>  [[
    bux::unlex<int>($1) /= bux::unlex<int>($3);
    $r = $1;
]]
<Expr> ::= <Expr> % <Expr>  [[
    bux::unlex<int>($1) %= bux::unlex<int>($3);
    $r = $1;
]]
<Expr> ::= ( <Expr> )       [[
    $r = $2;
]]
<Expr> ::= $Num             [[
    $r = bux::createLex(dynamic_cast<bux::C_IntegerLex&>(*$1).value<int>());
]]

(1) New lexid

(2) % Option

(3) Operator precedence

(4) Production rule

Generate C++ code of parser & scanner

When package parsergen is installed in ArchLinux

parsergen grammar.txt Parser tokens.txt && \
scannergen Scanner /usr/share/parsergen/RE_Suite.txt tokens.txt

When parsergen is built from github

parsergen grammar.txt Parser tokens.txt && \
scannergen Scanner "$PSGEN_DIR/ScannerGen/RE_Suite.txt" tokens.txt

where

Parameter Description
grammar.txt Annotated BNF rules and other types of options.
Parser Output file base - parsergen generates Parser.cpp Parser.h ParserIdDef.h
Scanner Output file base - scannergen generates Scanner.cpp Scanner.h
tokens.txt Output of parsergen & input of scannergen
RE_Suite.txt Recurring token definitions provided with scannergen and used by tokens.txt

If target source files already exist

๐Ÿ’ก Put the commands in a script called reparse for recurring uses.

โ„น๏ธ parsergen will prompt (y/n) questions three times and scannergen will prompt twice.

> ./reparse 
About to parse 'grammar.txt' ...
Total 1 lex-symbols 1 nonterms 9 literals
states = 30	shifts = 106
Spent 0.005232879"
38 out of 106 goto keys erased for redundancy.
ParserIdDef.h already exists. Overwrite it ?(y/n)y
Parser.h already exists. Overwrite it ?(y/n)y
Parser.cpp already exists. Overwrite it ?(y/n)y
Parser created
#pos_args = 4
About to parse '/usr/share/parsergen/RE_Suite.txt' ...
About to parse 'tokens.txt' ...
Scanner.h already exists. Overwrite it ?(y/n)y
Scanner.cpp already exists. Overwrite it ?(y/n)y
> _ 

Use the generated

โ„น๏ธ from example/CalcInt/main.cpp

Includes

#include "Parser.h"         // C_Parser
#include "ParserIdDef.h"    // TID_LEX_Spaces
#include "Scanner.h"        // C_Scanner

๐Ÿ’ก Including ParserIdDef.h may not be necessary when spaces can't be ignored.

Scanner|screener|parser piped to parse

C_Parser                            parser{/*args of context ctor*/};
bux::C_ScreenerNo<TID_LEX_Spaces>   screener{parser}; // (1)
C_Scanner                           scanner{screener};
bux::C_IMemStream                   in{line}; // or other std::istream derived
bux::scanFile(">", in, scanner);

// Check if parsing is ok
// ... (2)

// Acceptance
if (!parser.accepted())
{
   std::cerr <<"Incomplete expression!\n";
   continue; // or break or return
}

// Apply the result 
// parser.getFinalLex() ... (3)

(1) Screener is filter of scanner and can filter out, change, aggregate selected tokens. Don't use it if you don't need it:

C_Parser                            parser{/*args of context ctor*/};
C_Scanner                           scanner{parser};
bux::C_IMemStream                   in{line}; // or other std::istream derived
bux::scanFile(">", in, scanner);

(2) Time to check integrity of your context status.

(3) parser.getFinalLex() returns reference to the merged result of type bux::LR1::C_LexInfo. In this example, the expected result is integral value of type int and can be conveniently obtained by calling bux::unlex<T>()

bux::unlex<int>(parser.getFinalLex())

An alternative way is to store the result in the user context instance thru "production code" instead of calling parser.getFinalLex().

parsergen's People

Contributors

buck-yeh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

parsergen's Issues

fatal error: 'bux/LR1.h' file not found

Build fails:

In file included from /disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/Parser.cpp:5:
In file included from /disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/ParseFile.h:3:
/disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/Parser.h:6:10: fatal error: 'bux/LR1.h' file not found
#include <bux/LR1.h>
         ^~~~~~~~~~~

cannot find -lbracketPairing

[ 42% 21/38] : && /usr/local/bin/g++10 -O2 -pipe -fno-omit-frame-pointer  -fstack-protector-strong -Wl,-rpath=/usr/local/lib/gcc10 -fno-omit-frame-pointer  -Wl,-rpath=/usr/local/lib/gcc10 -isystem /usr/local/include -Wall -Wextra -Wshadow -Wconversion -Ofast -std=c++20 -Wno-shadow -O2 -pipe -fno-omit-frame-pointer  -fstack-protector-strong -Wl,-rpath=/usr/local/lib/gcc10 -fno-omit-frame-pointer  -Wl,-rpath=/usr/local/lib/gcc10 -isystem /usr/local/include -fstack-protector-strong -Wl,-rpath=/usr/local/lib/gcc10 -L/usr/local/lib/gcc10 ParserGen/CMakeFiles/grammarstrip.dir/GrammarStrip.cpp.o ParserGen/CMakeFiles/grammarstrip.dir/BNFContext.cpp.o ParserGen/CMakeFiles/grammarstrip.dir/Parser.cpp.o ParserGen/CMakeFiles/grammarstrip.dir/ParseFile.cpp.o ParserGen/CMakeFiles/grammarstrip.dir/ParserGenBase.cpp.o ParserGen/CMakeFiles/grammarstrip.dir/Scanner.cpp.o -o ParserGen/grammarstrip -L/disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/../../bux/src   -L/disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/../../CBrackets -Wl,-rpath,/disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/../../bux/src:/disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/../../CBrackets  -lbracketPairing  -lbux  -lfmt  -lstdc++ && :
FAILED: ParserGen/grammarstrip 
: && /usr/local/bin/g++10 -O2 -pipe -fno-omit-frame-pointer  -fstack-protector-strong -Wl,-rpath=/usr/local/lib/gcc10 -fno-omit-frame-pointer  -Wl,-rpath=/usr/local/lib/gcc10 -isystem /usr/local/include -Wall -Wextra -Wshadow -Wconversion -Ofast -std=c++20 -Wno-shadow -O2 -pipe -fno-omit-frame-pointer  -fstack-protector-strong -Wl,-rpath=/usr/local/lib/gcc10 -fno-omit-frame-pointer  -Wl,-rpath=/usr/local/lib/gcc10 -isystem /usr/local/include -fstack-protector-strong -Wl,-rpath=/usr/local/lib/gcc10 -L/usr/local/lib/gcc10 ParserGen/CMakeFiles/grammarstrip.dir/GrammarStrip.cpp.o ParserGen/CMakeFiles/grammarstrip.dir/BNFContext.cpp.o ParserGen/CMakeFiles/grammarstrip.dir/Parser.cpp.o ParserGen/CMakeFiles/grammarstrip.dir/ParseFile.cpp.o ParserGen/CMakeFiles/grammarstrip.dir/ParserGenBase.cpp.o ParserGen/CMakeFiles/grammarstrip.dir/Scanner.cpp.o -o ParserGen/grammarstrip -L/disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/../../bux/src   -L/disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/../../CBrackets -Wl,-rpath,/disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/../../bux/src:/disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/../../CBrackets  -lbracketPairing  -lbux  -lfmt  -lstdc++ && :
/usr/local/bin/ld: cannot find -lbracketPairing
/usr/local/bin/ld: cannot find -lbux
collect2: error: ld returned 1 exit status

Survey if graphviz can create miscellaneous object diagrams for README

If the answer is negative ? Whichever else can ?

For instance, I want to represent library dependencies like:
libfmt.a => libBux.a => libjson.a
libfmt.a => libBux.a => libbracketPairing.a => parsergen scannergen grammarstrip
libfmt.a => libBux.a => parsergen / scannergen generated code

clang fails to compile: error: expected concept name with optional arguments

In file included from /disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/BNFContext.cpp:1:
In file included from /disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/BNFContext.h:5:
In file included from /disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/../../bux/include/bux/Logger.h:3:
/disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/../../bux/include/bux/SyncLog.h:44:31: error: expected concept name with optional arguments
        { holder.stream() }-> std::convertible_to<std::ostream*>;
                              ^
/disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/../../bux/include/bux/SyncLog.h:152:15: error: no template named 'derived_from' in namespace 'std'
template<std::derived_from<std::ostream> T_Sink>
         ~~~~~^
/disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/../../bux/include/bux/SyncLog.h:153:26: error: template argument for template type parameter must be a type
struct C_AutoSinkHolderT<T_Sink>
                         ^~~~~~
/disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/../../bux/include/bux/SyncLog.h:98:16: note: template parameter is declared here
template<class C_LogImpl>
               ^
/disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/../../bux/include/bux/SyncLog.h:189:15: error: no template named 'derived_from' in namespace 'std'
template<std::derived_from<I_SnapT<std::ostream*>> T_Sink>
         ~~~~~^
/disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/../../bux/include/bux/SyncLog.h:190:26: error: template argument for template type parameter must be a type
struct C_AutoSinkHolderT<T_Sink>
                         ^~~~~~
/disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/../../bux/include/bux/SyncLog.h:98:16: note: template parameter is declared here
template<class C_LogImpl>
               ^
In file included from /disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/BNFContext.cpp:3:
/disk-samsung/freebsd-ports/devel/parsergen/work/parsergen-1.7.0-9-gbda59af/ParserGen/../../bux/include/bux/FsUtil.h:4:10: fatal error: 'ranges' file not found
#include <ranges>       // std::ranges::forward_range<>
         ^~~~~~~~
6 errors generated.

clang-12
FreeBSD 13

v1.7.0 todo list

  • Replace include guards with #pragma once
  • New command line flag -y to say yes to all overwrites upon outputting of both parsergen & scannergen
  • Grammar files & token files are defaulted as being prepended #pragma once. There is no way to turn this off until some use cases are observed.

test/archlinux/Dockerfile is broken

Dockerfile

> ./docker_build 
Sending build context to Docker daemon  33.28MB
Step 1/10 : FROM archlinux/archlinux
 ---> 3f4d16de804d
Step 2/10 : USER root
 ---> Using cache
 ---> 3d18db5381bf
Step 3/10 : RUN echo $'Server = http://archlinux.cs.nctu.edu.tw/$repo/os/$arch\nServer = http://ftp.tku.edu.tw/Linux/ArchLinux/$repo/os/$arch\n' | tee /etc/pacman.d/mirrorlist &&     if [[ -n "$PACDAY" ]]; then         sed -Ei "s|^Include\s*=.+$|SigLevel = PackageRequired\nServer=https://archive.archlinux.org/repos/$PACDAY/\$repo/os/\$arch|g" /etc/pacman.conf ;     fi &&     sed -Ei 's|^#TotalDownload\s*$|TotalDownload|g' /etc/pacman.conf &&     sed -Ei 's|^#VerbosePkgLists\s*$|VerbosePkgLists|g' /etc/pacman.conf &&     pacman -Sy --needed --noconfirm cmake make gcc git binutils fmt fakeroot sed gawk &&     rm -rf /root/.cache
 ---> Using cache
 ---> d6293db64c75
Step 4/10 : RUN useradd -m guest
 ---> Using cache
 ---> affcccdf0354
Step 5/10 : USER guest
 ---> Using cache
 ---> 067bea616d71
Step 6/10 : RUN  echo $'PS1=\"\\[\\033[01;37m\\]\\D{%m/%d %H:%M:%S} \\[\\033[01;32m\\]\\u@\\h\[\\033[00m\\]:\\[\\033[01;43m\\]\\w\\[\\033[00m\\] \"\nalias ll=\'ls -lF --color=auto --time-style=\"+%y/%m/%d %H:%M:%S\"\'\nexport LANG=en_US.UTF-8\nlocale-gen\n' >> ~/.bashrc
 ---> Using cache
 ---> 5a60055674c8
Step 7/10 : COPY --chown=guest:guest PKGBUILD /home/guest/Duty/
 ---> 0d62b18b0df2
Step 8/10 : RUN  cd ~/Duty &&      makepkg -s
 ---> Running in ea24e73afc5c
==> WARNING: Cannot find the sudo binary. Will use su to acquire root privileges.
==> Making package: parsergen 1.7.2-1 (Fri 03 Jun 2022 06:20:27 AM UTC)
==> Checking runtime dependencies...
warning: config file /etc/pacman.conf, line 34: directive 'TotalDownload' in section 'options' not recognized.
==> Checking buildtime dependencies...
warning: config file /etc/pacman.conf, line 34: directive 'TotalDownload' in section 'options' not recognized.
==> Retrieving sources...
==> Extracting sources...
==> Starting prepare()...
git: /usr/lib/libc.so.6: version `GLIBC_2.34' not found (required by git)
==> ERROR: A failure occurred in prepare().
    Aborting...
Removing intermediate container ea24e73afc5c
The command '/bin/sh -c cd ~/Duty &&      makepkg -s' returned a non-zero code: 4

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.