Git Product home page Git Product logo

ragel-bison-parser-sandbox's Introduction

ragel-bison-parser-sandbox

A small sandbox to play around with some C++ parsing libraries and learn. It parses lambda calculus expressions to an AST and prints it.

Very much inspired by the build toolchain that Sorbet uses.

./bazel run //main:sandbox -- $'let f = \\x -> x in f f'
# input:  let x = \x -> x in x x
# output: Let { bind = x, what = Lam { param = x, body = Var { var = x } }, inWhere = App { f = Var { var = x }, arg = Var { var = x } } }

# if you want to generate a compile_commands.json file:
tools/scripts/build_compilation_db.sh

Some things I cared about

  • I wrote a lot of comments
    • This was a learning exercise. I don't really intend to pursue this project further (I find it way more fun to hack on existing projects than to duplicate existing efforts into my own greenfield projects). But still, I had never worked with the lexer+parser much in Sorbet, so if I had needed to build a project from scratch, I knew this part would have taken me some getting used to. So the comments are there to remind me how the whole thing works.
  • Bazel for the build tool
    • It's what Sorbet uses
    • I'm mostly convinced all build tools are terrible; Bazel just happens to be the one that I know the most, and the one for which I know how to set up to work with libraries and my editor.
  • Ragel for lexing, Bison for parsing
    • They're what Sorbet uses, but I wanted something simpler.
    • Both the generated Ragel lexer and Bison parser are fully self-contained C++ classes (no global variables tracking state).
  • std::unique_ptr for the Bison parse result
    • I couldn't find any examples online of a simple Bison parser that used unique_ptr, so I wasn't sure if it would work or not (e.g., it might have been the case that it copy constructed something under the hood, but that's not the case; looks like you can use Bison with unique_ptr just fine).
  • The Ragel lexer iterates over a string_view::const_iterator instead of a char *.
    • This is actually pretty cool; Ragel doesn't even know that it's not operating on a char *. The generated Ragel lexer just manipulates whatever variables called p/pe/ts/te are in scope, and as long as they support the operations that it needs them to, it just compiles. Turns out that the string_view::const_iterator interface supports everything Ragel needs it to.
  • Move semantics for strings allocated by the lexer.
    • I'm told that one of the big inefficiencies in Sorbet's parser is that it does a lot of string copying. I made the sandbox::parser::Node constructors always take std::string&&'s to require that they're move'd when constructing an AST node, instead of copied.
  • There are a lot of TODO's
    • I tried to write TODOs for most of the things I like about Sorbet that I'd want to cargo cult if I ever used this as a starting point for something bigger.

Anyways, if you read this far, you might also want to look at some other things I work on:

  • Sorbet
    • The internals are documented and the codebase is using pretty modern C++. It's also relatively small compared to it's peer projects.
  • rust-lc-interp
    • I used a similar project (lambda calculus interpretter) to learn about Rust. That one not only parses but also interprets the output.
  • stlc-infer
    • At one point, I worked through an exercise from TAPL to write (naive) type inference for the simply-typed lambda calculus. It lexes, parses, type checks/infers, and interprets. It's written in Haskell.

The Rust project and the Haskell project use basically the same surface syntax, so you can get a more or less equal comparison of Ragel/Bison, Alex/Happy, and lalrpop.

TODO(jez)

  • core::GlobalState / core::Context threaded through whole program.
    • error reporting

ragel-bison-parser-sandbox's People

Contributors

jez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

cdolan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.