brendanzab / moniker Goto Github PK

Automagical variable binding library for Rust

License: Apache License 2.0

Rust 99.35% Shell 0.65%

rust programming-languages derive locally-nameless debruijn-indices alpha-equivalence variable-binding abstract-binding-trees unbound

moniker's Introduction

Moniker

Moniker makes it simple to track variables across nested scopes in programming language implementations.

Instead of implementing name-handling code by hand (which is often tedious and error-prone), Moniker provides a number of types and traits for declaratively describing name binding directly in your language's abstract syntax tree. From this we can derive the corresponding name-handling code automatically!

Motivation

It's interesting to note that the idea of a 'scope' comes up quite often in programming:

Description	Rust Example
case match arms	match expr { x => /* `x` bound here */ }
let bindings	let x = ...; /* `x` bound here */
recursive functions	fn foo() { /* `foo` bound here */ }
functions parameters	fn foo(x: T) { /* `x` bound here */ }
recursive types	enum List { Nil, /* `List` bound here */
type parameters	struct Point<T> { /* `T` bound here */ }

For example, let's take a silly example of a Rust function:

type Count = u32;

fn silly<T>((count, data): (Count, T)) -> T {
    match count {
        0 => data,
        count => silly((count - 1, data)),
    }
}

There's actually lots of name-binding at play here! Let's connect the binders to their corresponding binders:

//            ?
//            |
//            v
type Count = u32;
//     |
//     '----------------------.
//  .-------------------------|------------------.
//  |    .--------------------|----*------.      |
//  |    |                    |    |      |      |
//  |    |                    v    v      v      |
fn silly<T>((count, data): (Count, T)) -> T { // |
//             |      |                          |
//          .--'      |                          |
//          |         *--------------.           |
//          v         |              |           |
    match count { //  |              |           |
//             .------'              |           |
//             |                     |           |
//             v                     |           |
        0 => data, //                |           |
//         .--------------.          |           |
//         |              |          |           |
//         |              v          v           |
        count => silly((count - 1, data)), //    |
//                 ^                             |
//                 |                             |
//                 '-----------------------------'
    }
}

Keeping track of the relationships between these variables can be a pain, and can become especially error-prone when implementing evaluators and type checkers. Moniker aims to support all of these binding structures, with minimal pain!

Example

Here is how we would use Moniker to describe a very small functional language, with variables, anonymous functions, applications, and let bindings:

#[macro_use]
extern crate moniker;

use moniker::{Embed, Binder, Rec, Scope, Var};
use std::rc::Rc;

/// Expressions
///
/// ```text
/// e ::= x               variables
///     | \x => e         anonymous functions
///     | e₁ e₂           function application
/// ````
#[derive(Debug, Clone, BoundTerm)]
//                        ^
//                        |
//              The derived `BoundTerm` implementation
//              does all of the heavy-lifting!
pub enum Expr {
    /// Variables
    Var(Var<String>),
    /// Anonymous functions (ie. lambda expressions)
    Lam(Scope<Binder<String>, RcExpr>),
    /// Function applications
    App(RcExpr, RcExpr),
}

pub type RcExpr = Rc<Expr>;

We can now construct lambda expressions by doing the following:

let f = FreeVar::fresh(Some("f".to_string()));
let x = FreeVar::fresh(Some("x".to_string()));

// \f => \x => f x
Rc::new(Expr::Lam(Scope::new(
    Binder(f.clone()),
    Rc::new(Expr::Lam(Scope::new(
        Binder(x.clone()),
        Rc::new(Expr::App(
            Rc::new(Expr::Var(Var::Free(f.clone()))),
            Rc::new(Expr::Var(Var::Free(x.clone()))),
        )),
    )))
)))

More Complete Examples

More complete examples, including evaluators and type checkers, can be found under the moniker/examples directory.

Example Name	Description
`lc`	untyped lambda calculus
`lc_let`	untyped lambda calculus with nested let bindings
`lc_letrec`	untyped lambda calculus with mutually recursive bindings
`lc_multi`	untyped lambda calculus with multi-binders
`stlc`	simply typed lambda calculus with literals
`stlc_data`	simply typed lambda calculus with records, variants, literals, and pattern matching
`stlc_data_isorec`	simply typed lambda calculus with records, variants, literals, pattern matching, and iso-recursive types

Projects using Moniker

Moniker is currently used in the following Rust projects:

Pikelet: A dependently typed systems programming language

Overview of traits and data types

We separate data types into terms and patterns:

Terms

Terms are data types that implement the BoundTerm trait.

Var<N>: A variable that is either free or bound
Scope<P: BoundPattern<N>, T: BoundTerm<N>>: bind the term T using the pattern P
Ignore<T>: Ignores T when comparing for alpha equality

Implementations for tuples, strings, numbers, slices, vectors, and smart pointers are also provided for convenience.

Patterns

Patterns are data types that implement the BoundPattern trait.

Binder<N>: Captures a free variables within a term, but is ignored for alpha equality
Ignore<T>: Ignores T when comparing for alpha equality
Embed<T: BoundTerm<N>>: Embed a term T in a pattern
Nest<P: BoundPattern<N>>: Multiple nested binding patterns
Rec<P: BoundPattern<N>>: Recursively bind a pattern in itself

Implementations for tuples, strings, numbers, slices, vectors, and smart pointers are also provided for convenience.

Roadmap

Moniker is currently good enough to use for initial language prototypes, but there is more work that we'd like to do to make it a more fully-featured name binding and scope handling toolkit.

References

Here is a list of interesting references and prior art that was helpful when building Moniker. Note that it isn't necessary to read and understand these to use the library, but they might be useful if you would like to contribute!

Papers

Blog Posts

Other Libraries

The API was mainly inspired by the Unbound and Unbound-Generics libraries for Haskell, with some differences. The main change that we make is to have two separate traits (BoundTerm and BoundPattern) in place of Unbound's single Alpha type class. We've found that this better captures the semantics of the library, and greatly cuts down on the potential for accidental misuse.

Other auto-binding libraries exist for a number of different languages:

Unbound: Specify the binding structure of your data type with an expressive set of type combinators, and Unbound handles the rest! Automatically derives alpha-equivalence, free variable calculation, capture-avoiding substitution, and more.
Unbound-Generics: an independent re-implementation of Unbound but using GHC.Generics instead of RepLib.
Cαml: a tool that turns a so-called "binding specification" into an OCaml compilation unit.
alphaLib: An OCaml library that helps deal with binding constructs in abstract syntax trees.
abbot: Generation of abstract binding trees for SML
rabbot: A port of SML's Abbot to Rust
DBLib: Facilities for working with de Bruijn indices in Coq
Bound: DeBruijn indices for Haskell
Metalib: The Penn Locally Nameless Metatheory Library
LibLN: Locally nameless representation with cofinite quantification

Contributing

We really want to encourage new contributors to help out! Please come chat with us on our Gitter channel - if you have any questions about the project, or just want to say hi!

Acknowledgments

This work was done in part with the generous support of YesLogic.

moniker's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger happy-ferret heyrutvik matthewhammer marklemay curvelogic

moniker's Issues

Implement examples based on 'Let Arguments Go First'

I want to eventually use the modified bidirectional approach used in the paper 'Let Arguments Go First' in Pikelet (see pikelet-lang/pikelet#76). As a stepping stone, it might help to make some simpler examples that closely follow what was shown in the paper in this repository.

Resources

Let Arguments Go First (paper) (slides)

Bug with nested letrecs?

I think there may be a bug in the handling of nested recursive bindings with embeds... i.e. nested letrecs as in the test I've added here:

curvelogic@39d7c56

(I may be misunderstanding or there may be a way to combine Nest<> or something else to fix this but I'm not seeing it at present.)

When opening the pattern, state depth is not incremented when entering the second scope's pattern so bound variables which actually identify the nested letrec are being looked up in the outer letrec. (Which fails in the test above because there are fewer bindings in the outer letrec than the index of the binding being sought.)

Culprit seems to be a missing .incr() on line 142 of scope.rs. Depth is incremented for unsafe_body but not for unsafe_pattern. I'm not clear in my head yet whether the obvious change of adding the .incr() for the pattern too (like this) is safe or correct but it doesn't seem to break any tests at least.

Allow for the customization of the underlying String type for free names

Users might want to use a string interner for identifiers. We should be able to support this use case.

Support other styles of name binding

Initially we have support for locally nameless variable binding, but perhaps it would be handy to also support others, so users could compare them for performance. The original Unbound implementation also supports a nominal representation, for example.

locally nameless
nominal
ordered terms (see freebroccolo's ocaml-ordered-terms implementation)
scope graphs (see A Theory of Name Resolution)

More variations can be found in steshaw/lennart-lambda.

Rec pattern type

Unbound has a Rec type for defining recursive patterns. For example:

letrec 
    x = 21 + y
    y = 1
    z = x + y
in
    z + x

Could be described using the following AST:

#[derive(BoundTerm)]
enum Expr {
    Var(Var),
    Literal(i32),
    Add(Box<Expr>, Box<Expr>),
    LetRec(Bind<Rec<Vec<(Name, Embed<Box<Expr>>)>>, Box<Expr>),
}

Store max-bound variable in terms

Lean (apparently) stores the maximum bound variable to improve the performance of its locally nameless representation.

Here is a quote from section 3.3 in Elaboration in Dependent Type Theory (note that 'instantiate/abstract' maps to 'open/close' respectively in our nomenclature):

Although the locally nameless approach greatly simplifies the implementation effort, there is a performance penalty. Given a term t of size n with m binders, it takes O(nm) time to visit t while making sure there are no dangling bound variables. In [21], the authors suggest that this cost can be minimized by generalizing abstract and instantiate to process sequences of free and bound variables. This optimization is particularly effective when visiting terms containing several consecutive binders, such as λx₁ : A₁, λx₂ : A₂, … , λx_n : A_n, t. Nonetheless, we observed that these two operations were still a performance bottleneck for several files in the Lean standard library. We have addressed this problem using a very simple complementary optimization. For each term t, we store a bound B such that all de Bruijn indices occurring in t are in the range [0,B). This bound can easily be computed when we create new terms: the bound for the de Bruijn variable with index n is n + 1, and given terms t and s with bounds Bt and Bs respectively, the bound for the application (t s) is max(Bt,Bs), and the bound for (λx:t,s) is max(Bt,_Bs_−1). We use the bound B to optimize the instantiate operation. The idea is simple: B enables us to decide quickly whether any subterm contains a bound variable being instantiated or not. If it does not, then our instantiate procedure does not even visit the subterm. Similarly, for each term t, we store a bit that is set to “true” if and only if t contains a free variable. We use this bit to optimize the abstract operation, since it enables us to decide quickly whether a subterm contains a free variable.

These optimizations are crucial to our implementation. The Lean standard library currently contains 172 files, and 41,700 lines of Lean code. With the optimizations, the whole library can be compiled in 71.06 seconds using an Intel Core i7 3.6Ghz processor with 32Gb of memory. Without the optimizations, it takes 2,189.97 seconds to compile the same set of files.

Perhaps our Scope<P, T> type could include this binding depth as a field?

Use cargo-readme to keep docs and README in sync

cargo-readme seems like a handy tool!

Not sure how best to do this, considering this is a multi-crate repo 🤔

Publish v0.1 on Crates.io

This will be necessary for a 0.1 release of Pikelet.

Add an example of type checking recursive types and recursive functions

Implement pattern matching in STLC examples

The paper, The Locally Nameless Representation, has a good section (7.3) on pattern matching. It would be nice to implement this as an example! Unfortunately I can't find any examples of doing this with Unbound.

I'm guessing this would work nicely on the moniker/examples/stlc_data* examples, seeing as we already have records and variants. I'm thinking a pattern syntax like:

+ enum Pattern {
+     Var(FreeVar<String>),
+     Literal(Literal),
+     Ann(Rc<Pattern>, Embed<Rc<Type>>),
+     Record(Vec<(String, Rc<Pattern>)>),
+     Tag(String, Rc<Pattern>),
+ }
+
  #[derive(Debug, Clone, BoundTerm)]
  pub enum Expr {
      /// Annotated expressions
      Ann(Rc<Expr>, Rc<Type>),
      /// Literals
      Literal(Literal),
      /// Variables
      Var(Var<String>),
-     /// Lambda expressions, with an optional type annotation for the parameter
-     Lam(Scope<(FreeVar<String>, Embed<Option<Rc<Type>>>), Rc<Expr>>),
+     /// Lambda expressions
+     Lam(Scope<Rc<Pattern>, Rc<Expr>>),
      /// Function application
      App(Rc<Expr>, Rc<Expr>),
+     /// Case expressions
+     Case(Rc<Expr>, Vec<Scope<Rc<Pattern>, Rc<Expr>>>),
      /// Record values
      Record(Vec<(String, Rc<Expr>)>),
      /// Field projection on records
      Proj(Rc<Expr>, String),
      /// Variant introduction
      Tag(String, Rc<Expr>),
  }

Resources

Add example of Haskell style definitions

It would be nice to show an example of using Moniker to define Haskell/Idris/Purescript/Agda-style definitions.

Cache free variables in scopes

This should make free-variable calculation and substitutions much quicker.

This is what SimpleFP-v2 uses in its ABT.

Rebind pattern type

Unbound has a Rebind type for defining a series of dependent patterns. For example:

let x = 21
    y = 1 + x
in
    x + y

Could be described using the following AST:

#[derive(BoundTerm)]
enum Expr {
    Var(Var),
    Literal(i32),
    Add(Box<Expr>, Box<Expr>),
    Let(Scope<Rebind<(Name, Embed<Box<Expr>>)>, Box<Expr>),
}

Note that it's probably more idiomatic (in Rust) and cache-friendly to use an underlying Vec inside the Rebind type, rather than using Unbound's linked-list method.

Suppport multiple identifier namespaces

This would be useful for languages where the are multiple namespaces. Examples would be Rust, Haskell, or Elm, where values and types can share names, but point to different things. Eg:

data Foo a = Foo a Int

Foo points to either:

the type constructor Foo :: Type -> Type
the value constructor Foo :: forall a . a -> Int -> Foo a

Unbound manages this by adding a phantom type parameter to its Name type. So you could have either Name Type or Name Expr.

Add a Subst trait

We'd like a Subst trait, like in Unbound. It might look something like this:

trait Subst<T> {
    fn subst(&mut self, name: &Name, replacement: &T);
    fn substs(&mut self, replacements: &[(Name, T)]);
}

Implment an example of general recursion

We should use Rec<P> for this!

brendanzab / moniker Goto Github PK

moniker's Introduction

Moniker

Motivation

Example

More Complete Examples

Projects using Moniker

Overview of traits and data types

Terms

Patterns

Roadmap

References

Papers

Blog Posts

Other Libraries

Contributing

Acknowledgments

moniker's People

Contributors

Stargazers

Watchers

Forkers

moniker's Issues

Resources

Resources

Recommend Projects

Recommend Topics

Recommend Org