Git Product home page Git Product logo

moniker's Introduction

Moniker

Build Status Crates.io Docs.rs Gitter

Moniker makes it simple to track variables across nested scopes in programming language implementations.

Instead of implementing name-handling code by hand (which is often tedious and error-prone), Moniker provides a number of types and traits for declaratively describing name binding directly in your language's abstract syntax tree. From this we can derive the corresponding name-handling code automatically!

Motivation

It's interesting to note that the idea of a 'scope' comes up quite often in programming:

Description Rust Example
case match arms match expr { x => /* `x` bound here */ }
let bindings let x = ...; /* `x` bound here */
recursive functions fn foo() { /* `foo` bound here */ }
functions parameters fn foo(x: T) { /* `x` bound here */ }
recursive types enum List { Nil, /* `List` bound here */
type parameters struct Point<T> { /* `T` bound here */ }

For example, let's take a silly example of a Rust function:

type Count = u32;

fn silly<T>((count, data): (Count, T)) -> T {
    match count {
        0 => data,
        count => silly((count - 1, data)),
    }
}

There's actually lots of name-binding at play here! Let's connect the binders to their corresponding binders:

//            ?
//            |
//            v
type Count = u32;
//     |
//     '----------------------.
//  .-------------------------|------------------.
//  |    .--------------------|----*------.      |
//  |    |                    |    |      |      |
//  |    |                    v    v      v      |
fn silly<T>((count, data): (Count, T)) -> T { // |
//             |      |                          |
//          .--'      |                          |
//          |         *--------------.           |
//          v         |              |           |
    match count { //  |              |           |
//             .------'              |           |
//             |                     |           |
//             v                     |           |
        0 => data, //                |           |
//         .--------------.          |           |
//         |              |          |           |
//         |              v          v           |
        count => silly((count - 1, data)), //    |
//                 ^                             |
//                 |                             |
//                 '-----------------------------'
    }
}

Keeping track of the relationships between these variables can be a pain, and can become especially error-prone when implementing evaluators and type checkers. Moniker aims to support all of these binding structures, with minimal pain!

Example

Here is how we would use Moniker to describe a very small functional language, with variables, anonymous functions, applications, and let bindings:

#[macro_use]
extern crate moniker;

use moniker::{Embed, Binder, Rec, Scope, Var};
use std::rc::Rc;

/// Expressions
///
/// ```text
/// e ::= x               variables
///     | \x => e         anonymous functions
///     | e₁ e₂           function application
/// ````
#[derive(Debug, Clone, BoundTerm)]
//                        ^
//                        |
//              The derived `BoundTerm` implementation
//              does all of the heavy-lifting!
pub enum Expr {
    /// Variables
    Var(Var<String>),
    /// Anonymous functions (ie. lambda expressions)
    Lam(Scope<Binder<String>, RcExpr>),
    /// Function applications
    App(RcExpr, RcExpr),
}

pub type RcExpr = Rc<Expr>;

We can now construct lambda expressions by doing the following:

let f = FreeVar::fresh(Some("f".to_string()));
let x = FreeVar::fresh(Some("x".to_string()));

// \f => \x => f x
Rc::new(Expr::Lam(Scope::new(
    Binder(f.clone()),
    Rc::new(Expr::Lam(Scope::new(
        Binder(x.clone()),
        Rc::new(Expr::App(
            Rc::new(Expr::Var(Var::Free(f.clone()))),
            Rc::new(Expr::Var(Var::Free(x.clone()))),
        )),
    )))
)))

More Complete Examples

More complete examples, including evaluators and type checkers, can be found under the moniker/examples directory.

Example Name Description
lc untyped lambda calculus
lc_let untyped lambda calculus with nested let bindings
lc_letrec untyped lambda calculus with mutually recursive bindings
lc_multi untyped lambda calculus with multi-binders
stlc simply typed lambda calculus with literals
stlc_data simply typed lambda calculus with records, variants, literals, and pattern matching
stlc_data_isorec simply typed lambda calculus with records, variants, literals, pattern matching, and iso-recursive types

Projects using Moniker

Moniker is currently used in the following Rust projects:

  • Pikelet: A dependently typed systems programming language

Overview of traits and data types

We separate data types into terms and patterns:

Terms

Terms are data types that implement the BoundTerm trait.

Implementations for tuples, strings, numbers, slices, vectors, and smart pointers are also provided for convenience.

Patterns

Patterns are data types that implement the BoundPattern trait.

Implementations for tuples, strings, numbers, slices, vectors, and smart pointers are also provided for convenience.

Roadmap

Moniker is currently good enough to use for initial language prototypes, but there is more work that we'd like to do to make it a more fully-featured name binding and scope handling toolkit.

  • Initial implementation using a locally nameless representation
    • Implement basic type combinators
      • Embed
      • Ignore
      • Nest
      • Rec
      • Scope
    • Automatically derive traits
      • BoundTerm
      • BoundPattern
      • Subst
    • Allow derives to use identifier types other than String
    • Implement namespaced variables and binders
    • Performance optimizations
      • Cache max-depth of terms
      • Cache free variables of terms
      • Perform multiple-opening/closing
      • Use visitors
  • Explore implementing other name-binding schemes
    • Named with unique indices
    • Scope Graphs
    • ...?

References

Here is a list of interesting references and prior art that was helpful when building Moniker. Note that it isn't necessary to read and understand these to use the library, but they might be useful if you would like to contribute!

Papers

Blog Posts

Other Libraries

The API was mainly inspired by the Unbound and Unbound-Generics libraries for Haskell, with some differences. The main change that we make is to have two separate traits (BoundTerm and BoundPattern) in place of Unbound's single Alpha type class. We've found that this better captures the semantics of the library, and greatly cuts down on the potential for accidental misuse.

Other auto-binding libraries exist for a number of different languages:

  • Unbound: Specify the binding structure of your data type with an expressive set of type combinators, and Unbound handles the rest! Automatically derives alpha-equivalence, free variable calculation, capture-avoiding substitution, and more.
  • Unbound-Generics: an independent re-implementation of Unbound but using GHC.Generics instead of RepLib.
  • Cαml: a tool that turns a so-called "binding specification" into an OCaml compilation unit.
  • alphaLib: An OCaml library that helps deal with binding constructs in abstract syntax trees.
  • abbot: Generation of abstract binding trees for SML
  • rabbot: A port of SML's Abbot to Rust
  • DBLib: Facilities for working with de Bruijn indices in Coq
  • Bound: DeBruijn indices for Haskell
  • Metalib: The Penn Locally Nameless Metatheory Library
  • LibLN: Locally nameless representation with cofinite quantification

Contributing

We really want to encourage new contributors to help out! Please come chat with us on our Gitter channel - if you have any questions about the project, or just want to say hi!

Acknowledgments

YesLogic Logo

This work was done in part with the generous support of YesLogic.

moniker's People

Contributors

brendanzab avatar dependabot-support avatar marklemay avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

moniker's Issues

Bug with nested letrecs?

I think there may be a bug in the handling of nested recursive bindings with embeds... i.e. nested letrecs as in the test I've added here:

curvelogic@39d7c56

(I may be misunderstanding or there may be a way to combine Nest<> or something else to fix this but I'm not seeing it at present.)

When opening the pattern, state depth is not incremented when entering the second scope's pattern so bound variables which actually identify the nested letrec are being looked up in the outer letrec. (Which fails in the test above because there are fewer bindings in the outer letrec than the index of the binding being sought.)

Culprit seems to be a missing .incr() on line 142 of scope.rs. Depth is incremented for unsafe_body but not for unsafe_pattern. I'm not clear in my head yet whether the obvious change of adding the .incr() for the pattern too (like this) is safe or correct but it doesn't seem to break any tests at least.

Rec pattern type

Unbound has a Rec type for defining recursive patterns. For example:

letrec 
    x = 21 + y
    y = 1
    z = x + y
in
    z + x

Could be described using the following AST:

#[derive(BoundTerm)]
enum Expr {
    Var(Var),
    Literal(i32),
    Add(Box<Expr>, Box<Expr>),
    LetRec(Bind<Rec<Vec<(Name, Embed<Box<Expr>>)>>, Box<Expr>),
}

Store max-bound variable in terms

Lean (apparently) stores the maximum bound variable to improve the performance of its locally nameless representation.

Here is a quote from section 3.3 in Elaboration in Dependent Type Theory (note that 'instantiate/abstract' maps to 'open/close' respectively in our nomenclature):

Although the locally nameless approach greatly simplifies the implementation effort, there is a performance penalty. Given a term t of size n with m binders, it takes O(nm) time to visit t while making sure there are no dangling bound variables. In [21], the authors suggest that this cost can be minimized by generalizing abstract and instantiate to process sequences of free and bound variables. This optimization is particularly effective when visiting terms containing several consecutive binders, such as λx1 : A1, λx2 : A2, … , λxn : An, t. Nonetheless, we observed that these two operations were still a performance bottleneck for several files in the Lean standard library. We have addressed this problem using a very simple complementary optimization. For each term t, we store a bound B such that all de Bruijn indices occurring in t are in the range [0,B). This bound can easily be computed when we create new terms: the bound for the de Bruijn variable with index n is n + 1, and given terms t and s with bounds Bt and Bs respectively, the bound for the application (t s) is max(Bt,Bs), and the bound for (λx:t,s) is max(Bt,_Bs_−1). We use the bound B to optimize the instantiate operation. The idea is simple: B enables us to decide quickly whether any subterm contains a bound variable being instantiated or not. If it does not, then our instantiate procedure does not even visit the subterm. Similarly, for each term t, we store a bit that is set to “true” if and only if t contains a free variable. We use this bit to optimize the abstract operation, since it enables us to decide quickly whether a subterm contains a free variable.

These optimizations are crucial to our implementation. The Lean standard library currently contains 172 files, and 41,700 lines of Lean code. With the optimizations, the whole library can be compiled in 71.06 seconds using an Intel Core i7 3.6Ghz processor with 32Gb of memory. Without the optimizations, it takes 2,189.97 seconds to compile the same set of files.

Perhaps our Scope<P, T> type could include this binding depth as a field?

Implement pattern matching in STLC examples

The paper, The Locally Nameless Representation, has a good section (7.3) on pattern matching. It would be nice to implement this as an example! Unfortunately I can't find any examples of doing this with Unbound.

I'm guessing this would work nicely on the moniker/examples/stlc_data* examples, seeing as we already have records and variants. I'm thinking a pattern syntax like:

+ enum Pattern {
+     Var(FreeVar<String>),
+     Literal(Literal),
+     Ann(Rc<Pattern>, Embed<Rc<Type>>),
+     Record(Vec<(String, Rc<Pattern>)>),
+     Tag(String, Rc<Pattern>),
+ }
+
  #[derive(Debug, Clone, BoundTerm)]
  pub enum Expr {
      /// Annotated expressions
      Ann(Rc<Expr>, Rc<Type>),
      /// Literals
      Literal(Literal),
      /// Variables
      Var(Var<String>),
-     /// Lambda expressions, with an optional type annotation for the parameter
-     Lam(Scope<(FreeVar<String>, Embed<Option<Rc<Type>>>), Rc<Expr>>),
+     /// Lambda expressions
+     Lam(Scope<Rc<Pattern>, Rc<Expr>>),
      /// Function application
      App(Rc<Expr>, Rc<Expr>),
+     /// Case expressions
+     Case(Rc<Expr>, Vec<Scope<Rc<Pattern>, Rc<Expr>>>),
      /// Record values
      Record(Vec<(String, Rc<Expr>)>),
      /// Field projection on records
      Proj(Rc<Expr>, String),
      /// Variant introduction
      Tag(String, Rc<Expr>),
  }

Resources

Rebind pattern type

Unbound has a Rebind type for defining a series of dependent patterns. For example:

let x = 21
    y = 1 + x
in
    x + y

Could be described using the following AST:

#[derive(BoundTerm)]
enum Expr {
    Var(Var),
    Literal(i32),
    Add(Box<Expr>, Box<Expr>),
    Let(Scope<Rebind<(Name, Embed<Box<Expr>>)>, Box<Expr>),
}

Note that it's probably more idiomatic (in Rust) and cache-friendly to use an underlying Vec inside the Rebind type, rather than using Unbound's linked-list method.

Suppport multiple identifier namespaces

This would be useful for languages where the are multiple namespaces. Examples would be Rust, Haskell, or Elm, where values and types can share names, but point to different things. Eg:

data Foo a = Foo a Int

Foo points to either:

  • the type constructor Foo :: Type -> Type
  • the value constructor Foo :: forall a . a -> Int -> Foo a

Unbound manages this by adding a phantom type parameter to its Name type. So you could have either Name Type or Name Expr.

Add a Subst trait

We'd like a Subst trait, like in Unbound. It might look something like this:

trait Subst<T> {
    fn subst(&mut self, name: &Name, replacement: &T);
    fn substs(&mut self, replacements: &[(Name, T)]);
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.