precog / matryoshka Goto Github PK

View Code? Open in Web Editor NEW

810.0 57.0 87.0 6.65 MB

Generalized recursion schemes and traversals for Scala.

License: Apache License 2.0

Scala 100.00%

recursion-schemes algebra traversal recursive-structures fold

matryoshka's Introduction

Matryoshka

Generalized folds, unfolds, and traversals for fixed point data structures in Scala.

External Resources

Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire – the iconic paper that collected a lot of this info for the first time
Recursion Schemes: A Field Guide (Redux) – Ed Kmett’s summary of various folds and unfolds, with links to Haskell code
Unifying Structured Recursion Schemes – a newer paper on how to generalize recursion schemes
Efficient Nanopass Compilers using Cats and Matryoshka – Greg Pfeil’s talk on this library (and some other things)
Fix Haskell (by eliminating recursion) – Greg Pfeil’s talk on recursion schemes in Haskell
Recursion schemes by example - Tim Williams slides talk
Practical Recursion Schemes - Jared Tobin
Promorphisms, Pre and Post - Jared Tobin
Time Traveling Recursion Schemes - Jared Tobin
Automatic Differentiation via recursion schemes - Jared Tobin

Usage

Add a dependency

libraryDependencies += "com.slamdata" %% "matryoshka-core" % "0.21.5"

Optionally, you can also depend on matryoshka-scalacheck to get Arbitrary/Cogen/Shrink instances for a bunch of pattern functors and fixed points.

Apply some fix for SI-2712. Prior to 2.12, use @milessabin’s compiler plugin. As of 2.12, you can simply add scalacOptions += "-Ypartial-unification" to your build.sbt.
Add imports as needed. Usually the following should suffice

import matryoshka._
import matryoshka.implicits._

but if you need some of our pattern functors, then matryoshka.patterns._ should be added. Also, there will be cases where you need to specify explicit types (although we generally recommend abstracting over {Bir|Cor|R}ecursive type classes), so you may need matryoshka.data._ (for Fix, Mu, and Nu) and/or matryoshka.instances.fixedpoint._ for things like Nat, List, Cofree, etc. defined in terms of Mu/Nu.

Introduction

This library is predicated on the idea of rewriting your recursive data structures, replacing the recursive type reference with a fresh type parameter.

sealed abstract class Expr
final case class Num(value: Long)      extends Expr
final case class Mul(l: Expr, r: Expr) extends Expr

could be rewritten as

sealed abstract class Expr[A]
final case class Num[A](value: Long) extends Expr[A]
final case class Mul[A](l: A, r: A)  extends Expr[A]

This abstract class generally allows a Traverse instance (or at least a Functor instance). Then you use one of the fixed point type constructors below to regain your recursive type.

You may also want instances for Delay[Equal, ?], Delay[Order, ?], and Delay[Show, ?] (which are very similar to their non-Delay equivalents) to get instances for fixed points of your functor.

Fixpoint Types

These types take a one-arg type constructor and provide a recursive form of it.

All of these types have instances for Recursive, Corecursive, FunctorT, TraverseT, Equal, Show, and Arbitrary type classes unless otherwise noted.

Fix – This is the simplest fixpoint type, implemented with general recursion.
Mu – This is for inductive (finite) recursive structures, models the concept of “data”, aka, the “least fixed point”.
Nu – This is for coinductive (potentially infinite) recursive structures, models the concept of “codata”, aka, the “greatest fixed point”.
Cofree[?[_], A] – Only has a Corecursive instance if there’s a Monoid for A. This represents a structure with some metadata attached to each node. In addition to the usual operations, it can also be folded using an Elgot algebra.
Free[?[_], A] – Does not have a Recursive instance. In addition to the usual operations, it can also be created by unfolding with an Elgot coalgebra.

So a type like Mu[Expr] is now isomorphic to the original recursive type. However, the point is to avoid operating on recursive types directly …

Algebras

A structure like this makes it possible to separate recursion from your operations. You can now write transformations that operate on only a single node of your structure at a time.

This diagram covers the major classes of transformations. The most basic ones are in the center and the arrows show how they can be generalized in various ways.

Here is a very simple example of an algebra (eval) and how to apply it to a recursive structure.

// we will need a Functor[Expr] in order to call embed bellow
implicit val exprFunctor = new scalaz.Functor[Expr] {
  override def map[A, B](fa: Expr[A])(f: (A) => B) = fa match{
    case Num(value) => Num[B](value)
    case Mul(l, r) => Mul(f(l), f(r))
  }
}

val eval: Algebra[Expr, Long] = { // i.e. Expr[Long] => Long
  case Num(x)    => x
  case Mul(x, y) => x * y
}
 
def someExpr[T](implicit T: Corecursive.Aux[T, Expr]): T =
  Mul(Num[T](2).embed, Mul(Num[T](3).embed,
      Num[T](4).embed).embed).embed

import matryoshka.data.Mu 

someExpr[Mu[Expr]].cata(eval) // ⇒ 24

The .embed calls in someExpr wrap the nodes in the fixed point type. embed is generic, and we abstract someExpr over the fixed point type (only requiring that it has an instance of Corecursive), so we can postpone the choice of the fixed point as long as possible.

The example directory will show you how to use other algebras and recursion schemes.

Recursion Schemes

Here is a cheat-sheet (also available in PDF) for some of them.

Folds

Those algebras can be applied recursively to your structures using many different folds. cata in the example above is the simplest fold. It traverses the structure bottom-up, applying the algebra to each node. That is the general behavior of a fold, but more complex ones allow for various comonads and monads to affect the result.

Unfolds

These are the dual of folds – using coalgebras to deconstruct values into parts, top-down. They are defined in the Corecursive type class.

Refolds

Refolds compose an unfold with a fold, never actually constructing the intermediate fixed-point structure. Therefore, they are available on any value, and are not part of a type class.

Transformations

The structure of these type classes is similar to Recursive and Corecursive, but rather than separating them between bottom-up and top-down traversals, FunctorT has both bottom-up and top-down traversals (and refold), while TraverseT has all the Kleisli variants (paralleling how Traverse extends Functor). A fixed-point type that has both Recursive and Corecursive instances has an implied TraverseT instance.

The benefits of these classes is that it is possible to define the required map and traverse operations on fixed-point types that lack either a project or an embed (e.g., Cofree[?[_], A] lacks embed unless A has a Monoid instance, but can easily be mapped over).

The tradeoff is that these operations can only transform between one fixed-point functor and another (or, in some cases, need to maintain the same functor).

The names of these operations are the same as those in Recursive and Corecursive, but prefixed with trans.

There is an additional (restricted) set of operations that also have a T suffix (e.g., transCataT). These only generalize in “the Elgot position” and require you to maintain the same functor. However, it can be the most natural way to write certain transformations, like matryoshka.algebras.substitute.

Generalization

There are generalized forms of most recursion schemes. From the basic cata (and its dual, ana), we can generalize in a few ways. We name them using either a prefix or suffix, depending on how they’re generalized.

G…

Most well known (in fact, even referred to as “generalized recursion schemes”) is generalizing over a Comonad (or Monad), converting an algebra like F[A] => A to F[W[A]] => A. Many of the other named folds are instances of this –

when W[A] = (T[F], A), it’s para,
when W[A] = (B, A), it’s zygo, and
when W[A] = Cofree[F, A], it’s histo.

These specializations can give rise to other generalizations. zygoT uses EnvT[B, ?[_], A] and ghisto uses Cofree[?[_], A].

…M

Less unique to recursion schemes, there are Kleisli variants that return the result in any monad.

Elgot…

This generalization, stolen from the “Elgot algebra”, is similar to standard generalization, except it uses W[F[A]] => A rather than F[W[A]] => A, with the Comonad outside the functor. Not all of the forms seem to be as useful as the G variants, but in some cases, like elgotZygo, it offers benefits of its own.

GElgot…M

Any of these generalizations can be combined, so you can have an algebra that is generalized along two or three dimensions. A fold like cofPara takes an algebra that’s generalized like zygo ((B, ?)) in the “Elgot” dimension and like para ((T[F], ?)) in the “G” dimension, which looks like (B, F[(T[F], A)]) => A. It’s honestly useful. I swear.

Implementation

Since we can actually derive almost everything from a fairly small number of operations, why don’t we? Well, there are a few reasons, enumerated here in descending order of how valid I think they are:

Reducing constraints. In the case of para, using gcata(distPara, …) would introduce a Corecursive constraint, and all of the Kleisli variants require Traverse for the functor, not just Functor.
Improving performance. cata implemented directly (presumably) performs better than gcata[Id, …]. We should have some benchmarks added eventually to actually determine when this is worth doing.
Helping inference. While we are (planning to) use kinda-curried type parameters to help with this, it’s still the case that gcata generally requires all the type parameters to be specified, while, say, zygo doesn’t. You can notice these instances because their definition actually is just to call the generalized version, rather than being implemented directly.

Contributing

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Users

Quasar

matryoshka's People

Contributors

Stargazers

Watchers

matryoshka's Issues

Add recursion-scheme-friendly ABTs

Without all the associated machinery, we want something like this (but less hand-wavy):

sealed abstract class ABT[F[_], A]
final case class Var[F[_], A](v: String) extends ABT[F, A]
final case class Abs[F[_], A](v: String, term: F[A]) extends ABT[F, A]

type BoundExpr[F[_], A] = HCoproduct(ABT, Expr, F, A)

which relies on us having mutual recursion, but gives us multi-sorted ABTs pretty simply, I think.

Cyclic data structures

At Scala by the Bay I asked @sellout if recursion schemes could be used on data structures which are not only recursive, but actually contain cycles. If you ever figure it out, please let me know. Thanks!

Use Unapply trick on folds.

Because things like Cofree are shaped like [?[_], A] rather than just [?[_]] we end up needing to annotate folds like

p.cata[Cofree[ProfF, Unit]](Cofree((), _))

however, we might be able to create something like cataU (analogous to traverseU) that allows us to avoid the type annotation.

implicitly[Recursive[Fix[Option]]] ambiguity between recursiveTRecursive and birecursiveTBirecursive

Symptoms

cata and other recursion patterns unavailable for Fix and related types.

scalaVersion := "2.11.8"
"com.slamdata"  %% "matryoshka-core" % "0.16.4",

Causes

Ambiguity between recursiveTRecursive and birecursiveTBirecursive, apparently in top-level package object.

How to reproduce

import matryoshka._
import matryoshka.implicits._
import matryoshka.data.Fix
implicitly[Recursive[Fix[Option]]]

<console>:19: error: ambiguous implicit values:
 both method recursiveTRecursive in package matryoshka of type [T[_[_]], F[_]](implicit evidence$81: matryoshka.RecursiveT[T])matryoshka.Recursive.Aux[T[F],F]
 and method birecursiveTBirecursive in package matryoshka of type [T[_[_]], F[_]](implicit evidence$83: matryoshka.BirecursiveT[T])matryoshka.Birecursive.Aux[T[F],F]
 match expected type matryoshka.Recursive[matryoshka.data.Fix[Option]]
       implicitly[Recursive[Fix[Option]]]
                 ^

Same problem for Mu and Nu.

Workaround

def r(implicit r: BirecursiveT[Fix]) = matryoshka.birecursiveTBirecursive(r)

Scala.js support

Are there any plans for this? Most of this project looks to be pure Scala as far as I can tell.

Add laws.

E.g.

Birecursive#iso should pass the monocle.Iso laws.
x.ana(f).cata(g) ≟ x.hylo(g, f) (for some generic f and g)
x.cata(g) ≟ x.hylo(g, _.project) and its dual
either make other operations (gcata, futu, etc.) final or add laws comparing them to the default impls

Generalize folds from Comonads to Adjunctions.

As described in Unifying Structured Recursion Schemes, we can define adjoint folds that cover all of the current Comonadic folds as well as a number of others that don’t fall under the Comonad model (mutu being the one non-comonadic case we currently have implemented).

Figure out how to create various instances on arbitrary fixed-point types.

E.g., is there something we can do, like (Recursive t f, PatternMonoid f) => Monoid t (where PatternMonoid is the missing piece), and similarly for other type classes.

We have (or maybe used to have?) things like this for Functor and Foldable, but should try to expand it as much as possible. I think I remember seeing some instances like this in the Haskell world (perhaps in @pa-ba /compdata).

Are there any plans of providing a cats version of matryoshka?

This would make it easier to depend on it from a project I'm working on where bringing in scalaz as a dependency is a point of contention.

Support for mutual recursion? + motivating example

Hi, I saw the talk from Greg Pfeil and found the park on mutual recursion really appealing.

I created a simple example of numeric and boolean operations that I would be happy to add to a documentation once I figure how to do traversals. It is a nice domain to document due to some obvious rewrites on boolean expressions.

I appreciate that we need a natural transformation to transform A (is higher-kinded Functor a thing?). But not sure if there is further infrastructure around this.

sealed trait BoolAndCalcF[A[_],I]
// Using AtLeastTwoList as a more complex use case
case class And[A[_]](expressions: AtLeastTwoList[A[Boolean]]) extends BoolAndCalcF[A, Boolean]
case class Or[A[_]](expressions: AtLeastTwoList[A[Boolean]]) extends BoolAndCalcF[A, Boolean]
case class Not[A[_]](wrapped: A[Boolean]) extends BoolAndCalcF[A, Boolean]
case class BoolVal[A[_]](value: Boolean) extends BoolAndCalcF[A, Boolean]

// A bridge from calculation to booleans
case class Compare[A[_],I : Numeric, J: Numeric](left: A[I], op: CompOp, right: A[I]) extends BoolAndCalcF[A, Boolean]

case class Add[A[_], I : Numeric](left: A[I], right: A[I]) extends BoolAndCalcF[A, I]
//To show it would not compose with the rest due to Option
case class Divide[A[_], I : Numeric](left: A[I], right: A[I]) extends BoolAndCalcF[A, Option[I]]
case class Num[A[_], I : Numeric](value: I) extends BoolAndCalcF[A, I]

sealed trait CompOp
object CompOp {
  case object GT extends CompOp
  case object LT extends CompOp
  case object EQ extends CompOp
  case object NE extends CompOp
  case object GE extends CompOp
  case object LE extends CompOp
}
//import cats.data.OneAnd
//type AtLeastTwoList[A] = OneAnd[OneAnd[List,A],A]
case class AtLeastTwoList[A](head: A, second:A, tail: List[A]) {
  def map[B](f: A => B) = AtLeastTwoList(f(head), f(second), tail.map(f))
  def fold[B](h: A => B)(f: (B, A) => B): B = tail.foldLeft[B](f(h(head), second))(f)
}

matryoshka-core jvm artifact has Scala.js dependencies

Having a dependency on "com.slamdata" %% "matryoshka-core" % "0.18.0" brings the Scala.js version of monocle-core, scalaz-core and simulacrum.

This creates problems, for example, if you depend on a binary-compatible but different version of scalaz, you will have both on the classpath because sbt can't evict one of them as the artifact names are different. Also sbt-assembly will fail on merge because duplicated .class files are not equals.

Add scala-clippy advice

Matryoshka could definitely benefit from library-specific advice when it comes to compiler errors. See here for how: https://github.com/softwaremill/scala-clippy#library-specific-advice

Add scala-exercises

Hopefully this can be done as a subproject, otherwise we'all need a separate repo.

It'd be great to create a tutorial using the scala-exercises/scala-exercises tool, so users can interactively solve Matryoshka problems.

“Adjust” `FunctorT` for directly-recursive types.

FunctorT (and TraverseT) was originally created to support transformations on types that had some way to “map” over the pattern functor, but lacked either a project or embed (i.e., {Co}Free). However, there’s no way to change those type classes to the directly recursive approach and maintain that property. I’m not sure what the right approach is going forward.

{Co}Free no longer works with FunctorT, so the original use case is already gone (the old implementations relied on incorrect semantics for those types, so it’s just as well)
But maybe there are other types that would benefit from that approach?
With some constraint changes, we could fold them into the Recursive and Corecursive type classes.
But at that point, there are just generic algebra transformations that would avoid needing additional folds at all.

Ok, so I think I’ve convinced myself that those type classes should go away and we should just implement algebra transformations (e.g., AlgebraicTransform => Algebra) to allow us to continue to write transformations the way we already do.

Add examples module

It would be good to add an examples module to the project to show usages of different recursion schemes. I'd be happy to have a go at this.

What are your thoughts?

README should mention the need to import fixpoint types

To use Fix etc., it is insufficient to import only

import matryoshka._
import matryoshka.implicits._

An explicit import of matryoshka.data.Fix etc. is required. It would be helpful if the README said so more prominently; right now, this is buried in the eval example.

Shaky foundations for generalized recursion schemes

Hi! I wanted to let you know about a subtle but pretty fundamental bug we've found in Haskell's recursion-schemes library. Since Matryoshka uses the same "recursion-schemes from comonads" approach as we do, and exposes the same gcata, distZygoT, and histHisto building blocks as we do, I'm pretty sure the bug affects Matryoshka as well.

Now, the bug is pretty subtle. Only zygomorphic histomorphisms and their generalizations (such as the infamous zygo-histomorphic prepromorphism) seem to be affected, and even then, they only compute an incorrect answer when the input tree's depth is at least 3. Another symptom is that histomorphisms do more work when implemented via gcata and distHisto than when implemented by hand. That's the symptom I noticed first, but be sure to read further down, it really is a correctness issue, not just a performance issue.

In our Haskell implementation, we have decided to move away from comonads, but that's a pretty big change, so you might consider the following slightly less drastic but still painful change: remove gzygo and distZygoT from the library. zygo and distZygo are still fine. Manually reimplementing histo might be a good idea too. Keep in mind, however, that these are merely the symptoms I have noticed; there might be more.

Anyway, I wrote a lot more about the problem in the upstream issue, but since the problem is pretty subtle, let me know if you have any question.

Organize for minimal imports.

To make it simple for clients to use the library (and to try to encourage a particular usage pattern), we should organize it so a user almost always needs only a single import per file. Here are the three use cases I see for different parts of a program:

when defining algebras

algebra types
Recursive/Corecursive/FunctorT/TraverseT types
project/embed/map/traverse

when using algebras

AlgebraOps
folds/unfolds/refolds
Possibly rec/Corecursive/etc instances
Distributive laws

when creating recursive data

fixpoint types & their rec/Corecursive/... Instances

For each of these cases, there should be two imports – one that provides everything via “ops” and one that provides package methods.

Add matryoshka-refined project.

For instances on refined types.

E.g. Birecursive.Aux[Int With NonNegative, Option]

Trampolines

All of matryoshka's recursion operators at the moment are not stack-safe. However, always trampolining them can be a performance issue. Therefore, is there any interest in representing recursion schemes in a recursion-less fashion, so that they can be executed in a trampolined or partially-trampolined (a la http://blog.higher-order.com/blog/2015/06/18/easy-performance-wins-with-scalaz/) context?

One way to do this without sacrificing the ability to make plain (not trampolined) recursive calls for data which is known to be small is to write recursion schemes in CPS, such that an ordinary recursive function

def f(a: A): B

is represented as:

def f(a: A, loop: A => Trampoline[B]): Trampoline[B])

Then the scheme can make calls to loop instead of explicitly recursing, and it can be executed with each step trampolined, or with none of them trampolined, or with only steps exceeding some maximum stack size trampolined.

Add `Birecursive` type class

The direct approach of

sealed trait Birecursive[T] extends Recursive[T] with Corecursive[T]

doesn’t seem to cut it when an abstract type member is in play, so we need something more clever.

I, @paulp, and @edmundnoble have taken swipes at this so far.

Add "How to install instructions"

Scala 2.13 artifact?

Hi,

Is there any possibility to publish matryoshka-core artifact for scala 2.13?

Can I help with that?

CC: @djspiewak

Best regards,
Karol