Git Product home page Git Product logo

gkossakowski / kentuckymule Goto Github PK

View Code? Open in Web Editor NEW
156.0 15.0 10.0 8.41 MB

Limits of Scala typechecking speed

Home Page: https://medium.com/@gkossakowski/can-scala-have-a-highly-parallel-typechecker-95cd7c146d20

License: Other

Python 0.01% Scala 96.95% Ruby 0.01% Shell 0.32% Awk 0.01% CSS 0.35% JavaScript 0.34% Java 1.98% HTML 0.04% Batchfile 0.01% C 0.01%
scala-compiler typechecker scala

kentuckymule's Introduction

Kentucky Mule

Kentucky Mule is an exploration of an alternative architecture for Scala compiler design (specifically typechecker) with a focus on speed. Lessons learned from Kentucky Mule apply to a wide range of possible compilers and are not limited to just the Scala compiler.

Kentucky Mule's origins are described in my blog post Can Scala have a highly parallel typechecker?

Since the time I wrote the blog post, I rephrased the original question into a twofold one with only winning outcomes:

How to build a highly parallel and high-performance typechecker, or does Scala have a fundamental language design flaw that prevents such from being made?

The prototype in this repo computes the outline types I described in the blog post. The outline types enable the computation of dependencies between symbols in the symbol table.

Status update (August 2020): Kentucky Mule served its purpose as a research vehicle and is in a retired state. More on this in the blog post I wrote: https://gkk.dev/posts/status-kentucky-mule-faster-scala-compiler

Name

Kentucky Mule is the name of a bourbon-based cocktail served at Beretta, a San Francisco restaurant. Other ingredients of Kentucky Mule are ginger beer, lime juice, a pinch of cane sugar, and mint as garnish.

Kentucky Mule is mixed and served in Collins glass at 1199 Valencia St in San Francisco's Mission district.

Demo

Let's see Kentucky Mule in action. Below Kentucky Mule is analyzing scalap sources and printing dependencies as seen at the API level, without typechecking method bodies:

Kentucky Mule processing scalap sources

Kentucky Mule processes over two thousand lines of code in 600ms on a cold JVM.

Once the JVM is warmed up, the parser becomes the bottleneck. If I skip parsing in benchmarking, Kentucky Mule calculates outline types at the speed of processing over 4.4 million lines of Scala code per second.

If I add dependency extraction and analysis (as presented above) of scalap sources, the performance is at 1.8 million lines of Scala code per second.

Is Kentucky Mule really fast?

The form of typechecking (or "pre-typechecking") implemented by Kentucky Mule is a very stripped-down version of the real typechecking scalac performs when computing types of declarations. However, there's one benchmark in the repo where the comparison is fair. The name of the benchmark is 10k.scala.

In this specific benchmark, the scalac and Kentucky Mule carry out very comparable work. And the performance of both is different. The Scala compiler type checks the 10k.scala file at the speed of 7k lines of code per second. Kentucky Mule type checks the 10k.scala file at the speed of 15m lines of code per second. Yes, Kentucky Mule is that blazingly fast.

See my notes in notes.md for more details about this benchmark.

Running

Kentucky Mule runs with these two commands:

$ sbt assembly
$ scala -cp kentuckyMule.jar kentuckymule.Main

Development principles

The goal of Kentucky Mule wasn't a working implementation but trustworthy numbers. I wanted to get a sense of whether ideas for parallelizing Scala typechecking have a leg to stand on. The idea was to estimate the limits of Scala typechecking performance with as little development cost as possible.

The original goal drove principles of Kentucky Mule development:

  1. Transparent pricing - I added many benchmarks, and at each step of implementing Kentucky Mule, I measured the performance cost of supporting a Scala feature.
  2. Only the paranoid survive - many smart people have tried to optimize the Scala compiler performance before by following the conventional wisdom to "avoid premature optimization" and had minimal results. Kentucky Mule takes a paranoid approach and assumes to sit at the fringes of prevalent software construction practices. Almost everything in Kentucky Mule is an exercise in premature optimization.
  3. It's a voyage - my goal was to explore fresh ideas in compiler architecture. I wanted to see effects on potential performance gains of an implementation driven by different architecture compared to what one finds in Scala and Dotty, javac, and many other compilers in the wild.
  4. Loss aversion driven development - performance losses are easier to defend against over performance gains to win. This principle is about limiting the search space for a performant implementation. Instead of having a feature-rich but slow implementation with unknown directions for optimizations, I started with a tiny, feature-less implementation, and iterated over implementing new features incrementally and carefully watching the performance. The advantage of the latter approach is that, at each step, the direction is easier to spot: I know the feature I want to implement and the change to code is relatively small. Small changes to logic are easier to understand from the performance-cost point of view. This principle is inspired by the loss aversion effect described in economics and decision theory.

In particular, the code of Kentucky Mule doesn't use any of the functional programming idioms. Even higher-order functions occur rarely. If there were an investigation of whether I'm a fan of functional programming, this codebase would kill any defense line envisioned even by the best lawyer money can find.

Check notes.md for details of the architecture explored in Kentucky Mule that leads to high-speed performance.

License

Kentucky Mule is licensed under the 3-Clause BSD License

Kentucky Mule borrows the Scala parser from Dotty and for that reason is licensed under the same license.

kentuckymule's People

Contributors

gkk-stripe avatar gkossakowski avatar smarter avatar trane avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kentuckymule's Issues

Inch toward the finish line: As Seen From & Scala std lib processing

Background read: Can Scala have a highly parallel typechecker?

This is a meta issue with a quick overview of all tasks I came up when I thought of adding support for the As Seen From (ASF) algorithm and for processing the Scala std lib to Kentucky Mule.

I'm following the format from my work on zinc that turned out to be successful in managing the complexity of a difficult project I worked on in the past: sbt/sbt#1104
It shipped as part of the sbt 1.0 release recently.

This ticket describes the second stage in Kentucky Mule's development. The first stage was about validating whether the ideas for rethinking compiler's design with focus on performance are in touch with how fast computers actually compute. The second stage is about confronting Kentucky Mule's ideas with what I consider the most tricky aspects of Scala's design to implement in a performant way.

I'm planning to update this issue with interesting findings and roadblocks I hit working towards finishing the tasks listed below. My intent for this issue is twofold:

  • keep me organized in marching towards adding support for processing Scala's standard library in Kentucky Mule
  • give busy but curious passerby a place to track the progress of Kentucky Mule's evolution

Context

In my most recent blog post on Kentucky Mule, I wrote:

Adding support for processing Scala’s standard library in Kentucky Mule would be a good trial for the As Seen From treatment. The standard library makes a heavy use of mix-ins, abstract types that are refined multiple times in the inheritance chain and type constructors. These are the features that remain the riskiest for the architecture to support.

I think adding support for Scala standard library to Kentucky Mule should take another 15 days of deeply focused effort. And I think it would be a really captivating project to work on. Once that is done, I think Kentucky Mule would become a complete proof of concept for a highly parallel Scala compiler architecture.

I'm breaking up the work required for adding the As Seen From and some other missing Scala language features to Kentucky Mule into a list of smaller tasks. The list is not meant to be set in stone and will get refined as more issues and tasks come to light.

Once I surfaced the list, I realized the scope is a bit larger than I originally speculated. I'm revising previous prediction of 15 days of deeply focused effort to complete this project and bump it to 20 days.

Tasks

Missing language features

  • monomorphic type aliases: type T = String
  • polymorphic type aliases: type Foo[T] = List[T]
  • package objects
  • constructor multiple param list
  • def multiple param list
  • method dependent types
  • resolution of this in paths
  • type bounds for type members
  • type bounds for type parameters
  • implicit import of Predef to every compilation unit
  • implicit import of scala and java packages to every compilation unit
  • support for super in paths
  • singleton types
  • type projections
  • tuple types (contributed by @trane in #7)
  • function types
  • import renames
  • self-types
  • method-dependent types

Implementation features

Features that are not strictly language features but need to be implemented for other reasons.

  • empty package handling (surprisingly hard to get right)
  • cycle detection between completers

As Seen From

Surprisingly, Kentucky Mule implements some aspects of the ASF already. E.g. Rule 3 for the type parameters is partially supported. I don't have a good intuition what's missing even when I have ASF's precise definition in front of my eyes. For performance reasons, Kentucky Mule implements ASF's features differently from how they're specified so simple diffing between the implementation and the spec doesn't cut it.
I'll come back to this section as the implementation of the other language features mentioned above progresses.

Status

I haven't touched Kentucky Mule for almost a year and I'm picking it up now to work on tasks in this ticket continously until I check all the boxes. Kentucky Mule remains my side project so I aim for a steady but slow progress done over the course of some of my evenings.

I implemented the original, first stage of Kentucky Mule in a short span of time when I was working on it full time. I'm curious if my 20 days prediction of completing this project will hold when the days are broken into a larger number of evenings.

Feature: Jar/classpath parsing

(paraphrasing from @gkossakowski)

Currently, kentuckymule does not parse jars or look at the classpath for the code it is type checking. The sample code it's being tested against is the scala standard library which has a minimal set of dependencies (mostly or all in the java standard library?) and so when kentuckymule needs to search for information on one of these symbols, it uses a shim to avoid having to do proper jar and classpath parsing (the shim is called ScalaLibHelper).

This ticket tracks implementation of a proper jar parser to avoid having to use the shim. This would also empower kentuckymule to be run over other codebases, specifically the scala community build, which would help develop benchmarks to support collaboration!

Explicitly disallow importing an identifier whose type needs to be inferred

The following code:

class Hi {
  class A
}

class Foo {
  val x = new Hi
  import x._
  def bar: A = new A
}

fails with:

java.lang.RuntimeException: Can't resolve selector Ident(A)

It seems that either importing from something with type InferredTypeMarker should be disallowed or that everything in scope after such an import should get type InferredTypeMarker.

Support for compiling custom files?

As far as I can see, everything is hardcoded to compile scalap, what would be needed to be able to do:

> run foo.scala

from sbt, and have it just work?

Feature: Benchmarks on the scala community build

How tractable would it be to get kentucky mule tested against the scala community build? What are the barriers to running this? Is the scala community build a closed system, or can kentuckymule get access to that infrastructure to experiment? Will we need to roll-out our own and if so how much work would that be?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.