pikelet-lang / pikelet Goto Github PK
View Code? Open in Web Editor NEWA friendly little systems language with first-class types. Very WIP! 🚧 🚧 🚧
Home Page: https://pikelet-lang.github.io/pikelet/
License: Apache License 2.0
A friendly little systems language with first-class types. Very WIP! 🚧 🚧 🚧
Home Page: https://pikelet-lang.github.io/pikelet/
License: Apache License 2.0
I've implemented concrete syntax for let
and where
bindings, but not hooked them up to the core syntax yet. We need to do this at some stage!
I've been thinking - could we perhaps desugar them into dependent records like 1ML does? Eg:
let x = y; .. in z
to (record { x = y, .., body = z }).body
z where x = y; ..
to (record { x = y, .., body = z }).body
Not sure what the implications for alpha equality would be though. Also whether this is compatible with how we want bindings to be evaluated. 🤔
Had a nice chat with @pythonesque on Twitter:
I think at this point most people involved in Coq development, at least, agree that a majority of the stuff in it should be in a proof assistant (maybe should have been from the beginning). I strongly recommend you try to prove as much as you possibly can; it'll save you later.
https://twitter.com/awesomeintheory/status/983592526232932352
Apparently lots of the trouble comes in the form of optimizations to the type checking algorithm.
I think it might make sense in the short term to try to keep the implementation as close to the syntax-directed algorithm described in the appendix of the book. For the short term our programs should be small, and speed should be less of a concern for us.
As a precaution, we should also try to ensure certain aspects of our implementation avoid known trouble areas, for example using implicit substitutions, and building up a tangle of mutually recusive definitions/modules/types (although that can be hard to avoid at times in dependent type systems).
In parallel we could:
Then we could start carefully porting over the optimizations to the Rust implementation.
Currently we do eager substitutions under binders. This is inefficient and probably won't be tenable once we have recursive datatypes and definitions. Type systems like F* and PiSigma use weak head normal forms and explicit substitutions to get around this.
See also: https://github.com/brendanzab/pikelet/issues/22#issuecomment-381979497
Currently type parameters must always be supplied to polymorphic functions. Eg.
id : (a : Type) -> a -> a;
id a x = x;
const : (a b : Type) -> a -> b -> a;
const a b x y = x;
This can get quite annoying! Eg.
id String "hi"
id I32 42
const String Char "hi" '\s'
const I32 Char 42 '\s'
Implicit arguments will allow users to omit some arguments Eg.
id : {a : Type} -> a -> a;
id a x = x;
const : {a b : Type} -> a -> b -> a;
const a b x y = x;
This would make using the above functions much less tedious:
id "hi"
id 42
const "hi" '\s'
const 42 '\s'
We will probably have to add some simple unification to the type checker to enable this!
Once we get dependent records in #2 it would be nice to then be able to use them to implement typeclasses/modules. Instance arguments would give us the ability to automatically pass instances to functions. I think this might actually be easier to implement than implicit arguments (see #8).
Resources:
We should be able to run programs that:
We should also be able to see more than one type error message.
At the moment we recover from syntactic errors, inserting concrete::Term::Error
variants in the places that we fail to parse. But then we blow up if wen try to desugar these into RawTerm
s. Instead we should probably insert a static-error : {a : Type} -> String -> a
function in those places instead...
I poke fun at expression parser/evaluator written in OO style, because really the visitor pattern is just asking for it. But I have a weird, almost morbid fondness for such things written in plain old ANSI C. It’s complicated between me and C don’t ask me about it I’m not ready
At the moment we create lots of intermediate data structures between the stages in the compiler, and it would be nice to cut that down!
Using the visitor pattern might make things more efficient by allowing us to fuse various traversals into single passes. Ultimately it might even be a good idea (or even necessary) to push the visitor abstraction back to the nameless library as well. The challenge would be to continue to maintain a nice, readable codebase in the process.
libsyntax/visit.rs
(visitor module in Rustc) - uses 'walkers' to allow for default methods to simplify visitor implementations.ratel-visitor
- has an interesting build!
macroSerializer
anf Deserializer
traits follow this pattern.Core languages like PiSigma and MiniTT have simple constructs for pattern matching, allowing for easier verification of type checking.
I'm not sure where this should be done, however. MLton seems to do it during its defunctorise pass.
We can do it in a naive way for now, but there is an interesting paper showing how to do it well: Compiling Pattern Matching to Good Decision Trees. Do note that we have the added complication of perhaps wanting to add dependent patterns etc.
Compiling non-dependent pattern matching to case trees is described in chapter 5 of The Implementation of Functional Programming Languages. This is what Sixten currently uses, "with some dependent pattern matching hacked in".
Compiling dependent pattern matching to case trees has been described by Jesper Cockx in:
One big motivation I've had for going with dependent types in the first place is the idea of being able to use the same language feature (dependent records) for both modules and data types. This is similar to what is proposed in 1ML, but I would prefer not to jump through the hoops of forcing the language into System Fω, and rather just go straight to dependent types, seeing as they are such a handy tool to have in the first place.
These would be super handy for representing modules. Eg.
Graph (a : Type) = Record {
Node : Type,
Edge : Type,
has-edge : a -> Node -> Node -> Bool,
edges : a -> Node -> List Edge,
}
MyGraph = DArray (Int, Int)
MyGraph-graph : Graph MyGraph
MyGraph-graph = record {
Node = Int,
Edge = (Int, Int),
has-edge = _,
edges = _,
}
This would also allow us to hide the length of length of arrays to get something like Rust's Vec
type:
DArray (a : Type) = Record {
len : U64,
data : Array a len,
};
A very simple way of representing dependent records would be to use dependent pairs (sigma types):
Unit : Type
{} : Unit
{e1, e2} : (x : t1) * t2
left : (x : t1) * t2 -> t1
right : (x : t1) * t2 -> t2
Where t1
depends on t2
.
The trouble with this formulation is that labels are not significant when comparing for equality at the type and value level.
I would kind of prefer to go structural rather than nominal with this, but it seems that most dependently typed languages use nominal records. I'm guessing this is because it makes checking if two record types are compatible much easier. If we went structural we might need to do some complex subtyping and coercions to handle the fact that fields might be mentioned out of order.
In his thesis on Agda, Ulf Norel says:
In practise, however, it is a good idea to let each record declaration introduce a new type. This means that two record types declared to have the same fields will be different, but they will have the same elements. One advantage of this is that it significantly improves the efficiency of checking equality between record types—instead of comparing the types of all the fields, it is enough to compare the names. It is also good programming practise to keep intentionally different types separate in the type system.
It would be nice if records were efficiently packed in memory by default, like Rust does it. The unboxed types issue #18 is related.
A potential issue with a non-uniform memory representation is that, as @ollef points out, if you have a record { x : A, y : B x, z : C }
, then the memory offset of z
depends on x
. We might be able to do some size analysis though, and require non-statically known sizes to have a pointer indirection.
If we allow structural types, data layout might become trickier. We might need to have a consistent 'normalized field order' for records. Things would become even more trickier if row polymorphism was brought into the mix.
Other stuff to consider wrt. data layout is struct of array and array of structs. Perhaps our compile time reflection would be powerful enough to auto-derive this, but langs like Jai have this built in.
This paper talks about some interesting ways to parameterise over data layout: https://www.doc.ic.ac.uk/~scd/ShapesOnwards.pdf
Here is a book going through Data Oriented Design.
It would be nice to support row polymorphism, but I have no idea how this would look.
Haven't thought much about it. That paper uses subtyping. For row typing you might be able to do like
{ Γ; Θ={ y₀:B₀; ⋯ yₙ:Bₙ }; Δ[Γ, Θ] }
where Γ and Δ are treated as two separate row variables, Δ depends on Γ and Θ, and Θ is closed with respect to Γ. Not sure though.
Labels are also interesting - I've never really liked the 'type level string' nature of record labels, especially once row polymorphism comes into play. Would be nice to have these be namespaced, sort of like how clojure.spec does it. Not sure what the ergonomics would look like though.
Would we want to auto-generate these? If we had first class labels and row polymorphism I could see us being generate polymorphic lenses in a library, like how Purescript does it.
This should reduce the memory footprint of the type checker, and also make it faster to clone identifiers. LALRPOP uses string_cache for this.
I've updated our name binding crate to support this. See:
Here are some common errors that are often ignored, but would be nice to be able to avoid either statically or ensure they are accounted for at runtime:
I'm proposing that we rename the signed integer types from I8
, I16
, I32
, I64
to S8
, S16
, S32
, S64
. I originally was following Rust's lead with the naming of signed integers, but I feel like this is more consistent.
If you do help out on this one, please be sure to update the book as well! 😄
At the moment our issues are a mess, and not very well categorised. We should probably adopt a labelling scheme like Rust's!
Plasma has a nice labelling scheme with a 'skill' label. This might help folks find something that they can help out with!
Before we go too far down the path of building a traditional compiler (#9), it probably makes sense to start thinking about how we might incrementalise things. This will be super important for supporting a good editor experience (#97). If we put this off for too long we might end up having to rebuild a bunch - this is what Rust is facing, for example.
Without knowing much about it, perhaps something like the Incremental Lambda Calculus would be handy for this. We could try to find the derivative of each pass of our compiler, based on the result of a previous run. This could also be a helpful framework for formalising our incremental compiler as well (#39)!
CRDT-style data structures could also be of potential use, and perhaps projects like timely-dataflow and differential-dataflow.
What is the difference between 'incremental' vs. 'query-driven'?
This might be faster than using reference counting, as we currently do in the core AST. We'll have take care not to let it adversely impact readability in the type checker, however.
I'd like to use bare braces for implicit arguments (see #8), so we'll need a keyword or sigil for dependent record syntax (see #2). Now, if I just used a single identifier, like record
, for record types and record values, I would run into the following ambiguity:
Pikelet> :t record {}
Is this an empty record value, or the type of an empty record?
If we were going to have default fields then we would run into a similar issue:
Pikelet> :t record { x : I32 = 23 }
Is this a record value with a field annotated with a type I32
and a value 23
, or is it a record type with a field of type I32
and a default value 23
?
One way to resolve this would be to defer this choice to type checking, and require annotations in ambiguous circumstances. Bidirectional checking should make this trivial, but I'm not sure how nice it would be ergonomically.
Idris has a similar ambiguity around ()
, which can stand for the 'unit value' or the 'unit type'. It however defaults to the unit value in ambiguous circumstances, and coerces to a type or value when it has the necessary type information:
Idris> ()
() : ()
Idris> the () ()
() : ()
Idris> the Type ()
() : Type
Another option would be to have two different keywords, but then I have to decide on that:
Record
/record
is one way, using a capital to signify the change in universe levelStruct
/struct
- would convey more of a feeling of that fact we want a non-uniform memory representationsig
/struct
, like in ML?sig
/record
?signature
/record
- maybe a bit verbose?Currently we use the following syntax for functions:
(x : t1) -> t2 -- dependent functions
t1 -> t2 -- non-dependent functions
Sadly once we add in the type annotation syntax, ie. e : t
, these notations become ambiguous in a context free grammar. This means we have to do an additional pass over the AST, fixing this after-the-fact.
We should either:
Fn (x : t1) -> t2
the : (a : Type) -> a -> a
(this is what Idris does)It would be nice to have operators for:
(<<)
, (>>)
(<|)
, (|>)
(+)
, (-)
, (*)
, (/)
(++)
@sordina was trying to get Pikelet to build on wasm, but was running into issues with some system dependencies:
error[E0432]: unresolved import `sys`
--> /Users/../.cargo/registry/src/github.com-1ecc6299db9ec823/mortal-0.1.5/src/screen.rs:9:5
|
9 | use sys;
| ^^^ no `sys` in the root
error[E0432]: unresolved import `sys`
--> /Users/../.cargo/registry/src/github.com-1ecc6299db9ec823/mortal-0.1.5/src/terminal.rs:10:5
|
10 | use sys;
| ^^^ no `sys` in the root
error: aborting due to 2 previous errors
Getting Pikelet to build on wasm would be great, because it would let us build a 'try' website in the browser without needing to run it on a server, and also support use cases like the one above.
It would be nice to be able to shift field lookups too!
At the moment this just gives:
Pikelet> prelude.id^1 Type
error: expected one of "(", ")", "->", ".", ":", ";", "=", "=>", "?", "Record", "Type", "[", "]", "binary literal", "character literal", "decimal literal", "else", "float literal", "hex literal", "identifier", "import", "octal literal", "of", "record", "string literal", "then", "where", or "}", found `^`
- <repl>:1:11
1 | prelude.id^1
| ^ unexpected token
We should be able to load files in the REPL, either when passing a list of filenames on startup, or by a :load
/:l
command while the REPL is running:
cargo run repl ./src/library/prelude.pi
Pikelet> :load ./src/library/prelude.pi
Pikelet> id String "hi"
"hi" : #String
We should also be able to reload files using :reload
/:r
:
Pikelet> :load ./src/library/prelude.pi
Pikelet> :reload
Currently it's hard to get much insight into what the type checker is doing when it is checking a term, even with a debugger. This is a barrier to entry for helping new developers gain understand how the Pikelet type checking algorithm works. This is important if we want more developers to be able to confidently contribute fixes and features to the type checker. It could also help us fix bugs and spot opportunities for optimizations.
A debugger would output some directed graph showing how a Pikelet term is checked. We could potentially use Graphvis, or perhaps HTML for this. Console logging is another option.
This is a cut down set of requirements for dependent records, in contrast to the more lofty goals described in issue #2.
The following additions will be made to the core syntax:
R ::= l : T, R -- field type annotation
| ε -- empty record type
r ::= l = t, r -- field value definition
| ε -- empty record value
t,T ::= ...
| Record { R } -- record types
| record { r } -- record values
| t.l -- projection
Record { x : I8, y: I8 }
will not be compatible with Record { y: I8, x : I8 }
, and record { x = 1, y = 2 } : Record { x : I8, y: I8 }
is a type error.We have syntax for declaring module names and imports, but this has not been implemented yet!
Modules will not be mutually recursive, so we'll want to ensure these form a DAG.
It would be nice to actually compile Pikelet programs!
We'll probably want some intermediate representations like MLton to get the higher level Pikelet code to be in better shape for passing to a back-end.
We would like to do more work on the type system to make efficient code-gen easier. For example tracking coeffects such as usage mutliplicities (#7), and data flow annotations (allowing us to partially evaluate code at compile time). Regions would also be handy at some stage.
Another issue we'll have to face at some time is garbage collection. Ultimately it would be nice to not have to force users to use a specific allocation strategy, but figuring out how this interacts with multiplicities might be tricky. Until we have a better idea it might be be easier just to use something like the Boehm-Demers-Weiser GC in the interim.
Some possible targets for codegen are as follows:
A JIT might be handy as it would allow us to embed Pikelet in existing Rust codebases. This could help us gain adoption without needing to convince folks to switch to an entirely new language in the mid-term.
Sixten is another functional language with dependent types and unboxed data types. Might be worth looking there for inspiration too!
Currently Pikelet is implemented in a very traditional way - ie. it consumes a source string, does some processing, and spits something out. This isn't a very good approach when wanting to support interactive editing, however, but the more more modern, query-driven approaches to language implementation are still sparsley documented.
One way to make some of the requirements for this more concrete is to try our hands at implementing a language server communicating via the LSP to a VS Code extension. This could help direct these efforts, before we accumulate too much tech debt!
TODO: Improve this plan!
So, it seems that the first stage of compilation that we might want to do is to compile to A-Normal Form (ANF). I'm guessing we could start a new module called pikelet::compiler::anf
for this. We'll probably want a new set of IR types for this.
A-Normal Form syntactically sequentializes computations, and it partitions expressions into two classes: atomic expressions and complex expressions.
Because it (implicitly) simplifies the internal structure of continuations, it is easier to construct an interpreter for A-Normal Form.
For the same reason, it is easier to generate machine code or construct a static analyzer.
I ultimately want each of our passes to be type preserving, so we'll want a new type system and type checker, but it's fine if the initial version leaves this out. The paper Compiling Dependent Types Without Continuations outlines how this can be done for dependent types.
Don't feel that you need to read and understand all of these in their entirety to tackle this issue, but I've included them with the hope that they might be helpful or interesting! Let me know if there are any more I should add!
This will allow us to remove all the excess pretty printers that we have to write for all the core representations by just pretty printing the concrete representation. Perhaps we could keep the core pretty printers around for debugging purposes, but use a simpler s-expression syntax that doesn't require stuff like precedences calculated.
For now we could just pattern match against literals:
case x of
1 => ...
2 => ...
x => ...
For Pikelet to be of much use at all, we should allow definitions to be recursive. I would like to add totality checking at some stage, like Idris has. Mutually recursive definitions would also be handy!
This will be handy when working with records-as modules, and also potentially make checking dependent records easier.
Core syntax changes:
t ::= ... | (= t)
The t
should be an inferrable term.
Resources:
Mentioned in our discussion on record types: #2 (comment)
Quantitative type theory promises to get us both erasure and linear types in the same feature.
The idea is that we could add usage count annotations (multiplicities) to each binder in order to track how many times the parameters can be used at runtime:
alloc : {a : Type} (1 op : (1 data : T) -> IO a) -> IO a
free : {a : Type} (1 data : T) -> IO ()
newMArray : {a b : Type} -> Int -> (1 f : (1 data : MArray a) -> b) -> b
write : {a : Type} (1 data : MArray a) -> (Int, a) -> MArray a
read : {a : Type} (1 data : MArray a) -> Int -> (MArray a, a)
freeze : {a : Type} (1 data : MArray a) -> Array a
There would be the following multiplicities initially:
0
: erased1
: linear - must be used exactly onceω
: unbounded - can be used any number of timesHere are some resources:
PR #32 introduced block comments to the lexer. It would be nice if these nested too! This would then admit the following string:
{-
{- ... -}
{-
{- ... -}
-}
-}
We have decided instead to remove the block comments entirely!
Here's a list of some primitive functions that would be useful to have:
#String-eq : #String -> #String -> #Bool
#String-lt : #String -> #String -> #Bool
#String-le : #String -> #String -> #Bool
#String-ge : #String -> #String -> #Bool
#String-gt : #String -> #String -> #Bool
#String-gt : #String -> #String -> #Bool
#String-append : #String -> #String -> #String
#Char-eq : #Char -> #Char -> #Bool
#Char-lt : #Char -> #Char -> #Bool
#Char-le : #Char -> #Char -> #Bool
#Char-ge : #Char -> #Char -> #Bool
#Char-gt : #Char -> #Char -> #Bool
#Char-gt : #Char -> #Char -> #Bool
#Char-to-string : #Char -> #String
#U8-eq : #U8 -> #U8 -> #Bool
#U8-lt : #U8 -> #U8 -> #Bool
#U8-le : #U8 -> #U8 -> #Bool
#U8-ge : #U8 -> #U8 -> #Bool
#U8-gt : #U8 -> #U8 -> #Bool
#U8-gt : #U8 -> #U8 -> #Bool
#U8-add : #U8 -> #U8 -> #U8
#U8-sub : #U8 -> #U8 -> #U8
#U8-mul : #U8 -> #U8 -> #U8
#U8-div : #U8 -> #U8 -> #U8
#U8-to-string : #U8 -> #String
#U16-eq : #U16 -> #U16 -> #Bool
#U16-lt : #U16 -> #U16 -> #Bool
#U16-le : #U16 -> #U16 -> #Bool
#U16-ge : #U16 -> #U16 -> #Bool
#U16-gt : #U16 -> #U16 -> #Bool
#U16-gt : #U16 -> #U16 -> #Bool
#U16-add : #U16 -> #U16 -> #U16
#U16-sub : #U16 -> #U16 -> #U16
#U16-mul : #U16 -> #U16 -> #U16
#U16-div : #U16 -> #U16 -> #U16
#U16-to-string : #U16 -> #String
#U32-eq : #U32 -> #U32 -> #Bool
#U32-lt : #U32 -> #U32 -> #Bool
#U32-le : #U32 -> #U32 -> #Bool
#U32-ge : #U32 -> #U32 -> #Bool
#U32-gt : #U32 -> #U32 -> #Bool
#U32-gt : #U32 -> #U32 -> #Bool
#U32-add : #U32 -> #U32 -> #U32
#U32-sub : #U32 -> #U32 -> #U32
#U32-mul : #U32 -> #U32 -> #U32
#U32-div : #U32 -> #U32 -> #U32
#U32-to-string : #U32 -> #String
#U64-eq : #U64 -> #U64 -> #Bool
#U64-lt : #U64 -> #U64 -> #Bool
#U64-le : #U64 -> #U64 -> #Bool
#U64-ge : #U64 -> #U64 -> #Bool
#U64-gt : #U64 -> #U64 -> #Bool
#U64-gt : #U64 -> #U64 -> #Bool
#U64-add : #U64 -> #U64 -> #U64
#U64-sub : #U64 -> #U64 -> #U64
#U64-mul : #U64 -> #U64 -> #U64
#U64-div : #U64 -> #U64 -> #U64
#I8-eq : #I8 -> #I8 -> #Bool
#I8-lt : #I8 -> #I8 -> #Bool
#I8-le : #I8 -> #I8 -> #Bool
#I8-ge : #I8 -> #I8 -> #Bool
#I8-gt : #I8 -> #I8 -> #Bool
#I8-gt : #I8 -> #I8 -> #Bool
#I8-add : #I8 -> #I8 -> #I8
#I8-sub : #I8 -> #I8 -> #I8
#I8-mul : #I8 -> #I8 -> #I8
#I8-div : #I8 -> #I8 -> #I8
#I8-to-string : #I8 -> #String
#I16-eq : #I16 -> #I16 -> #Bool
#I16-lt : #I16 -> #I16 -> #Bool
#I16-le : #I16 -> #I16 -> #Bool
#I16-ge : #I16 -> #I16 -> #Bool
#I16-gt : #I16 -> #I16 -> #Bool
#I16-gt : #I16 -> #I16 -> #Bool
#I16-add : #I16 -> #I16 -> #I16
#I16-sub : #I16 -> #I16 -> #I16
#I16-mul : #I16 -> #I16 -> #I16
#I16-div : #I16 -> #I16 -> #I16
#I16-to-string : #I16 -> #String
#I32-eq : #I32 -> #I32 -> #Bool
#I32-lt : #I32 -> #I32 -> #Bool
#I32-le : #I32 -> #I32 -> #Bool
#I32-ge : #I32 -> #I32 -> #Bool
#I32-gt : #I32 -> #I32 -> #Bool
#I32-gt : #I32 -> #I32 -> #Bool
#I32-add : #I32 -> #I32 -> #I32
#I32-sub : #I32 -> #I32 -> #I32
#I32-mul : #I32 -> #I32 -> #I32
#I32-div : #I32 -> #I32 -> #I32
#I32-to-string : #I32 -> #String
#I64-eq : #I64 -> #I64 -> #Bool
#I64-lt : #I64 -> #I64 -> #Bool
#I64-le : #I64 -> #I64 -> #Bool
#I64-ge : #I64 -> #I64 -> #Bool
#I64-gt : #I64 -> #I64 -> #Bool
#I64-gt : #I64 -> #I64 -> #Bool
#I64-add : #I64 -> #I64 -> #I64
#I64-sub : #I64 -> #I64 -> #I64
#I64-mul : #I64 -> #I64 -> #I64
#I64-div : #I64 -> #I64 -> #I64
#F32-eq : #F32 -> #F32 -> #Bool
#F32-lt : #F32 -> #F32 -> #Bool
#F32-le : #F32 -> #F32 -> #Bool
#F32-ge : #F32 -> #F32 -> #Bool
#F32-gt : #F32 -> #F32 -> #Bool
#F32-gt : #F32 -> #F32 -> #Bool
#F32-add : #F32 -> #F32 -> #F32
#F32-sub : #F32 -> #F32 -> #F32
#F32-mul : #F32 -> #F32 -> #F32
#F32-div : #F32 -> #F32 -> #F32
#F32-to-string : #F32 -> #String
#F64-eq : #F64 -> #F64 -> #Bool
#F64-lt : #F64 -> #F64 -> #Bool
#F64-le : #F64 -> #F64 -> #Bool
#F64-ge : #F64 -> #F64 -> #Bool
#F64-gt : #F64 -> #F64 -> #Bool
#F64-gt : #F64 -> #F64 -> #Bool
#F64-add : #F64 -> #F64 -> #F64
#F64-sub : #F64 -> #F64 -> #F64
#F64-mul : #F64 -> #F64 -> #F64
#F64-div : #F64 -> #F64 -> #F64
Error messages are pretty rudimentary. brendanzab/codespan#1 would fix them!
I've been looking at the Web Assembly specification, and have been very impressed. I'm thinking it might be nice to follow its lead with regard to notation and approach. At the moment I just use the formal notation on the theory page, but it would nice to be able to expand it to a larger specification.
I'm not sure what we should use for documentation. Mdbook has been nice, but the support for maths notation support in pulldown-cmark is lacking right now. It would be lovely if pulldown-cmark supported setting the class of inline and block elements, kind of like: {katex}`\Gamma \vdash x : \tau`
- this way it would also render ok in Github's previews, but I don't know what it would take to get that merged.
The features I need are:
So yeah some other options available to us:
Tool | Language | Markup Language | Math support | Custom syntax highlighting | Docs theme | Boots to SPA | Versions | PDF Export |
---|---|---|---|---|---|---|---|---|
Gutenburg | Rust | Markdown | No (could be added) | Awkward | Yes | No | No | No |
Hakyll | Haskell | Markdown | MathJax or KaTeX | Unknown | No | No | No | No |
Slick | Haskell | Markdown | MathJax or KaTeX | Unknown | No | No | No | No |
Sphinx | Python | ReStructured text | MathJax or KaTeX | Unknown | Yes | No | Yes | Yes |
VuePress | Javascript | Markdown | KaTeX (server rendered) | Yes | Yes | Yes | No | No |
Docusaurus | Javascript | Markdown | KaTeX (server rendered) | Yes | Yes | No | Yes | No |
Should we put the spec and/or site in a separate repository? Or continue with the mono-repo approach?
At the moment I'm explicitly writing out asts manually in some of my tests. This is tedious to update when I change the AST, and it would be handy to just get a diff of the changes automatically generated instead. This would allow us to be more liberal with our testing too. 'Goldenfiles' (also known as 'snapshot testing') seems like a good answer to this. There is a crate called goldenfile
that does this for Rust.
Here are some tests that I think could be improved with this method:
We could probably add some more tests after that once we set up a nice, repeatable process for this.
It would be nice for our data types to be unboxed by default, similar to how Rust and Sixten do it. Records for instance should be represented in a contiguous run of memory, unless otherwise specified.
This would mean that we wouldn't need to wait around for the CI to pass before merging. Also makes use look more 'legit'!
At the moment Pikelet doe not do anything useful. In order to be more useful, we probably need some form of way to perform effects.
Initially we could use an Io : Type -> Type
wrapper type for this.
Eventually I would like us to move to something like an algebraic effect system. The limiting factor for this would be that it must be zero-cost. ie. We need to make sure that any reification of effects can be erased or partially evaluated away at compile time in a reliable way. This should result in codegen identical to if one had written code in an imperative language.
yallop/effects-bibliography has lots of links.
Here are some papers that are more specifically about integrating dependent types and effects:
Many projects have one of these. Perhaps it would help make it easier to get started?
Found during a demo at the Melbourne Haskell User Group:
Pikelet> (\(t : Type) (f : (a : Type) -> a -> a) => f t) String (\(a : Type) (x : a) => x)
error: internal compiler error: cannot find `t$1` in scope
- <repl>:1:46: not found in this scope
If we manually apply the first argument it seems to be fine:
Pikelet> (\(f : (a : Type) -> a -> a) => f String) (\(a : Type) (x : a) => x)
\x$12 : #String => x$12 : ($7 : #String) -> #String
Thanks @andykitchen for finding this!
So I now have some sort of McBride-style universe shifting implemented in Pikelet in (excuse my poor technical writing). I was wondering about terminology here. Should we call the foo^n
a:
Wondering if you have any strong thoughts @jonsterling, @pigworker?
Prior discussion and implementation here: #10, #125
cc. @LightAndLight
Would be nice to have a layout-based syntax like Haskell, Idris, and F#. I'm getting a little sick of all those closing curly braces in Rust!
At the moment we use a stratified universe hierarchy to avoid Girard's Paradox (endemic to systems where Type : Type
). Unfortunately this means that we lose some polymorphism. For example:
Pikelet> (\(x : Type) (x: x) => x) Type String
error: found a term of type `Type 1`, but expected a term of type `Type`
- <repl>:1:27: the term
Pikelet> (\(x : Type) (x: x) => x) (Type 1) Type
error: found a term of type `Type 2`, but expected a term of type `Type`
- <repl>:1:28: the term
It would be nice to be able to write the identity function once and have it work for all levels in the universe hierarchy.
Resources:
Each time a term is printed in the REPL the variable numbering is different. It looks like the variable numbers don't reset at any time. This is confusing if you enter the same or similar term twice and expect to get the same or similar output.
It would be nice to improve the bidirectional type checking algorithm to also pull information from the arguments. This could also help us eventually implement implicit arguments (see #8). A nice approach for doing this was given in the paper Let Arguments Go First. In the introduction the authors state (emphasis added):
We believe that, similarly to standard bi-directional type checking, bi-directional type checking with an application mode can be applied to a wide range of type systems. Our work shows two particular and non-trivial applications. Other potential areas of applications are other type systems with subtyping, static overloading, implicit parameters or dependent types.
In section 3 (the Hindley-Milner example) of the paper they show the application context pulling type information from the arguments in order to fill in the 'hidden type arguments' of the foralls via the subtyping rules. Perhaps we could do something similar for implicit arguments! This would effectively involve combining sections 3, 4, and 5.3 into one system.
The following expressions illustrate the problem:
record { x = "hello" } : Record { x : String; y : String }
record { x = "hello"; y = "hello" } : Record { x : String }
These both cause the following panic:
panicked at 'assertion failed: `(left == right)`
left: `2`,
right: `1`', /../moniker-0.3.2/src/scope.rs:87:13
This is due to the following call to Scope::unbind2
in the type checker:
// C-RECORD
(&raw::Term::Record(span, ref raw_scope), &Value::RecordType(ref raw_ty_scope)) => {
let (raw_fields, (), raw_ty_fields, ()) =
Scope::unbind2(raw_scope.clone(), raw_ty_scope.clone());
Scope::unbind2
should probably return a result in this case, but for now we could work around this by adding the following check:
// C-RECORD
(&raw::Term::Record(span, ref raw_scope), &Value::RecordType(ref raw_ty_scope)) => {
if raw_scope.unsafe_pattern.unsafe_patterns.len()
!= raw_ty_scope.unsafe_pattern.unsafe_patterns.len()
{
return Err(unimplemented!("mismatched fields"));
}
let (raw_fields, (), raw_ty_fields, ()) =
Scope::unbind2(raw_scope.clone(), raw_ty_scope.clone());
unimplemented!("mismatched fields")
should be replaced with an appropriate error diagnostic, and we'll need some tests to cover this as well! Here are some tests that could go in the check_term.rs
file:
#[test]
fn record_field_mismatch_lt() {
let mut codemap = CodeMap::new();
let tc_env = TcEnv::default();
let expected_ty = r"Record { x : String; y : String }";
let given_expr = r#"record { x = "hello" }"#;
let expected_ty = parse_nf_term(&mut codemap, &tc_env, expected_ty);
match check_term(&tc_env, &parse_term(&mut codemap, given_expr), &expected_ty) {
Err(TypeError::/* your error variant here! */ { .. }) => {},
Err(err) => panic!("unexpected error: {:?}", err),
Ok(term) => panic!("expected error but found: {}", term),
}
}
#[test]
fn record_field_mismatch_gt() {
let mut codemap = CodeMap::new();
let tc_env = TcEnv::default();
let expected_ty = r"Record { x : String }";
let given_expr = r#"record { x = "hello"; y = "hello" }"#;
let expected_ty = parse_nf_term(&mut codemap, &tc_env, expected_ty);
match check_term(&tc_env, &parse_term(&mut codemap, given_expr), &expected_ty) {
Err(TypeError::/* your error variant here! */ { .. }) => {},
Err(err) => panic!("unexpected error: {:?}", err),
Ok(term) => panic!("expected error but found: {}", term),
}
}
It was reported to me that this is currently broken:
@brendanzab: @markbrown if you pull the pikelet repo, you can do
cargo run repl
to run the REPL
@markbrown: I had to create arepl-history
file myself for that to work
Obviously I don't test the first-run experience very often! 😮
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.