Git Product home page Git Product logo

lushui's People

Contributors

fmease avatar

Watchers

 avatar

lushui's Issues

The parser should recover from basic syntax errors

Right now, every syntax error is fatal, we cannot report multiple syntax errors. This is really vague but below is a list of some errors which don't need to be lethal to the parser.

  • parse (& lower) all other modules part of the module tree. this means we need to register external modules during parsing, so we can handle them (in parallel or after parsing the original file irrespective of whether it parses (and lowers) successfully)
  • don't bail out early when encountering undefined (nullary) attributes (fixed in c774b0d)
  • report all instances of number literals that are "out of range" (fixed in e2d2897) (note: the concept of number suffixes does not exist anymore, thus this error cannot happen anymore)
  • syntactically allow missing definitions (= + expression) in let/ins (and disallow them in the lowerer)
  • syntactically allow weird explicit definition (= + expression) for constructors (and gate them in the lowering stage)
  • skip/report all instances of commas (#65, #63)
  • skip/r.a.i.o. unexpected ->s
  • r.a.i.o. invalid identifiers (consecutive dashes, trailing dashes, …) / treat them as valid identifiers; needs lexer support: they should just be lexed as InvalidIdentifiers
  • r.a.i.o. invalid number literals (consecutive primes, trailing primes, …) / treat them as valid ones; needs lexer support: they should be lexed as InvalidNumberLiterals
  • all other kinds of syntax errors: t.b.d. (major!)
    • (maybe, not sure if worth the effort) in some cases where we expect a line break (e.g. after = in module and data declarations), we could either insert one
      or skip everything until the next line break and continue parsing (but still report an error later in both case)
    • (maybe, not sure if worth the effort) if a parsing error occurs inside of an attribute, skip until ending ) (if exists) and continue
      parsing the item or other attributes (still reporting the error). pretty sure this leads to many weird parsing errors though when trying to parse the item
    • ???

Related: #13.

More ergonomic paths containing punctuation

The lexical syntax of paths and punctuation of today leads to rather unpleasant looking paths. Consider the expression core.nat. + 1 2. Notice the space between the path separator . and the punctuation + which is necessary to distinguish it from the punctuation .+. Further, it is likely confusing for newcomers as it looks like applying the function core.nat to three arguments, +, 1, and 2 – if it was not for the trailing dot.

This mandates the following change to our lexical grammar:

Lex dots immediately (that is without a space in between) after alphanumeric identifiers and before other punctuation (which may include dots) as a separate token. For example, alpha.+!.* should be tokenized to Identifier(Alpha), Dot, Punctuation(+!.*). However, alpha.+.beta should result in the tokens Identifier(alpha), Dot, Punctuation(.+), Identifier(beta), not Identifier(alpha), Dot, Punctuation(+), Dot, Identifier(beta)! This means, a cluster of punctuation is never split by dots unless the dot is the first token and the previous token is an alphanumeric identifier.

After this change, there are still situations where one needs to insert additional whitespace between path segments, albeit in way more uncommon scenarios. Namely, namespaces (modules, types, record constructors) aliased to punctuation. Let's say we have a module used as +|+ (why would you, satan?) and we want to access its members alpha and ?\. For the first, it's +|+ .alpha and in the second one, it's +|+ . ?\. Very reasonable (my initial thought was to disallow aliasing those but that would introduce just unnecessary edge cases; punctuation constructors are relatively common, too).

Alias fails instance check

Example:

Alias: Type = Original

data Original: Type of
    original: Alias

I strongly believe this occurs because of a missing evaluation before checking.

Update: It depends on the order of declarations right now. If the alias is declared after/below the data declaration,
the instance check succeeds! Only if the alias is declared before/above the data declaration, it will fail.

Counter-example:

data Original: Type of
    original: Alias

Alias: Type = Original

Change identifier syntax once again

For the record, the current syntax of "alphanumeric" identifiers is ([a-zA-Z][a-zA-Z0-9]*'*)(-[a-zA-Z][a-zA-Z0-9]*'*)*.
They consist of segments separated by a single dash - where each segment may contain ASCII alphabetic characters and — except for the first character — digits and may end with several primes '.

I now think allowing primes leads to poor naming strategies, so it's better to remove them from the syntax of identifiers.
Next, we probably should allow underscores in identifiers even if our naming convention is kebap case not snake or pascal case.

With that, the new lexical syntax should be: ([a-zA-Z_]\w*)(-[a-zA-Z]\w*)* where a single _ becomes a keyword.

Parsing issues around line breaks

Relates to #15 in the sense that we should rewrite our trivia (line breaks, whitespace, indentation) system.

Right now, in some cases, there is a difference between three consecutive line breaks and the sequence line break,
comment, line break. The parser does not treat them equal in some cases. This is because the lexer merges several true line breaks in a row into one but does not when there is a comment in between. In the parser, I could just start replacing code consuming a "single" line break (according to the lexer) with loops consuming as many as possible. I don't like this.

Parsing issues around expressions which contain indentation

Expressions which contain indentation are currently let/in, case/of and do blocks (probably expanding).
It's a reoccurring issue with our horrendous indentation system #15 which we should probably replace.

I am actually not sure if they should parse (we don't have a specification yet). But those three examples below
should parse but don't:

Let/in:

alpha: Alpha = let alpha = alpha in
    alpha

Diagnostic:

error[E010]: found indentation, but expected expression
  > test.lushui:6:1
  |
6 |     x
  | ^^^^

Case/of (error does not happen if there is no declaration below the one containing the case analysis):

alpha: Alpha = case alpha of
    alpha => alpha

beta

Diagnostic:

error[E010]: found identifier, but expected line break
   > test.lushui:11:1
   |
11 | beta
   | ^^^^ 

Do blocks (error does not happen if there is no declaration below the one containing the case analysis):

alpha: Alpha = do
    alpha <- alpha

beta

Diagnostic:

error[E010]: found identifier, but expected line break
  > test.lushui:8:1
  |
8 | beta
  | ^^^^ 

Each of them can be slightly rewritten that is indented to make them parse:

alpha: Alpha =
    let alpha = alpha in
    alpha
alpha: Alpha =
    case alpha of
        alpha => alpha

beta
alpha: Alpha =
    do
        alpha <- alpha

beta

The lexer should not report any errors

Instead, special tokens should be emitted which the parser then uses to report errors.

Don't return early on

  • invalid tokens (via 61bfeba)
  • unterminated text literals (also via above commit)
  • number literals "out of range" (relevant until we removed number literal suffixes) (via c774b0d)
  • trailing dashes in identifiers
  • consecutive primes in number literals
  • trailing primes in number literals
  • unbalanced brackets

Grammar for attributes

Right now the syntax is very ad-hoc with some plans to make them more robust enabling us to not only attach them to
declarations but also expressions, patterns and statements.

Pretty-printing HIR

Used for good looking error messages and maybe also for documentation generation.

Path hanger `extern` instead of crate declarations

Remove crate declarations (crate my-lovely-library) from the language and replace them by paths beginning with the path hanger external crates extern being a new keyword (extern.my-lovely-library).

Conceptually, we now have two roots in the module system: crate for the current crate being compiled(…) and crates for all the --linked library dependencies. For the purpose of documentation, let me inform you that before, we only had one – crate – and dependencies would be mounted into this single tree with duplicate crate declarations linking to the identical crate.

Motivations

  • (original motivation) Allows macros, lowering passes, synthesizers of the type inference engine to generate self-contained expressions even if they need to refer to external crates, especially the standard library core!. Before, those would need to somehow inject a crate declaration somewhere close with a hygienic name. Horrendous! One example: Lowering sequence literals. Another one: Derive macros.
  • Crate declarations are inconsistent with external module declarations in the regard how duplication is handled: External module declarations copy content, crate declarations _link`
  • Allows to use bindings from an external crate directly with a single declaration (a use declaration) instead of two (a crate and a use declaration)
  • Is more in line with the notion that external modules are added to your projects from the outside (whether that's a good thing or not) with --link or the crate manifest package.json5, rather than declared by the current crate (which might lead to the assumption that crate metadata is or should be in code like @(version ">=2.3.2") crate extension (whether that's a good thing or not))

Example

Currently

crate awesome
use awesome.Alpha

Beta: Type = Alpha Type

module sub =
    use crate.awesome

    gamma: … = awesome.stuff 0

module mu =
    module nu =
        crate awesome

        X: Type = awesome.f …

Then

use extern.awesome.Alpha

Beta: Type = Alpha Type

module sub =
    gamma: … = extern.awesome.stuff 0 ;; or `crate.awesome.stuff`

module mu =
    module nu =
        X: Type = extern.awesome.f … ;; or `crate.`…

Generic literals

Define literals (number, text and sequence ones) to be instances of pseudo-types which implicitly coerce to specific types. This way, literals are overloaded and can seemingly be of several types. This improves the ergonomics of literals. Type theoretically, without subtyping, we cannot have the literal 0 for example to be of both/either Int or Nat. Therefore, we have different versions/names for constructors which are the same in type systems with subtyping. E.g. @Int 0 and @Nat 0.

Implicit coercions in this language are different from subtyping in the regard that they do not happen transitively. That is, even if a coerces to b then t a does not coerce to t b if t is a (type) constructor. All (type) constructors are therefore invariant.

Meta: Task: Improve the description below.

Pseudo-Type of Number Literals

Name: .Number-Literal
Properties:

  • integer interval it fits into
  • decimal part (0 means it can be coerced to Nats, Ints, …)

Pseudo-Type of Text Literals

Name: .Text-Literal
Properties:

  • length (to differentiate between Text and Rune)
  • escape sequences (to differentiate between UTF-8-encoded and ASCII-encoded text-y types)

Pseudo-Type of Sequence Literals

Name: .Sequence-Literal
Properties: none

Crate system

The package system of lushui.

  • #49
  • Type-checking and interpreting
    • CrateIndex should be renamed since with a crate system, it might get confused with numeric identifiers of crates
      Suggestions: DeclarationIndex, DefinitionIndex, ModuleLevelIndex
    • The issue arises that the indices mentioned above are no longer globally unique if we define them to be crate-local
      (which is what I intended them to be). So we should have a distinction between local and global indices or just use
      global indices which consist of the index of the crate (the new CrateIndex) and a crate-local index (maybe DeclarationIndex) named GlobalDeclarationIndex (maybe)
    • a big question is how we restructure our code to handle multiple crates: Contrary to Rust, we cannot (yet) compile
      crates separately since we cannot compile at all right now (no working byte code and no working compiler yet),
      which means crates are not compilation units/boundaries (yet). Should we have multiple CrateScopes and put the into
      a Session or ProjectScope? The way we register foreign and inherent bindings needs to change, too.
    • new CLI-option --link <path> which either links to a lushui file (in which case the dependency is a "single file/simple crate"
      and we assign it some default meta data i.e. its name (name of the file), its version (0.0.0?)) or to a folder with a config file in
      it. We somehow need a way/design to refer to known locations (as env variables prob.) for example the equivalent of target/ or node_modules/ or the shared dependency folder $HOME/.lushui/shared-deps/ (…)
  • Package manifest file support
  • Compilation making it the a unit of compilation (memory reduction) (only once we actually have a compilation mechanism, #4)
  • Make the standard library core a package
    • link it by default for simple/single file crates with the option to unlink it --unlink-core
    • place it into libraries/core/ libs/core/ in this repository
    • rename src/ to compiler/

Named parameters

It's not clear whether we should support named arguments apart from record fields.
But since we'd already have the infrastructure, it's very easy to add. The only thing left is the design point regarding the declaration of named parameters. For the longest time, I have envisioned:

alpha (let gamma: Int) (let delta: Text): Int = gamma

beta: Int = alpha (gamma = @Int 15) (delta = "delta")

As you can see, the current proposal (re-)uses let for the declaration. Obviously, field and let would be mutually exclusive on parameters.

The MVP does not include reordering actually!

Misleading span when reporting type mismatch

Example:

use extern.core.unit.(Unit unit)
use extern.core.bool.(Bool false true)

Lazy (A: Type): Type = Unit -> A

Lazy-Bool: Type = Lazy Bool

;;;and (a: Bool) (b: Unit -> Bool): Bool =
and (a: Bool) (b: Lazy Bool): Bool =
    case a of
        false => false
        true => b unit

;;; @Bug span: "expected due to this" points to normalization of _lazy Bool_ instead of the pre-normalized for
main: Lazy Bool = and true (\(unit: Unit) => true)
;;;main: Bool = and true (\(unit: Unit) => true)

Output:

error[E032]: expected type `extern.core.unit.Unit -> extern.core.bool.Bool` but got type `extern.core.bool.Bool`
  --> /home/fmease/programming/main_projects/lushui/bugs/x.lushui:4:24
   |
 4 | Lazy (A: Type): Type = Unit -> A
   |                        --------- expected due to this
   |
  --> /home/fmease/programming/main_projects/lushui/bugs/x.lushui:15:19
   |
15 | main: Lazy Bool = and true (\(unit: Unit) => true)
   |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ has the wrong type
   |

Task: Minimize reproducer.

Related (instance of a similar issue): #31. We should use the span of the unevaluated expression (smh).

We don't evaluate expressions during term equivalence check

Meta: Add description.

Example:

F: Type = F

;;; @Bug does not overflow because equals does not evaluate
x: F = Type

use extern.core.nat.(+ Nat)

f (n: Nat): Nat = + 1 (f n)

data Holder (n: Nat): Type of

;;; @Bug should overflow but does not
y: Holder (f 100000) = Type

Output:

error[E032]: expected type `topmost.F` but got type `Type`
 --> /home/fmease/programming/main_projects/lushui/bugs/no-computation-during-type-checking.lushui:4:4
  |
4 | x: F = Type
  |    - expected due to this
  |
 --> /home/fmease/programming/main_projects/lushui/bugs/no-computation-during-type-checking.lushui:4:8
  |
4 | x: F = Type
  |        ^^^^ has the wrong type
  |

error[E032]: expected type `topmost.Holder (topmost.f 100000)` but got type `Type`
  --> /home/fmease/programming/main_projects/lushui/bugs/no-computation-during-type-checking.lushui:13:4
   |
13 | y: Holder (f 100000) = Type
   |    ----------------- expected due to this
   |
  --> /home/fmease/programming/main_projects/lushui/bugs/no-computation-during-type-checking.lushui:13:24
   |
13 | y: Holder (f 100000) = Type
   |                        ^^^^ has the wrong type
   |

Do blocks

  • lexing & parsing (since 15ac701)
  • lowering (in the lowerer and/or the name resolver and/or the type checker)

Depends on #24.

Records

Example of a record (note: the syntax is not final):

data Person: Type =
    person (field name: Text) (field birth: Date): Person

jane: Person = Person.person (name = "Jane") (birth = date.ymd 1980 1 1)
jane-name: Text = Person.person.name jane-name
jack: Person = Person.person "Jane" (date.ymd 1991 1 2)

Tasks:

  • remove the old record syntax using the keyword record (lexer, parser, lowerer)
  • add keyword field
  • (optional, otherwise future work) make the keyword contextual: allowed as binder for top-level declarations, binders of let/ins, use/ins, patterns
  • add fields to the grammar and parse them (, comes before field)
  • in the lowerer, gate duplicate fields (the resolver does this out of the box in register_binding)
  • deny unnamed fields in the lowerer (e.g. (field) -> Alpha) (impossible anyways, the parser catches that)
  • disallow multiple constructors when lowering if any of them has fields
  • in the name resolver, make constructors namespaces and add fields as functions to them
    • if case analysis supports deapplication by the time this gets implemented, use case analyses to implement field projections
    • otherwise add a temporary hack by adding a new kind of expression to the HIR like Projection (which needs extra typing rules etc.!) and elaborate to them
  • resolve named arguments in
    • expressions
    • patterns
  • (optional, otherwise future work) support @shallow on constructors (not part of the language any more, see #38)
  • (optional, otherwise future work) have a way to make field names private: for modules with insufficient privileges, those
    constructors should appear as if they weren't records: neither field projections nor named arguments are accessible

This issue describes the MVP of records and does not include named arguments outside of record declarations, re-ordering of
applications with named arguments or a record update syntax.

Sigma types

They are just are library type except that we also provide syntactic sugar for them (I think we should). Definition (depends on #2):

data Sigma (A: Type) (P: A -> Type): Type =
    sigma (,A: Type) (,P: A -> Type)
        (a: A) (proof: P a): Sigma A P

The syntactic sugar is not entirely clear but this is the current proposal:

  • (a: A ** P a) to crate.core.sigma.Sigma A (\a => P a)
  • (a ** p a) to crate.core.sigma.Sigma.sigma a (p a)

Note that we should not naïvely translate to the above path to the sigma type defined in core but intelligently to the actual resolved binding (and rise an error if core is not available (as a module or in the future as a crate #14)).

The most important blocker or at least the first to encounter is the question of precedence and associativity to be able to connect
it to the rest of the grammar.

Wrong span when reporting invalid data type instances

Example:

use extern.core.type.Type

data Thing: Type of
    thing: Kind

Kind: Type = Type

Confusing error message:

error[E033]: ‘extern.core.type.Type’ is not an instance of ‘topmost.Thing’
  ┌─ /home/fmease/programming/lushui/wh.lushui:6:14
  │
6 │ Kind: Type = Type
  │              ════

We use the span of the evaluated expression, not the one of the unevaluated one we should use.

More ergonomic record fields

Depends on/relates to #18.

Instead of the very verbose field projections which are namespaced under record constructors which in turn are namespaced under
data types, utilize type-directed namespaces in the language.

This proposal adds a new syntactic and semantic form called a (field) projection or field access.
Its grammar rule is Naked-Lower-Expression "::" Identifier (or # instead of ::) and it is a lower expression.
Examples: some.path.to::field_, 0::field_ (ill-typed), @A(@B hello)::field_.

Marking parameters of constructors as fields with the reserved symbol :: makes this projection available for values created with this
constructor. Example:

data Person: Type of
    person
        (::name: Text)
        (::age: Nat): Person

jane: Person = Person.person (name = "Jane") (age = 52)
jane-name: Text = jane::name

Counter-example:

;; `jane` from above
x: Nat = jane::afe ;; error: ~ `Person`/`person` does not have such a field
y: Nat = 0::identity ;; error: ~ `Nat` is not even a record, duh!

Initially, I did not want to add this feature (because of "m i n i m a l i s m") but I recognized that the current way of doing this
is horrendous! I really like that we use a different symbol for this instead of "re-using" . for "static" name resolution. This allows
us to provide better error messages for both invalid uses of . and ::.

The question remains if we should remove the old way of accessing fields or keep it as an alternative. I think we should throw it
away as we cannot lower either one to the other, meaning more unnecessary complexity.

For reference, the current design dictates generating functions namespaced under the constructor. Taking the example from
above, there would exist Person.person.name of type Person -> Text and Person.person.age of type Person -> Nat.
So you'd have to write (or use) Person.person.name jane for jane::name.

The advantage of the previous field projection design was that it enabled point-free style. Now, we need to write an explicit lambda to turn the projection into a first-class function: (\p => p::name) of type Person -> Text. That's not a big problem
however: Just provide more syntactic sugar: (::field_) to (\x => x::field_).

Example:

people: List Person = [(Person "Jane" 43) (Person "Jack" 33) (Person "Jim" 80)]
names: List Text = core.list.map (::name) people ;; ["Jane" "Jack" "Jim"]

Module exposure and privacy

  • plain @public
  • restricted exposure via @(public scope) with scope variable
  • rules around punctuation binders: public (even in restricted forms) punctuation must have a public alphanumeric alias disallow any re-exports of private bindings
  • public constructors
  • @opaque on data declarations (private constructors)
  • good diagnostics
  • thorough test suite
  • (optional, otherwise future work) private-in-public linting (prototype, later becoming a standard error instead of a denied lint) moved to #70

Sequence literals

  • lexing & parsing (with suffix syntax)
  • parse expression and pattern attributes (we don't have a suffix syntax anymore, we need attributes for this now, see #35)
  • lowering not in crate::lowerer but after name resolution
    • create attribute sequence-literal for data type declarations (only a marker, not a macro, only supported for selected types)
    • support List
    • support Vector

`@shallow` for data and constructor declarations

The attribute @shallow switches from the default mode of creating a namespace containing subordinate declarations to placing those into the same namespace where the original declaration is defined in.

It is semver compatible (here i.e. not a major change) for an API to go from a non-shallow data declaration to a shallow one if all constructors are (manually) used in the same module and the same exposure scope of the data type. Starting out with non-shallow is thus forward compatible. The reverse is not given. Neither direction is given for (non-)shallowness on constructors since one cannot use inside of the constructor listing.

Motivation

Less to type esp. when prototyping since one does not need to qualify paths as much or need to use them. No other reason. Therefore, it is quite low priority and might not even become part of the language at all.

Scope

  • on data declarations
  • on constructor declarations

Trait system

There are several ideas already (in the design document) but there is no "final" decision yet.
My tendency since forever is a manual trait system meaning records + implicit parameters type 2 (no normal type inference but a search through module-local bindings with unifiable types).

1st Design Proposal: Records and Automatic Parameters

Trait (type class, interface, …) definitions are just records (#18, #42) and trait bounds (type class constraints) are automatic parameters.

Automatic parameters show similarities to implicit ones but the algorithm for "inferring" them is vastly different.
For implicits, the type checker applies type inference (applying inference rules and synthesizing values to fill inference variables).
For "automatics", the type checker searches the module scope (!) (implies trait impls need to be used) for bindings matching the required type probably substituting implicits or whatever. It will throw an error on ambiguity. Has a depth to not search endlessly (seems to be necessary in Idris 1 for some good reason). Temporary syntax is two single quotes ("double ticks") mirroring implicits which have merely a single single quote (assuming #63).

Example:

;; a trait declaration
data Monoid (A: Type): Type =
    monoid '(A: Type)
        (field empty: A)
        (field append: A -> A -> A): Monoid A
        ;; ^ generates field accessor of type
        ;; `'(A: Type) -> ''(Monoid A) -> (A -> A -> A)`

;; trait implementations
monoid-nat-sum: Monoid Nat =
    Monoid.monoid
        (empty = 0)
        (append = +)

monoid-nat-product: Monoid Nat =
    Monoid.monoid
        (empty = 1)
        (append = *)

;; trait bounds
append-thrice '(A: Type) ''(monoid: Monoid A) (a: A): A =
    use Monoid.monoid.append in
    append a (append a a)
    ;; ^ or `append ''monoid a (append ''monoid a a)`
    ;; or (with ergonomic record fields): `monoid#append a (monoid#append a a)`

;; usage (in this case, the trait implementation needs to be manually supplied as there are two
;; matching ones)
thing: Nat = append-thrice ''monoid-nat-product 2 ;; => 8
;; ^ or (with erg. r. fields): `monoid-nat-product#append-thrice 2`

Pros:

  • less distinct concepts (primitives) in the surface language
    • interactions with other features don't need to be analyzed as it's already based on existing features
    • less to learn for newcomers
  • less to implement (?)
  • allows multiple implementations for a type (non-canonicity)
  • requires users to name trait implementations and the names won't be beautiful in most cases but simply descriptive)

Cons:

  • worse error messages (this phenomenon is always imposed by designs involving non-primitives)
  • verbose, clunky, maybe less readable
  • hacky because of "arbitrary" restrictions around resolving automatic parameters
  • unclear how to realize default and especially minimal method implementations
  • unclear how to alias trait methods (= record fields) to punctuation (e.g. Monoid.monoid.<> aliasing Monoid.monoid.append)
  • allows multiple implementations for a type (non-canonicity)

2nd Design Proposal: Records with Syntactic Sugar and Automatic Parameters

Introduce trait declarations (beginning with the keyword trait) which are lowered to records (but probably with the information of them being desugared traits preserved).

Meta: Expand upon this description.

3rd Design Proposal

Meta: Design.

Restructure the HIR to use indices for nested declarations

Instead of having hir::Declarations inside hir::{Module, Data}, use HIRIndex/HIRIdentifier and store all declarations linearly in CrateScope. This would make iterating over declarations easier. We can and should then also store the parent for each declaration.
This would get rid of our Index/Environment "hack" in the typer (and resolver(?)) we currently require in some cases for handling constructors and need some data from the parent data declaration — in the new system, we can simply look up the parent by HIRIndex (plus some "downcasting"). Maybe this would also allow us to throw out the typer::Registration system by just having a list of those indices. I don't know.

This might also help finding unused declarations, implementing privacy and printing relative paths but maybe it does not but only if we also do this change to lowered_ast::Declaration b.c. the resolver needs it. I dunno.

Language support for the universe pattern

Instead of this kind of boilerplate:

data U0: Type =
    int: U0
    text: U0
    option: U0 -> U0

interpret (u: U0): Type =
    case u of
        U0.int => Int
        U0.text => Text
        U0.option \u => Option (interpret u)

We might want to have some syntactic sugar like this:

@(universe interpret)
data U0: Type =
    int = Int
    text = Text
    option = Option

Report better error message on type mismatch involving a type parameter

Given the program take (A: Type) (f: A -> A): A = f 0 (with dep. to core), the current message is:

error[E032]: expected type `A`, got type `crate.core.nat.Nat`
 >>> file.lushui:4:20
  |
4 | take (A: Type) (f: A -> A): A = f 0
  |                    - expected due to this
  |
 >>> file.lushui:4:35
  |
4 | take (A: Type) (f: A -> A): A = f 0
  |                                   ^ has wrong type
  |

error: aborting due to previous error

Which is not bad but I prefer it saying expected type parameter `A`, … and also add a secondary span on the definition site of the parameter like parameter defined here. Also, the phrasing expected due to this does not really work in this context, for my liking. It should be more precise like `f` takes a parameter of type `A` or something like that special-casing pi types here.

Optional laziness

The first version of this feature is going to look it is specified below. This will probably not be the final form of it.

New keyword lazy which goes on function parameters (after , and before field). Makes the type of the affected parameter equal to external.core.unit.Unit -> A with A being the original type in the source code. Most importantly, when applying a function with a lazy parameter, the argument is not immediately evaluated but rewritten into a thunk, that is a function from unit to the value. Note that outside of the function, that means when you apply it, the type of the argument must be of type A and cannot be of type Unit -> A (this should help type inference, unless mistaken), but inside of it, i.e. in the body of the function as well as the explicit type annotation of the body, it is of type Unit -> A. To evaluate the value, one can of course simply apply it to core.unit.unit, conventionally, however, one uses the function core.unit.force.

Example

;; core.bool:

use external.core.(unit.force bool.(Bool false true))

if (,A: Type) (condition: Bool) (lazy consequent alternate: A): A =
    case condition of
        true => force consequent
        false => force alternate

;; user code:

use external.core.(bool.if nat.(Nat =< -))

result (switch: Nat): Nat =
    if (=< switch 100)
        0
        (- switch 100)

Alternative Designs

  • similar to the one above but the parameter type needs to Unit -> A not A (which makes it less implicit). Additionally, add the type alias Thunk (A: Type): Type= Unit -> A to the standard library. Signature of if: (,A: Type) -> (condition: Bool) -> (lazy consequent: Thunk A) -> (lazy alternate: Thunk A) -> A
  • an attribute @lazy on expressions (meta: task: pros & cons)
  • a special type function à la @blessed lazy: Type -> Type = identity
  • a special parameterized type @blessed data Lazy (A: Type): Type with a constructor lazy
  • ??

Coercions that should be legal:

  • (lazy a: A) -> P a to (a: A) -> P a forfeiting laziness becoming strict/eager
  • (a: A) -> P a to (lazy a: A) -> P a gaining laziness becoming lazy
  • most importantly, f a to f \_ => a if f of type (lazy A) -> B with a of type A

Improve current design of intrinsic bindings

Even though the current design of foreign bindings will hopefully be replaced by proper C FFI with the ability to specify calling conventions etcetera, we should improve the current situation for now.

Right now, foreign bindings are not namespaced. They should respect the module system. For example today, a foreign binding alpha in module crate.some.where.here is referenced by the string "alpha" on the Rust side. But ideally, it should only be referenceable by the string "crate.some.where.here.alpha" or without the leading crate. until we have a crate system. If we have a crate system in place, it should maybe be core:some.where.here or core.some.where.here or something else depending on how the final design of the crate system looks like.

Then, update the bindings for core.

For the purpose of developing this language implementation in stages, it should also be possible to explicitly add this "foreign key" to @foreign. E.g. @(foreign core.some.where.here.alpha) or @(foreign "core:some.where.here.alpha"). This should only be temporary. The motivation is the gained ability to develop the byte code generator without the need to reference and therefore also support the whole standard library core. E.g. we might like to generate code for very simple code snippet using very few features but requiring let's say the type Int32. It can then easily be included like this:

;; module core ;; <- not necessary!

@(foreign core.int32.Int32)
data Int32: Type

;; ... rest of the code ...

Without namespaces, we cannot add identically named (up to the local module) bindings from different modules. That's what we'd like to do in core: Function add in modules Nat, Nat32, ..., Int, ....

Deriving

Ties in with the design of the trait system #24. Should be designed similarly to GHC Generics but simpler of course.
So it's a macro system.

Improve golden tester

  • Print file diffs instead of both golden and actual output. Use the crate difference for this. Right now, the differences are really hard to spot.
  • Compile the lushui compiler before running the tests (via cargo run). This already lead to several bugs (setting up CI) and frustrating debugging sessions. Update CI once this task is completed
  • Don't hackily invoke the compiler by referencing the target directory but get the information from cargo (smh). This ensures compatibility with Windows
  • Allow passing environment variables to overwrite/update certain (or all) golden files
  • Improve error reporting
  • Add support for (nested) folders in tests/ so we can start organizing tests by topic, domain etc. and feel more motivated to write more tests

Rune type for Unicode scalars

Form: @Rune "X" where X stands for a single Unicode scalar value.

Steps necessary to follow, to implement basic support for them to close this issue:

  • Make the lowerer understand the attributes Rune and Text
  • Add Rune as a primitive value for the lowered AST and the HIR
  • Correctly lower literals of the form "text", @Text "text", @Rune "R" checking whether the rune literal is indeed a single valid scalar value
  • Add the foreign type Rune and add it to the standard library core in the module rune
  • (optional) foreign functions to turn runes into text and to nat32s

Bytecode backend

Biggest blocker right now: Compiling and interpreting closures.

Pattern bindings

A pattern binding is a way to bind a subpattern to a name. Not to be confused with pattern binders.
We probably need a better names (having bindings and binders is really confusing and seem arbitrarily assigned).

This adds the syntax "(" Lower-Pattern "as" Identifier ")" (or something similar depending on the precedence and associativity of as).

Example:

thing (value: Duple Nat Foo): Unit =
    case value of
        duple 0 ((foo \_) as a) => unit
        \_ => unit

Clean up argument parsing

Over time, we amassed a fair amount of boilerplate even though we are already using a proc-macro driven argument parser, structopt.
For clarification, this originates from the requirement of supporting shared and distinct options per subcommand. Maybe I don't use structopt's full potential yet or it simply cannot handle it and we are forced to go back to the procedural API of clap.

We also need to design the basic options our CLI should currently understand: Selecting between the engines TWI and BCI, perma-unstable flags, etc.

Basic support for case analysis

This issue consists of the following sub-issues:

  • Deapplications (type checking + interpreting)
  • Analyzing polymorphic values (type checking + interpreting)

And optionally includes:

  • Analyzing uninhabited types
  • Warning on duplicate cases
  • Disallowing non-exhaustive matching
  • Implicit deapplication
  • Deapplication (compiling to BC)
  • Polymorphic value (compiling to BC)

It does not contain:

  • GADT support
  • Dependent pattern matching

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.