sharkdp / numbat Goto Github PK

A statically typed programming language for scientific computations with first class support for physical dimensions and units

Home Page: https://numbat.dev

License: Apache License 2.0

Rust 92.19% HTML 2.31% JavaScript 1.77% Vim Script 0.23% Handlebars 2.22% Shell 0.30% CSS 0.83% PHP 0.13%

calculator physics programming-language statically-typed terminal-based units web-app

numbat's People

Stargazers

Watchers

numbat's Issues

Proper command-line interface

Usage:

insect [--debug] [--expr <EXPR>] [<file>]

Add assert_eq statements

let x = 3 + 4;

assert_eq(x, 7);

where assert_eq is of type

fn assert_eq<D>(x: D, y: D) -> !

This needs to be checked rather sooner than later. Does our "user-defined units" concepts fare well with SI prefixes that can come directly before the identifier? Or do we only support those prefixes on specified units (we probably should)? Or only on a small set of builtin units (that would be a bit sad)?

kilometer => prefix(kilo) · identifier(meter)
km => prefix(kilo) · identifier(meter)

Run all examples in unit tests

Run everything in examples and make sure it passes
Run everything in examples/errors and make sure it fails (with the right error)

Implement exponentiation operator for expressions

also, rewrite the prelude

Write integration test for prelude.ins

Should "foo (x)" be parsed as a function call?

Dimension annotations

Further expand the idea of dimension-annotations via : (or related syntax), allowing users to write:

let d: Length = 2 meter
let t: Time = 2 meter # error

# simple functions
kineticEnergy(m: Mass, v: Speed) -> Energy = 0.5 m v^2
kineticEnergy(5 gram, 100 km/h)
kineticEnergy(5 kg, 3 meter/second^2) # error

# generics?
let f<T>(x: T) -> T^2 = 2*x^2

Compile everything, then run

"overload" interpret_statements to compile everything first, the execute. Right now, we compile each statement separately and then immediately run the VM.

Write excellent error messages

We already have the full machinery to make error reporting great.

Interesting challenges

List of interesting challenges when implementing a parser, type checker and bytecode interpreter for scientific calculations with physical dimensions and units:

Parser
- Precedence of unary minus compared with exponentiation depends on which side of the ^ we are on: -2^3 == -(2^3), but 2^-3 == 2^(-3).
- Parsing SI prefixes. Problems with non-normal units (US customary, Imperial), only allow metric prefixes for SI units, long-and-short prefixes, …
- Ambiguous edge cases:
  - ft: foot or femto tonne?
  - cd: candela or centi days?
- 2e3 is a number but 2e is 2 × e.
- Implicit multiplication
Type checker
- The problem with exponentiation (see README)
- Minor: asymmetry between add/sub and mul/div: the first two can produce type check errors, the latter can't.
- Type inference as a system of equations
- We need generics, otherwise we can not type something as simple as sqrt(x) = x^(1/2)
- Polymorphic zero? meter + 0 should be fine. But general case can probably not be solved, e.g. meter + (second - second)
- Rational exponents (eg definition of planck length)
Unit representation
- Angles (are angles dimensionless, does deg decay to a scalar?, sin(scalar)?, sin(angle)?)
- Units like cm/m or inch/meter which have the dimension of a scalar.
- Units with additive offsets like °C or °D
- Units with rational exponents
- Formatting of units, e.g. J/(m²·K)
Evaluation
- Prefixes need to be handled separately, otherwise you run into numerical problems when e.g. 1 / 2ns is being converted to GHz
- Distinguish between decimal, rational, integer?
- Simplification of units kg m²/s² back to J. How are unit system boundaries treated?
- Exact conversions between units?
Good error messages
- Incompatible dimensions/units
Features
- User-defined units

Allow only one fundamental unit per dimension?

Functions: FFI

Allow calling of foreign functions (Rust code).

extern fn sqrt<T>(x: T) -> T^(1/2)
extern fn sqr<T>(x: T) -> T^2
extern fn sin(x: Angle) -> Scalar

Support parsing 2^-3?

See https://github.com/sharkdp/insect/blob/5002e3fa880f980d2a51d3b3819679a52d33376f/src/Insect/Parser.purs#L389-L404

Large computations

Make sure we can run computations like these: sharkdp/insect#346

Inspiration

https://pikelet-lang.github.io/pikelet/
https://www.figr.app/

Add source-spans to all type-check and interpreter errors

Jupyter Notebook Integration?

Split into library and main application

Automatically load prelude.ins

Modules / Imports

Allow for imports such that we can structure the Prelude into multiple documents (e.g. separate maths from physics to allow users to easily construct custom preludes).

Do we want to go for a simple "include this file" mechanism or implement something cleaner like an actual module system with namespaces, etc.?

Advanced conversion operations

users might also like to explicitly convert to:

a base SI representation, like kg m³ / s², e.g. expr -> base
a shortest-possible representation, e.g. J·m e.g. expr -> shortest
to the canonical representation (see also #3), e.g. expr -> canonical

Do we also need/want to allow for conversions to dimensions? is something like expr -> Energy useful? Users could always add type annotations, i.e. use sth like expr: Energy to make sure that something is an energy.

see #3 for implicit conversions

Demonstrate performance improvement over PureScript version with a benchmark

Just to get some hard data on the performance improvement (and also because I'm personally curious, hehe).

VM: pop function arguments off the stack

Constants and variables

It would be cool to define constants like pi or e in Numbat code itself (like we currently do in the prelude). That would require us to introduce the concept of constants / immutable variables.

WASM

As early as possible, make sure that we can build everything with WASM. We shouldn't rely on dependcies/technologies that don't work with WASM.

Syntax for defining unit properties

We need a way to declare whether or not units allow prefixes. And we need a way to add aliases.

Ideas:

unit meter[si_prefixes,short="m"]: Length

#[si_prefixes,short="m"]
unit meter: Length

or we always use new declarations for aliases:

unit minute = …
unit minutes = minute
unit min = minute

Functions: first iteration

Function declarations
Local variables?
Arity errors

Things to think about:

Mathematica-style function calls, super convenient for repls: 4.0 // sqrt.
should foo(x) always be parsed as a function call? or could it be an implicit multiplication?

Short unit names

Interesting resources

https://www.cs.ox.ac.uk/people/samuel.staton/papers/tlca2015.pdf

Shorthand notation for new unit including new dimension?

maybe we should add "syntactic sugar" and allow 'unit px' to mean: 'dimension px; unit px: px'

Unit conversions

Allow for non-standard units like inch
Implement conversion operator

Experiment: generate LLVM IR

Instead of bytecode, simply(?) generate LLVM IR?
https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl03.html

Add quantities as runtime concept

Use high-precision numeric type

https://crates.io/crates/rug (LGPL! Possible to compile to WASM?)
Simply use f64 or something like f128? => we could put numeric values on the stack

Functions: generics / parametric polymorphism

Add support for generic functions, to allow things like

fn sqr(x) = x^2

Syntax

We can go the Rust route and do something like

fn sqr<T>(x: T) -> T^2 = x^2

Or we could add a special syntax for type variables, e.g. $T:

fn sqr(x: $T) -> $T^2 = x^2

Or go the Haskell way and use lowercase expressions as variables:

fn sqr(x: d) -> d^2 = x^2

that could work well with inferred types (#29):

f(x: d0, y: d1, z: d2) -> dR = …

Functions: recursion

In order for recursion to make any sense, we will probably need some boolean types, comparison operations, and maybe a ternary operator or an if <cond> then <expr1> else <expr2> expression.

Support for decimal numbers

Zero should be polymorphic

3 meter + 0 should work. Similarly: something like hypot(2m, 0) should work.

Functions: type inference

It would be very cool if we could infer function parameter types and return types. For example, given hypothetical code like

fn g(t: Time) -> Speed = …

fn f(d, t) = (d + 1 meter) / g(t)

we could easily infer from d + 1 meter that d: Length, and from g(t) that t: Time. Furthermore, we could compute the return type to be Length / Speed =Time, leading to an inferred type of

fn f(d: Length, t: Time) -> Time

This also relates to generics (#25), as some (most?) functions would probably have inferred generic types. For example, in the absence of any annotations,

kineticEnergy(m, v) = 1/2 * m * v^2

would have an inferred type of

kineticEnergy<T0, T1>(m: T0, v: T1) -> T0 * T1^2 = 1/2 * m * v^2

Possible constraints

Addition, Subtraction, Conversion: these binary operators act as constraints on their operands. For example, an addition like
```
expr1 + expr2
```
imposes a simple [expr1] == [expr2] constraint. Similarly, expr1 -> expr2 also forces expr1 to have the same dimension as expr2.
Function application: A function call like
```
f(expr1, expr1, …, exprN)
```
imposes N constraints, one for each argument: [exprK] == [parameter k of f]. A simple example would be something like
```
fn normalize_len(l: Length) -> Length = …
fn f(d, alpha) = normalize_length(d) * sin(alpha)
```
where we could use the function calls to easily get the types of d and alpha.
Exponentiation: Since the exponent in an expression like
```
expr1 ^ expr2
```
always needs to be a scalar, we can infer [expr2] == 1. In general, we can not assume that expr1 is also a scalar (meter^2 is perfectly fine, for example). So there is also no constraint for expr1.

Formalization

Given a (partially annotated) function definition like one of these (Cx is a concrete type):

f(x0, x1, x2, …) = …
f(x0, x1: C1, x2, …) = …
f(x0, x1, x2, …) -> C_ret = …
f(x0, x1: C1, x2, …) -> C_ret = …
…

We start by filling in blanks with free type variables T0, T1, … and T_ret:

f(x0: T0, x1: T1, x2: T2, …) -> T_ret = …
f(x0: T0, x1: C1, x2: T2, …) -> T_ret = …
f(x0: T0, x1: T1, x2: T2, …) -> C_ret = …
f(x0: T0, x1: C1, x2: T2, …) -> C_ret = …
…

Next, we walk the AST of the right hand side expression and record all constraints that we discover (see above). For example, given a declaration like f(x0, x1, x2) = sqrt((x0 · x2² + second) · second / x1²) -> meter, we would identify:

T0 · T2^2 == Time (from x0 · x2² + second)
((T0 · T2^2) · Time / T1²)^(1/2) == T0^(1/2) · T1 · T2 · Time^(1/2) == Length (from the -> conversion operator)

These constraints would then have to be converted to a system of linear equations on the exponents.

  1 * exponents0 + 0 * exponents1 + 2 * exponents2 == (1, 0)
1/2 * exponents0 + 1 * exponents1 + 1 * exponents2 == (1, -1/2)

where exponents_i and the right hand sides would be vectors with one rational number entry for each of the D physical dimensions. In total, this is a set of N_constraints · D equations, where the left-hand side will be the same regardless of the physical dimension.

Examples

Unique solution

Example 1

g(t: Length) -> Length = …
f(x, y, z) = g(sqrt(x)/y) * (y + 2 * second) * 2^z

From 2^z, we infer [z]: 1.
From y + 2 * second, we infer [y]: Time
From g(sqrt(x)/y), we infer sqrt(x)/y: Length, i.e. sqrt(x): Length · Time or x: Length² · Time².
Finally, there return type can be computed as Length · Time

f(x: Length² · Time², y: Time, z: 1) -> Length · Time = g(sqrt(x)/y) * (y + 2 * second) * 2^z

Example 2

f(x) = x + x^2

should be inferred as

f(x: Scalar) -> Scalar

Example 3

(from https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-391.pdf)

f(x,y,z) = x*x + y*y*y*y*y + z*z*z*z*z*z

should ideally be inferred as

f<T>(x: T^15, y: T^6, z: T^5) -> T^30

Underdetermined

For a function definition like

f(x, y) = x + y^3

we could infer one of those (equivalent) types:

f(x: T, y: T^(1/3)) -> T = x + y^3
f(x: T^3, y: T) -> T^3 = x + y^3

Overdetermined

A function like

f(x) = (x + meter)/(y + second)

should lead to a type check error.

Colorized output

[question] Why is there a new `fn` keyword? Is the existing function declaration syntax not sufficient?

Title.

Examples for nontrivial exponents: adiabatic exponent

https://en.wikipedia.org/wiki/Heat_capacity_ratio

Canonical unit for dimension

With our new dimension concept, we can define a canonical unit for a certain dimension. For example: meter: Length, joule: Energy. Or if someone uses Imperial, ft: Length. Then, if the user doesn't ask for an explicit conversion, we can figure out the target dimension and use the canonical unit.

This will allow us to solve sharkdp/purescript-quantities#27

Use a different parsing technique

A Pratt parser would probably be quite useful for us because of the large number of (binary) operators.

https://tratt.net/laurie/blog/2020/which_parsing_approach.html
https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html
https://blog.reverberate.org/2013/09/ll-and-lr-in-context-why-parsing-tools.html

Rational exponents in dimensions

We need this to allow for things like x^(1/2).

Currencies

Maybe we could have currencies as a separate module that needs to be imported. Something like

import currencies

30 € -> THB

Internally, that module would fetch currency data and define all those currencies.

Introduce optimization stage

After type checking, we could introduce an optimization stage in the compiler that could - in theory - perform a lot of optimizations, as most (so far: all) of the code is pure and could be evaluated at compile time.

There are also some simple optimizations that would generate cleaner byte code, like:

Negate(Constant(n)) => Constant(-n)