It would be very cool if we could infer function parameter types and return types. For example, given hypothetical code like
fn g(t: Time) -> Speed = …
fn f(d, t) = (d + 1 meter) / g(t)
we could easily infer from d + 1 meter
that d: Length
, and from g(t)
that t: Time
. Furthermore, we could compute the return type to be Length / Speed =Time
, leading to an inferred type of
fn f(d: Length, t: Time) -> Time
This also relates to generics (#25), as some (most?) functions would probably have inferred generic types. For example, in the absence of any annotations,
kineticEnergy(m, v) = 1/2 * m * v^2
would have an inferred type of
kineticEnergy<T0, T1>(m: T0, v: T1) -> T0 * T1^2 = 1/2 * m * v^2
Possible constraints
- Addition, Subtraction, Conversion: these binary operators act as constraints on their operands. For example, an addition like
imposes a simple
[expr1] == [expr2]
constraint. Similarly, expr1 -> expr2
also forces expr1
to have the same dimension as expr2
.
- Function application: A function call like
f(expr1, expr1, …, exprN)
imposes N constraints, one for each argument: [exprK] == [parameter k of f]
. A simple example would be something like
fn normalize_len(l: Length) -> Length = …
fn f(d, alpha) = normalize_length(d) * sin(alpha)
where we could use the function calls to easily get the types of d
and alpha
.
- Exponentiation: Since the exponent in an expression like
always needs to be a scalar, we can infer
[expr2] == 1
. In general, we can not assume that expr1
is also a scalar (meter^2
is perfectly fine, for example). So there is also no constraint for expr1
.
Formalization
Given a (partially annotated) function definition like one of these (Cx
is a concrete type):
f(x0, x1, x2, …) = …
f(x0, x1: C1, x2, …) = …
f(x0, x1, x2, …) -> C_ret = …
f(x0, x1: C1, x2, …) -> C_ret = …
…
We start by filling in blanks with free type variables T0, T1, …
and T_ret
:
f(x0: T0, x1: T1, x2: T2, …) -> T_ret = …
f(x0: T0, x1: C1, x2: T2, …) -> T_ret = …
f(x0: T0, x1: T1, x2: T2, …) -> C_ret = …
f(x0: T0, x1: C1, x2: T2, …) -> C_ret = …
…
Next, we walk the AST of the right hand side expression and record all constraints that we discover (see above). For example, given a declaration like f(x0, x1, x2) = sqrt((x0 · x2² + second) · second / x1²) -> meter
, we would identify:
T0 · T2^2 == Time
(from x0 · x2² + second
)
((T0 · T2^2) · Time / T1²)^(1/2) == T0^(1/2) · T1 · T2 · Time^(1/2) == Length
(from the ->
conversion operator)
These constraints would then have to be converted to a system of linear equations on the exponents.
1 * exponents0 + 0 * exponents1 + 2 * exponents2 == (1, 0)
1/2 * exponents0 + 1 * exponents1 + 1 * exponents2 == (1, -1/2)
where exponents_i
and the right hand sides would be vectors with one rational number entry for each of the D physical dimensions. In total, this is a set of N_constraints · D equations, where the left-hand side will be the same regardless of the physical dimension.
Examples
Unique solution
Example 1
g(t: Length) -> Length = …
f(x, y, z) = g(sqrt(x)/y) * (y + 2 * second) * 2^z
- From
2^z
, we infer [z]: 1
.
- From
y + 2 * second
, we infer [y]: Time
- From
g(sqrt(x)/y)
, we infer sqrt(x)/y: Length
, i.e. sqrt(x): Length · Time
or x: Length² · Time²
.
- Finally, there return type can be computed as
Length · Time
f(x: Length² · Time², y: Time, z: 1) -> Length · Time = g(sqrt(x)/y) * (y + 2 * second) * 2^z
Example 2
should be inferred as
Example 3
(from https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-391.pdf)
f(x,y,z) = x*x + y*y*y*y*y + z*z*z*z*z*z
should ideally be inferred as
f<T>(x: T^15, y: T^6, z: T^5) -> T^30
Underdetermined
For a function definition like
we could infer one of those (equivalent) types:
f(x: T, y: T^(1/3)) -> T = x + y^3
f(x: T^3, y: T) -> T^3 = x + y^3
Overdetermined
A function like
f(x) = (x + meter)/(y + second)
should lead to a type check error.