hakaru-dev / hakaru Goto Github PK

A probabilistic programming language

License: BSD 3-Clause "New" or "Revised" License

Haskell 99.98% Shell 0.01% Makefile 0.01% Perl 0.01% TeX 0.01% Python 0.01% Emacs Lisp 0.01% Vim Script 0.01%

haskell maple machine-learning bayesian-statistics probabilistic-programming

hakaru's Introduction

Hakaru

Hakaru is a simply-typed probabilistic programming language, designed for easy specification of probabilistic models and inference algorithms. Hakaru enables the design of modular probabilistic inference programs by providing:

A language for representing probabilistic distributions, queries, and inferences
Methods for transforming probabilistic information, such as conditional probability and probabilistic inference, using computer algebra

It can be used to aid in the creation of machine-learning applications and stochastic modeling to help answer variable queries and distributions.

Warning: This code is alpha and experimental.

For Hakaru documentation, including an installation guide and some sample programs, visit hakaru-dev.github.io.

Citing us

When referring to Hakaru, please cite the following academic paper:

P. Narayanan, J. Carette, W. Romano, C. Shan and R. Zinkov, "Probabilistic Inference by Program Transformation in Hakaru (System Description)", Functional and Logic Programming, pp. 62-79, 2016.

@inproceedings{narayanan2016probabilistic,
	title = {Probabilistic inference by program transformation in Hakaru (system description)},
	author = {Narayanan, Praveen and Carette, Jacques and Romano, Wren and Shan, Chung{-}chieh and Zinkov, Robert},
	booktitle = {International Symposium on Functional and Logic Programming - 13th International Symposium, {FLOPS} 2016, Kochi, Japan, March 4-6, 2016, Proceedings},
	pages = {62--79},
	year = {2016},
	organization = {Springer},
	url = {http://dx.doi.org/10.1007/978-3-319-29604-3_5},
	doi = {10.1007/978-3-319-29604-3_5},
}

hakaru's People

Contributors

Stargazers

Watchers

hakaru's Issues

VarEq bug in normalize

Email from @zaxtax:

When I call normalize on the following program:

bet <~ beta(5,3)
x1 <~ factor(bet)
if bet > 0.7: return true else: return false

I get the following error:

[zv@t420 hakaru (master)]% normalize t.hk
normalize: VarEqTypeError (Variable "x1" (Nat 1) (SData (STyCon (SingSymbol Proxy :: Sing "Unit")) (SPlus SDone SVoid))) (Variable "" (Nat 1) SProb)

Note this happens after expect is called, as if I print out the AST for the program, every Variable has a unique ID:

syn (MBind :$ ((syn ((MeasureOp_ Beta) :$ ((syn ((CoerceTo_ (CCons (Continuous HContinuous_Prob) CNil)) :$ ((syn (Literal_ (LNat 5))) :* End))) :* ((syn ((CoerceTo_ (CCons (Continuous HContinuous_Prob) CNil)) :$ ((syn (Literal_ (LNat 3))) :* End))) :* End)))) :* ((bind (Variable "bet" (Nat 0) SProb) (syn (MBind :$ ((syn (Superpose_ [(var (Variable "bet" (Nat 0) SProb),syn (Dirac :$ ((syn (Datum_ (Datum "unit" (Inl Done)))) :* End)))])) :* ((bind (Variable "x1" (Nat 1) (SData (STyCon (SingSymbol Proxy :: Sing "Unit")) (SPlus SDone SVoid))) (syn (Case_ (syn ((PrimOp_ (Less HOrd_Prob)) :$ ((syn (Literal_ (LProb (7 % 10)))) :* ((var (Variable "bet" (Nat 0) SProb)) :* End)))) [Branch (PDatum "true" (PInl PDone)) (syn (Dirac :$ ((syn (Datum_ (Datum "true" (Inl Done)))) :* End))),Branch (PDatum "false" (PInr (PInl PDone))) (syn (Dirac :$ ((syn (Datum_ (Datum "false" (Inr (Inl Done))))) :* End)))]))) :* End))))) :* End)))

Any ideas?

"cabal test" blocks on ssh access to karst.uits.iu.edu

Alexs-MacBook-Pro:hakaru alx$ cabal test
Preprocessing test suite 'system-testsuite' for hakaru-0.3.0...
Running 1 test suites...
Test suite system-testsuite: RUNNING...
The authenticity of host 'karst.uits.iu.edu (149.165.234.34)' can't be established.
RSA key fingerprint is SHA256:LFxx+DwrCz79spUAikQ7A3gqTQCn+1MG91+kMtEU400.
Are you sure you want to continue connecting (yes/no)?

Disintegrate is emitting bindings out of order

The following modified form of tugofwar:

def pulls(strength real):
    normal(strength, 1)

def winner(a real, b real):
    a_pull <~ pulls(a)
    b_pull <~ pulls(b)
    return (a_pull - b_pull)

alice <~ normal(0,1)
bob   <~ normal(0,1)
carol <~ normal(0,1)

match1 <~ winner(alice, bob)
match2 <~ winner(bob, carol)
match3 <~ winner(alice, carol)

return ((match1, match2), match3)

Disintegrates to the following invalid program

fn x13 pair(real, real): 
 (match x13: 
   (x30, x31): 
    x42 <~ normal(strength, 1)
    x55 <~ normal(strength, 1)
    pulls = fn strength real: normal(strength, 1)
    winner = fn a real: 
              fn b real: 
               a_pull <~ pulls(a)
               b_pull <~ pulls(b)
               return (a_pull - b_pull)
    alice <~ normal(0, 1)
    bob <~ normal(0, 1)
    carol <~ normal(0, 1)
    x59 <~ weight((exp((negate(((x30 + x42 - alice) ^ 2)) / 2))
                    / 
                   1
                    / 
                   sqrt((2 * pi))),
                  return ())
    x58 <~ weight((exp((negate(((x31 + x55 - bob) ^ 2)) / 2))
                    / 
                   1
                    / 
                   sqrt((2 * pi))),
                  return ())
    match3 <~ winner(carol, alice)
    return match3
   _: reject. measure(real))

Any ideas why?

L.H.Pretty.Concrete loses parens around array

$ cat array.hk
(array i of 10: i)[3]
$ pretty array.hk
array i of 10: i[3]

The pretty-printing loses parens and produces something that no longer parses as the original program.

Improve pretty printing (Concrete.hs)

The following things should be done to help improve legibility of pretty printing to the concrete syntax:

When encountering RealPow e (-1/n) replace with Recip (NatRoot e n). N.B., this is the negative case which isn't currently handled.
Since NatRoot e 2 is so common, pretty print as sqrt e instead. N.B., make sure that sqrt is defined for symbol resolution too.
When encountering e1 + -e2 pretty print as e1 - e2 instead. N.B., should handle both the case where e2 is negate e3 as well as the case where e2 is a negative literal.
When encountering e1 * 1/e2 pretty print as e1 / e2; again both for recip e3 and for literal 1/v3.

Version number in cabal is too restricted.

It's very easy to encounter dependecy conflicts with other packages.

Maple "123*x" parses as Hakaru "x/1"

$ cat p1.hk 
fn x prob: x*123
$ ~/.cabal/bin/simplify --debug p1.hk 
Sent to Maple:
use Hakaru, NewSLO in timelimit(90, RoundTrip(lam(x, HReal(Bound(`>=`,0)), (x * 123)), HFunction(HReal(Bound(`>=`,0)), HReal(Bound(`>=`,0))))) end use;
Returning from Maple:
lam(x,HReal(Bound(`>=`,0)),123*x)

fn x prob: (x / 1)

Change prob to real makes simplify produce the right output again.

Newsgroup data TODO items

Hierarchical names (currently News is toplevel)
Fix hardcoded path for newsgroup directory
Integrate @zaxtax 's code for pulling data as part of build
Match McCallum's word-filtering rules
Add an easy way to subset the data - maybe set topic indices to include and max docs per topic

Regression in KB

Given the following

with(KB): with(Hakaru):
kb := KB(Constrain(i = Hakaru:-idx(Hakaru:-idx(t, k), i专)), Introduce(i专, HInt(Bound(`>=`, 0), Bound(`<=`, Hakaru:-size(Hakaru:-idx(t, k))-1))), Introduce(i, HInt(Bound(`>=`, 0), Bound(`<=`, Hakaru:-size(Hakaru:-idx(t, k))-2))), Introduce(k, HInt(Bound(`>=`, 0), Bound(`<=`, Hakaru:-size(t)-1))), Introduce(t, HArray(HArray(HInt(Bound(`>=`, 0))))));
cond := Hakaru:-idx(Hakaru:-idx(t, k), i专)+1 = Hakaru:-size(Hakaru:-idx(t, k));
res := assert( Not(cond), kb )

On stable, res is equal to the original KB. On master, the condition is added to the KB as a Constrain.

This breaks the simplification of dice_index, as the extra condition appears in a match expression in the result.

This call to assert occurs in Loop:-enter_piecewise, on line 285.

Alpha equivalence of terms

Having the ability to test α-equivalence of terms (in the Haskell AST) would simplify writing tests for AST transformations.

hygiene bug in expect

Rob says:

So I have been trying to get normalize to work, and I think I've
stumbled onto a bug. I am trying to fix it, so any help or pointers
is appreciated.

Suppose you have this simple model:

$> cat docs/examples/quux.hk
x <~ normal(0,1)
y <~ normal(x,2)
return (y,x)

Which you normalize as so:

$> normalize docs/examples/quux.hk
weight(recip(integrate x1 from -∞ to fromProb(∞):
          (exp((negate(((x1 + 0.0) ^ 2)) * 0.5)) *
           1.0 *
           recip(natroot((2.0 * pi), 2)) *
           integrate x0 from -∞ to fromProb(∞):
            (exp((negate(((x0 + negate(x)) ^ 2)) * 0.125)) *
             0.5 *
             recip(natroot((2.0 * pi), 2)) *
             1.0))),
   x <~ normal(0.0, 1.0)
   y <~ normal(x, 2.0)
   return (y, x))

But, negate(x) is the wrong, it is suppose to be negate(x1).

Haddock documentation for data constructors of GADTs

At present, Haddock will fail to parse any documentation on data constructors for GADTs (including not-actually-GADTs using the GADT-syntax). This is a long outstanding issue in Haddock, as discussed in the old trac ticket and the ported-to-github ticket. Apparently this will be fixed when GHC 7.12 ships, as discussed in GHC Weekly News for 2015/07/21 and on phabricator.

I'm filing this bug report here (1) to make a public note of this problem so others are aware of it, (2) to have a ticket number to put in comments making note of this issue, and (3) to remind us to fix it once we transition to GHC 7.12

Telling Maple size :: Array -> Nat.

Somehow we need to add this information to our knowledge base, and make use of it in deductions.

Failing Plate tests

These tests don't pass (as of 39cb306), but they should.

Don't be confused by extra iterations at beginning of plate

 Expected result : Plate(n,i,Gaussian(0,2^(1/2)))
 Evaluated result: Weight(2^(-n)*Pi^(-n),Plate(n,i,Bind(Lebesgue(-infinity,infinity),`xx俏`,piecewise(-1 < i,Weight(exp(-1/4*`xx俏`^2)*Pi^(1/2),Ret(`xx俏`)),Msum()))))

The condition -1 < i is not being simplified to true in the context 0 <= i. It isn't exactly clear how binders in Maple produced by Plate interact with Domain (if at all) but the context isn't being correctly passed in this case.

plate size failed

 Expected result : Weight(f(k),Ret(Unit))
 Evaluated result: Bind(Plate(k,c,Uniform(37,42)),`xs俹`,Weight(f(Hakaru:-size(`xs俹`)),Ret(Unit)))

Probably related to the above; but the information about the size of xs would be inserted in a slightly different place.

gmm (currently failing -- due to alpha-equivalence) failed

The results are equal, although not quite alpha-equivalent (more precisely, they are equal up to commutativity of + and *, and sqrt(1/x)=1/sqrt(x), and alpha equivalence).

gmm total (currently failing -- due to alpha-equivalence) failed

Same as above; and also differs by a constant factor inside a product. I'm not sure if these simplifications are very important.

Conjugacy between unrolled symmetric Dirichlet and multinomial

Error, improper op or subscript selector

Investigation needed.

Conjugacy between rolled Dirichlet and multinomial

Error, (in Loop:-unproduct) assertion failed, unproduct: BUG! bijectivity assumed!

At least it's obvious where this one happens, but it isn't clear if this is a regression in KB or a bug in unproduct (as the assertion claims) that is only now being hit.

Error caused by `simplify` attempting to interpret Maple initialization-file output

To reproduce the error:

Go to your Maple initialization file. Find a line/command that begins
libname:=
If that line/command ends with a colon, change it to a semicolon. At your system command prompt enter
echo "0" | simplify -

Expected output:

Actual output:

An error caused by the attempted evaluation of
ToInert(libname:= ...)..
That is, the program attempts to evaluate, with ToInert, any output generated by the cmaple session. The output of an assignment command is an equivalent assignment command with its right side evaluated. This can't be re-evaluated as an expression (in Maple, assignment statements are not expressions), which would be required to pass it to any function, such as ToInert.

Priority:

I'd guess that the priority for fixing this is relatively low as most initialization files have their output suppressed by ending statements with colons.

Possible ways to fix:

This can be fixed with the command-line flags to the cmaple command. The current flags are -q -t. This suppresses most extraneous output. I suggest also using the -B -b and -s flags. Here is the relevant documentation:

The -B option tells Maple that the default system and toolbox libraries should be added to the libname variable used to specify where Maple looks for code repositories. By default -B is implicitly specified. It is most commonly used in addition to * -b* to append to or rearrange the library search order.

The -b option tells Maple that the following argument specifies the pathname of the directory that contains Maple libraries or the full path of the .lib or .mla file of a single Maple library. This initializes the Maple variable libname. By default, libname is initialized with the pathname of the main Maple library (for example, /usr/local/maple/lib). More than one -b option can be specified. When several -b options are used, the first -b option overrides the default libname setting, and subsequent -b options are appended to libname, forming a Maple expression sequence of directory names. The -B option can be used to avoid overriding Maple system and toolbox libraries.

The -s (suppress initialization) option causes Maple to forgo reading initialization files when initiating a session.

For full documentation of Maple's command-line options, issue
echo "help(maple);" | cmaple -

`maple` directory needs to be cleaned up

In particular, we should have at least src, test, doc directories.

We also have at least three different "build systems" in maple: update-archive.mpl, MapleUpdate.hs, and Makefile; maybe these should also should have their own build directory (although if the root dir was much less crowded, these few files there wouldn't be a big deal).

A 'quarantine' directory is also probably needed, where we place things that are totally bit-rotten and unusable in their present state, but we would want to keep (like a bunch of random test files; we'd probably want to extract at least a few tests from those, especially if we were to find tests that failed).

Find a way to reproduce error Ken gets

On 2017-03-03 16:35 , Chung-chieh Shan wrote:

One reason that I classified maple/NewSLOTest.mpl as worthy of attention
is I got the following just now:
 Error, (in assuming) when calling 'simplify/siderels:-simplify/siderels'.
 Received: 'side relations must be polynomials in (name or function) variables'

I tried on karst (which has Maple 2016) and could not reproduce.

Create a gitter room for hakaru

Gitter allows you to create chatrooms linked to specific github repositories allowing you to reference issues etc and have more live discussions.
Once a room is created, it allows you the option to create a pull request to add a badge/link. See the README for https://github.com/suhailshergill/extensible-effects for an example (pull request in question: suhailshergill/extensible-effects#36).

i would be happy to help get this set up, but i believe you need to be part of the organization (hakaru-dev in this case) to do so.

Expect doesn't go under case

When we try to compute the expectation of a case-expression where the scrutinee cannot be evaluated far enough to match some branch (e.g., because the scrutinee is a neutral term), right now we will residualize the call to expect around the whole case expression. What we would much rather do is to push the transformation down into all the branches.

Optimizing powers/roots

In the abstract syntax we distinguish between various mathematically distinct notions of "powers". In particular we currently have:

RealPow :: HProb -> HReal -> HProb
NatPow :: (HSemiring a) => a -> HNat -> a
NatRoot :: (HRadical a) => a -> HNat -> a

There are various laws relating these three operators. For example, given natural numbers m and n we have that RealPow x (m / n) is denotationally and extensionally equivalent to both NatPow (NatRoot x n) m and NatRoot (NatPow x m) n. However, these three expressions are operationally and intensionally distinct.

In general, we want a "strengthening" pass (in Haskell) which converts things from less constrained operators (e.g., RealPow) to extensionally equivalent operations which are more constrained/specific (e.g., NatPow, NatRoot) and hence provide opportunities to be more operationally efficient. In the example above there's some ambiguity about which term to strengthen RealPow into[1], but whenever m == 1 the ambiguity goes away— and at a bare minimum we should handle this case in our strengthening pass. In particular, we should never generate code like 2 ** 0.5 and should always instead generate sqrt 2.

[1] The issue about which to pick is similar to the issue about where to insert coercions between the log-domain and the exp-domain. It can be resolved in favor of one form or the other if we know more about x, m, and n; but in general it could go either way.

simplification bug (negative Weight)

The following Hakaru program
x0 <~ uniform(1, 3)
(match (0 < x0):
true: reject. measure(real)
false: return x0)

will simplify to (in Maple notation)

Weight(-1/2,Uniform(1,0))

and will thus be rejected as an invalid program.

Bug in `categorical` for the C backend

Below is a snippet of C code produced by HKC for the categorical statement for the naive_bayes benchmark, and is generated by this portion of the C backend.

r_h4 = ((((double)rand()) / ((double)RAND_MAX)) * exp(ws_h3));
ws_h3 = log(0);
it_h2 = 0;
while (1)
{
  if ((r_h4 < exp(ws_h3)))
  {
    out_g.weight = 0;
    out_g.sample = (it_h2 - 1);
    break;
  }
  ws_h3 = logSumExp2(ws_h3,_h1.data[it_h2]);
  it_h2++;
}

The problem is that, as implemented, this code causes a segmentation fault by accessing memory outside the valid range of _h1.data. In this example _h1.length == 10 while it_h2 == 9258. Clearly out of bounds, but the error is not observable until we run afoul of memory access protection.

There two potential things wrong with this code which could cause the issue observed here.

Looking at the values in the _h1 array which has type struct array_prob, all the values are ~4400. This is a somewhat non-sensical value for a probability (particularly in log domain). That seems odd to me, but perhaps this is valid for Hakaru. For the sake of categorical, all the elements have more or less the same probability, so it should be picking the values more or less uniformly for this example.
The termination condition r_h4 < exp(ws_h3) will never test true. exp(4000) == inf, and inf < inf is always false. If the values in the _h1 array are valid, then categorical needs to handle these cases. It looks like the implementation of categorical in LogFloatPrelude handles this by dividing every value by the maximum, then sampling from the resulting array.

plate of berns doesn't parse

$ echo 'plate i of 10: bern(0.5)' | pretty
pretty: insertVarSet: variable is already assigned!
CallStack (from HasCallStack):
  error, called at haskell/Language/Hakaru/Syntax/Variable.hs:396:23 in hakaru-0.4.0-6kViTuEdGKS6SwZnYskeOG:Language.Hakaru.Syntax.Variable

@pravnar, does this have to do with your new implementation of primBern by any chance?

Domain.mpl doesn't work without opaquemodules=true

Something about the module structure of Domain is broken, but it isn't clear how to fix it. Domain:-Extract:-Bound uses Domain:-Has:-Bound but without opaquemodules=true, Maple throws the error "Domain does not evaluate to a module". (This isn't the only occurrence of such an error, but it is the first which occurs at runtime). Trying to add the Domain submodules to the uses of other Domain submodules doesn't seem to work.

Maple output for uneither

Should there be a "Right" in here somewhere?
https://github.com/hakaru-dev/hakaru/blob/master/Language/Hakaru/Maple.hs#L98

Failing assertion in Loop due to change in KB

The assertion in question is the one removed by 5227a4f. The discussion there would indicate that we don't really understand why this ASSERT fails. Currently the only affect test case is "Conjugacy between rolled Dirichlet and multinomial".

The failing call is

%assert(i =-1,KB:-KB(KB:-Introduce(i,HInt(Bound(`>=`,0),Bound(`<=`,Hakaru:-size(as)-2))),KB:-Constrain(0< Hakaru:-size(as))))

Recent versions of both master and pw_simp return NotAKB for this call, which causes the ASSERT to fail; what used to be returned is

KB(Constrain(0 <= Hakaru:-size(as)-1), Constrain(0 <= -1), Bound(i, `=`, -1), Introduce(i, HInt(Bound(`>=`, 0), Bound(`<=`, Hakaru:-size(as)-2))), Constrain(0 < Hakaru:-size(as)))

That result doesn't look correct to me; clearly i = -1 and i >= 0 is false. If Loop was relying on getting that result, both Loop and KB were bugged. If this result is actually correct, then one or multiple changes made to KB were incorrect (if this is the case, I need to understand why this result is correct).

Hygiene bug in disintegration

In Tests.Disintegrate, look at the first solution returned by testPerform1b and testPerform1c (also note that testPerform1a doesn't suffer the bug). The first statement emitted for that solution just throws its variable away, and what should be the use sites of that variable now point to a free variable instead.

In particular, this bug has been tracked down to the performWhnf helper function (Language.Hakaru.Disintegrate:372–375). Namely, that function's call to emitMBind is invalid because the expression being emitted contains heap-bound variables. For these tests in particular, that call to emitMBind should actually be acceptable— the heap-bound variables in question are SLet-bound to a variable which has already been emitted, thus we should just perform the variable renaming before emitting (or, have already performed the renaming whenever we emitted those variables).

Parser problem

fn k nat:
  x <~ plate _ of k:
    return 1
  return size(x)

gives
simplify: getTermSing: the impossible happened

which I'm guessing is a parser bug?

Something gets sent to Maple. And it comes back. There is a problem with this "measure", but that should not trigger an 'impossible happened' ?

Failing NewSLO test (rmProg1)

Expected result : Weight(1/2/Pi,Bind(Uniform(3,8),a4,Bind(Uniform(1,4),a5,
                          Weight(<W0>,Ret(Pair(a4,a5))))))
Evaluated result: Weight(1/2/Pi,Bind(Uniform(1,4),a5,Bind(Uniform(3,8),a4,
                          Weight(<W1>,Bind(Gaussian((<W2>,<W3>)^(1/2)),a6,Ret(Pair(a4,a5)))))))

Here we get a multidimensional domain (and successfully solve it) but don't manage to evaluate an integral which becomes a value after solution. Before the Domain code, elim_intsum would have been called on every integral of that solution to potentially perform such an evaluation, but now elim_intsum is called only once, on the top-level integral, after the application of the solution. Domain:-Apply should (optionally) call do_elim_intsum (or something like it) on every integral.

The Binds which should be left in the result end up reversed; Domain:-Apply will only try to respect variable order if no interesting solutions are found. Perhaps it should also try to enforce the existing variable order in solutions, when possible.

Add travis support

Travis is a continuous integration server which offers a free service to allow you to run build tests and have the ability to reflect its state in the repository. For reference see https://github.com/suhailshergill/extensible-effects (note the badge).

below is an outline of a possible plan:

add a build file which captures all the tests/checks we want to run on the CI system. For reference see https://github.com/suhailshergill/extensible-effects/blob/develop/Makefile#L66. Note: this step is optional and is not strictly necessary
add a .travis.yml file which configures environment and invokes the tests which need to be run (either by directly specifying or referencing build file if one exists). For reference see https://github.com/suhailshergill/extensible-effects/blob/develop/.travis.yml
enable build in travis-ci
Profit! ;)

let me know if the above makes sense, and i would be happy to contribute and lend a hand.

Non-implemented simplification

Test t57, namely
fn t real:
(if (t < 1):
return ()
else:
reject. measure(unit)) <|>
(if (t > 0):
return ()
else:
reject. measure(unit))

does not 'simplify' to what is wanted. The crux of the matter is that, in operator notation in Maple,
Indicator(t<1) * m + Indicator(0<t) * m
for any measure m, does not simplify.

Should it? Note that
simplify(piecewise(t<1,1,0) + piecewise(0<t,1,0));
piecewise(t <= 0, 1, t < 1, 2, 1 <= t, 1)
so Maple certainly can do that. However, there is also that
simplify(piecewise(t<1,1,0)_m + piecewise(0<t,1,0)_m);
piecewise(t <= 0, m, t < 1, 2 m, 1 <= t, m)

which turns a data piecewise into a control piecewise, which is likely unwanted.

So, what is the right design?

Function calls cannot take literal pair argument with negative numbers

This is correct:

$ cat p3.hk 
def f(p pair(real,real), q pair(real,real)):
  match p: (a,b): match q: (c,d): normal(a+c,exp(b+d))

bar = (-4.0,-5.0)
f((-2.0,-3.0), bar)
$ ~/.cabal/bin/pretty p3.hk 
f = fn p pair(real, real): 
     fn q pair(real, real): 
      (match p: (a, b): (match q: (c, d): normal((a + c), exp((b + d)))))
bar = (negate(prob2real(4)), negate(prob2real(5)))
f(bar, (negate(prob2real(2)), negate(prob2real(3))))

But inlining bar causes an error:

$ cat p3.hk 
def f(p pair(real,real), q pair(real,real)):
  match p: (a,b): match q: (c,d): normal(a+c,exp(b+d))

f((-2.0,-3.0), (-4.0,-5.0))
$ ~/.cabal/bin/pretty p3.hk 
pretty: makeAST: Passed primitive with wrong number of arguments
CallStack (from HasCallStack):
  error, called at haskell/Language/Hakaru/Parser/SymbolResolve.hs:483:9 in hakaru-0.3.0-GmeKctKnIC9Dllwvz7dSSS:Language.Hakaru.Parser.SymbolResolve

But but! If both -4.0 and -5.0 are made positive, then the error vanishes again:

$ cat p3.hk 
def f(p pair(real,real), q pair(real,real)):
  match p: (a,b): match q: (c,d): normal(a+c,exp(b+d))

f((-2.0,-3.0), (4.0,5.0))
$ ~/.cabal/bin/pretty p3.hk 
f = fn p pair(real, real): 
     fn q pair(real, real): 
      (match p: (a, b): (match q: (c, d): normal((a + c), exp((b + d)))))
f((prob2real(4), prob2real(5)),
  (negate(prob2real(2)), negate(prob2real(3))))

avoid calling buggy `convert/piecewise`

Ken says:

I'm thinking we'd keep the call in KB:-assert_deny and remove the call in Loop:-intssums.

I assigned it to 3 people, whoever gets to it first should close this issue.

summary timing out

For this code summary is timing out, but compile returns a result almost immediately.

Fix `substs`

The previous (now commented out) implementation of substs is buggy. The current implementation calls subst repeatedly to perform the multiple substitution; this is correct, but extremely inefficient.

rescaled Beta

Right now, beta(n,m) has bounds 0..1. This means that if we encounter
a 'rescaled' beta on a..b, we won't recognize it as such. So a
uniform with a weight will be spit out instead.

It probably makes sense to redefine beta to be rescaled, as it will
still be holonomic, and we'll be able to see more betas.

[Related to test t78 failing, as there was some expectation that this would happen]

Opinions?

[Ken replied: Yes!!! ]

Function-call arguments reversed

$ cat p2.hk 
def f(a real, b prob): normal(a,b)

f(2,3)
$ ~/.cabal/bin/pretty p2.hk 
f = fn a real: fn b prob: normal(a, b)
f(nat2prob(3), nat2real(2))

Understand when int_assuming needs called

See details at 65d0dc5#commitcomment-21609611

code in hakaru tutorial (0.1.4) gives suspicious coefficients

I tried the code for fitting curves in the tutorial (http://indiana.edu/~ppaml/HakaruTutorial.html), particularly:

expectations l = map (mean . V.fromList) (transpose l)

poly1d weights x = poly weights x 1
   where poly [] x acc = 0
         poly (w:ws) x acc = w*acc + (poly ws x acc*x)

cubic :: [Double] -> Measure [Double]
cubic x = do
    w <- replicateM 4 $ unconditioned (normal 0 2)
    y <- mapM (conditioned . (\x -> normal (poly1d w x) 1)) x
    return w

burn = 5000

do
   l <- mcmc (cubic x) (map (Just . toDyn . Lebesgue) y)
   let means = expectations $ take 1000 $ drop burn l
   return $ chart (poly1d means) (zip x y)

This is the coefficients I got back:

[3.013056283406498e-2,3.013056283406498e-2,3.013056283406498e-2,3.013056283406498e-2]

They are all the same, regardless of what data and what curve I used.

Did I do something wrong? This is highly....suspicious.

Plate with array passed in as free variable does not simplify

If we take the following program, and try to simplify it.

[zv@t420 hakaru (master)]% cat hello2.hk
fn x array(real):
 θ <~ normal(0,1)
 _ <~ plate i of size(x):
        observe normal(θ, 1) x[i]
 return θ

We get an odd result.

[zv@t420 hakaru (master)]% simplify --debug hello2.hk
Sent to Maple: lam(x, HArray(HReal()), Bind(Gaussian(0, 1), θ, Bind(Plate(size(x), i, Msum(Weight((exp(((-(((idx(x, i) + (-(θ))) ^ 2))) * 1/(((2/1) * ((1/1) ^ 2))))) * 1/((1/1)) * 1/(root(((2/1) * Pi), 2))), Ret(idx(x, i))))), _, Ret(θ))))

Returning from Maple: lam(x,HArray(HReal()),Bind(Gaussian(0,1),`θ丅`,LO(`h七专`,integrate(Msum(
Weight(1/2*exp(-1/2*(idx(x,i)-`θ丅`)^2)*2^(1/2)/Pi^(1/2),Ret(idx(x,i)))),
Integrand(`_不`,applyintegrand(`h七专`,`θ丅`)),[i = 0 .. size(x)-1]))))

Understand where singularities in disint test cases come from, and how to deal with them

Yuriy said:

d4, norm1a/b still have singularities. d4 contains the hardest one, I think - x/y. And also a new test case I added (DisintT/pair_x_x).

The biggest issue is all the singularities which are removed are done so in a fairly hack-y way. Instead of taking the (left and right) limit at that point, it just evaluates. I can't seem to get limit to work (even on very simple expressions).

Improve pretty printing of if statements

This is being split off from @wrengr comment in #27

When encountering case b of { True -> e1; False -> e2 } instead print if_ b e1 e2. N.B., this should be sure to handle all cases where the scrutinee is of HBool type, including when the patterns may be given in the other order, when one pattern may be a wildcard or a variable, etc.
As a special case, when encountering if_ b m reject instead pretty print as guard b; m so as to avoid excessive indentation

Symbolic disintegration produces unevaluated Int (but could evaluate)

In DisintT.mpl, we find

normalFB1 :=
  Bind(Gaussian(0,1), x, 
  Bind(Gaussian(x,1), y, 
  Ret(Pair((y+y)+x, _Unit)))):

with expected answer

normalFB1r := {Weight(1/26*exp(-1/26*t^2)/Pi^(1/2)*13^(1/2)*2^(1/2),Ret(_Unit))}:

but instead we get

Weight(1/4*2^(1/2)/Pi^(1/2)*Int(1/2*2^(1/2)*exp(-13/8*`x乩`^2)/exp(t^2)^(1/8)*exp(3/4*`x乩`*t)/Pi^(1/2),`x乩` = -infinity .. infinity),Ret(_Unit))

These are "equal" (mathematically). But clearly the expected one would be preferred. But this is a design issue: when do we try to compute integrals?

Track down real source of Maple library bug and report

Yuriy writes:

The call which caused the error in "slice sampling Gaussian" was:

coulditbe(0 < u) assuming (u < (1/2)*sqrt(2)*exp(-(1/2)*x^2)/sqrt(Pi));

This ends up calling

simplify( (-ln(2*Pi*K^2))^(1/2), {K = (1/2)*2^(1/2)*exp(-(1/2)*x^2)/Pi^(1/2)} );

(with some name internal to simplify instead of K)

On older versions of Maple, the original call returns FAIL; I don't think anything like the 2nd call to simplify happens at all (but I'm not certain - debugging built-ins is hard). It seems that Maple got a little bit smarter in the latest version; smart enough to try to solve this problem, but not smart enough to actually solve it.

bern -> categorical

Translate primBern (in Haskell) to Categorical rather than Superpose.

Rationale: The invariants for Categorical are easier to establish than MSum. Basically because Categorical involves a data-flow piecewise while MSum involves a control-flow piecewise. So NewSLO will do much more simplifcation on the latter.

Scoping issue in the generated C code

The generated C code produces references to variables not declared in the current scope for the naive bayes example after ANF conversion. I am reasonably certain this is not due to a bug in the ANF pass.

The offending code can be seen at the bottom of this report. Note that the variable _fo is used outside its declared scope. My suspicion is that this is caused by the variable mapping in the CodeGenMonad being stored in a State monad. For the Hakaru AST, where identifiers are not globally unique, the proper definition of CodeGenMonad would thread the variable mapping through as a Reader monad. This prevents shadowing of variable definitions while sequentially traversing the AST.

Here is some sample code which illustrates the issue. Since ANF introduces many let-bindings, this problem is much more likely to occur (or may have to do with how the ANF pass generates new variables).

let x = 10
in (let x = 5 in x) + x -- This x points to ---+ in the variable mapping during code generation
        ^                                      |
        |                                      |
        +--------------------------------------+

I think there are two possible fixes:

Rework the CodeGenMonad to include a Reader layer in the stack and manage the variable mapping through said layer.
Have the C backend require globally unique varIDs. There is a pass in the optimizations branch which does this already, though it is not currently used. Running this pass before code generation fixes this problem.

   for (i18_eu = _es; i18_eu < _et; i18_eu++)
   {
      double _ew;
      unsigned int _ex;
      struct arrayNat arr_ey;
      double _ez;
      unsigned int _f0;
      unsigned int _f1;
      unsigned int _f2;
      unsigned int i13_f3;
      unsigned int acc_f4;
      double p_fy;
      double _fz;
      unsigned int _g0;
      double p_g1;
      double _g2;
      unsigned int _g3;
      unsigned int _g4;
      unsigned int i13_g5;
      double acc_g6;
      double _ga;
      double _gb;
      double _gc;
      arr_ey = w_d;
      _ex = arr_ey.size;
      _f1 = 0;
      _f2 = _ex;
      acc_f4 = 0;
      /* ------------- Summate ------------- */
      for (i13_f3 = _f1; i13_f3 < _f2; i13_f3++)
      {
        unsigned int _f5;
        unsigned int _f6;
        struct arrayNat arr_f7;
        unsigned int _f8;
        struct SUSUV _f9;
        unsigned int _fa;
        unsigned int _fb;
        struct SUSUV _fc;
        unsigned int _fd;
        struct arrayNat arr_fe;
        unsigned int _ff;
        struct SUSUV _fg;
        unsigned int _fh;
        unsigned int _fi;
        struct SUSUV _fj;
        struct SUSUV not_fk;
        unsigned int _fl;
        struct arrayNat arr_fm;
        unsigned int _fn;
        unsigned int _fo;
        struct arrayNat arr_fp;
        unsigned int _fq;
        struct SUSUV _fr;
        unsigned int _fs;
        unsigned int _ft;
        struct SUSUV _fu;
        struct SUSUV _fv;
        struct SUSUV _fw;
        struct SUSUV _fx;
        arr_f7 = doc_e;
        _f8 = i13_f3;
        _f6 = (*(arr_f7.data + _f8));
        _fa = _f6;
        _fb = docUpdate_f;
        _f9.index = (_fa == _fb) ? 0 : 1;
        _fc = _f9;
        if ((_fc.index == 0))
        {
          _f5 = 0;
        }
        if ((_fc.index == 1))
        {
          arr_fe = w_d;
          _ff = i13_f3;
          _fd = (*(arr_fe.data + _ff));
          _fh = _fd;
          _fi = 0;
          _fg.index = (_fh < _fi) ? 0 : 1;
          not_fk = _fg;
          not_fk.index = (not_fk.index == 1) ? 0 : 1;
          arr_fm = doc_e;
          _fn = i13_f3;
          _fl = (*(arr_fm.data + _fn));
          arr_fp = z_c;
          _fq = _fl;
          _fo = (*(arr_fp.data + _fq));
          _fs = i_dv;
          _ft = _fo;
          _fr.index = (_fs == _ft) ? 0 : 1;
          _fv = _fj;
          _fw = _fr;
          _fu.index = (!(_fw.index == _fv.index));
          _fx = _fu;
          if ((_fx.index == 0))
          {
            _f5 = 1;
          }
          if ((_fx.index == 1))
          {
            _f5 = 0;
          }
        }
        acc_f4 += _f5;
      }
      _f0 = acc_f4;
      p_fy = log1p((_f0 - 1));
      _ez = p_fy;
      _g0 = i18_eu;
      p_g1 = log1p((_g0 - 1));
      _fz = p_g1;
      _g3 = 0;
      _g4 = word_prior_size_j;
      acc_g6 = log(0);
      /* ------------- Summate ------------- */
      for (i13_g5 = _g3; i13_g5 < _g4; i13_g5++)
      {
        double _g7;
        struct arrayProb arr_g8;
        unsigned int _g9;
        arr_g8 = word_prior_b;
        _g9 = i13_g5;
        _g7 = (*(arr_g8.data + _g9));
        acc_g6 = logSumExp2(acc_g6,_g7);
      }
      _g2 = acc_g6;
      _ga = _fo;
      _gb = _fz;
      _gc = i13_g5;
      _ew = logSumExp3(_ga,_gb,_gc);
      acc_ev += _ew;
   }

Program slow to simplify

The following program is slow to simplify

x5 = x6 = fn x11 pair(prob, prob):
           (1 *
            1 *
            (match x11:
              (x27, x28):
               x29 = prob2real(x27)
               (match (not((x29 < 3)) && not((8 < x29))):
                 true:
                  (recip(real2prob((8 + negate(3)))) *
                   x30 = prob2real(x28)
                   (match (not((x30 < 1)) && not((4 < x30))):
                     true:
                      (recip(real2prob((4 + negate(1)))) *
                       x19_ = prob2real(x27)
                       integrate x32 from negate(prob2real(∞)) to prob2real(∞):
                        (exp((negate(((x32 + negate(nat2real(0))) ^ 2)) *
                              recip(prob2real((2 * (real2prob(x19_) ^ 2)))))) *
                         recip(real2prob(x19_)) *
                         recip(natroot((2 * pi), 2)) *
                         x21_ = prob2real(x28)
                         x22_ = real2prob(x21_)
                         ((exp((negate(((nat2real(17) + negate(x32)) ^ 2)) *
                                recip(prob2real((nat2prob(2) * (x22_ ^ 2)))))) *
                           recip(x22_)) *
                          recip(natroot((nat2prob(2) * pi), 2)) *
                          integrate x35 from negate(prob2real(∞)) to prob2real(∞):
                           (exp((negate(((x35 + negate(x32)) ^ 2)) *
                                 recip(prob2real((2 * (real2prob(x19_) ^ 2)))))) *
                            recip(real2prob(x19_)) *
                            recip(natroot((2 * pi), 2)) *
                            x24_ = real2prob(x21_)
                            ((exp((negate(((nat2real(13) + negate(x35)) ^ 2)) *
                                   recip(prob2real((nat2prob(2) * (x24_ ^ 2)))))) *
                              recip(x24_)) *
                             recip(natroot((nat2prob(2) * pi), 2)) *
                             1)))))
                     false: 0))
                 false: 0)
              _: 0))
     fn x5 pair(prob, prob):
      x0 <~ (match x5:
              (noiseT, noiseE):
               weight(1/2,
                      noiseT_ <~ uniform(nat2real(3), nat2real(8))
                      return (real2prob(noiseT_), noiseE)) <|>
               weight(1/2,
                      noiseE_ <~ uniform(nat2real(1), nat2real(4))
                      return (noiseT, real2prob(noiseE_))))
      return (x0, (x6(x0) * recip(x6(x5))))
fn x4 pair(prob, prob):
 x3 <~ x5(x4)
 (match x3:
   (x1, x2):
    x0 <~ weight(min(1, x2), return true) <|>
          weight(real2prob((prob2real(1) + negate(prob2real(min(1, x2))))),
                 return false)
    return (match x0:
             true: x1
             false: x4))

In particular it takes about 20 seconds

[zv@t420 hakaru (master)]% time simplify slow.hk > /dev/null                                                                                                                                                                                                            
simplify slow.hk > /dev/null  19.53s user 5.78s system 66% cpu 37.789 total

Hashable instances for AST types

Hashable instances for the Hakaru AST types would allow the use of fast associative containers for AST transformations (say, CSE). This would probably have to work on the full AST in order to support equivalence of terms like summate. Perhaps GHC can derive much of the implementation though?

Program with match needs type annotation

The following program needs a type annotation:

x <~ return (1,2)
return (match x:
         pair(_,_): 3).nat

That seems odd, and unnecessary.

improve Sing :: Symbol -> *

The singletons package has an implementation of Sing :: Symbol -> * which avoids the need for storying a Proxy (cf). We'd like to do that too.

The main sticking point is that the JmEq1 instance currently uses TL.sameSymbol which hard-codes needing a Proxy rather than being polymorphic over any type constructor (à la TL.symbolVal). TL.sameSymbol is implemented via TL.symbolVal and unsafeCoerce, so we can simply inline the definition if we don't mind maintaining the use of unsafeCoerce against future changes. The alternative is to send a patch upstream for liberalizing the type of TL.sameSymbol, and then waiting for new releases of GHC.

Thoughts?