metadsl / metadsl Goto Github PK

View Code? Open in Web Editor NEW

97.0 10.0 9.0 5.97 MB

Domain Specific Languages in Python

Home Page: https://metadsl.readthedocs.io/en/latest/

Python 2.29% Jupyter Notebook 97.51% TypeScript 0.20% Makefile 0.01%

python dsl code-generation intermediate-representation

metadsl's Introduction

`metadsl`

This repo is deprecated and we recommend using egglog-python instead.

metadsl is an exploration of how we can create deeply embedded domain specific languages in Python, in a way that is type safe (plays nicely with Mypy) and feels ergonomic to Python developers. It is meant to be a building block for other libraries to create their own domain specific languages and compile them to different forms.

Current Status

It currently is not being actively supported and is in need of a refactor to:

Refactor the type analysis to stop relying on Python's built in type objects and use its own instead.
Make "abstractions" (aka functions) a first class citizen, instead of the current approach of embedding them as data.

It also needs a compelling downstream user to help drive further development.

If you are interested in either of these topics, feel free to open an issue or reach out directly.

Support

The only reason this projects exists was due to the funding and support from Quansight Labs, from 2018 till 2020.

metadsl's People

Contributors

Stargazers

Watchers

Forkers

prasunanand hello-anmol melissawm vishalbelsare andrewfulton9 saulshanabrook andreacortis szhorvat

metadsl's Issues

Improve printing experience

Add nicer str methods

Track bounds on natural numbers

At the core of the uarray AST is natural numbers. In particular, bounded natural numbers. i.e. when we index an array we know that the indices are within a certain range. If we track this range through the program, we should be able optimize certain instructions.

For example, this would let us simplify the form generated by something like Take(2, Concat(A, B)). Here, the Concat will turn into a conditional indexing function, then if we Take this limits the shape and so the bounds of this function change. therefore, if the outer dimension of A is greater than 2, we can eliminate this conditional, since we know it won't index into B ever. If we had bounds on all the naturals, then we add a replacement for something like a > b where the lower bound of a is greater than the upper bound b we can replace with True.

It would also let us possible eliminate excess floor division and modulo calls that show up a ton in reshape operations

I don't have much experience with this sort of bounded arithmetic. I wonder if SymPy's indexed module could be used to do this for us? (#66)

SymPy Integration

SymPy is an existing CAS system that could be helpful to uarray to re-use some of it's features.

Also, if uarray was integrated as a module in SymPy it would help it have larger distribution.

Create notebook showing why we need another type system

Add combinators to metadsl tutorial

NumPy transforms

translate mul to array_ufunc
Infer shape and dtype over time

Figure out intersection types

Right now we don't have a good way of typing intersections. This comes up with Unbound, but also with Initializable.

For example, we want a type like this:

class CInitializable(Category, typing.Generic[INNER_TYPE]):
    pass

however where the INNER_TYPE is also part of the type.

For example, things can be two things in different contexts. In one context they are initializable and an array.

The designated mypy solution for this is a subclass that inherits both the classes.

Need to investigate whether this works or not

NumPy Notes

Here are some quotes I am collecting from the NumPy mailing list:

The challenge for array_function is that because it's a simple black-box dispatch system wrapped around the whole library, there's no simple way for third-parties to reuse numpy's code, which creates challenges for compatibility, cost, maintenance, etc. Maybe that will turn out to be a showstopper, maybe not – we can each make our guesses, but since we're trying the experiment then we'll know soon enough :-).
If it does turn out to be a showstopper, then the question will become: how do we provide finer-grained protocols that are deeply integrated into numpy's semantics? This can help address those questions above, because (a) it's a narrower set of APIs for implementors to worry about, so implementing will require less resources, (b) they're more precisely defined, so getting the details right is easier, (c) you get more re-use of numpy code in between calls to the protocols, so you automatically get numpy's bug fixes.
But... doing this is hard because it really requires us to dig into the details of numpy's semantics. You mention a protocol for ufuncs – we already have that? And it took years, and an immense amount of discussion, because the integration details were genuinely super complicated (e.g., the famous thread about how to handle dispatch when there were both add and array_ufunc methods on the same object). Just saying "we'll use multimethods" doesn't tell you how + and np.add should interoperate.

Some use cases:

np.concatenate([1, 2], [3, 4]) dispatch to other backends

Benchmarks

Pythran has a bunch of benchmarks that seem like a useful subset of numpy functionality: https://github.com/serge-sans-paille/numpy-benchmarks/tree/master/benchmarks

Pyodide is using them and xtensor has also used them, so we could compare to a pretty broard range of backends if we also did.

Remove MoA dependency for NumPy API

Currently, the lazy ndarray implementation maps to MoA operations, like outer product and binary operations.

Instead, we should have the lazy ndarray implementation just create a graph that maps more directly to the numpy calls.

Then we can separately define mappings between those operations and MoA

Add some initial notebooks to show dispatching

Awkward array (arbitrary nested data)

awkward-array (API: https://github.com/scikit-hep/awkward-array/blob/master/binder/array-types.ipynb)

Remove all existing code

Try passing into scipy.optimize.differential_evolution

The idea here is to take a scipy function like scipy.optimize.differential_evolution and be able to execute it on different hardware.

If the hardware is still immediate, then we could do this possibly by passing in a CuPy array or something like that.

The more general case, however, is to first build up an expression graph for the computation, as written, then choose how to execute it. This would be useful if we wanna explore any sort of MoA optimizations or translate to a non eager form like Tensorflow. Basically by not doing things eagerly, we can optimize in ways that are impossible.

For example, this is what Numba does. It takes a whole function and optimizes it as a whole. And you can get much better performance this way.

https://github.com/scipy/scipy/blob/v1.2.1/scipy/optimize/_differentialevolution.py#L649

Probabilistic Programming

Similar API to pymc4 (https://sharanry.github.io/post/eight-schools-model/) but not tied to tensorflow (can run on TF, numpy, pytorch)

NumPy API without NumPy Dependency

A good trial to see if the type system of the metadsl system makes sense is to create a module that mocks out the numpy API, but all calls create expressions instead of executing.

Add initial project setup

pyproject.toml
README.md

Add initial dispatching code

Use linnea to generate LAPACK calls

https://github.com/HPAC/linnea

Would need to define new backend for it that isn't julia, or call out to julia directly.

If we could target LLVM with that, then we could use LLVMlite to build and execute

Roadmap notes at PyCon

I brainstormed some initial use cases with the help of @mattip:

inferring shape and dtype of NP computation without computing
Compiling NP API to pytorch codAPI
Use for numba array optimization subpart
produce stuff like numexpr test suite from expressions instead of string
- this would support being frontend for a simplified graph for dask
Generate python ast of numpy
output C as the backend
cython guy multipass compiling
- He would provide c backend
- also similar to autograph work with google, parsing Python like syntax and creating graph

I think an initial MVP would be the "produce stuff like numexpr test suite from expressions instead of string" one. A roadmap to get there could look like:

NP Frontend:

Finish implementing and testing wrapping logic
Add another layer of "compat' numpy API that has deferrs to wrapped one but adds special signatures
Actually implement all of numpy API

Optimizations:

Build out abstraction types and typed lambda calculus execution
Show how to build MoA on this
Move existing MoA logic here

Backend:

Add replacements to python AST
support all numpy operations in this replacement
Verify this works with Numba as compiler

Add docs

We could use nbsphinx to turn some of our test notebooks into docs as well: https://nbsphinx.readthedocs.io/en/0.3.5/

Figure out how to get RTD to use flit install to get dependencies
Add more docstrings to clarify parts of the project

Einsum notation

As we are thinking about translating NumPy-ish API to MoA forms to reduce, one of the most general that comes up is the einsum notation and function (https://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html#numpy.einsum) (Note that tensorflow's einsum supports a subset of this form https://www.tensorflow.org/api_docs/python/tf/einsum) AFAIK It is the superset of the np.inner and np.tensordot, so if we can reduce any form of einsum, then we can reduce those as well. Also, we would then be able to used opt_einsum to optimize the einsum computation first, and use uarray/moa as the backend (https://optimized-einsum.readthedocs.io/en/latest/backends.html) All we need to do is be able to get the shapes of arrays and define einsum to plug it into that library.

Internal xarray usage

Use for lazy operations to optimize pydata/xarray#2298 pydata/xarray#2302

Implement new type system and translate to instances

Rework python AST creation with conversions/promotions

A _repr_html_ for a matchpy expression.

In the web view, a pure html representation might be better for the matchpy...Expressions

 import pprint, black, vdom, matchpy, IPython; IPython.get_ipython().display_formatter.formatters['text/html'].for_type(
        matchpy.expressions.expressions.Expression, lambda str: vdom.pre(vdom.code((black).format_str(pprint.pformat(str), 100)))._repr_html_()
    )

This snippet is meant to be registered in a notebook to demonstrate the idea.

Generate code for TPU

https://www.analyticsindiamag.com/julia-users-can-now-rejoice-google-cloud-has-powerful-capabilities-to-support-the-language/

Infer args and return value

Instead of having to manually specify the args and return value of expressions
and replacements, we can inspect them from the type signature.

Note: After I do this, link to it from python/typing#606 so that others can see this approach.

Internal numba usage

support array optimizations on array like objects (source: conversation w/ @sklam)
parallelization of array expressions numba/numba#2432

Optimizing np.where and np.select

This is motivated by sympy/sympy#15014.

Take something like np.select, or if you prefer, nested np.where (select is not a ufunc).

np.where(cond, x, y) takes a boolean array cond and chooses x where it is True and y where it is False.

The issue is if x and y are not arrays but expressions. For instance, np.where(x <= 0, 0, np.log(x)). This will evaluate np.log on every element of x, when it only should be evaluated on the positive elements. This is especially problematic if x is a non-NumPy scalar, but even if it is an array, it is wasteful.

A more efficient way is to use a mask

>>> x = np.array([-1, 0, 1])
>>> cond = x <= 0
>>> out = np.empty(x.shape)
>>> out[cond] = 0
>>> out[~cond] = np.log(x[~cond])
>>> out
array([0., 0., 0.])

If there is more than one variable, you need to use np.broadcast. For nested np.where (or select), you should chain the conditions (see the SymPy issue for an example).

FEniCS Overlap

Looks like there is another project we should look into for inspiration/collaboration.

https://fenicsproject.org/

FEniCS is a popular open-source (LGPLv3) computing platform for solving partial differential equations (PDEs). FEniCS enables users to quickly translate scientific models into efficient finite element code. With the high-level Python and C++ interfaces to FEniCS, it is easy to get started, but FEniCS offers also powerful capabilities for more experienced programmers. FEniCS runs on a multitude of platforms ranging from laptops to high-performance clusters.

In their recent paper "TSFC: a structure-preserving form compiler", they outline a process similar to ours. In particular, I am interested in how their internal forms differ from ours. Here is an overview of that part from their paper:

And then later down skipping some details I don't understand about PDEs:

This is actually pretty cool, because the idea of "free indices" is one I have to wrestle with in the compilation currently. The basic concept in our system is that we start with a lambda calculus like form of length and indexing. We call this a Sequence, because it corresponds to the Python idea of a sequence. So we start with a form that is a Sequence that has a function, that should return another Sequence or a Scalar. How do we actually compile this?

Well let's look at the code that turns the a Sequence into some AST statements that creates a NumPy array: https://github.com/Quansight-Labs/uarray/blob/ec8be30c7ec0097989661ec8d8d958d56e9fc45f/uarray/uarray/ast.py#L208-L284

We are basically generating code that looks like this:

# only create array if `alloc` is true
array = np.array(shape)
for i in range(length):
     result = getitem(i)
     array[i] = result

We are calling the getitem function that defines the sequence with an AST form that represents the index i. So we now have some "Free indices".

Let me give a more explicit example, that isn't tied to code generation. This notebook shows how we take an expression equivalent to np.outer(np.arange(10), np.arange(5)) and index it with two unbound variables, i and j. Now we effectively have two free indices on this form.

The free indices idea seems very related to implemented einsum generally, because einsum is really about associating names to the dimensions of multiple arrays, and the result array, and the algorithm just falls out of that (along with choosing your addition and multiplication operations).

Understand application of this type of system for other fields in Python

See if this would be useful for filesystem spec: https://github.com/martindurant/filesystem_spec

Instead of objects we care about here being arrays, they would be filesystems. But same ideas would apply. Instead of using python subclassing, we use replacements. Would let you have "generic" and backend dependent api's, so that you aren't force to cater to lower common denominator of API.

Also look into how they are already handling these issues

Look into fixed point forms

After reading http://conal.net/papers/essence-of-ad/ I wonder if we can convert our functional forms into fixed point. aka instead of having lambda functions turn them all into composition and other higher level forms.

Why? Not sure, does this get us AD? Need to review that paper more to understand crux of requirements.

Add internal type support

Including ability to infer dtype from numpy

Figure out what type system is relevent

I think it is system F or system F omega.

Internal Dask usage

It would be great to understand the Dask use cases for high level optimization well enough to create some preliminary integration that addresses them. See dask/dask#4038

Add equality function

Status Updates

I am opening up this issue to serve as an internally focused blog of development updates for my work. I have been writing up daily notes when I work, but I have been just saving them in text files locally.

I was thinking of publishing them here as comments instead, to give more visibility into my work.

If you find the updates here unhelpful, please let me know and I can move back to keeping notes just internally.

Change omega semantics

Right now it's backward of MoA, i.e. you specify dimensionality from left instead of from right.

Datatable (dataframes)

datatable (API: h2oai/datatable#696)

Allow creating expressions with mutable objects

We need to be able to create expressions with mutable objects. For example, this should work:

import metadsl.nonumpy.compat as nonp
import numpy as np

# First we have some regular numpy array
x = np.arange(10)

# Then we have one of our numpy arrays
y = nonp.arange(10)

# Now we add them. We should have an expression tree with the
# real numpy array as one of the leaves
z = y + x

# this works, but we cannot take the hash of z now
# This raises an error
hash(z)

Why do we need to be able to hash expressions? Because we wanna be able to cache results based on their hash, which is more expensive if we only have equality not hash (see "Example 3: When a == b does not imply hash(a) == hash(b)" in "What happens when you mess with hashing in Python").

So on face value, it would seem useful if the hash of these non-hashable leaves was based on their id. This would seem to work fine, as long as these objects are not mutated in the course of us using the expressions. Which seems like a fine assumption.

One idea would just be to wrap each unhashable leaf in hashable container that uses the id of the inside node as the hash.

Differentiation

We should be able to compute derivatives of array expression wrt another array expression.

Prior work:

Javascript / Web Assembly / WebGL

We should be able to directly compile abstract array expressions into Javascript, Web Assembly, and/or WebGL.

@mdboom gave a wonderful talk about pyiodide and showed some performance comparisons:

This is using (I assume) emscripten to translate the NumPy C code to webassembly and calling it. If instead, we create an abstract array expression in these benchmarks and then have a webassembly backend for uarray, we could translate the semantics of the algorithms more directly and hopefully get a speedup over translating the C code.

Everything in pythran benchmarks

Pythran has a nice benchmark suite (https://github.com/Quansight-Labs/uarray/issues/50). We should support all algorithms used in those benchmarks so that we can use them.

Document basic example for metadsl

Create specification for meta level

I should answer questions like:

Which expression forms can I compose with each other?

What is Call?

What are the forms that should result from each "step" of compilation?

Can we use Python's typing for this?

Yes pretty much works, we could add parent class for Array and Contents and Callable, but how do we type Call operation? It's "type" is the type of what it returns. Can we do this with typing right now? Can we use __call__ protocol for this?

Side issue with all this stuff is can we still use it at runtime? In JupyterLab or other environment's where Mypy is not as available, we still want to verify things.