mikera / core.matrix Goto Github PK

View Code? Open in Web Editor NEW

700.0 700.0 113.0 5.03 MB

core.matrix : Multi-dimensional array programming API for Clojure

License: Other

Clojure 99.36% Java 0.53% HTML 0.03% Dockerfile 0.07%

core.matrix's Introduction

mikera

Mike's general purpose Java library

Lots of good stuff inside including:

Maths functions
A set of persistent, immutable data structures
Bitwise operations and tools
Resource handling utility functions
GUI utilities
Sound utilities
Random number generation and algorithms (also available via the Randomz library - https://github.com/mikera/randomz)

Status

core.matrix's People

Contributors

Stargazers

Watchers

Forkers

heffalump siscia mjwillson jolby clojure-numerics tgoossens durka orbitfold clojens munk si14 mrwhelan astanin adereth john-poplett wyegelwel airwoz devn tim-brooks luxbock poppingtonic anthgur whitespace mschuene haosdent drone29a dvdnglnd procr prasant94 gerrrr mobileink daslu agibsonccc rosejn bmaddy dmarjenburgh hokkaido naivecoder jafingerhut metasoarous andreacrotti japonophile nblumoe tranchis quantisan hendekagon scotttlin skrat mthomure gleenn fyquah gnomix radicalzephyr edwastone keithschulze kidaa mcanthony swamwithturtles dmh43 kephale nberger avgerin0s mkess theaverageguy cavhack paultopia weaver-viii shark8me tobias ds923y jlowenz rowhit ulsa blaisdellk chunsj pdenno ashleylester plumpmath cw-delli-bird elisehuard mars0i piokuc wangxiaofei558 xlisp cailuno joinr pcwerk zhanghuabin clojure-data-science-course jjttjj turlando tabidots mfikes faizbshah vvvvalvalval unwarysage clojure-land adrianparvino brianchevalier pedrorgirardi

core.matrix's Issues

Broken emap for NDArray?

emap appears to incorrectly iterate too deep when an NDArray contains Clojure vectors, e.g.

(def m (new-array :ndarray [3 3])) ;; create a 3x3 NDarray
(def my-vectors (for [i (range 9)] [i (inc i)])) ;; a sequence of 9 vectors
(assign-array! m (object-array my-vectors)) ;; assign all the elements using an array

(emap count m) ;; should apply count to each of the 9 vectors
UnsupportedOperationException count not supported on this type: Long  clojure.lang.RT.countFrom (RT.java:545)

This could be a problem of NDArray depending on the persistent vector implementation, for which nested vectors are a known limitation?

Can't register an implementation without PConversion

Here is a file/revision in question: https://github.com/si14/matrix-api/blob/70b376f58ec3846df6622b971001c3ade32d0725/src/main/clojure/clojure/core/matrix/impl/ndarray.clj#L283

When trying to run

(imp/register-implementation (empty-ndarray [1]))

java.lang.NullPointerException: null
 at clojure.lang.RT.alength (RT.java:2120)
    clojure.core.matrix.impl.wrappers.NDWrapper.dimensionality (wrappers.clj:248)
    clojure.core.matrix.protocols$persistent_vector_coerce.invoke (protocols.clj:458)
    clojure.core.matrix.impl.wrappers.NDWrapper.toString (wrappers.clj:285)
    clojure.core$str.invoke (core.clj:497)
    clojure.core/fn (core_print.clj:94)
    clojure.lang.MultiFn.invoke (MultiFn.java:167)
    clojure.core$pr_on.invoke (core.clj:3266)
    clojure.core$print_map$fn__5292.invoke (core_print.clj:197)
    clojure.core$print_sequential.invoke (core_print.clj:58)
    clojure.core$print_map.invoke (core_print.clj:200)
    clojure.core/fn (core_print.clj:204)
    clojure.lang.MultiFn.invoke (MultiFn.java:167)
    clojure.core$pr_on.invoke (core.clj:3266)
    clojure.core$pr.invoke (core.clj:3278)
    clojure.tools.nrepl.middleware.pr_values$pr_values$fn$reify__531$fn__533.invoke (pr_values.clj:23)
    clojure.tools.nrepl.middleware.pr_values$pr_values$fn$reify__531.send (pr_values.clj:23)
    clojure.tools.nrepl.middleware.interruptible_eval$evaluate$fn__547$fn__558.invoke (interruptible_eval.clj:67)
    clojure.main$repl$read_eval_print__6405.invoke (main.clj:246)
    clojure.main$repl$fn__6410.invoke (main.clj:266)
    clojure.main$repl.doInvoke (main.clj:266)
    clojure.lang.RestFn.invoke (RestFn.java:1096)
    clojure.tools.nrepl.middleware.interruptible_eval$evaluate$fn__547.invoke (interruptible_eval.clj:56)
    clojure.lang.AFn.applyToHelper (AFn.java:159)
    clojure.lang.AFn.applyTo (AFn.java:151)
    clojure.core$apply.invoke (core.clj:601)
    clojure.core$with_bindings_STAR_.doInvoke (core.clj:1771)
    clojure.lang.RestFn.invoke (RestFn.java:425)
    clojure.tools.nrepl.middleware.interruptible_eval$evaluate.invoke (interruptible_eval.clj:41)
    clojure.tools.nrepl.middleware.interruptible_eval$interruptible_eval$fn__588$fn__590.invoke (interruptible_eval.clj:171)
    clojure.core$comp$fn__4034.invoke (core.clj:2278)
    clojure.tools.nrepl.middleware.interruptible_eval$run_next$fn__581.invoke (interruptible_eval.clj:138)
    clojure.lang.AFn.run (AFn.java:24)
    java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1110)
    java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:603)
    java.lang.Thread.run (Thread.java:722)

The error doesn't arise when PConversion is present. PConversion isn't marked as mandatory in protocols.clj, so I'm wondering if this is a bug.

Add a generic implementation of matrix determinant

We don't currently have a default implementation for "determinant".

We should ideally have something here that works, even if it is a bit naive / slow.

push 0.0.2-snapshot to clojars?

Clatrix now has the basic protocols implemented, but we need matrix-api snapshot in clojars to pass travis test

Here's the branch of Clatrix
https://github.com/tel/clatrix/tree/matrix-api

length not working

It seems to me that the length implementation is not found correctly.

(use 'core.matrix)
(length [1 1])

Returns an IllegalArgumentException:

java.lang.IllegalArgumentException: No implementation of method: :length of protocol: #'clojure.core.matrix.protocols/PVectorOps found for class: clojure.lang.PersistentVector

I don't fully understand how the protocols of the matrix-api work yet, maybe you can point me in the right direction?

Thank you!

Create "distance" function for distance between two vectors

Rationale:

Although easy to implement, probably useful enough to be in the core API
There is a possibility for optimised implementations (better than subtracting and taking magnitude)

Eliminate PersistentVector's implementations usage from default.clj

This is a long-term goal, but I think generally it's better to do this.

For now we reuse PersistentVector's implementations in many places in default.clj, coercing a value to persistent vector, performing an operation and coercing it back. This causes slowness (like in 5714695) and bugs (like in #51). We can avoid this in 2 ways:

use NDArray instead of vectors, as it is presumably faster;
implement defaults using mandatory protocols on arguments.

This options does not contradict with each other, so we can use them both.

Add set-0d to PZeroDimensionAccess

I think we should add add-0d (non-mutative one) to PZeroDimensionAccess for consistency.

Submatrix view construction

It should be possible with the NDArrayView to create a "submatrix" of an existing matrix, i.e. a rectangular view over a subset of any other matrix.

It should works as a "view", i.e. modifying the view matrix will modify the underlying data (assuming the source matrix is mutable)

Needs some functions / protocols in the main API to create these. Probably something like

(sub-matrix m [[0 2] [2 4]])
=> 2x2 matrix starting at [0,2] in the original matrix m

Question: Should we use [start, end] or [start,length] to specify index ranges?

Possible: Rename dimensionality to size

This is a more common name, and shorter ! Rename it ?

https://github.com/mikera/matrix-api/blob/master/src/main/clojure/core/matrix.clj#L78

Make Maven builds do AOT compilation

We should AOT compile the core.matrix library for users

unable to connect to project via lein

I'm unable to connect to the matrix-api project via lein. I've updated my local repository, as well as lein.

When I do lein check I get the following error:

Exception in thread "main" java.io.FileNotFoundException: Could not locate clojure/core/matrix/compliance_tester__init.class or clojure/core/matrix/compliance_tester.clj on classpath:

When I connect to the project in the repl with lein repl (in the project root) and try to use clojure.core.matrix this is what happens:

user=> (use 'clojure.core.matrix)
FileNotFoundException Could not locate clojure/core/matrix__init.class or clojure/core/matrix.clj on classpath:   clojure.lang.RT.load (RT.java:432)

Maybe the problem is that the files are located in src/main/clojure/clojure/core/matrix rather than src/main/clojure/core/matrix?

Mechanism to load matrix implementations

We could give users an easy way to load a specific matrix implementation.

I'm thinking something like:

(use 'core.matrix)

(use-matrix-implementation :jblas)

Mechanism would need to do a few things:

look up the implementation in a list of known implementations
require the implementation to ensure it is loaded. In case of failure, do something sane (e.g. give some instructions on how to get the dependency on the classpath)
possibly "hook" into some core.matrix functions, e.g. functions to construct new matrices should produce JBLAS matrices by default

N-dimensional pretty printing

Follow on from #15

Pretty-printing of different (non-2D) array sizes currently fails with errors like:

java.lang.UnsupportedOperationException: nth not supported on this type:      ScalarWrapper
    at clojure.lang.RT.nthFrom(RT.java:846) 
    at clojure.lang.RT.nth(RT.java:796)
    at clojure.core.matrix$pm.invoke(matrix.clj:999)

pm needs extending to handle these cases

Implement core.matrix support for 2D arrays of primitive doubles

We have 1D arrays of Java doubles supported:

https://github.com/mikera/matrix-api/blob/master/src/main/clojure/core/matrix/impl/double_array.clj

It would potentially be valuable to extend this support to 2D arrays of doubles along the same lines. This is useful for 2 main reasons:

It would provide a fast. lightweight 2D mutable matrix implementation
It may be helpful for interop with Java libraries that use this format

NDArray loading time

Currently core.matrix loading NDArray loading is extremely slow, sometimes as much as 20secs.

This isn't acceptable for general purpose use of core.matrix, so for the moment I've made NDArray optional so that it loads lazily, see this commit:

9f1cfa4

With this commit, the NDArray initialisation time is only incurrened the first time an NDArray is used, e.g.

(time (array :ndarray 1))
"Elapsed time: 17640.514591 msecs"

(time (array :ndarray 1))
"Elapsed time: 0.118685 msecs"

Complete PAssignment implementation for NDarray

This protocol is important for performance, as it performs the general purpose mutable assignment required for many vector/matrix algorithms.

Make NDArray into default implementation

Currently the core.marix default implementation is :persistent-vector

We should switch this to :ndarray once we are comfortable with the robustness of the NDArray implementation.

Dmitry - important milestone for you during GSoC, I think!

Add API functions for standard matrix decompositions

We should have API functions for most standard matrix decompositions.

In particular we should have:

LU (lower upper trinagular decomposition)
SVD (singular value decomposition)
QR
Cholesky distribution
probably a few others.....

Naming is TDB, we have various proposals:

lu
decompose-lu
lu-decompose
decomposition-lu
lu-decomposition

Need to decide on a consistent and logical naming scheme before we commit to a public API.

Also need to define return values. Current thinking is a map (or defrecord?) that contains clearly labeller values e.g.

(decompose-lu matrix)
=> {lower: [matrix1]
      :upper: [matrix 2]
      :determinant [double value]

See:

https://groups.google.com/forum/?fromgroups=#!topic/numerical-clojure/DN8aD__q5BU

Add elementary row operations

Add the elementary row operations. (switching, multiplication, and row addition; see http://en.wikipedia.org/wiki/Elementary_matrix).

Terminology: use 'array' for general case, 'matrix' only for 2D arrays?

Hi all

So I'm a bit confused at present looking at PImplementation and PDimensionInfo. They're addressing the general NDArray case (great!), but are using the term "matrix" not just for a (2D) matrix, but also for vectors and for general ND arrays.

I suggest we follow tradition in referring to the general case as an 'array' (equally happy with 'ndarray' as in numpy, 'tensor' as in matlab, or similar), and use 'matrix' only for the 2D case. I've just never heard anyone refer to a non-2-dimensional array as a matrix.

Also the "matrix" terminology adds extra unnecessary confusion around the distinction between a 1D array, and a 2D matrix whose shape is 1xN (row matrix?) or Nx1 (column matrix?).

I'd like to suggest that referring to a 'row vector' or 'column vector' is a bad idea because of the potential for confusion about what it means (a 1D vector, or a 2D matrix whose shape is 1xN or Nx1 ?). For reasons discussed on the mailing list, in a general NDArray framework it's preferable not to have special cases for 1D arrays identifying them with 2D matrices, just use a consistent broadcasting rule for all array shapes. A 1D array doesn't really have any natural orientation as a row or a column in a matrix context, at least not unless the chosen broadcasting rule makes it so.

Suggest using just 'vector' for the first case, and 'column matrix' or 'row matrix' for the latter, to make this more explicit.

Happy to put a patch together if people agree.

NDArray add-product! not working for simple case

(add-product! (array [1]) (array [2]) (array [3]))
=> ArrayIndexOutOfBoundsException 1  clojure.core.matrix.impl.ndarray.NDArray (ndarray.clj:1169)

Lazy instantiation of specialised NDArray types

Currently NDArray is pre-defined with specific set of NDArray specialised types (currently Object and double)

Ideally, it should be possible to create (at runtime if necessary) a new specialised NDArray for any arbitrary type.

This would give several advantages:

It makes the NDArray useful for custom specialised types, e.g. Complex
Users can get better performance for their specific types
It makes it possible to initialise NDArrays for proimitive types like byte if needed
It would also be possible to lazy-load Object and double, to reduce start up time issues / AOT compilation requirement

Matrix inverse should be supported

I also couldn't find the function for the inverse of a matrix.

Reflection warnings in NDArray

The current NDArray implementation has some reflection warnings whenever mvn test is run:

Reflection warning, clojure/core/matrix/impl/ndarray.clj:123 - reference to field data can't be resolved.
Reflection warning, clojure/core/matrix/impl/ndarray.clj:124 - reference to field ndims can't be resolved.
Reflection warning, clojure/core/matrix/impl/ndarray.clj:125 - reference to field shape can't be resolved.
Reflection warning, clojure/core/matrix/impl/ndarray.clj:126 - reference to field strides can't be resolved.
Reflection warning, clojure/core/matrix/impl/ndarray.clj:127 - reference to field offset can't be resolved.

Sparse matrix support?

Do any of the supported core.matrix implementations cover sparse matrices? If not, what would you recommend for inclusion?

Finalise matrix operation function naming

Need to decide on a consistent function name set for matrix operations.

There seems to be a lot of options, e.g.

mul (vecmath, jblas, vectorz)
mult (clatrix, colt)
multiply (english!)

Tempted to go with mul for consistency with majority of underlying implementations - any better ideas?

Others are probably easier - use the obvious three letter abbreviations if we go with mul

Maths functions can mimic java.lang.Math like clatrix?

exp
log10
signum
etc.

Other operations should generally use full english words, especially where they match mathematical terms:

negate
determinant
trace
etc.

Matrix division fails on 0.10

I encountered this exception on matrix division while trying out 0.10. I do not see this problem on 0.9. The repl session in the screenshot below defines a 3x3 matrix and then divides it by itself to demonstrate the issue:

The exception I get is the following:

java.lang.ClassCastException: clojure.core.matrix.impl.ndarray.NDArray cannot be cast to java.lang.Number

John

Matrix construction functions

We need a way of constructing matrices that:

Is easy to use
can be hooked into by different matrix implementations

The idea is that something like:

(def M (matrix [[1 0] [0 1]]))

Will construct a 2x2 identity matrix using the current matrix implementation we are using.

what does coerce-param in protocols do?

What does 'pre-scale' mean as opposed to 'scale'?

Bit confused about this.

(defprotocol PMatrixScaling
  "Protocol to support matrix scaling by scalar values"
  (scale [m a])
  (pre-scale [m a]))

Surely pre- and post-multiplication by a scalar are the same thing (just element-wise scaling)? Or did you have something else in mind?

Broadcasting for element-wise binary operations

We should have broadcasting support for element-wise binary operations, e.g. matrix addition.

Design principles:

Should work like NumPy (http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
Should preserve fast path when broadcasting is not needed

.toString broken for NDArray?

Currently .toString appears to be broken for NDArray (latest develop branch)

(def m (new-array :ndarray [3 3]))
(.toString m)
RuntimeException Can't coerce to vector: class clojure.core.matrix.impl.ndarray.NDArray  clojure.core.matrix.protocols/persistent-vector-coerce (protocols.clj:489)

PImplementation

Hi,

I do have some question about the new PImplementation, what should come out of new-vector, new-matrix and new-matrix-nd ???

Should come out another vector, and other matrixs ?

There is really a point doing so ?

I mean, it bring to do something like this...

(def matrix (matrix [[1 2] [2 1]]))
(new-vector matrix [1 2 3 4])

To get a vector... Why we need to put the first matrix in the call new-vector ???
To have the same implementation of vector ?

Am I missing something ?

I will make something like define new-vector-parallel-colt, new-matrix-parallel-colt, and then a big case statement

(defn new-vector [data & {:keys [type :or type :parrallel-colt]}]
  (case type
    :parallel-colt (new-vector-parallel-colt data)
    :other (new-vector-other data)))

Sorry if I am missing the obvious...

Normalise benchmarks to per-operation timing

New benchmarks look good Dmitry!

I think it would be an improvement if the benchmarks normalised the results by deviding by the number of operations performed, so that the result is a per-operation timing. This would let us know precisely how many nanoseconds each operation is taking.

How is dot different from mul ?

Isn't the doc product of two vectors the same as mul ? Why is this a unique fn in the API ?

https://github.com/mikera/matrix-api/blob/master/src/main/clojure/core/matrix.clj#L249

Add .DS_Store to .gitignore

Ignore Mac OS X specific files in the file structure.

support the rank of a matrix

Maybe the API should support the rank of a matrix?

Move protocols into separate namespace

Maybe protocols should be moved into a separate namespace:

Pros:

Hide protocols from API users
Allow protocols to be used independently of core.matrix
Might solve circular dependency issues with default implementations (core.matrix requires default implementation requires protocol)

PDoubleArrayOutput generalization and extraction

We could extract PDoubleArrayOutput to a separate namespace and generalize it a bit, allowing to communicate between implementations with long arrays too (at least).

equals not working for different-sized NDArrays

(equals (array :ndarray 1) (array :ndarray [1 1]))
=> true

I believe these should be non-equal : different shaped arrays should not be equal to each other (even if the broadcasted version would be)

Add mutative counterparts in API

There are a few protocols without mutative counterparts, namely:

PNegation
PExponent
PSquare

Is it by design?

NDArray support

Add support for general purpose N-dimensional arrays (like NumPy ndarray)

Features:

Allow arbitrary objects (not just numbers)
Allow in-place modifications
Allow "views" - i.e. slices / subsets of other arrays that can be modified via the view
Can be used as 1D / 2D / 3D vectors / matrices / tensors if filled with java.lang.Numbers

Initial implementation stubs:

https://github.com/mikera/matrix-api/blob/master/src/main/clojure/core/matrix/impl/ndarray.clj

Pretty-printing for matrices

Should have a pretty-printer for matrix output, allowing at least the following:

Alignment of columns/ decimal places
Truncation of double values to a fixed number of decimal places
Max size truncation for very large matrices (maybe display 20x20 max by default?)

Printed output should still be readable clojure data.

Output might be something like:

[[-1.000  0.000  0.000]
 [ 0.000  3.141  0.000]
 [ 0.000  0.000 -2.718]]

Convention on return value of mutating operations

Is there any particular reason why mutations doesn't return mutated object in default implementation? (assign! in impl/default.clj at line 187, for example)

Add API functions for setting rows / columns / sub-matrices

Ideas:

(set-row m i v)
(set-column m i j)
(set-submatrix-at m i j small-matrix)

Parallel Colt

I tried to make a very first implementation of the matrix of Parallel Colt (PC), however it is a huge mess... I guess I am missing something in the multimethod stuff...

It is not complete https://gist.github.com/4525672

What I noticed is that (PC) does not have a real and clear abstraction of matrix...

It may be me, but I guess that implement PC is gonna be a big mess...

Cheers

Simone

AOT compilation fails in 0.9.0

I noticed that aot compilation fails with core.matrix 0.9.0

I created a fresh leiningen project, added core.matrix 0.9.0 as dependency and
added the existing core namespace to the project.clj :aot key. When I then require core.matrix
in the core namespace and do a lein compile, I get the following Exception:

Exception in thread "main" java.lang.IllegalArgumentException: No implementation of method: :implementation-key of protocol:
#'clojure.core.matrix.protocols/PImplementation found for class: clojure.core.matrix.impl.ndarray.NDArray, compiling:(ndarray.clj:351:1)

When I revert back to core.matrix 0.8.0, it compiles just fine.

Protocols revision

We need to revise distribution of methods between protocols at some point. Here is an example why this is needed:

(defprotocol PZeroDimensionAccess 
  (get-0d [m])
  (set-0d! [m value]))

(defprotocol PZeroDimensionSet
  (set-0d [m value]))

There is no semantic reason why this two protocols should be different.

Missing cross product for vectors?

Is there a function for the cross product between vectors? I couldn't find it anywhere in the API. Maybe you could point me in the right direction.