mikera / vectorz Goto Github PK

Fast and flexible numerical library for Java featuring N-dimensional arrays

License: GNU Lesser General Public License v3.0

Java 100.00%

vectorz's Introduction

Fast double-precision vector and matrix maths library for Java, based around the concept of N-dimensional arrays.

This library is designed for use in games, simulations, raytracers, machine learning etc. where fast vector maths is important.

Some highlights:

Vectorz can do over 1 billion 3D vector operations per second on a single thread.
Specialised matrix types for efficient optimised operations (identity, diagonal, sparse etc.).
Support for arbitrary n-dimensional numerical arrays.

Status

Vectorz is reasonably mature, battle tested and being used in production applications. The API is still evolving however as new features get added so you can expect a few minor changes, at least until version 1.0.0

Documentation

See the Vectorz Wiki

Example usage

    Vector3 v=Vector3.of(1.0,2.0,3.0);		
    v.normalise();                       // normalise v to a unit vector
    		
    Vector3 d=Vector3.of(10.0,0.0,0.0);		
    d.addMultiple(v, 5.0);               // d = d + (v * 5)
    
	Matrix33 m=Matrixx.createXAxisRotationMatrix(Math.PI);
	Vector3 rotated=m.transform(d);      // rotate 180 degrees around x axis

Key features

Supports double typed vectors of arbitrary size
Both mutable and immutable vectors are supported, enabling high performance algorithms
Support for any size matrices, including higher dimensional (NDArray) matrices
Ability to create lightweight view vectors (e.g. to access subranges of other vectors)
Library of useful mathematical functions on vectors
Vectors have lots of utility functionality implemented - Cloneable, Serializable, Comparable etc.
Various specialised types of vectors/matrices types (e.g. identity matrices, diagonal matrices)
Support for affine and other matrix transformations
sparse arrays for space efficient large vectors and matrices where most elements are zero
Operator system provides composable operators that can be applied to array elements
Input / output of vectors and matrices - in various formats including readable edn format

Vectorz is designed to allow the maximum performance possible for vector maths on the JVM.

This focus has driven a number of important design decisions:

Support for sparse vectors and other specialised array types
Specialised primitive-backed small vectors (1,2,3 and 4 dimensions) and matrices (1x1, 2x2, 3x3 and M*3)
Abstract base classes preferred over interfaces to allow more efficient method dispatch
Multiple types of vector are provided for optimised performance in special cases
Hard-coded fast paths for most common 2D and 3D operations
Vector operations are generally not thread safe, by design
Concrete classes are generally final

If you have a use case that isn't yet well optimised then please post an issue - the aim is to make all common operations as efficient as possible.

vectorz's People

Contributors

Stargazers

Watchers

vectorz's Issues

Unable to create NDArray for multidimensional arrays

Good day guys, I have to convert this python code

import numpy as np
x = np.array([[0,6], [1, 2], [3,5], [4,7], [2,9], [10,1]])
print(x[-4])         # Prints fourth to last [3,5]

i want to convert to java, so i stumbled upon this library. I made a comparison with nd4j, i found it fast and simple. And since i was unbale to get slice with negative index like the python code work so i decided to try vectorz but i am having problem creating the multidimensional array in vectors. This is what i have done

public static void main(String[] args) {
    double[][] x = new double[][]{{0, 6}, {1, 2}, {3, 5}, {4, 7}, {2, 9}, {10, 1}};
    double[] xx = new double[]{0, 6, 1, 2, 3, 5, 4, 7, 2, 9, 10, 1};
    NDArray v=NDArray.wrap(xx, new int[]{2,2});
    System.out.println(v.slice(-4));
}

So how do I create a NDArray or INDArray from a multidimensional array using vectorz?
and how to slice it to return the same result as the one python gave me?

Real angles in Vector2 rotateInPlace(), rotate() to copy original

Sorry, don't know how to do Git stuff like pull requests... can someone please merge these into the source code? These are amendments to Vector2.java.

rotateInPlace previously took an int angle parameter, which I think is obviously useless
Add a new function rotate which returns a rotated copy of the original vector, for convenience.

Frank

```

/**
* Rotates a 2D vector around the origin by a given angle
* @param angle
/
public void rotateInPlace(double angle) {
double ca=Math.cos(angle);
double sa=Math.sin(angle);
double nx=(xca)-(ysa);
double ny=(xsa)+(y*ca);
x=nx;
y=ny;
}

/**
 * Rotates a 2D vector around the origin by a given angle
 * @param angle

@return
/
public Vector2 rotate(double angle) {
double ca=Math.cos(angle);
double sa=Math.sin(angle);
return new Vector2((xca)-(ysa), (xsa)+(y*ca));
}

Equivalent to Numpy amin, argsort?

I'm looking to convert numpy code and was wondering where the equilavent functions/method as amin, argsort are... Would be great if you could point me to a Class or where the implementation is?

Support for long indices

With the new sparse matrix support, we should add support for long-based indexing.

This will also help with Clojure interop for vectorz-clj, since Clojure prefers long values rather than ints for most purposes.

Internal data formats may use either int or long values. ints will offer greater space efficiency and work better as array indices, so should be preferred unless long sizes are required.

includeIndices causes memory leak when adding many times

I've got a very large matrix where I need to add a vector to a column, perform an operation, and then subtract the original vector (leaving me with what should be the original matrix).

However, I'm experiencing a memory leak because of (I believe) the includeIndices functions, which seem to be really eating up space every time I add and subtract.

Here's my code:

        // oldMatrix is a SparseRowMatrix made of SparseIndexedVectors
        SparseColumnMatrix proximities = oldMatrix.toSparseColumnMatrix();
        for (int index = 0; index < matrixSize; index++) {
            matrix.getColumnView(index).add(vector);

            // do other things

            matrix.getColumnView(index).sub(vector);
        }

Any ideas on this behavior from includeIndices?

How to serialize a mikera.arrayz.INDArray?

I'm using mikera.arrayz.INDArray in my program. Now I get a problem with INDArray's serialization/ deserialization.

    mikera.arrayz.INDArray data = tensor1.data();

    // fastjson
    //com.alibaba.fastjson.JSONException: write javaBean error, class mikera.matrixx.impl.SparseRowMatrix, Can't slice a scalar!
    com.alibaba.fastjson.JSON.toJSON(data);


    // GSON
    com.google.gson.Gson gson = new Gson();
    // it's ok
    String json = gson.toJson(data);
    // Caused by: java.lang.UnsupportedOperationException: Interface can't be instantiated! Interface name: mikera.arrayz.INDArray
    gson.fromJson(json, INDArray.class);

    // jackson
    ObjectMapper mapper = new ObjectMapper();
    // org.nd4j.shade.jackson.databind.JsonMappingException: Direct self-reference leading to cycle (through reference chain: mikera.matrixx.impl.SparseRowMatrix["rows"]->java.util.ArrayList[0]->mikera.vectorz.impl.SparseIndexedVector["transpose"])
    String json1 = mapper.writeValueAsString(data);

Publish to Maven Central

Right now, vectorz is not published to Maven Central. This makes it a bit harder for Maven-based projects to find and use. Also, since LensKit is published to Central, vectorz would need to be published to Central if we are going to use it.

Strange exception with join-along

I've been playing with vectorz from core.matrix (thanks for the great work, BTW), and I met this exception with calling join-along that I can't understand:

(clojure.core.matrix/join-along
 1
 (clojure.core.matrix/array :vectorz [[1 2][3 4][5 6]])
 (clojure.core.matrix/array :vectorz [7 8 9]))
=> IllegalArgumentException Incompatible shapes: [3,2] vs. [3]  mikera.arrayz.impl.JoinedArray.join (JoinedArray.java:33)

I don't suppose it was meant to happen? Because I when I did the same thing with persistent-vector and also ndarray-double, it work out fine:

(clojure.core.matrix/join-along 1 [[1 2][3 4][5 6]] [7 8 9])
=> [[1 2 7] [3 4 8] [5 6 9]]

I'm using [net.mikera/core.matrix "0.52.1"] and [net.mikera/vectorz-clj "0.44.1"].

Add rotate functionality

It is useful to be able to rotate vectors, matrices and arrays around one or more axes.

Design thoughts:

Should return rotated mutable views
Should support rotation around one or all dimensions

Consistent Linear Algebra API

I believe it is worth making a consistent API for linear algebra operations as follows:

API classes in the mikera.matrixx.algo.* package
Static functions
Return an IXXXXResult for decompositions (other algorithms can return the result type directly
Use null for failed decompositions (e.g. singular matrices etc.)

This consistent API gives us a few advantages:

Users know where to go (it is all in one place)
We maintain common conventions
We hide implementation details
We have the option to upgrade to better implementations without changing the API (and breaking downstream users....)

@prasant94 - does that make sense to you? Think it applies to most of the stuff you are importing....

Creating sparse matrix in clojure

I'm trying to get rid of a reflection warning when creating SparseColumnMatrix from Clojure, looking for some suggestions.

After generating a number of SparseIndexedVectors, I'm trying to put them together into a SparseColumnMatrix, I have tried:

(SparseColumnMatrix/create (into-array  [(SparseIndexedVector/createLength 100) (SparseIndexedVector/createLength 100)]))

and

(SparseColumnMatrix/create  [(SparseIndexedVector/createLength 100) (SparseIndexedVector/createLength 100)])

They both gives reflection warning, I'm not sure how to type hint in this situation. Any suggestions?

Performance issue (4x) with indexed sparse vectors and distanceSquared

Replacing:

(.distanceSquared sparse-indexed-left sparse-indexed-right)

with:

(defn distance-squared
  ^Double [^SparseIndexedVector lhs ^SparseIndexedVector rhs]
  (let [result (.clone lhs)]
    (.sub result rhs)
    (.magnitudeSquared result)))

Resulted around a 4X perf gain.

My guess is there is a bunch of work to be done around special casing operations between types (even higher order operations) so that when using specialized types the full benefits of the type are realized.

Add norm function

We should have a function to compute an arbitrary p-norm on matrices and vectors

Improve efficiency of SparseRowMatrix.setRow()?

In the following code, I am finding that the t.setRow(i, uniformRow) is very slow. In this context, t is a SparseRowMatrix whose rows are all SparseIndexedVector (until I try to replace about a third of them with a RepeatedElementVector using setRow).

    double uniformWeight = (double) 1 / n; // used when the rowSum is zero
    RepeatedElementVector uniformRow = new RepeatedElementVector(n, uniformWeight);
    boolean[] zeroRows = new boolean[n]; // all values default to false
    for (int i = 0; i < n; i++) {
        if (DEBUG > 0 && i % 100 == 0) {
            System.out.print(i + " ");
        }
        AVector row = t.getRow(i);
        double rowSum = row.elementSum();
        if (rowSum > 0) {
            row.divide(rowSum); // modifies in place
        } else {
            t.setRow(i, uniformRow);
        }

When t is a 32K x 32K matrix, with about 10K zero rows that branch to the setRow, executing this code takes about 5.5 hours on my development machine. If I replace the setRow with setting a boolean in a boolean array, the whole normalization takes under a minute. (The row.divide that happens for 2/3 of the rows is super fast.) I am using version 0.27.0

Is there any way to make that setRow a more efficient operation on a SparseRowMatrix?

Implement Java Matrix Benchmarks

We should create a Vectorz implementation of the Java Matrix benchmarks so that we can understand how we perform vs. other Java matrix libraries.

https://code.google.com/p/java-matrix-benchmark/

We should aim to be close-to-fastest at runtime performance in every category

Make testing deterministic

In some cases we have non-deterministic test cases.

We should convert to deterministic tests for all of these (with fixed or deterministically generated test data)

Immutable array support

We should support immutable versions of all Vectorz array shapes.

This is an important and useful guarantee for people working in multicore environments.

License Question

Hi Mike!
I really like your vectorz library and I would like to use it in my applications, sadly the LGPL License makes this not possible for me. Would you consider switching to another license model such as MIT or Apache License to open up the possible uses of this nice library of yours?
As of now I am looking at la4j as a library, but I would like to use verctorz because of its design and performance. Thank you for your consideration.

Optimised operations for triangular matrices

We now have optimised classes for triangular matrices - UpperTriangularMatrix and LowerTriangularMatrix.

These classes allow for more efficient implementations of many operations. We should probably aim to achieve improved performance (better than regular dense matrices) for at least all of the following:

inner product
determinant (done)
inverse
transpose
addition to array

Implement inverse() for large matrices

Currently inverse() is only implemented for 2x2 and 3x3 matrices

We should have an efficient inverse operation for any square matrix

Will there be future support for single-precision floating point numbers?

This library is fantastic for the current project I am working on, however to save on memory usage (as all of our matrix and vector data comes to us in single-precision), might there be a configuration of the project in the future to support single-precision matrices? My apologies if this is not the right place to bring this up; I just wanted to check in before I try a find/replace of 'double' to 'float'!

Improve isOrthogonal method

We should improve the current isOrthogonal method for better performance, possibly using the second implementation available in the Hessenberg decomposition algorithm.

sparse matrix equality

Equals function behaves kinda weird for SparseRowMatrix and SparseColumnMatrix. For my problem I need a SparseRowMatrix, but the data are coming in by column. So I store data in SparseColumnMatrix first then use toSparseRowMatrix to convert it to SparseRowMatrix, thinking it maybe faster since I'm using Clojure. However, when I compare the converted SparseRowMatrix with original SparseColumnMatrix using equal function, it keeps giving my false. Here is a simple clojure code for it:

==> (.equals ^SparseColumnMatrix path-matrix-column (.toSparseRowMatrix ^SparseColumnMatrix path-matrix-column))
==> false

The sparse matrix I have is a 40000x41262, and potentially could be much larger. When I tried to create a small matrix manually, say 10x10, and convert it into to the other form, it seems to be ok.

Also, if I convert the sparse column matrix to sparse row matrix then use equality on the sparse row matrix, it gives true:

==> (def path-matrix-row (.toSparseRowMatrix ^SparseColumnMatrix path-matrix-column))
==> (.equals ^SparseRowMatrix path-matrix-row (.toSparseColumnMatrix ^SparseRowMatrix path-matrix-row))
==> true

Also

==> (.equals ^SparseRowMatrix path-matrix-row ^SparseColumnMatrix path-matrix-column)
==> false ;; returns instantly
==> (equals ^SparseColumnMatrix path-matrix-column ^SparseRowMatrix path-matrix-row)
==> false ;; takes quite a long time

AVector#innerProduct and #dotProduct return different types

When obtaining the inner product of two vectors, AVector.innerProduct returns a AScalar while Avector.dotProduct returns a double.

Is this for any particular reason or could they both return java.lang.Numbers?

Matrix.setColumn has incorrect index

The Matrix.setColumn method has an incorrect index:

for (int i=0; i<rc; i++) {
    index(i,j)]=col.unsafeGet(j); // should be unsafeGet(i)
}

AltLU needs handling of Pivot matrix?

Hi @haosdent I think the AltLU code you recently contributed needs some handling of the Pivot matrix - otherwise we don't seem to get the required result LU = PA.

See the new interface ILUP I added, and also the SimpleLUP code that is an alternative implementation.

Small Bug in SparseRowMatrix

Hi there,

I'm using your package and caught a small bug, thought it'd be useful to report it.
line 381 of SparseRowMatrix;
for (int j = 0; j < cols; ++j) {
should be changed to:
for (int j = 0; j < a.cols; ++j) {

Cheers,

Document Vector.length()

With vectors, it is always confusing whether length refers to Euclidean length or the size/dimensionality of the vector. In vectorz, it seems to be the size. But it would be nice to have this explicitly documented (length() has no JavaDoc).

Getting a more functional Style for Cholesky

Hi @prasant94 - The initial Cholesky stuff looks good, but I think we should try to convert this to a more "functional" interface.

By which I mean:

There should be no public mutator functions (e.g. setExpectedMaxSize) - any such values should be set as final fields when objects are constructed. It's OK to mutate as an implementation detail of course
Immutable result objects should be returned from decompositions
There is a pure functional interface to make the decomposition happen.
API users should never need to use anything from the "*.impl" packages - this should be purely for implementation details

As an example, I've converted the old Cholesky code to return a CholeskyResult in this style

Add maxElementIndex to vectors

The vector classes have a maxAbsElement method. It'd be useful to have a maxElement, as well as maxElementIndex and maxAbsElementIndex methods returning the indexes of the largest elements. Useful for things like computing probability distributions, then grabbing the most likely index and using it for something else (indexing into whatever the vector is a probability distribution over).

TestSvdImplicitQr is a slow test

Currently this test takes over 0.5s - around 25% of the total test runtime!

We should reduce this in the name of keeping a fast test runtime

clarification needed: is VectorMatrixMN expected to be sparse?

Would it be inefficient to use VectorMatrixMN for a dense matrix?

Or maybe the better question is, if I have a Matrix I need to build, and I know the number of rows in advance, and the row data could either be generated in a Vector or an array, what would be the most efficient Matrix class to use?

It's actually going to be the second multiplicand, so probably a matrix stored as columns would be best, but I don't see a non-sparse option for that.

I have an n x n SparseRowMatrix that will be multiplied by an n x k dense matrix, where n is large (tens or hundreds of thousands) and k is small (under 10).

Thanks in advance for your help and advice!

Test alternative cache-friendly matrix multiplication strategirs

Reference:

http://penguin.ewu.edu/~trolfe/MatMult/MatOpt.html

Add sparse vector implementation

Various applications (machine learning, collaborative filtering, recommender systems) depend on large sparse vectors, which contain relatively few non-zero values relative to their size.

Proposal is to add sparse vectors with the following attributes:

Space requirement grows only with the number of non-zero elements
Fast element access O(log32 n) or better
Fast updates O(log32 n) or better
All other vector operations implemented with reasonable efficiency.

Efficiency is probably best achieved with a shallow tree data structure, though a hashmap based sparse vector could also be an option.

Slow SparseColumnMatrix innerProduct

SparseColumnMatrix's innerProduct function is extremely slow with the current implementation. I was using it on a 40000x41262 SparseColumnMatrix, it took almost 6 min (531215.413798 msecs). The following clojure implementation is able to produce the same output around 1 second on the same machine.

(defn innerProduct
  [^SparseColumnMatrix A ^Vector x]
  (let [_M ^SparseColumnMatrix (m/clone A)
        b  ^Vector  (Vector/createLength (m/row-count _M))
        N  ^Integer (int (m/column-count _M))]
    (dotimes [c N]
      (let [col (.getColumn _M c)]
        (.multiply col (.get x c))
        (.add b col)))
    b))

I think the addMultipleToArray method is not very memory efficient.

Support truncated singular value decomposition

For many applications (recommender systems, information retrieval, etc.), only the top k singular values (and their corresponding vectors) are required from the singular value decomposition.

Would be nice to have the SVD methods support a decompose(AMatrix, int) method that specifies the (maximum) number of singular values desired, so we don't waste time and space computing a bunch of unneeded ones.

Add test cases cover corner and special exceptions

As @mikera metioned in #23 , more test cases, which could cover corner cases and special exceptions are necessary. So should add more test cases in TestAltLU and TestHouseholderQR.

Better determinant

Currently we use a fairly naive determinant calculation method. This should be replaced with a faster / more robust approach.

Probably using LUP decomposition? See:

http://en.wikipedia.org/wiki/LU_decomposition#Computing_the_determinant

Make matrices serializable

ImmutableMatrix and Matrix are not currently (as of 0.27.0) serializable.

I'm prototyping a vectorz-based version of LensKit's FunkSVD recommender, and for now will work around this by manually serializing, but long-term it would be more convenient to have the matrices serializable.

Integrate EJML linear algebra algorithms

EJML contains some very good linera algebra routines for decompositions etc.

It looks relatively feasible to port these over to Vectorz - Dense matix format is similar and license is also LGPL.

Initial experimental branch:

https://github.com/mikera/vectorz/tree/ejml-import

Add possibility for sparse default value different than zero

In SparseRowMatrix and SparseColumnMatrix, zero is assumed as the sparse value.
However, there are many applications were the default value if 1 instead, as in some symmetric distance matrices for example.
Although it is possible to handle this limitation by inverting the logic of the application to treat the way the data structure is used, the possibility I am suggesting is simple to implement and very useful.

Solving linear systems with vectorz

I am trying to solve a sparse linear system with Vectorz, but it seems to lack some basic algorithms to do it efficiently.

I want to solve Ku=F, where K is a sparse matrix and F a dense vector.
The classic way to solve this system is to get a decomposition of K, let's say K=LU, and then solve the easy triangular problems Ly=F and Uu=y.

I am not sure the implemented factorizations in Vectorz are optimized for sparse matrix (it seems the implementations are coming from a dense matrix manipulation library), but most importantly Vectorz seems to lack forward and backward substitution algorithms to solve triangular problems.

Sparse backwards and forwards substitution

We should provide backward and forward substitution algorithms for sparse matrices.

See: #29 for motivation

Merge ops package.

Merge ops package. https://github.com/haosdent/vectorz/tree/ejml-import/src/main/java/mikera/matrixx/ops

Failing case of QR decomposition

The following Qr decomposition case appears to fail:

[[  0 , 0 ],
 [ -1 , 0 ]]

with results

Q= [[  0.0,  1.0 ],
    [  1.0,  0.0 ]]
R= [[ -1.0,  0.0 ],
    [  0.0, -1.0 ]]

Support Edn format for vectors and matrices

https://github.com/edn-format/edn

This is the a nice format for vectors and matrices, and it has the big advantage of handling double numbers properly:

Examples:

;; a vector
[1.0 2.0 3.0]

;; a matrix
[[1 2 3] [4 5 6] [7 8 9]]

Different implementations of innerProduct for row vs column matrices?

If we compare

vectorz/src/main/java/mikera/vectorz/impl/SparseIndexedVector.java

Line 167 in 456c06d

public AVector innerProduct(SparseRowMatrix m) {

with

vectorz/src/main/java/mikera/vectorz/impl/SparseIndexedVector.java

Line 190 in 456c06d

public AVector innerProduct(SparseColumnMatrix m) {

, I don't understand why the logic isn't simply to find getRow(i) for SparseRowMatrix like it is for SparseColumnMatrix. Is one of the above methods more efficient than the other?

Aside: if this is not the proper forum for these kinds of questions, let me know! Thanks

vectorz bug on slice 1-d array

(scalar? (slice (array :vectorz [1 2 3]) 0)) ;=> false
See mikera/core.matrix#130

Can't create SparseColumnMatrix

When doing SparseColumnMatrix M = SparseColumnMatrix.create(3, 3);

I get:

method create in class mikera.matrixx.impl.SparseColumnMatrix cannot be applied to given types;
  required: mikera.vectorz.AVector[]
  found: int,int
  reason: varargs mismatch; int cannot be converted to mikera.vectorz.AVector

SparseRowMatrix.create(3,3); works, though.

Mistake in JoinedArray.get(int x, int y)

First, thank you for providing such a highly efficient library. But I found a small mistake in the function get(int x, int y) when I used JoinedArray.

if ((y<0)||(y>=sliceCount())) {
	throw new IndexOutOfBoundsException(ErrorMessages.invalidIndex(this, x,y));
}

The code above has some problem.
y should be the second dimension of an array. But sliceCount() will return the size of the first dimension of this array.This comparison is easier throwing a exception when the second dimension larger than the first dimension.