Git Product home page Git Product logo

vectorz's Introduction

Vectorz Logo

Fast double-precision vector and matrix maths library for Java, based around the concept of N-dimensional arrays.

This library is designed for use in games, simulations, raytracers, machine learning etc. where fast vector maths is important.

Some highlights:

  • Vectorz can do over 1 billion 3D vector operations per second on a single thread.
  • Specialised matrix types for efficient optimised operations (identity, diagonal, sparse etc.).
  • Support for arbitrary n-dimensional numerical arrays.

Status

Vectorz is reasonably mature, battle tested and being used in production applications. The API is still evolving however as new features get added so you can expect a few minor changes, at least until version 1.0.0

Build Status Dependency Status

Documentation

See the Vectorz Wiki

Example usage

    Vector3 v=Vector3.of(1.0,2.0,3.0);		
    v.normalise();                       // normalise v to a unit vector
    		
    Vector3 d=Vector3.of(10.0,0.0,0.0);		
    d.addMultiple(v, 5.0);               // d = d + (v * 5)
    
	Matrix33 m=Matrixx.createXAxisRotationMatrix(Math.PI);
	Vector3 rotated=m.transform(d);      // rotate 180 degrees around x axis	    

Key features

  • Supports double typed vectors of arbitrary size
  • Both mutable and immutable vectors are supported, enabling high performance algorithms
  • Support for any size matrices, including higher dimensional (NDArray) matrices
  • Ability to create lightweight view vectors (e.g. to access subranges of other vectors)
  • Library of useful mathematical functions on vectors
  • Vectors have lots of utility functionality implemented - Cloneable, Serializable, Comparable etc.
  • Various specialised types of vectors/matrices types (e.g. identity matrices, diagonal matrices)
  • Support for affine and other matrix transformations
  • sparse arrays for space efficient large vectors and matrices where most elements are zero
  • Operator system provides composable operators that can be applied to array elements
  • Input / output of vectors and matrices - in various formats including readable edn format

Vectorz is designed to allow the maximum performance possible for vector maths on the JVM.

This focus has driven a number of important design decisions:

  • Support for sparse vectors and other specialised array types
  • Specialised primitive-backed small vectors (1,2,3 and 4 dimensions) and matrices (1x1, 2x2, 3x3 and M*3)
  • Abstract base classes preferred over interfaces to allow more efficient method dispatch
  • Multiple types of vector are provided for optimised performance in special cases
  • Hard-coded fast paths for most common 2D and 3D operations
  • Vector operations are generally not thread safe, by design
  • Concrete classes are generally final

If you have a use case that isn't yet well optimised then please post an issue - the aim is to make all common operations as efficient as possible.

vectorz's People

Contributors

drone29a avatar farahhariri avatar haosdent avatar hoijui avatar infojunkie avatar mdekstrand avatar mikera avatar moygit avatar pengyunie avatar prasant94 avatar rosejn avatar sosotughushi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vectorz's Issues

Failing case of QR decomposition

The following Qr decomposition case appears to fail:

[[  0 , 0 ],
 [ -1 , 0 ]]

with results

Q= [[  0.0,  1.0 ],
    [  1.0,  0.0 ]]
R= [[ -1.0,  0.0 ],
    [  0.0, -1.0 ]]

Performance issue (4x) with indexed sparse vectors and distanceSquared

Replacing:

(.distanceSquared sparse-indexed-left sparse-indexed-right)

with:

(defn distance-squared
  ^Double [^SparseIndexedVector lhs ^SparseIndexedVector rhs]
  (let [result (.clone lhs)]
    (.sub result rhs)
    (.magnitudeSquared result)))

Resulted around a 4X perf gain.

My guess is there is a bunch of work to be done around special casing operations between types (even higher order operations) so that when using specialized types the full benefits of the type are realized.

Slow SparseColumnMatrix innerProduct

SparseColumnMatrix's innerProduct function is extremely slow with the current implementation. I was using it on a 40000x41262 SparseColumnMatrix, it took almost 6 min (531215.413798 msecs). The following clojure implementation is able to produce the same output around 1 second on the same machine.

(defn innerProduct
  [^SparseColumnMatrix A ^Vector x]
  (let [_M ^SparseColumnMatrix (m/clone A)
        b  ^Vector  (Vector/createLength (m/row-count _M))
        N  ^Integer (int (m/column-count _M))]
    (dotimes [c N]
      (let [col (.getColumn _M c)]
        (.multiply col (.get x c))
        (.add b col)))
    b))

I think the addMultipleToArray method is not very memory efficient.

Make testing deterministic

In some cases we have non-deterministic test cases.

We should convert to deterministic tests for all of these (with fixed or deterministically generated test data)

includeIndices causes memory leak when adding many times

I've got a very large matrix where I need to add a vector to a column, perform an operation, and then subtract the original vector (leaving me with what should be the original matrix).

However, I'm experiencing a memory leak because of (I believe) the includeIndices functions, which seem to be really eating up space every time I add and subtract.

Here's my code:

        // oldMatrix is a SparseRowMatrix made of SparseIndexedVectors
        SparseColumnMatrix proximities = oldMatrix.toSparseColumnMatrix();
        for (int index = 0; index < matrixSize; index++) {
            matrix.getColumnView(index).add(vector);

            // do other things

            matrix.getColumnView(index).sub(vector);
        }

Any ideas on this behavior from includeIndices?

How to serialize a mikera.arrayz.INDArray?

I'm using mikera.arrayz.INDArray in my program. Now I get a problem with INDArray's serialization/ deserialization.

    mikera.arrayz.INDArray data = tensor1.data();

    // fastjson
    //com.alibaba.fastjson.JSONException: write javaBean error, class mikera.matrixx.impl.SparseRowMatrix, Can't slice a scalar!
    com.alibaba.fastjson.JSON.toJSON(data);


    // GSON
    com.google.gson.Gson gson = new Gson();
    // it's ok
    String json = gson.toJson(data);
    // Caused by: java.lang.UnsupportedOperationException: Interface can't be instantiated! Interface name: mikera.arrayz.INDArray
    gson.fromJson(json, INDArray.class);

    // jackson
    ObjectMapper mapper = new ObjectMapper();
    // org.nd4j.shade.jackson.databind.JsonMappingException: Direct self-reference leading to cycle (through reference chain: mikera.matrixx.impl.SparseRowMatrix["rows"]->java.util.ArrayList[0]->mikera.vectorz.impl.SparseIndexedVector["transpose"])
    String json1 = mapper.writeValueAsString(data);

Mistake in JoinedArray.get(int x, int y)

First, thank you for providing such a highly efficient library. But I found a small mistake in the function get(int x, int y) when I used JoinedArray.

if ((y<0)||(y>=sliceCount())) {
	throw new IndexOutOfBoundsException(ErrorMessages.invalidIndex(this, x,y));
}

The code above has some problem.
y should be the second dimension of an array. But sliceCount() will return the size of the first dimension of this array.This comparison is easier throwing a exception when the second dimension larger than the first dimension.

Small Bug in SparseRowMatrix

Hi there,

I'm using your package and caught a small bug, thought it'd be useful to report it.
line 381 of SparseRowMatrix;
for (int j = 0; j < cols; ++j) {
should be changed to:
for (int j = 0; j < a.cols; ++j) {

Cheers,

Publish to Maven Central

Right now, vectorz is not published to Maven Central. This makes it a bit harder for Maven-based projects to find and use. Also, since LensKit is published to Central, vectorz would need to be published to Central if we are going to use it.

Add rotate functionality

It is useful to be able to rotate vectors, matrices and arrays around one or more axes.

Design thoughts:

  • Should return rotated mutable views
  • Should support rotation around one or all dimensions

clarification needed: is VectorMatrixMN expected to be sparse?

Would it be inefficient to use VectorMatrixMN for a dense matrix?

Or maybe the better question is, if I have a Matrix I need to build, and I know the number of rows in advance, and the row data could either be generated in a Vector or an array, what would be the most efficient Matrix class to use?

It's actually going to be the second multiplicand, so probably a matrix stored as columns would be best, but I don't see a non-sparse option for that.

I have an n x n SparseRowMatrix that will be multiplied by an n x k dense matrix, where n is large (tens or hundreds of thousands) and k is small (under 10).

Thanks in advance for your help and advice!

AltLU needs handling of Pivot matrix?

Hi @haosdent I think the AltLU code you recently contributed needs some handling of the Pivot matrix - otherwise we don't seem to get the required result LU = PA.

See the new interface ILUP I added, and also the SimpleLUP code that is an alternative implementation.

Solving linear systems with vectorz

I am trying to solve a sparse linear system with Vectorz, but it seems to lack some basic algorithms to do it efficiently.

I want to solve Ku=F, where K is a sparse matrix and F a dense vector.
The classic way to solve this system is to get a decomposition of K, let's say K=LU, and then solve the easy triangular problems Ly=F and Uu=y.

I am not sure the implemented factorizations in Vectorz are optimized for sparse matrix (it seems the implementations are coming from a dense matrix manipulation library), but most importantly Vectorz seems to lack forward and backward substitution algorithms to solve triangular problems.

Strange exception with join-along

I've been playing with vectorz from core.matrix (thanks for the great work, BTW), and I met this exception with calling join-along that I can't understand:

(clojure.core.matrix/join-along
 1
 (clojure.core.matrix/array :vectorz [[1 2][3 4][5 6]])
 (clojure.core.matrix/array :vectorz [7 8 9]))
=> IllegalArgumentException Incompatible shapes: [3,2] vs. [3]  mikera.arrayz.impl.JoinedArray.join (JoinedArray.java:33)

I don't suppose it was meant to happen? Because I when I did the same thing with persistent-vector and also ndarray-double, it work out fine:

(clojure.core.matrix/join-along 1 [[1 2][3 4][5 6]] [7 8 9])
=> [[1 2 7] [3 4 8] [5 6 9]]

I'm using [net.mikera/core.matrix "0.52.1"] and [net.mikera/vectorz-clj "0.44.1"].

Different implementations of innerProduct for row vs column matrices?

If we compare

public AVector innerProduct(SparseRowMatrix m) {
with
public AVector innerProduct(SparseColumnMatrix m) {
, I don't understand why the logic isn't simply to find getRow(i) for SparseRowMatrix like it is for SparseColumnMatrix. Is one of the above methods more efficient than the other?

Aside: if this is not the proper forum for these kinds of questions, let me know! Thanks

sparse matrix equality

Equals function behaves kinda weird for SparseRowMatrix and SparseColumnMatrix. For my problem I need a SparseRowMatrix, but the data are coming in by column. So I store data in SparseColumnMatrix first then use toSparseRowMatrix to convert it to SparseRowMatrix, thinking it maybe faster since I'm using Clojure. However, when I compare the converted SparseRowMatrix with original SparseColumnMatrix using equal function, it keeps giving my false. Here is a simple clojure code for it:

==> (.equals ^SparseColumnMatrix path-matrix-column (.toSparseRowMatrix ^SparseColumnMatrix path-matrix-column))
==> false

The sparse matrix I have is a 40000x41262, and potentially could be much larger. When I tried to create a small matrix manually, say 10x10, and convert it into to the other form, it seems to be ok.

Also, if I convert the sparse column matrix to sparse row matrix then use equality on the sparse row matrix, it gives true:

==> (def path-matrix-row (.toSparseRowMatrix ^SparseColumnMatrix path-matrix-column))
==> (.equals ^SparseRowMatrix path-matrix-row (.toSparseColumnMatrix ^SparseRowMatrix path-matrix-row))
==> true

Also

==> (.equals ^SparseRowMatrix path-matrix-row ^SparseColumnMatrix path-matrix-column)
==> false ;; returns instantly
==> (equals ^SparseColumnMatrix path-matrix-column ^SparseRowMatrix path-matrix-row)
==> false ;; takes quite a long time

Add norm function

We should have a function to compute an arbitrary p-norm on matrices and vectors

Add maxElementIndex to vectors

The vector classes have a maxAbsElement method. It'd be useful to have a maxElement, as well as maxElementIndex and maxAbsElementIndex methods returning the indexes of the largest elements. Useful for things like computing probability distributions, then grabbing the most likely index and using it for something else (indexing into whatever the vector is a probability distribution over).

Make matrices serializable

ImmutableMatrix and Matrix are not currently (as of 0.27.0) serializable.

I'm prototyping a vectorz-based version of LensKit's FunkSVD recommender, and for now will work around this by manually serializing, but long-term it would be more convenient to have the matrices serializable.

Getting a more functional Style for Cholesky

Hi @prasant94 - The initial Cholesky stuff looks good, but I think we should try to convert this to a more "functional" interface.

By which I mean:

  • There should be no public mutator functions (e.g. setExpectedMaxSize) - any such values should be set as final fields when objects are constructed. It's OK to mutate as an implementation detail of course
  • Immutable result objects should be returned from decompositions
  • There is a pure functional interface to make the decomposition happen.
  • API users should never need to use anything from the "*.impl" packages - this should be purely for implementation details

As an example, I've converted the old Cholesky code to return a CholeskyResult in this style

Consistent Linear Algebra API

I believe it is worth making a consistent API for linear algebra operations as follows:

  • API classes in the mikera.matrixx.algo.* package
  • Static functions
  • Return an IXXXXResult for decompositions (other algorithms can return the result type directly
  • Use null for failed decompositions (e.g. singular matrices etc.)

This consistent API gives us a few advantages:

  • Users know where to go (it is all in one place)
  • We maintain common conventions
  • We hide implementation details
  • We have the option to upgrade to better implementations without changing the API (and breaking downstream users....)

@prasant94 - does that make sense to you? Think it applies to most of the stuff you are importing....

Unable to create NDArray for multidimensional arrays

Good day guys, I have to convert this python code

import numpy as np
x = np.array([[0,6], [1, 2], [3,5], [4,7], [2,9], [10,1]])
print(x[-4])         # Prints fourth to last [3,5]

i want to convert to java, so i stumbled upon this library. I made a comparison with nd4j, i found it fast and simple. And since i was unbale to get slice with negative index like the python code work so i decided to try vectorz but i am having problem creating the multidimensional array in vectors. This is what i have done

public static void main(String[] args) {
    double[][] x = new double[][]{{0, 6}, {1, 2}, {3, 5}, {4, 7}, {2, 9}, {10, 1}};
    double[] xx = new double[]{0, 6, 1, 2, 3, 5, 4, 7, 2, 9, 10, 1};
    NDArray v=NDArray.wrap(xx, new int[]{2,2});
    System.out.println(v.slice(-4));
}

So how do I create a NDArray or INDArray from a multidimensional array using vectorz?
and how to slice it to return the same result as the one python gave me?

Improve efficiency of SparseRowMatrix.setRow()?

In the following code, I am finding that the t.setRow(i, uniformRow) is very slow. In this context, t is a SparseRowMatrix whose rows are all SparseIndexedVector (until I try to replace about a third of them with a RepeatedElementVector using setRow).

    double uniformWeight = (double) 1 / n; // used when the rowSum is zero
    RepeatedElementVector uniformRow = new RepeatedElementVector(n, uniformWeight);
    boolean[] zeroRows = new boolean[n]; // all values default to false
    for (int i = 0; i < n; i++) {
        if (DEBUG > 0 && i % 100 == 0) {
            System.out.print(i + " ");
        }
        AVector row = t.getRow(i);
        double rowSum = row.elementSum();
        if (rowSum > 0) {
            row.divide(rowSum); // modifies in place
        } else {
            t.setRow(i, uniformRow);
        }

When t is a 32K x 32K matrix, with about 10K zero rows that branch to the setRow, executing this code takes about 5.5 hours on my development machine. If I replace the setRow with setting a boolean in a boolean array, the whole normalization takes under a minute. (The row.divide that happens for 2/3 of the rows is super fast.) I am using version 0.27.0

Is there any way to make that setRow a more efficient operation on a SparseRowMatrix?

License Question

Hi Mike!
I really like your vectorz library and I would like to use it in my applications, sadly the LGPL License makes this not possible for me. Would you consider switching to another license model such as MIT or Apache License to open up the possible uses of this nice library of yours?
As of now I am looking at la4j as a library, but I would like to use verctorz because of its design and performance. Thank you for your consideration.

Can't create SparseColumnMatrix

When doing SparseColumnMatrix M = SparseColumnMatrix.create(3, 3);

I get:

method create in class mikera.matrixx.impl.SparseColumnMatrix cannot be applied to given types;
  required: mikera.vectorz.AVector[]
  found: int,int
  reason: varargs mismatch; int cannot be converted to mikera.vectorz.AVector

SparseRowMatrix.create(3,3); works, though.

Add sparse vector implementation

Various applications (machine learning, collaborative filtering, recommender systems) depend on large sparse vectors, which contain relatively few non-zero values relative to their size.

Proposal is to add sparse vectors with the following attributes:

  • Space requirement grows only with the number of non-zero elements
  • Fast element access O(log32 n) or better
  • Fast updates O(log32 n) or better
  • All other vector operations implemented with reasonable efficiency.

Efficiency is probably best achieved with a shallow tree data structure, though a hashmap based sparse vector could also be an option.

Will there be future support for single-precision floating point numbers?

This library is fantastic for the current project I am working on, however to save on memory usage (as all of our matrix and vector data comes to us in single-precision), might there be a configuration of the project in the future to support single-precision matrices? My apologies if this is not the right place to bring this up; I just wanted to check in before I try a find/replace of 'double' to 'float'!

Document Vector.length()

With vectors, it is always confusing whether length refers to Euclidean length or the size/dimensionality of the vector. In vectorz, it seems to be the size. But it would be nice to have this explicitly documented (length() has no JavaDoc).

Support truncated singular value decomposition

For many applications (recommender systems, information retrieval, etc.), only the top k singular values (and their corresponding vectors) are required from the singular value decomposition.

Would be nice to have the SVD methods support a decompose(AMatrix, int) method that specifies the (maximum) number of singular values desired, so we don't waste time and space computing a bunch of unneeded ones.

Real angles in Vector2 rotateInPlace(), rotate() to copy original

Sorry, don't know how to do Git stuff like pull requests... can someone please merge these into the source code? These are amendments to Vector2.java.

  1. rotateInPlace previously took an int angle parameter, which I think is obviously useless
  2. Add a new function rotate which returns a rotated copy of the original vector, for convenience.

Frank

```

/**
* Rotates a 2D vector around the origin by a given angle
* @param angle
/
public void rotateInPlace(double angle) {
double ca=Math.cos(angle);
double sa=Math.sin(angle);
double nx=(x
ca)-(ysa);
double ny=(x
sa)+(y*ca);
x=nx;
y=ny;
}

/**
 * Rotates a 2D vector around the origin by a given angle
 * @param angle
  • @return
    /
    public Vector2 rotate(double angle) {
    double ca=Math.cos(angle);
    double sa=Math.sin(angle);
    return new Vector2((x
    ca)-(ysa), (xsa)+(y*ca));
    }

Immutable array support

We should support immutable versions of all Vectorz array shapes.

This is an important and useful guarantee for people working in multicore environments.

Add possibility for sparse default value different than zero

In SparseRowMatrix and SparseColumnMatrix, zero is assumed as the sparse value.
However, there are many applications were the default value if 1 instead, as in some symmetric distance matrices for example.
Although it is possible to handle this limitation by inverting the logic of the application to treat the way the data structure is used, the possibility I am suggesting is simple to implement and very useful.

TestSvdImplicitQr is a slow test

Currently this test takes over 0.5s - around 25% of the total test runtime!

We should reduce this in the name of keeping a fast test runtime

Creating sparse matrix in clojure

I'm trying to get rid of a reflection warning when creating SparseColumnMatrix from Clojure, looking for some suggestions.

After generating a number of SparseIndexedVectors, I'm trying to put them together into a SparseColumnMatrix, I have tried:

(SparseColumnMatrix/create (into-array  [(SparseIndexedVector/createLength 100) (SparseIndexedVector/createLength 100)]))

and

(SparseColumnMatrix/create  [(SparseIndexedVector/createLength 100) (SparseIndexedVector/createLength 100)])

They both gives reflection warning, I'm not sure how to type hint in this situation. Any suggestions?

Optimised operations for triangular matrices

We now have optimised classes for triangular matrices - UpperTriangularMatrix and LowerTriangularMatrix.

These classes allow for more efficient implementations of many operations. We should probably aim to achieve improved performance (better than regular dense matrices) for at least all of the following:

  • inner product
  • determinant (done)
  • inverse
  • transpose
  • addition to array

Improve isOrthogonal method

We should improve the current isOrthogonal method for better performance, possibly using the second implementation available in the Hessenberg decomposition algorithm.

Support for long indices

With the new sparse matrix support, we should add support for long-based indexing.

This will also help with Clojure interop for vectorz-clj, since Clojure prefers long values rather than ints for most purposes.

Internal data formats may use either int or long values. ints will offer greater space efficiency and work better as array indices, so should be preferred unless long sizes are required.

Equivalent to Numpy amin, argsort?

I'm looking to convert numpy code and was wondering where the equilavent functions/method as amin, argsort are... Would be great if you could point me to a Class or where the implementation is?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.