Git Product home page Git Product logo

Comments (7)

DrTimothyAldenDavis avatar DrTimothyAldenDavis commented on August 14, 2024

Some of the things you're looking for are already in the pipeline and will be available soon. In particular the COUNT will be done far faster in v5.x, but not using an aggregator.

The COUNT in Group 1 can already be done in the current GraphBLAS using the PLUS_PAIR semirings and GrB_mxv. It takes O(n + nvals(d)) time and memory if d is the resulting vector, in GraphBLAS v4.0.3, and will take O(nvals(d)) time and memory in v5.x once I add uniform-valued matrices and vectors. The other methods in Group 1 can be handled with different operators, if I added them or if they were done as user-defined operators. COUNT_NONZEROS in each row of A, for example, would be a PLUS_FIRSTISNONZ semiring with d=A*y and y is a dense vector, where

FIRSTISNONZ_type(x,y) = (type) ((x != 0) ? 1:0)

If A m-by-n is held by row and you want to reduce it to a vector d of length m, by 'summing' up each row, then it becomes a dot-product based GrB_mxv, which is O(nvals(d)) time for COUNT if done by GrB_reduce, and will be the same time in GrB_mxv in v5.x once I add uniform-valued matrices and vectors (so the dense vector y takes O(1) time and memory to build). If you want to reduce the columns, it becomes GrB_vxm, which is a saxpy-based method, and I use atomics to do the monoid, so it's still fast, efficient, and parallel, at O(nvals(d)) time. (when I say "time" I mean "work", not "time", to be precise; it would be O(nvals(d)/p) time on p threads, assuming no significant hash collisions).

SUM_OF_SQUARES is the PLUS_POW semiring with a dense vector y whose entries are all equal to 2. I already have the POW binary operator so PLUS_POW could easily be added as a built-in method in my Source/Generated folder so it would be very fast.

For COUNT_NONZERO, the time would be O(nvals(A)), as opposed to O(nvals(d)) for COUNT; both are asymptotically as fast as theoretically possible, and this will appear in v5.0.0 or v5.1.0. Uniform-valued matrices/vectors is 2nd on my TODO list, after I fix the MATLAB interface for R2021a (the MATLAB interface to v4.x and v5.x dies when using MATLAB R2021a, since R2021a includes its own v3.3.3 to do the built-in C=A*B, which conflicts with v4.x+).

None of the aggregators you suggest, except the basic ones that are already monoids, can be done with atomic updates; they are by design sequential, with each aggregator owned by a single thread. They would be quite slow if you have a CSR matrix A and you want to aggregate down the columns of A (I'm not sure how my hash+atomics method for the hypersparse case would extend to the aggregators; it would be quite difficult to do). If you had a vector of n aggregators, it would take O(n+nvals(d)) time and memory and would make for a very slow aggregator but a very fast method based on GrB_vxm (taking O(nvals(d)) time). If n is huge, this is a big difference. The implementation would need hashed table of dynamically-created aggregators ... Then I'd need to give each of these its own spin-lock/critical section, or enforce sequential access somehow. It's a very daunting prospect to code.

I see that more expressive reductions could be useful, but I hesitate to consider methods that have impose a fundamentally sequential computation into GraphBLAS, like updating a single aggregator. I think it would be better to start simple, and see how far we can take the existing monoid/semiring idea (such as COUNT and all other methods in Group 1, which I can almost already do asymptotically fast, and will do so in v5.x), and go from there.

from graphblas.

eriknw avatar eriknw commented on August 14, 2024

I think it would be better to start simple, and see how far we can take the existing monoid/semiring idea

I agree with this sentiment. I think there is value in having Aggregator objects, so I can begin by adding aggregators to grblas that will translate to the appropriate calls to SuiteSparse:GraphBLAS. I can do the work in anticipation of O(1) dense vectors. This should be informative.

I updated SUM_OF_SQUARES and HYPOT to use POW as you suggested (my bad!).

from graphblas.

jim22k avatar jim22k commented on August 14, 2024

@eriknw I really like this proposal. Abstracting the complexity away from the user so they can treat aggregations as easy as monoids for reduction is a win for readability. It also allows GraphBLAS to be a more complete solution for general purpose sparse linear algebra where these kinds of aggregators would be expected to exist (especially simple ones like mean, std, count, etc).

from graphblas.

DrTimothyAldenDavis avatar DrTimothyAldenDavis commented on August 14, 2024

from graphblas.

DrTimothyAldenDavis avatar DrTimothyAldenDavis commented on August 14, 2024

Here is the code with better format:


function [s,i] = argmax (A)
% argmax, using GraphBLAS
% does not handle matrices with nan's very well
% if column j has no entries, MATLAB returns s(j)=0 and i(j)=1, but
% this method returns s(j) and i(j) as empty entries.

% find the max in each column
s = max (A) ;

% locate where each max occurs in each column (there might be duplicates)
S = diag (s) ;
L = GrB.mxm (A, 'any.eq.double', S) ;

% drop the zeros in L
L = GrB.prune (L) ;

% find the position of the max entry in each column
[m,n] = size (A) ;
x = ones (1,n) ; % x = GrB.ones (1,n) in the next release
i = GrB.mxm (x, 'any.secondi1.int64', L)  ;

with the test code:

rng ('default') ;
A = GrB.random (10, 10, 0.2)
[s,i] = argmax (A)
[s2,i2] = max (double (A))

I'm using the secondi1 operator since MATLAB expects that indices start with 1, not zero. So for GraphBLAS proper, use the secondi not secondi1.

from graphblas.

DrTimothyAldenDavis avatar DrTimothyAldenDavis commented on August 14, 2024

The vector x is an "iso-valued" dense vector of all ones, and will take O(1) time and memory to create in v5.1, so this will work nicely for hypersparse matrices. It already works in my draft code. I can do:

n = 2^60
H (1:10,1:10) = A
[s,i] = argmax (H)

and get the same result, except that s and i are vectors of dimension 2^60 instead of 10, in this case with just 8 entries each.

from graphblas.

DrTimothyAldenDavis avatar DrTimothyAldenDavis commented on August 14, 2024

assuming I change the statment x=ones(1,n) to x=GrB.ones(1,n) that is.

from graphblas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.