Comments (7)
Some of the things you're looking for are already in the pipeline and will be available soon. In particular the COUNT will be done far faster in v5.x, but not using an aggregator.
The COUNT in Group 1 can already be done in the current GraphBLAS using the PLUS_PAIR semirings and GrB_mxv. It takes O(n + nvals(d)) time and memory if d is the resulting vector, in GraphBLAS v4.0.3, and will take O(nvals(d)) time and memory in v5.x once I add uniform-valued matrices and vectors. The other methods in Group 1 can be handled with different operators, if I added them or if they were done as user-defined operators. COUNT_NONZEROS in each row of A, for example, would be a PLUS_FIRSTISNONZ semiring with d=A*y and y is a dense vector, where
FIRSTISNONZ_type(x,y) = (type) ((x != 0) ? 1:0)
If A m-by-n is held by row and you want to reduce it to a vector d of length m, by 'summing' up each row, then it becomes a dot-product based GrB_mxv, which is O(nvals(d)) time for COUNT if done by GrB_reduce, and will be the same time in GrB_mxv in v5.x once I add uniform-valued matrices and vectors (so the dense vector y takes O(1) time and memory to build). If you want to reduce the columns, it becomes GrB_vxm, which is a saxpy-based method, and I use atomics to do the monoid, so it's still fast, efficient, and parallel, at O(nvals(d)) time. (when I say "time" I mean "work", not "time", to be precise; it would be O(nvals(d)/p) time on p threads, assuming no significant hash collisions).
SUM_OF_SQUARES is the PLUS_POW semiring with a dense vector y whose entries are all equal to 2. I already have the POW binary operator so PLUS_POW could easily be added as a built-in method in my Source/Generated folder so it would be very fast.
For COUNT_NONZERO, the time would be O(nvals(A)), as opposed to O(nvals(d)) for COUNT; both are asymptotically as fast as theoretically possible, and this will appear in v5.0.0 or v5.1.0. Uniform-valued matrices/vectors is 2nd on my TODO list, after I fix the MATLAB interface for R2021a (the MATLAB interface to v4.x and v5.x dies when using MATLAB R2021a, since R2021a includes its own v3.3.3 to do the built-in C=A*B, which conflicts with v4.x+).
None of the aggregators you suggest, except the basic ones that are already monoids, can be done with atomic updates; they are by design sequential, with each aggregator owned by a single thread. They would be quite slow if you have a CSR matrix A and you want to aggregate down the columns of A (I'm not sure how my hash+atomics method for the hypersparse case would extend to the aggregators; it would be quite difficult to do). If you had a vector of n aggregators, it would take O(n+nvals(d)) time and memory and would make for a very slow aggregator but a very fast method based on GrB_vxm (taking O(nvals(d)) time). If n is huge, this is a big difference. The implementation would need hashed table of dynamically-created aggregators ... Then I'd need to give each of these its own spin-lock/critical section, or enforce sequential access somehow. It's a very daunting prospect to code.
I see that more expressive reductions could be useful, but I hesitate to consider methods that have impose a fundamentally sequential computation into GraphBLAS, like updating a single aggregator. I think it would be better to start simple, and see how far we can take the existing monoid/semiring idea (such as COUNT and all other methods in Group 1, which I can almost already do asymptotically fast, and will do so in v5.x), and go from there.
from graphblas.
I think it would be better to start simple, and see how far we can take the existing monoid/semiring idea
I agree with this sentiment. I think there is value in having Aggregator objects, so I can begin by adding aggregators to grblas
that will translate to the appropriate calls to SuiteSparse:GraphBLAS. I can do the work in anticipation of O(1) dense vectors. This should be informative.
I updated SUM_OF_SQUARES and HYPOT to use POW as you suggested (my bad!).
from graphblas.
@eriknw I really like this proposal. Abstracting the complexity away from the user so they can treat aggregations as easy as monoids for reduction is a win for readability. It also allows GraphBLAS to be a more complete solution for general purpose sparse linear algebra where these kinds of aggregators would be expected to exist (especially simple ones like mean, std, count, etc).
from graphblas.
from graphblas.
Here is the code with better format:
function [s,i] = argmax (A)
% argmax, using GraphBLAS
% does not handle matrices with nan's very well
% if column j has no entries, MATLAB returns s(j)=0 and i(j)=1, but
% this method returns s(j) and i(j) as empty entries.
% find the max in each column
s = max (A) ;
% locate where each max occurs in each column (there might be duplicates)
S = diag (s) ;
L = GrB.mxm (A, 'any.eq.double', S) ;
% drop the zeros in L
L = GrB.prune (L) ;
% find the position of the max entry in each column
[m,n] = size (A) ;
x = ones (1,n) ; % x = GrB.ones (1,n) in the next release
i = GrB.mxm (x, 'any.secondi1.int64', L) ;
with the test code:
rng ('default') ;
A = GrB.random (10, 10, 0.2)
[s,i] = argmax (A)
[s2,i2] = max (double (A))
I'm using the secondi1 operator since MATLAB expects that indices start with 1, not zero. So for GraphBLAS proper, use the secondi not secondi1.
from graphblas.
The vector x is an "iso-valued" dense vector of all ones, and will take O(1) time and memory to create in v5.1, so this will work nicely for hypersparse matrices. It already works in my draft code. I can do:
n = 2^60
H (1:10,1:10) = A
[s,i] = argmax (H)
and get the same result, except that s and i are vectors of dimension 2^60 instead of 10, in this case with just 8 entries each.
from graphblas.
assuming I change the statment x=ones(1,n) to x=GrB.ones(1,n) that is.
from graphblas.
Related Issues (20)
- Consider adding COLEQ and ROWEQ IndexUnaryOp operators HOT 9
- New unary operators to calculate principal cube root of real (floating point) values HOT 5
- Matrix_extractElement_Structural HOT 6
- Pass a print function for UDTs HOT 4
- GrB_Descriptor_set is unable to set expected descriptor fields (does not match GxB_Desc_set) HOT 4
- Set name of UDT when serializing HOT 12
- Build broken with spaces in folder names HOT 8
- build fails on windows 11 using CMake with MinGW HOT 5
- ISEQ monoids HOT 7
- Fix bitwise operator monoid names HOT 4
- Remove va_arg HOT 5
- GrB_Vector_(de)serialize HOT 6
- Sparse Index Space HOT 2
- Removed symbols without soname bump HOT 12
- atomic*: Undefined symbol on armel and mipsel architectures HOT 33
- "ZEROB" Binary Operator HOT 3
- GxB sort with smaller (or larger) output objects
- cpu_features: Build error for MinGW HOT 13
- Size of Static Library HOT 16
- Link error with Intel igx and Ninja generator on Windows HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from graphblas.