drtimothyaldendavis / graphblas Goto Github PK

SuiteSparse:GraphBLAS: graph algorithms in the language of linear algebra. For production: (default) STABLE branch. Code development: ask me for the right branch before submitting a PR. video intro: https://youtu.be/Tj5y6d7FegI .

Home Page: http://faculty.cse.tamu.edu/davis/GraphBLAS.html

License: Other

CMake 0.31% Makefile 0.05% Shell 0.04% C 91.78% MATLAB 4.78% C++ 2.27% Cuda 0.43% Awk 0.01% HTML 0.30% Dockerfile 0.01% Assembly 0.03%

graphblas's People

Contributors

Stargazers

Watchers

graphblas's Issues

GrB_MINUS_INT8 interacts interestingly with implicit zero elements

Hi,

I wish to subtract the negative transpose of a matrix from itself as part of a max-flow algorithm.

I created a descriptor to transpose the second intput, and use element-wise addition with the GrB_MINUS_INT8 binary operator. As you can see below, instead of subtracting, the operation performs an addition, and indeed the result is the same if I replace GrB_MINUS_INT8 with GrB_PLUS_INT8. Is this the intended behaviour when GrB_MINUS_INT8 interacts with imlpicit zeros a matrix?

I can achieve the intended result if I perform three successive subtractions, but that feels a bit excessive.

Growing size of compiled shared library

Not sure this is something that is easy to do anything about, but the size trend of the shared library seems to be blowing up:

v3.3.3 => 71MB uncompressed, 31MB compressed
v4.0.0draft4 => 109MB uncompressed, 40MB compressed
v4.0.0draft5 => 448MB uncompressed, 135MB compressed

This matters when including prebuilt graphblas libraries in docker images (where the image size is important)

For reference, I've published the docker images here:
https://hub.docker.com/r/graphblas/graphblas

# Dockerfile/how the images are built:
FROM ubuntu:20.04 as builder
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -yq cmake make wget

ARG VERSION

WORKDIR /home/build
RUN wget https://github.com/DrTimothyAldenDavis/GraphBLAS/archive/v$VERSION.tar.gz
RUN tar -xf v$VERSION.tar.gz

WORKDIR /home/build/GraphBLAS-$VERSION
RUN make JOBS=8 && make install

FROM scratch
COPY --from=builder /usr/local/lib/ /usr/local/lib/
COPY --from=builder /usr/local/include/ /usr/local/include/

GxB for vector dot product?

How much trouble would it be for you to finally add this? I would use it and love it.

To work around the lack of this functionality, I make a Matrix with one column or row, do mxv with a semiring and a length 1 vector for the output, then extract the first element from the vector. This is pretty annoying. It's come up a few times for me.

I don't have a preference for the name. vxv, dot, inner, or really anything would work for me. Output types of GxB_Scalar or C scalar would work for me; GxB_Scalar is preferred/expected, because it makes everything so much easier. Is there really any question about what the signature or behavior would be?

Best Way to Construct GrB_Vector of all Trues but a Few Falses

I'm constructing a mask, and I have a list of GrB_Index that specifies which indexes are to be made false, but all the rest I want true (so I can probe them in the matrix).

I wanted to just use GrB_assign to set all the values to true, and then use GrB_Vector_build to feed in tuples of the indexes I want set to False... but obviously we aren't allowed to build on the GrB_Vector if it is nonempty.

So what's the most efficient way to accomplish this?

GrB_Info batch(GrB_Matrix A, GrB_Index src, GrB_Index *dests, GrB_Index destsCount)
{
    GrB_Index n;
    GrB_Matrix_nrows(&n, A);             

    GrB_Index *destsValues;
    destsValues = malloc(destsCount * sizeof(GrB_Index));
    memset(destsValues, 0, destsCount * sizeof(GrB_Index));

    GrB_Vector mask;                                
    GrB_Vector_new(&mask, GrB_BOOL, n);             
    GrB_assign(mask, NULL, GrB_NULL, true, GrB_ALL, n, GrB_NULL);
    GrB_Vector_build(mask, dests, destsValues, destsCount, GrB_LOR);

    // etc...
}

Feature idea: add `LXNOR` (the opposite of `LXOR`)

This is more of an idea/suggestion than a feature request. What do you think of adding LXNOR binary operator and monoid as a GxB?

Here's a use case: triangle count given the incidence matrix E and either the adjacency matrix A or the lower triangle of the adjacency matrix L.

A value and dtype-agnostic way to write triangle count is (in grblas syntax):

val1 = A.mxm(E, semiring.lxnor_pair[dtypes.BOOL]).new()
num_triangles = val1.reduce_scalar(monoid.plus[dtypes.UINT64]).value // 3

val1 = L.mxm(E, semiring.lxnor_pair[dtypes.BOOL]).new()
num_triangles = val1.reduce_scalar(monoid.plus[dtypes.UINT64]).value

In my limited benchmarking, it appears to be better to reduce val1 to a scalar than apply a mask based on the values of val1 then count the number of non-zero (and I'm eager to try using apply with a binary operator as per the 1.3 spec). Using LXOR_PAIR followed by LNOT isn't a terrible option either.

Of course, if one has L or A in these examples, then there are other ways to compute the total number of triangles.

GrB_MINUS_REMOVE

I've got a situation where I'm extracting vectors and then adding or subtracting them to another column or row using GxB_Col_subassign and GxB_Row_subassign with GrB_PLUS_UINT16 and GrB_MINUS_UINT16.

the problem is when an entry becomes 0 after subtracting and should be removed... currently the only way I see to do this would be iteratively checking every entry after each subtraction operation, which i'm very weary of given extreme outliers might arise in my application on the order of many thousands (hopefully millions) of entries. but maybe such concerns are moot given each value would have to iteratively touched inside the library anyway, and they aren't operated on in some SIMD-esque way?

in either case, ideally there would be a GrB_MINUS_REMOVE_UINT16 operator that removed any value that hit zero... regardless of iteration or not, only requiring the data to be visited and operated on once. but i'm not sure how beyond the pale that is currently?

i considered the idea of a GrB_Col_extract that let me filter for 0 values and then iterate over them, but I assume inside the library all the entries would still have to be visited iteratively to test their values equal 0?

maybe you have another idea of how to best accomplish this?

worst case i might decide to just leave them as zombie zeros since they would not interfere with the operation of my database, just simply consume excess memory.

unpack/pack functions sometimes switch ncols and nrows of the underlying Matrix

For example, if you unpack a Matrix via FullR and then pack it with FullC, then the shape of the Matrix is transposed. I confirmed this by printing nrows and ncols after the unpack function, and they are reversed afterwards.

I have observed this reversing behavior with the following combinations of unpack-then-pack functions:

unpack	pack
`GxB_Matrix_unpack_FullR`	`GxB_Matrix_pack_FullC`
`GxB_Matrix_unpack_FullC`	`GxB_Matrix_pack_FullR`
`GxB_Matrix_unpack_BitmapR`	`GxB_Matrix_pack_BitmapC`
`GxB_Matrix_unpack_BitmapC`	`GxB_Matrix_pack_BitmapR`

Perhaps this is behaving as intended (I didn't read the fine print), but, nevertheless, I find it surprising that a pack operation can change the shape of a Matrix.

Please don't deprecate GxB_EQ_BOOL_MONOID

It appears that you list this as being deprecated in your header file. I prefer the name GxB_EQ_BOOL_MONOID over GrB_LXNOR_MONOID_BOOL.

I see that you chose to have EQ_{EQ,NE,GT,LT,GE,LE,...} semirings instead of LXNOR_{...}. If you have semirings that use EQ for a monoid, then it's reasonable to expect to have EQ monoids as well.

Similarly, it appears that the GxB_EQ_LOR_BOOL semiring is deprecated in favor of GrB_LXNOR_LOR_SEMIRING_BOOL. It would be nice to keep the original name around.

If you choose not to deprecate these, please highlight this fact in the header file (perhaps you already have a blanket statement like "EQ is always preferred over LXNOR and won't be deprecated", but I may have missed this).

Reshape matrix

I would find it useful to be able to reshape a matrix while not modifying the values. The new shape must have the same number of elements as the old shape--i.e., nrows1 * ncols1 == nrows2 * ncols2. It would be sufficient for me to assume row-oriented layout.

For example (with dense data):

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

>>> a.reshape(5, 3)
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

The purpose of this is to rearrange what is effectively multidimensional data. For example, if we have axes coordinates [C, D] x [X, Y] x [A, B], reshaping allows us to go from:

    XA XB YA YB
C    1  2  3  4
D    5  6  7  8

    A  B
CX  1  2
CY  3  4
DX  5  6
DY  7  8

There are at least two ways we could generalize this to do more than reshape, which we can discuss if you're open to this functionality.

A nice thing about reshape is that for row-oriented storage that has sorted indices, the values don't need to be touched at all, and the indices can be processed and remain sorted. For bitmap and full formats, only the shape metadata needs to change.

Time Complexities of GrB_vxm and GrB_Matrix_extract

What are the time complexities of GrB_vxm and GrB_Matrix_extract?

When conducting a BFS, I'm curious if it's faster to extract a submatrix and then LOR across the rows, than it is to multiply.

Wrong argument to GrB_Matrix_error()?

Shouldn't it be GrB_Matrix not GrB_Vector?

https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/master/Include/GraphBLAS.h#L4428

How to introspect a Vector or Matrix to determine if it's iso-valued

Is there some way to ask a Vector or Matrix whether or not it's iso-valued? Thanks.

Matrix reduce to Vector with FIRST or SECOND

Per the spec, one can pass a BinaryOp to GrB_Matrix_reduce_BinaryOp to reduce the rows or columns of a Matrix to a Vector. SuiteSparse 4.0 changed so that only BinaryOps that correspond to known Monoids may be used. FIRST and SECOND do not have Monoids, therefore they cannot be used.

Nevertheless, I expected FIRST and SECOND to work in SuiteSparse. The intent is clear, and I suspect efficient implementations exist.

(Note: per the spec, the outcome of a non-commutative or non-associative binary operator is undefined. FIRST and SECOND are non-commutative--they commute to each other--so their behavior would also be undefined with the current spec. SuiteSparse often implies an ordering when applying BinaryOps and chooses to be well-defined).

Failed GxB_Vector_serialize/deserialize Boolean v5.2.2

Working with GraphBLAS with all default settings and attempting to build a JNI using a mixture of SWIG and Hard Coding. One of the functions I am using fails completely within the JNI code written in C. The GxB function fails when passed a Boolean Vector. The following snippet leads to a failed usage.

Important notes. This vector is dense, and sparse vectors seem to not suffer from the same issues. (Tested with a vector of length 1000 with 3 filled positions in multiple configurations). However, vectors with approximately >1% fill seem to fail.

  GrB_Vector vector = NULL;
  GrB_Vector_new(&vector, GrB_BOOL, 10);

  for (GrB_Index i = 0; i < 10; i++) {
    GrB_Vector_setElement_BOOL(vector, true, i); // Individually set all values to true.
  }

  GrB_Type type;
  GxB_Vector_type(&type, vector);

  GrB_Index size_handle = 0;
  void *blob = NULL;
  GxB_Vector_serialize(&blob, &size_handle, vector, NULL);

  GrB_Vector v = NULL;
  GxB_Vector_deserialize(&v, type, blob, size_handle, NULL);
  GrB_Index v_size;
  GrB_Vector_size(&v_size, v);

  for (GrB_Index i = 0; i < v_size; i++) {
    bool val;
    GrB_Vector_extractElement_BOOL(&val, v, i);
    printf("%s, \t", val?"true":"false");
  }
  printf("\n");

The Response I receive from the final print statement is

true, 	true, 	true, 	true, 	true, 	true, 	false, 	false, 	true, 	false,

This behavior is not replicated for any other types. int8_t through int64_t are all serialized and de-serialized correctly. In addition, I have tested GrB_Matrix_serialize and GrB_Matrix_deserialize and neither of them suffer the same issues regardless of sparsity.

New functions in v5.0.2 not externed with GB_PUBLIC

GxB_Matrix_concat, GxB_Matrix_diag, GxB_Matrix_split, GxB_Vector_diag were not externed with GB_PUBLIC. I suspect this was a simple oversight. Sorry for not checking this earlier!

Apple-Darwin: Undefined symbols for architecture arm64

This may be just a missing flag on my part. Cross compiling from Alpine Linux to aarch64-apple-darwin I get this command/error pair:

/opt/bin/aarch64-apple-darwin20-libgfortran5-cxx11/aarch64-apple-darwin20-clang --sysroot=/opt/aarch64-apple-darwin20/aarch64-apple-darwin20/sys-root  -Wno-pointer-sign  -O3 -DNDEBUG   -O3 -DNDEBUG -dynamiclib -Wl,-headerpad_max_install_names -compatibility_version 5.0.0 -current_version 5.0.2 -o libgraphblasdemo.5.0.2.dylib -install_name @rpath/libgraphblasdemo.5.dylib CMakeFiles/graphblasdemo.dir/Demo/Source/bfs5m.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/bfs5m_check.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/bfs6.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/bfs6_check.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/dpagerank.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/dpagerank2.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/drowscale.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/get_matrix.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/import_test.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/ipagerank.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/irowscale.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/isequal.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/mis.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/mis_check.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/mis_score.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/prand.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/random_matrix.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/read_matrix.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/simple_rand.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/simple_timer.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/tricount.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/usercomplex.c.o CMakeFiles/graphblasdemo.dir/Demo/Source/wathen.c.o  -Wl,-rpath,/workspace/srcdir/GraphBLAS/build -lm libgraphblas.5.0.2.dylib -lm

Undefined symbols for architecture arm64:
 "___divdc3", referenced from:
    _complex_div in usercomplex.c.o
    _complex_rdiv in usercomplex.c.o
    _complex_minv in usercomplex.c.o
 ld: symbol(s) not found for architecture arm64
 clang-11: error: linker command failed with exit code 1 (use -v to see invocation)
 make[3]: *** [CMakeFiles/graphblasdemo.dir/build.make:435: libgraphblasdemo.5.0.2.dylib] Error 1

Is there a way to avoid building the demo libraries altogether, perhaps in the next patch? I didn't see a flag for it in CMakeLists.txt. The other question is whether that symbol being undefined silently affects the complex types in the library?

Add GxB_diag - assign GrB_vector to k-th diagonal

As discussed on the forum adding this here for tracking.

GrB_Vector_size (&n, v) reports n = 5, and if k = 0, then GxB_diag (&A, v, 0) would create a 5-by-5 diagonal matrix.

If k = 1 or -1, then A would be 6-by-6. Here's what I think it should do, just like MATLAB, except it would always
produce a sparse matrix, not a full one like MATLAB.

If nvals(v) << n then I would create A as hypersparse. It would be crazy simple to write this GxB_diag function,
and it would be very simple, and O(nvals(v)) time.

>> v = rand (1,5)
v =
    0.8147    0.9058    0.1270    0.9134    0.6324
>> diag (v, 1)
ans =
         0    0.8147         0         0         0         0
         0         0    0.9058         0         0         0
         0         0         0    0.1270         0         0
         0         0         0         0    0.9134         0
         0         0         0         0         0    0.6324
         0         0         0         0         0         0
>> size(ans)
ans =
     6     6
>> diag (v, -2)
ans =
         0         0         0         0         0         0         0
         0         0         0         0         0         0         0
    0.8147         0         0         0         0         0         0
         0    0.9058         0         0         0         0         0
         0         0    0.1270         0         0         0         0
         0         0         0    0.9134         0         0         0
         0         0         0         0    0.6324         0         0

[Bug] GrB_apply one

I've not tested yet in C but in both Rust and Java when running GrB_Matrix_apply with unaryOp one I get 0 instead of 1
Steps to reproduce:

fn test_matrix_apply_transpose_on_self_fail_java() {
        let mut mat1 = SparseMatrix::<i32>::empty((1, 3));
        mat1.insert((0, 0), -3); // GrB_Matrix_setElement
        mat1.insert((0, 2), -4);
        mat1.insert((0, 1), 10);

        mat1.wait();

        // GrB_Marix_apply
        let mut mat2 = mat1.apply_mut::<i32, bool>(
            UnaryOp::<i32, i32>::one(), //GxB_ONE_INT32
            None,
            None,
            &Descriptor::default(),
        ); 

        mat2.wait();

        assert_eq!(mat2.shape(), (1, 3));
        assert_eq!(mat2.get((0, 0)), Some(1)); //this fails
        assert_eq!(mat2.get((0, 2)), Some(1)); //this fails
        assert_eq!(mat2.get((0, 1)), Some(1)); //this fails
    }

findmax and findmin built-ins

I'd like to request the addition of two(+) binary operators/monoids, findmax and findmin which act like min and max but return the position of the max and min values. While argmin/argmax exist in the MATLAB library, this is slightly different.

My use case is something like FINDMAXJ_PLUS. The typical MAX_PLUS semiring results in C_ij = max_k(A_ik + B_kj). FINDMAXJ_PLUS would return set C_ij to be the k which is provided the result of max_k (hopefully that's clear).

This should be done for MIN and I (FINDMAXI) as well.

As far as I can tell this cannot be constructed as a user defined operator as of now. Do internal positional binary ops have access to the values as well? Or is there a different set of algorithms for positional and value based which makes this difficult?

How to use GraphBLAS on CUDA?

Hi,

I have install GraphBLAS on a cuda enabled machine and I can run the demos in Demo successfully on cpu. But How to use the library on cuda? Whether it has independent API w.r.t. cpu?

Best,
Kangfei

Descriptor for GxB_Matrix_sort

In section 6.13.2 of the User Guide for GxB_Matrix_sort, you mention to use the descriptor GrB_DESC_IN0. A descriptor of this name does not exist in GraphBLAS.h. Do you mean GrB_DESC_T0?

GrB_Vector_assign_BOOL results in Memory Error for Large Vectors

I am having an issue with boolean assign and perhaps my usage of ISO valued vectors. I create an ISO valued vector ( GrB_Vector_assign_BOOL(v, NULL, NULL, true, GrB_ALL, size, NULL);) Then the presented output is iso status = true, size_t = 264. This is expected behavior for the vector. However, once I assign the opposite of the default value (false) to one position (GrB_Vector_assign_BOOL(v, NULL, NULL, false, indices, 1, NULL);) the code returns an error (GrB_OUT_OF_MEMORY) and seems to dump the vector with output iso status = false, size_t = 0.

    GrB_Vector v = NULL;
    GrB_Index n = GxB_INDEX_MAX;
    GrB_Vector_new(&v, GrB_BOOL, n);
    GrB_Index size = NULL;
    GrB_Vector_size(&size, v);
    GrB_Vector_assign_BOOL(v, NULL, NULL, true, GrB_ALL, size, NULL);

    {
        bool iso = NULL;
        size_t memsize = NULL;
        GxB_Vector_iso(&iso, v);
        GxB_Vector_memoryUsage(&memsize, v);
        printf("iso status = %s,\t size_t = %zu\n", iso?"true":"false", memsize);
    }

    GrB_Index indices[1] = {2};
    GrB_Vector_assign_BOOL(v, NULL, NULL, false, indices, 1, NULL);

    {
        bool iso = NULL;
        size_t memsize = NULL;
        GxB_Vector_iso(&iso, v);
        GxB_Vector_memoryUsage(&memsize, v);
        printf("iso status = %s,\t size_t = %zu\n", iso?"true":"false", memsize);
    }

This issue may be caused by a misunderstanding of the nature of the ISO valued vector. Does the ISO value swap the underlying "default" value or does it only work to claim that the entire vector is now equal to one value? Given the former interpretation, is the result garnered an error or expected behavior? Given the latter interpretation, is there some way to manipulate the vector so that it stores the significantly smaller cardinality false positions rather than the extremely dense true positions. Thank you for your help!

Masking groups of rows/cols in a matrix

Hello,
I am quite new with GraphBlas so I may have missed something but I could not find an easy way to remove/mask groups of rows and/or columns from a matrix. Formally, I got a sparse vector, says U (containing integers but actually all equal to 1), and I want to remove rows or columns of a matrix A where U has no entry. In Matlab, something like : A.*U or U.*A woul work well. In GraphBlas, I am currently going with
-Creating a matrix Du = diag(u)
-Compute A*Du or Du*A
Is there another, more efficient way for performing this ?
Thanks

SIMD + GraphBLAS

just out of pure curiosity, do you have any plans to vectorize GraphBLAS? (I saw at least GB_DOT_SIMD but couldn't locate any usages).

on that topic, avx512 will be closer to penalty-less come Ice Lake.

and of course there are inbound matrix instructions as well. (who knows the chip wide downscaling penalties yet though).

^^ from this broadcast: https://sc20.gallery.video/detail/videos/sc20/video/6208439527001/a-closer-look-at-ice-lake-architecture-overview

Compile with cosmopolitan libc

https://github.com/jart/cosmopolitan

"Cosmopolitan Libc makes C a build-once run-anywhere language, like Java, except it doesn't need an interpreter or virtual machine. Instead, it reconfigures stock GCC and Clang to output a POSIX-approved polyglot format that runs natively on Linux + Mac + Windows + FreeBSD + OpenBSD + NetBSD + BIOS with the best possible performance and the tiniest footprint imaginable."

Could be really interesting to try building GraphBLAS/LAGraph using cosmopolitan libc instead of glibc.

OpenMP issue in Windows (doesn't like `int64_t` in for loop initialization)

We are trying to build 6.1.3 and latest master branch on Windows and encountered the following OpenMP error:

[ 28%] Building C object CMakeFiles/graphblas.dir/Source/Generated2/GB_AxB__bxnor_band_uint16.c.obj
GB_AxB__bxnor_band_uint16.c
d:\bld\graphblas_1641574797935\work\source\template\GB_AxB_dot4_template.c(592): error C3015: initialization in OpenMP 'for' statement has improper form
d:\bld\graphblas_1641574797935\work\source\template\GB_AxB_dot4_template.c(650): error C3015: initialization in OpenMP 'for' statement has improper form
d:\bld\graphblas_1641574797935\work\source\template\GB_AxB_dot4_template.c(712): error C3015: initialization in OpenMP 'for' statement has improper form
NMAKE : fatal error U1077: 'D:\bld\graphblas_1641574797935\_build_env\Library\bin\cmake.exe' : return code '0x2'
Stop.
NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\nmake.exe"' : return code '0x2'
Stop.
NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\nmake.exe"' : return code '0x2'
Stop.

as can be seen here:
https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=437194&view=logs&j=a70f640f-cc53-5cd3-6cdc-236a1aa90802&t=f5d15007-a01c-5ad8-c9ce-4d519d3b275f&l=1985

I was able to fix it by removing int64_t from the for loop initialization (for example, changing for (int64_t k = 0 ; k < vlen ; k++) to for (k = 0 ; k < vlen ; k++). I used the patch below:

diff --git a/Source/Template/GB_AxB_dot4_template.c b/Source/Template/GB_AxB_dot4_template.c
index 5ab19ce822..61f5bc32ee 100644
--- a/Source/Template/GB_AxB_dot4_template.c
+++ b/Source/Template/GB_AxB_dot4_template.c
@@ -588,14 +588,17 @@
                     //----------------------------------------------------------

                     GB_BTYPE *restrict G = W ;
-                    #pragma omp parallel for num_threads(nthreads) \
-                        schedule(static)
-                    for (int64_t k = 0 ; k < vlen ; k++)
                     {
-                        // G (k,0:1) = B (k,j1:j1+1)
-                        const int64_t k2 = k << 1 ;
-                        G [k2    ] = Bx [k + (j1    ) * vlen] ;
-                        G [k2 + 1] = Bx [k + (j1 + 1) * vlen] ;
+                        int64_t k;
+                        #pragma omp parallel for num_threads(nthreads) \
+                            schedule(static)
+                        for (k = 0 ; k < vlen ; k++)
+                        {
+                            // G (k,0:1) = B (k,j1:j1+1)
+                            const int64_t k2 = k << 1 ;
+                            G [k2    ] = Bx [k + (j1    ) * vlen] ;
+                            G [k2 + 1] = Bx [k + (j1 + 1) * vlen] ;
+                        }
                     }

                     //----------------------------------------------------------
@@ -646,15 +649,18 @@
                     //----------------------------------------------------------

                     GB_BTYPE *restrict G = W ;
-                    #pragma omp parallel for num_threads(nthreads) \
-                        schedule(static)
-                    for (int64_t k = 0 ; k < vlen ; k++)
                     {
-                        // G (k,0:2) = B (k,j1:j1+2)
-                        const int64_t k3 = k * 3 ;
-                        G [k3    ] = Bx [k + (j1    ) * vlen] ;
-                        G [k3 + 1] = Bx [k + (j1 + 1) * vlen] ;
-                        G [k3 + 2] = Bx [k + (j1 + 2) * vlen] ;
+                        int64_t k;
+                        #pragma omp parallel for num_threads(nthreads) \
+                            schedule(static)
+                        for (k = 0 ; k < vlen ; k++)
+                        {
+                            // G (k,0:2) = B (k,j1:j1+2)
+                            const int64_t k3 = k * 3 ;
+                            G [k3    ] = Bx [k + (j1    ) * vlen] ;
+                            G [k3 + 1] = Bx [k + (j1 + 1) * vlen] ;
+                            G [k3 + 2] = Bx [k + (j1 + 2) * vlen] ;
+                        }
                     }

                     //----------------------------------------------------------
@@ -708,16 +714,19 @@
                     //----------------------------------------------------------

                     GB_BTYPE *restrict G = W ;
-                    #pragma omp parallel for num_threads(nthreads) \
-                        schedule(static)
-                    for (int64_t k = 0 ; k < vlen ; k++)
                     {
-                        // G (k,0:3) = B (k,j1:j1+3)
-                        const int64_t k4 = k << 2 ;
-                        G [k4    ] = Bx [k + (j1    ) * vlen] ;
-                        G [k4 + 1] = Bx [k + (j1 + 1) * vlen] ;
-                        G [k4 + 2] = Bx [k + (j1 + 2) * vlen] ;
-                        G [k4 + 3] = Bx [k + (j1 + 3) * vlen] ;
+                        int64_t k;
+                        #pragma omp parallel for num_threads(nthreads) \
+                            schedule(static)
+                        for (k = 0 ; k < vlen ; k++)
+                        {
+                            // G (k,0:3) = B (k,j1:j1+3)
+                            const int64_t k4 = k << 2 ;
+                            G [k4    ] = Bx [k + (j1    ) * vlen] ;
+                            G [k4 + 1] = Bx [k + (j1 + 1) * vlen] ;
+                            G [k4 + 2] = Bx [k + (j1 + 2) * vlen] ;
+                            G [k4 + 3] = Bx [k + (j1 + 3) * vlen] ;
+                        }
                     }

                     //----------------------------------------------------------

After applying the above fix, we still get an error when trying to build with commit: https://github.com/DrTimothyAldenDavis/GraphBLAS/commit/17f431150924edacfc70d08640f239893300dbe1 with cpu_features disabled.

The new error is:

[ 54%] Building C object CMakeFiles/graphblas.dir/Source/Generated2/GB_AxB__plus_times_fp32.c.obj
GB_AxB__plus_times_fp32.c
%SRC_DIR%\Source\Generated2\GB_AxB__plus_times_fp32.c(447): error C2485: 'target': unrecognized extended attribute
%SRC_DIR%\Source\Generated2\GB_AxB__plus_times_fp32.c(447): error C2059: syntax error: ')'
%SRC_DIR%\Source\Generated2\GB_AxB__plus_times_fp32.c(447): error C2059: syntax error: 'type'
%SRC_DIR%\Source\Generated2\GB_AxB__plus_times_fp32.c(485): error C2485: 'target': unrecognized extended attribute
%SRC_DIR%\Source\Generated2\GB_AxB__plus_times_fp32.c(485): error C2059: syntax error: ')'
%SRC_DIR%\Source\Generated2\GB_AxB__plus_times_fp32.c(485): error C2059: syntax error: 'type'
%SRC_DIR%\Source\Template\GB_AxB_saxpy5_meta.c(118): warning C4013: 'GB_AxB_saxpy5_unrolled_avx512f' undefined; assuming extern returning int
%SRC_DIR%\Source\Template\GB_AxB_saxpy5_meta.c(129): warning C4013: 'GB_AxB_saxpy5_unrolled_avx2' undefined; assuming extern returning int
NMAKE : fatal error U1077: 'D:\bld\graphblas_1641583385327\_build_env\Library\bin\cmake.exe' : return code '0x2'
Stop.
NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\nmake.exe"' : return code '0x2'
Stop.
NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\nmake.exe"' : return code '0x2'
Stop.

as seen here:
https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=437290&view=logs&j=a70f640f-cc53-5cd3-6cdc-236a1aa90802&t=f5d15007-a01c-5ad8-c9ce-4d519d3b275f&l=3393

We added -D GBNCPUFEAT=1 to the cmake command:
https://github.com/conda-forge/graphblas-feedstock/blob/7e60db768b0116b7c9cfc5c5baded573c63b24b8/recipe/bld.bat

Apply with identity and mask doesn't preserve iso-valued-ness

For example:

  GrB_Matrix_new(&M_0, GrB_INT64, 4, 4);
  GrB_Matrix_new(&M_1, GrB_INT64, 4, 4);
  GrB_Matrix_assign_INT64(M_0, NULL, NULL, 1, GrB_ALL, 4, (GrB_Index[]){0, 1}, 2, NULL);
  GrB_Matrix_assign_INT64(M_1, NULL, NULL, 1, GrB_ALL, 4, (GrB_Index[]){1, 2}, 2, NULL);
  GrB_Matrix_new(&M_2, GrB_INT64, 4, 4);
  GrB_Matrix_apply(M_2, M_1, NULL, GrB_IDENTITY_INT64, M_0, GrB_DESC_S);

M_0 and M_1 are both iso-valued with the same datatype and value of 1. I expect M_2 to be iso-valued, but it is not.

For this example, I can use assign instead of apply, which would preserve iso-valued-ness.

How to interpret larger than expected arrays in import/export

In version 4, the documentation says the size of arrays may be larger than expected. For example, in export CSR, it says Aj_size >= nvals(A). Similarly, for FullR, Ax_size >= nrows*ncols. If we're given an array larger than is expected, do we simply use the first n elements where n is the size we expect? The documentation for this is unclear to me. I understand the rationale of transferring control in O(1) time and memory if possible, which naturally exposes some underlying implementation details.

Similarly, are there any advantages to passing a larger-than-necessary array to import?

PLUS_POW semirings

I am creating aggregators in grblas (because they are oh-so-nice to use!) using straightforward recipes in anticipation of SuiteSparse 5 with O(1) dense vectors. I currently have 10 aggregators that use PLUS_POW semirings.

Would you consider adding PLUS_POW semirings? Am I correct in thinking that they can be faster if you include them? Fwiw, it's not difficult for me to create the semirings from plus monoid and pow binaryop.

Bug with `GrB_{Matrix,Vector}_assign`? 3.3.0 and 3.2.2

Best just to show. It's possible I'm not doing things quite right:

int main (void)
{
    GrB_Vector w, u ;
    GrB_Info info ;
    OK (GrB_init (GrB_NONBLOCKING)) ;
    OK (GrB_Vector_new (&w, GrB_INT64, 1)) ;
    OK (GrB_Vector_setElement_INT64 (w, 1, 0)) ;
    OK (GxB_print (w, 3)) ;

    OK (GrB_Vector_new (&u, GrB_FP64, 1)) ;
    OK (GrB_Vector_assign (u, NULL, NULL, w, GrB_ALL, 1, NULL)) ;
    OK (GxB_print (u, 3)) ;

    OK (GrB_Vector_setElement_FP64(u, 98.5, 0)) ;
    OK (GxB_print (u, 3)) ;

    OK (GrB_finalize ( )) ;
}

w Looks good so far:

  1x1 GraphBLAS int64_t vector, sparse by col:
  w, 1 entry

    (0,0)   1

u after assign. Wait, why is u an int64_t now?!

  1x1 GraphBLAS int64_t vector, sparse by col:
  u, 1 entry

    (0,0)   1

u after set element. Indeed, I can't assign a float to u:

  1x1 GraphBLAS int64_t vector, sparse by col:
  u, 1 entry

    (0,0)   98

Now for another example:

int main (void)
{
    GrB_Vector w, u ;
    GrB_Info info ;
    OK (GrB_init (GrB_NONBLOCKING)) ;
    OK (GrB_Vector_new (&w, GrB_INT64, 1)) ;
    OK (GrB_Vector_setElement_INT64 (w, 1, 0)) ;
    OK (GxB_print (w, 3)) ;

    OK (GrB_Vector_new (&u, GrB_FP64, 1)) ;
    OK (GrB_Vector_setElement_FP64(u, 42.5, 0)) ;   // NEW
    OK (GxB_print (u, 3)) ;                         // NEW

    OK (GrB_Vector_assign (u, NULL, NULL, w, GrB_ALL, 1, NULL)) ;
    OK (GxB_print (u, 3)) ;

    OK (GrB_Vector_setElement_FP64(u, 98.5, 0)) ;
    OK (GxB_print (u, 3)) ;

    OK (GrB_finalize ( )) ;
}

This outputs:

  1x1 GraphBLAS int64_t vector, sparse by col:
  w, 1 entry

    (0,0)   1


  1x1 GraphBLAS double vector, sparse by col:
  u, 1 entry

    (0,0)    42.5


  1x1 GraphBLAS double vector, sparse by col:
  u, 1 entry

    (0,0)    4.94066e-324


  1x1 GraphBLAS double vector, sparse by col:
  u, 1 entry

    (0,0)    98.5

So, u stays a double, but now the assignment into it from w gives bizarre results.

I tested this on my personal build on OS X with versions 3.2.2 and 3.3.0.

xref: python-graphblas/python-graphblas#25

vxm does not return

Unexpected behavior

When GrB_vxm is called with empty w, u, A, and non-empty mask, GrB_vxm never returns.

Example code

#include "GraphBLAS.h"

int main(int argc, char *argv[]) {
    GrB_init(GrB_BLOCKING);

    size_t size = 100;

    GrB_Vector w;
    GrB_Vector_new(&w, GrB_BOOL, size);

    GrB_Vector u;
    GrB_Vector_new(&u, GrB_BOOL, size);

    GrB_Vector mask;
    GrB_Vector_new(&mask, GrB_BOOL, size);
    for (int i = 0; i < 5; i++) {
        GrB_Vector_setElement_BOOL(mask, true, i);
    }

    GrB_Matrix A;
    GrB_Matrix_new(&A, GrB_BOOL, size, size);

    GrB_vxm(w, mask, GrB_NULL, GxB_LOR_LAND_BOOL, u, A, GrB_NULL);

    GrB_finalize();
}

Detailed Description

If the inputs u and A are empty, the hash table used in loop GB_AxB_saxpy3_symbolic.c#L173 will have space for 4 elements.
If the mask has more than 4 elements, loop GB_AxB_saxpy3_symbolic.c#L167 will not be able to find space for all elements, and it will loop forever.

GRB_VERSION / GrB_SUBVERSION

On page 45 of "GraphBLAS_API_C_v13.pdf", I see:

#define GRB_VERSION     1
#define GrB_SUBVERSION  3

I am surprised by the use of GRB (all caps), and also the inconsistency (GRB and GrB). Do you know whether these should both be GRB (as you define them in your header file), or GrB? I know this is more of a GraphBLAS question, but I wasn't sure where else to ask.

Failing test (gbtest76) on Apple Silicon

I made some progress on porting the library to Apple Silicon (#84, #86). This issue documents how to build Octave v7 on macOS and run the test. Finally, it contains the output of a single failing test.

Building Octave

Grab the brew formula:

wget https://raw.githubusercontent.com/Homebrew/homebrew-core/master/Formula/octave.rb

Edit octave.rb
1. Add "disable-docs" to args (or ensure that you have a working texinfo installation).
2. Edit Mercurial (hg) repository: switch from the default branch (containing code for Octave v8.0) to stable (v7.0).
Run
```
brew install --head ./octave.rb
```
This takes about 10 minutes.

Building the tests (gbmake)

Grab the OpenMP binaries as described at https://mac.r-project.org/openmp/:

curl -O https://mac.r-project.org/openmp/openmp-13.0.0-darwin21-Release.tar.gz
sudo tar fvxz openmp-13.0.0-darwin21-Release.tar.gz -C /

Run

sed -i.bkp 's/-fopenmp/-Xclang -fopenmp/g' @GrB/private/gbmake.m

Run octave and follow the instructions in GraphBLAS/README.md, including gbmake:
```
cd @GrB/private
gbmake
```

Running the tests

Run the tests:

cd test
gbtest

Currently, all tests pass except one.

Failing test: gbtest76

gbtest76 fails on macOS / Apple Silicon. (I tried running it in both Octave 7 and 8 on macOS. It passes on Linux with Octave 8. I did not try it on Linux with Octave 7.)

The output of gbtest76 is the following:

> gbtest76

gbtest76: testing trig and special functions
single
double
single complex
double complex
error: assert (err == 0) failed
error: called from
    assert at line 107 column 11
    gbtest76>gbtest76b at line 432 column 5
    gbtest76 at line 47 column 13

Upon closer inspection, the only test case that fails is the following:

case { 'double complex' }
    A = rand (4) + 1i* rand (4) ;
    B = rand (4) + 1i* rand (4) ;
    tol = 1e-10 ;
    G = GrB (A) ;
    H = GrB (B) ;
    gbtest76b (A, B, G, H, tol) ;

I printed the output before the assertion in line 432:

C1 = A.^1 ;
C2 = G.^1 ;
err = norm (C1-C2, 1) ;
display (err)
assert (err == 0) ;

It indeed has a non-zero result:

double complex
err = 2.7497e-16

Extract matrix values (edges) used in a VxM

Is there any easy way to produce a matrix consisting only of the values interacted with in a VxM operation?

Specifically, I am using one step of a breadth-first search-like procedure to produce an s-t cut in the graph. I know that the edges traversed in this VxM corresponds to the cut I need, but I am unsure how to extract them.

The VxM produces the full set of vertices on the "t-side" of the cut, and by searching one step "backwards" again (VxM with using the t-set and the tranposed graph) I could produce all the parent-vertices. By taking all the edges then implicitly stored in the two sets of vertices, I would likely get most but not all of the cut.

I could also utilize these two sets of vertices to find the edges manuallly using matrix_extract, but that would lead to the kind of run-time I am trying to avoid.

I would appreciate any help, or an indication that this kind of procedure isn't really supported in GraphBLAS.

Advocating for Aggregator objects to perform reductions

I think there is a strong case to be made to add Aggregator objects to GraphBLAS to perform reductions. In software, it's powerful to give names to common operations, or, as is the case for aggregations, to reuse names users are already familiar with. This hides implementation details behind a cleaner API that better captures what the user wants to do instead of how to do it. I expect this would result in cleaner, more portable, and more performant code.

Such a change to a GraphBLAS library (or the spec) should, of course, be done with care. Generic user-defined aggregations are complex, and systems that implement them are often too simple and lack functionality, performance, or both. Expressive, performant user-defined aggregations are necessarily complicated to support (imho), and we shouldn't lose sight of this. Having Aggregator objects allows a library to incrementally support more and more advanced aggregations (if desired), the details of which are hidden behind a nice API and opaque types. But one shouldn't start advanced. Let's begin simple.

I think SuiteSparse:GraphBLAS can actually get really far by reusing its existing machinery. Most aggregations people care about do use monoids during a reduction step, which can be done with your O(1) dense vectors and suitable semirings. Many aggregations should be "simple" recipes for SuiteSparse (I quote "simple" so as to not make light of it--SuiteSparse is anything but simple!).

To help myself think through this, I have assembled a spreadsheet that describes how some common aggregations (and a few uncommon ones) may be performed in GraphBLAS. It's incomplete, but I think it's fairly illustrative:

https://docs.google.com/spreadsheets/d/1dQNrxLBVq8Y3JOdvNnIXRPZsPTDaf7lydztiG7Gbx8E/edit?usp=sharing

I split the aggregations into six groups of increasing complexity or likely difficultness to implement. The 0th group (not included) are the aggregations that are already well-handled by reductions with monoids, such as SUM, PROD, ANY, ALL, MIN, MAX, etc. These could, of course, be rolled into Aggregators.

Group 1

COUNT
COUNT_NONZERO
COUNT_ZERO
SUMOFSQUARES

// Group 1 (blue)
struct Aggregator {
    // May define init_value, init_unaryop, or neither
    // If none are defined, then an O(1) dense vector with irrelevant data can be used in a matrix-vector multiply.
    TYPE *init value,  // Optional; e.g., the values of an O(1) dense vector.
    UnaryOp *init_unaryop,  // Optional; e.g., is applied to the input matrix at the beginning.
    BinaryOp *update_binaryop,  // Optional; e.g., the BinaryOp in a matrix-vector multiply.
                                // If not provided, then `first` may be assumed such as `plus_first(A @ dense)`
    Monoid merge_monoid  // Required; e.g., the Monoid in a matrix-vector multiply.
                         // If only this is provided, then it is equivalent to existing reduce with Monoid.
};

Group 2

HYPOT
LOGADDEXP
LOGADDEXP2

// Group 2 (green)
// Add finalize.
struct Aggregator {
    TYPE *init value,
    UnaryOp *init_unaryop,
    BinaryOp *update_binaryop,
    Monoid merge_monoid,
    UnaryOp *finalize  // Optional; is applied to the output at the very end.
};

Group 3

MEAN
VARP
VARS
STDEVP
STDEVS
GEOMETRIC_MEAN
HARMONIC_MEAN
PTP
SKEW (not included)
KURTOSIS (not included)

// Group 3 (yello)
// Same fields as group 2, but requires tuple types or composite aggregations.

Group 4

ARGMIN
ARGMAX

// Group 4 (orange)
// Add init_indexunaryop.
struct Aggregator {
    // May define init_value, init_unaryop, init_indexunaryop and init_value, or none of these.
    TYPE *init value,
    UnaryOp *init_unaryop,
    IndexUnaryOp *init_indexunaryop,  // Optional; for indices; uses `init_value` as thunk value.
    BinaryOp *update_binaryop,
    Monoid merge_monoid,
    UnaryOp *finalize
};

Group 5

FIRST
LAST
FIRSTI
LASTI
FIRST_NONZERO
IS_MONOTONIC_DECREASING

// Group 5 (red)
// Add merge_binaryop and terminate_unaryop, and don't use update_binaryop.
// Exactly one of merge_binaryop or merge_monoid must be defined.
struct Aggregator {
    TYPE *init value,
    UnaryOp *init_unaryop,
    IndexUnaryOp *init_indexunaryop,
    BinaryOp *update_binaryop,
    BinaryOp *merge_binaryop,  // elements must be processed in order,
                               // but can still be decomposed into groups and done in parallel
    Monoid *merge_monoid,  // this is now optional
    UnaryOp *terminate_unaryop,  // call on each aggregation value to check whether we can terminate
    UnaryOp *finalize
};

Group 6

LASTI (alternative)
IS_MONOTONIC_DECREASING (alternative)

// Group 6 (brown)
// Add update_indexunaryop, and allow use of update_binaryop.
struct Aggregator {
    // init options
    TYPE *init value,
    UnaryOp *init_unaryop,
    IndexUnaryOp *init_indexunaryop,
    // update options
    BinaryOp *update_binaryop,
    IndexUnaryOp* update_indexunaryop,  // use previous agg value as thunk value
    // merge options
    BinaryOp *merge_binaryop,
    Monoid *merge_monoid,
    // final options
    UnaryOp *terminate_unaryop,
    UnaryOp *finalize
};

In group 5, we used init_unaryop or init_indexunaryop with merge_binaryop, but didn't define any update options. In this case, the init operation is applied to every element, and the elements are merged together (while maintaining order) with merge_binaryop.

In group 6, we use an init operation, an update operation, and merge_binaryop. This means if we split the elements being aggregated into sublists (for parallel processing), then the init operation is only called on the first element in each sublist, and the update operation is used for all the rest.

More behaviors and fields could be added. For example, if an update operation is specified without specifying a merge operation, then this could mean the elements must be processed in order, serially (i.e., no within-group parallelism!). For another example, terminate_value could be added to serve the same purpose as terminate_unaryop.

Known shortcomings:

Parametrized aggregations require user-defined functions with the parameters "baked in".
- Examples:
  - FREQUENCY(val)
  - MOMENT(n)
  - NTH(n)
  - POWERMEAN(p)
  - STDEV(ddof)
  - VAR(ddof)
  - ZSCORE(val)
  - PERCENT_GT(val), etc
  - MEAN_ABSOLUTE_DIFFERENCE(x) (abs(x1 - x) + abs(x2 - x) + ...)
It's unclear how an aggregation on a non-empty set can return NO_VALUE.
- For example, ONLY could return the first element if and only if there is one element.
- Also, should sample variance (VARS) return inf, nan, or NO_VALUE if a single value in group? (I say nan)
Aggregations such as MIN_BY are challenging.
- BY could be a UnaryOp or a Vector or Matrix the same shape as the input.
- Examples:
  - Return the complex number with the smallest imaginary component.
  - ARGMAX (probably in some way)
Aggregations such as weighted average aren't directly supported.
Aggregations or scans with non-scalar results (such as histograms or cumsum) aren't directly supported.
The following reductions aren't supported (yet):
- MEDIAN
- MODE
- PERCENTILES
- MAD or MEAN_ABSOLUTE_DEVIATION (around mean, median, mode, etc)
- NUNIQUE or COUNT_UNIQUE
ARGMIN and ARGMAX don't naturally generalize to Matrix to Scalar reduction.
- I think the natural generalization would be to return both I and J indices of the min or max value.
User-defined aggregators could become complicated and error-prone.
It's not obvious to me where to draw the line for how to create and support user-defined aggregators.
- Since groups 5 and 6 don't use monoids, these are most likely in the "not yet" category for SuiteSparse.

I'll be honest, the thing I like most about this proposal is COUNT. I really want a canonical, portable way to calculate row degrees and column degrees 😃 .

Please let me know if anything needs more explaining (or corrected). I intend this issue to be as much for the GraphBLAS committee and community as for SuiteSparse.

Cheers! 🍻

Export/unpack without values

Sometimes when I export/unpack an object, I only want the indices. It would be nice to be able to pass NULL in for the values array. I believe you already support this behavior in extract tuples.

Sometimes exporting in a given format requires SuiteSparse to perform work (such as transposing or sorting). If we know the values aren't needed, then maybe less work could be done.

I don't expect this to be optimized any time soon, but I suspect it may be optimized at some point in the future. If I could write code that passes NULL in for the values today, then I could benefit from any future optimization.

I suspect passing in NULL for any array in export/unpack would be super-tedious work. I'm not asking for this. Only values.

Alternatively, is there some trick that can be done that effectively does this efficiently? Perhaps one could create a new iso-Matrix with the same structure as the non-iso-Matrix, then export the new iso-Matrix.

Typecast mystery: value-dependent coercion weirdness with accumulate?

Using SuiteSparse:GraphBLAS 3.2.2.

I apologize for not having these examples in C (GraphBLAS is still building...). Here's an example using grblas Python API. I will explain some hidden details:

w = gb.Vector.from_values([0, 1], [0, -20], dtype='INT64')
u = gb.Vector.from_values([0, 1], [1, 2], dtype='FP64')
w(accum=gb.binary.plus) << u

which results in

                    0                    1
  4886396799603965952  4886405595696988140

As you can probably guess, u and w are length-two vectors. u is FP64 dtype with values 1 and 2. w is INT64 dtype with values 0 and -20.

Under the hood, this is calling something like:

GrB_apply(
    w,
    GrB_NULL,           // no mask
    GrB_PLUS_INT64,     // accum; the dtype is very important here!
    GrB_IDENTITY_FP64,  // unary op
    u,
    GrB_NULL            // no descriptors
)

Surprisingly, the results in w are 4886396799603965952 and 4886405595696988140.

If we use GrB_PLUS_FP64 for the accumulation instead, then we get the expected results 1 and -18.

Here's the kicker. Let's extend w and u while keeping the first two elements the same:

w = gb.Vector.from_values([0, 1, 2, 4, 5], [0, -20, 30, 40, 50], dtype='INT64')
u = gb.Vector.from_values([0, 1, 3, 4, 5], [1, 2, 3, -4, 0], dtype='FP64')
w(accum=gb.binary.plus) << u

which results in

  0    1   2  3   4   5
  1  -18  30  3  36  50

These values are correct! But, why don't the first two values match the first example?!?!

I'm quite confused and surprised, although it is very possible that I am doing something silly here.

GxB_PRINT_1BASED doesn't affect printing of pending tuples

5x5 GraphBLAS double matrix, sparse by col
  1 entry, memory: 6.5 KB
  pending tuples: 9 max pending: 256 zombies: 0

    (1,1)    0.570652
  pending tuples:
    GraphBLAS type:  double size: 8
    row: 1 col: 3    0.0304623
    row: 1 col: 4    0.987438
    row: 2 col: 3    0.10812
    row: 3 col: 0    0.124603
    row: 3 col: 3    0.411447
    row: 4 col: 0    0.953935
    row: 4 col: 1    0.581752
    row: 4 col: 2    0.22532
    row: 4 col: 3    0.634672
  pending operator: implicit 2nd

I can try and find this later this week and make a PR to fix, unless this is intentional. It looks correct one pending tuples are "materialized".

GCC 11.1 fails with -Werror=misleading-indentation

Buillding with GCC 11.1 -Werror=misleading-indentation stops at:

/pub/devel/suitesparse/suitesparse-5.9.0-1.i686/build/GraphBLAS/Source/Generated/GB_AxB__any_first_int32.c:278:5:
error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
278 | if (exists && !cb) cx = (ax) ; cb |= exists

Is it a misunderstanding of the compiler or a real code issue ?

Transpose mask

Is it possible to transpose the mask in a GraphBLAS operation?

GxB_Vector_diag should respect transpose

On the main diagonal setting desc.in0=transpose should obviously do nothing. But if k != 0 it seems to me that desc.in0=transpose should perform the transpose right?

`GrB_Matrix_apply_BinaryOp{1st,2nd}` use the wrong transpose descriptors.

According to the spec (and what one would expect from the argument list to the functions), GrB_Matrix_apply_BinaryOp1st uses GrB_INP1 to indicate the input matrix is transposed, and GrB_Matrix_apply_BinaryOp2nd uses GrB_INP0. SuiteSparse has these reversed both in code and in documentation.

This looks easy enough to fix. Care if I take a stab at it?

Serialize and deserialize vectors and matrices

Hi Tim! Is there a way to (de)serialize vectors and matrices to bytes or a stream of bytes? I want to send these bytes over the wire, not to disk.

I am creating a Python library for doing distributed and/or out-of-core GraphBLAS computation using the grblas API. A single distributed matrix is comprised of many SuiteSparse matrices (chunked any which way). To have this work well in a distributed setting, the ability to serialize objects will of course be very valuable. And just a heads up: this will also exercise running GraphBLAS with lots of user threads (fingers crossed!).

GrB_eraseElement

Is there any specific reason why there isn't an erase function? And can we get one in a future release?

GraphBLAS initialization and separate calls in Python wrapper

I build the c++ library with 2 functions: f1 and f2. After that I call them from python wrapper using ctypes.

In f1 I call GrB_init(GrB_NONBLOCKING); and do some computations on GraphBLAS matrices with user-defined type. f1 return the pointer to the c++ class with these matrices. to the python wrapper

After that I try to call function f2 from python wrapper and use this class and GraphBLAS matrices, but I get segmentation fault (core dumped) when I try to use GraphBLAS matrices.

Is there a way to init GrapBLAS in one call from wrapper and continue to use matrices in next calls. Or here we have some limitations?

How to avoid realizing huge results from matrix multiply?

Any advice for how to avoid creating huge results from matrix multiply? Preferably, this won't entail trying (and failing) to allocate memory.

I have three use-cases in mind:

I want to be "nice" to users and preferably give them a useful error message
I want to be "nice" to other processes running on the machine by not using too much memory (so I want to avoid creating too large of objects even if they'll fit in memory)
When executing a task in a distributed setting, I want to discover when a task is too large for one machine so I can distribute it over multiple machines.

Perhaps the easiest approach may be to add a setting that says "don't allow this object (or any object if a global setting) to hold more than MAX_NNZ elements". Perhaps max size in bytes would be better than NNZ. Then, if you detect this will be violated during a calculation, you can return GrB_INSUFFICIENT_SPACE, GrB_OUT_OF_MEMORY, or an exception of your creation. This approach would be "good enough" for me.

Is this a reasonable request? There are, of course, other, fancier ways of computing or estimating nnz of a sparse matrix product (symbolic multiplication, sketches, sampling, etc.), but I'll leave it up to you if you want them in SuiteSparse any time soon.

GB_PUBLIC for new `GxB_*_eWiseUnion` functions

GB_PUBLIC is missing for GxB_Vector_eWiseUnion and GxB_Matrix_eWiseUnion, so these are not externed.

Static link GraphBLAS 6.1.3

When trying to integrate GraphBLAS 6.1.3 to RedisGraph
We use the graphblas_static target and it look like cpu_features is not static linked
Trying to resolved it by adding the below code to CMakeLists.txt didn't help

if ( BUILD_GRB_STATIC_LIBRARY )
    target_link_libraries ( graphblas_static PUBLIC cpu_features )
endif ( )

the linker error was resolved but at runtime we get Symbol not found: _GetX86Info

Intel compiler error (icc)

Hello,

I am trying to use Intel Compiler to compile GraphBLAS but I get the following error:

GraphBLAS/Source/GB_ops.c(531): error: expression must have a constant value
GB_MONOID_DEFT ( GrB_, MIN_FP32     , float     , INFINITY    , -INFINITY )

My intel compiler version is
icc (ICC) 19.0.3.199 20190206

Here it says most of C11 features are supported by icc
https://software.intel.com/content/www/us/en/develop/articles/c11-support-in-intel-c-compiler.html

I am using GraphBLAS version 4.0.3 and the following commands to compile it:

mkdir build & cd build 
CC=icc cmake ..
cd ..
make library JOBS=32

Any idea what is wrong?

Prebuilt binary rmm_wrap_test in repository

The binary executable rmm_wrap/rmm_wrap_test is present in the repository.
Is it intended?