I think there is a strong case to be made to add Aggregator
objects to GraphBLAS to perform reductions. In software, it's powerful to give names to common operations, or, as is the case for aggregations, to reuse names users are already familiar with. This hides implementation details behind a cleaner API that better captures what the user wants to do instead of how to do it. I expect this would result in cleaner, more portable, and more performant code.
Such a change to a GraphBLAS library (or the spec) should, of course, be done with care. Generic user-defined aggregations are complex, and systems that implement them are often too simple and lack functionality, performance, or both. Expressive, performant user-defined aggregations are necessarily complicated to support (imho), and we shouldn't lose sight of this. Having Aggregator
objects allows a library to incrementally support more and more advanced aggregations (if desired), the details of which are hidden behind a nice API and opaque types. But one shouldn't start advanced. Let's begin simple.
I think SuiteSparse:GraphBLAS can actually get really far by reusing its existing machinery. Most aggregations people care about do use monoids during a reduction step, which can be done with your O(1) dense vectors and suitable semirings. Many aggregations should be "simple" recipes for SuiteSparse (I quote "simple" so as to not make light of it--SuiteSparse is anything but simple!).
To help myself think through this, I have assembled a spreadsheet that describes how some common aggregations (and a few uncommon ones) may be performed in GraphBLAS. It's incomplete, but I think it's fairly illustrative:
https://docs.google.com/spreadsheets/d/1dQNrxLBVq8Y3JOdvNnIXRPZsPTDaf7lydztiG7Gbx8E/edit?usp=sharing
I split the aggregations into six groups of increasing complexity or likely difficultness to implement. The 0th group (not included) are the aggregations that are already well-handled by reductions with monoids, such as SUM
, PROD
, ANY
, ALL
, MIN
, MAX
, etc. These could, of course, be rolled into Aggregators.
Group 1
COUNT
COUNT_NONZERO
COUNT_ZERO
SUMOFSQUARES
// Group 1 (blue)
struct Aggregator {
// May define init_value, init_unaryop, or neither
// If none are defined, then an O(1) dense vector with irrelevant data can be used in a matrix-vector multiply.
TYPE *init value, // Optional; e.g., the values of an O(1) dense vector.
UnaryOp *init_unaryop, // Optional; e.g., is applied to the input matrix at the beginning.
BinaryOp *update_binaryop, // Optional; e.g., the BinaryOp in a matrix-vector multiply.
// If not provided, then `first` may be assumed such as `plus_first(A @ dense)`
Monoid merge_monoid // Required; e.g., the Monoid in a matrix-vector multiply.
// If only this is provided, then it is equivalent to existing reduce with Monoid.
};
Group 2
HYPOT
LOGADDEXP
LOGADDEXP2
// Group 2 (green)
// Add finalize.
struct Aggregator {
TYPE *init value,
UnaryOp *init_unaryop,
BinaryOp *update_binaryop,
Monoid merge_monoid,
UnaryOp *finalize // Optional; is applied to the output at the very end.
};
Group 3
MEAN
VARP
VARS
STDEVP
STDEVS
GEOMETRIC_MEAN
HARMONIC_MEAN
PTP
SKEW
(not included)
KURTOSIS
(not included)
// Group 3 (yello)
// Same fields as group 2, but requires tuple types or composite aggregations.
Group 4
// Group 4 (orange)
// Add init_indexunaryop.
struct Aggregator {
// May define init_value, init_unaryop, init_indexunaryop and init_value, or none of these.
TYPE *init value,
UnaryOp *init_unaryop,
IndexUnaryOp *init_indexunaryop, // Optional; for indices; uses `init_value` as thunk value.
BinaryOp *update_binaryop,
Monoid merge_monoid,
UnaryOp *finalize
};
Group 5
FIRST
LAST
FIRSTI
LASTI
FIRST_NONZERO
IS_MONOTONIC_DECREASING
// Group 5 (red)
// Add merge_binaryop and terminate_unaryop, and don't use update_binaryop.
// Exactly one of merge_binaryop or merge_monoid must be defined.
struct Aggregator {
TYPE *init value,
UnaryOp *init_unaryop,
IndexUnaryOp *init_indexunaryop,
BinaryOp *update_binaryop,
BinaryOp *merge_binaryop, // elements must be processed in order,
// but can still be decomposed into groups and done in parallel
Monoid *merge_monoid, // this is now optional
UnaryOp *terminate_unaryop, // call on each aggregation value to check whether we can terminate
UnaryOp *finalize
};
Group 6
LASTI
(alternative)
IS_MONOTONIC_DECREASING
(alternative)
// Group 6 (brown)
// Add update_indexunaryop, and allow use of update_binaryop.
struct Aggregator {
// init options
TYPE *init value,
UnaryOp *init_unaryop,
IndexUnaryOp *init_indexunaryop,
// update options
BinaryOp *update_binaryop,
IndexUnaryOp* update_indexunaryop, // use previous agg value as thunk value
// merge options
BinaryOp *merge_binaryop,
Monoid *merge_monoid,
// final options
UnaryOp *terminate_unaryop,
UnaryOp *finalize
};
In group 5, we used init_unaryop
or init_indexunaryop
with merge_binaryop
, but didn't define any update options. In this case, the init operation is applied to every element, and the elements are merged together (while maintaining order) with merge_binaryop
.
In group 6, we use an init operation, an update operation, and merge_binaryop
. This means if we split the elements being aggregated into sublists (for parallel processing), then the init operation is only called on the first element in each sublist, and the update operation is used for all the rest.
More behaviors and fields could be added. For example, if an update operation is specified without specifying a merge operation, then this could mean the elements must be processed in order, serially (i.e., no within-group parallelism!). For another example, terminate_value
could be added to serve the same purpose as terminate_unaryop
.
Known shortcomings:
- Parametrized aggregations require user-defined functions with the parameters "baked in".
- Examples:
FREQUENCY(val)
MOMENT(n)
NTH(n)
POWERMEAN(p)
STDEV(ddof)
VAR(ddof)
ZSCORE(val)
PERCENT_GT(val)
, etc
MEAN_ABSOLUTE_DIFFERENCE(x)
(abs(x1 - x) + abs(x2 - x) + ...)
- It's unclear how an aggregation on a non-empty set can return
NO_VALUE
.
- For example,
ONLY
could return the first element if and only if there is one element.
- Also, should sample variance (
VARS
) return inf, nan, or NO_VALUE if a single value in group? (I say nan
)
- Aggregations such as
MIN_BY
are challenging.
BY
could be a UnaryOp or a Vector or Matrix the same shape as the input.
- Examples:
- Return the complex number with the smallest imaginary component.
ARGMAX
(probably in some way)
- Aggregations such as weighted average aren't directly supported.
- Aggregations or scans with non-scalar results (such as histograms or cumsum) aren't directly supported.
- The following reductions aren't supported (yet):
MEDIAN
MODE
PERCENTILES
MAD
or MEAN_ABSOLUTE_DEVIATION
(around mean, median, mode, etc)
NUNIQUE
or COUNT_UNIQUE
ARGMIN
and ARGMAX
don't naturally generalize to Matrix to Scalar reduction.
- I think the natural generalization would be to return both I and J indices of the min or max value.
- User-defined aggregators could become complicated and error-prone.
- It's not obvious to me where to draw the line for how to create and support user-defined aggregators.
- Since groups 5 and 6 don't use monoids, these are most likely in the "not yet" category for SuiteSparse.
I'll be honest, the thing I like most about this proposal is COUNT
. I really want a canonical, portable way to calculate row degrees and column degrees ๐ .
Please let me know if anything needs more explaining (or corrected). I intend this issue to be as much for the GraphBLAS committee and community as for SuiteSparse.
Cheers! ๐ป