Contact: Weiling Yang ([email protected])
LibShalom is a Library for Small Irregular-shaped Matrix Multiplications on ARMv8-based processors. It improves the performance of small and irregular-shaped GEMMs on ARMv8-based processors by improving the shortcomings of existing BLAS libraries, such as packing accounts for a large portion of the runtime, inefficient edge case processing and unreasonable parallelization methods.
This work continues to be optimized, and we need some time. Packing at micro-kernel is key to improving performance. This trick can even be used on large-scale GEMM. So far this project is only partially open source. If there is any problem with this program, please contact me.
Weiling Yang, Jianbin Fang, Dezun Dong, Xing Su, Zheng Wang: LIBSHALOM: optimizing small and irregular-shaped matrix multiplications on ARMv8 multi-cores. SC 2021
- GNU Compiler (GCC) (>=v8.2)
- OpenMP
Phytium 2000+, Kunpeng 920, ThunderX2 or otther ARMv8-based processors
$ cd NN_LIB && make
$ make install PREFIX= the installation path
These commands will copy LibShalom library and headers in the installation path PREFIX.
All LibShalom definitions and prototypes may be included in your C source file by including a single header file, LibShalom.h
:
#include <stdio.h>
#include <stdlib.h>
#include "LibShalom.h"
LibShalom_sgemm(int transa, int transb, float *C, float *A, float *B, long M, long N, long K)
// Interface of small SGEMM
LibShalom_sgemm_mp(int transa, int transb, float *C, float *A, float *B, long M, long N, long K)
// Interface of irregular-shaped SGEMM
LibShalom_dgemm(int transa, int transb, double *C, double *A, double *B, long M, long N, long K)
// Interface of small DGEMM
LibShalom_set_thread_nums(int num)
// Set the total number of threads
The command
$ cd benchmark/small_SGEMM && make
will compile the benchmark program of fp32 small GEMM to generate the executable file main
. By executing main
, the user can get the evaluation result of the matrices of sizes from 8x8x8 to 128x128x128.
the following C code is focused on a specific functionality but may be considered as Hello LibShalom.
#include <stdlib.h>
#include <stdlib.h>
#include "LibShalom.h"
int main()
{
int i,j,k;
int loop= 100;
long M, N, K;
M= N = K = 80;
/* row-major */
float *A = ( float * ) malloc( K* M * sizeof( float ) );
float *B = ( float * ) malloc( K* N * sizeof( float ) );
float *C = ( float * ) malloc( M* N * sizeof( float ) );
double drand48();
/* initialize input matrices A and B*/
for ( i = 0; i < M; i++ )
{
for ( j = 0; j < K; j++ )
A [i* K + j]= 2.0 * (float)drand48( ) - 1.0 ;
}
for ( i = 0; i < K; i++ )
{
for ( j = 0; j < N; j++ )
B [i * K + j]= 2.0 * (float)drand48( ) - 1.0 ;
}
// warm up
//perform C = A * B (B is transposed)
for( i =0 ;i< 5; i++)
LibShalom_sgemm(NoTrans, Trans, C, A, B, M, N, K);
for( i= 0; i< loop ;i++)
LibShalom_sgemm(NoTrans, Trans, C, A, B, M, N, K);
free(A);
free(B);
free(C);
return 0;
}
The makefile corresponding to this program:
LibShalom_PREFIX = $ path to install LibShalom
LibShalom_INC = $(LibShalom_PREFIX)/SMM/include
LibShalom_LIB = $(LibShalom_PREFIX)/SMM/lib/libsmm.a
OTHER_LIBS =-fopenmp
CC = g++
CFLAGS = -O3 -I$(LibShalom_INC)
LINKER = $(CC)
OBJS = Hello.o
%.o: %.c
$(CC) $(CFLAGS) -c -fopenmp $< -o $@
all: $(OBJS)
$(LINKER) $(OBJS) $(LibShalom_LIB) $(OTHER_LIBS) -o a.out
The matrices are stored in the row-major format in this library. We will keep this library updated and maintained.