Git Product home page Git Product logo

libshalom's Introduction

LibShalom

Contact: Weiling Yang ([email protected])

LibShalom is a Library for Small Irregular-shaped Matrix Multiplications on ARMv8-based processors. It improves the performance of small and irregular-shaped GEMMs on ARMv8-based processors by improving the shortcomings of existing BLAS libraries, such as packing accounts for a large portion of the runtime, inefficient edge case processing and unreasonable parallelization methods.

This work continues to be optimized, and we need some time. Packing at micro-kernel is key to improving performance. This trick can even be used on large-scale GEMM. So far this project is only partially open source. If there is any problem with this program, please contact me.

Paper information

Weiling Yang, Jianbin Fang, Dezun Dong, Xing Su, Zheng Wang: LIBSHALOM: optimizing small and irregular-shaped matrix multiplications on ARMv8 multi-cores. SC 2021

Software dependences

hardware platform

Phytium 2000+, Kunpeng 920, ThunderX2 or otther ARMv8-based processors image

Compile and install

$ cd NN_LIB && make  
$ make install PREFIX= the installation path

These commands will copy LibShalom library and headers in the installation path PREFIX.

Compiling with LibShalom

All LibShalom definitions and prototypes may be included in your C source file by including a single header file, LibShalom.h:

#include <stdio.h>
#include <stdlib.h>
#include "LibShalom.h"

API

LibShalom_sgemm(int transa, int transb, float *C, float *A, float *B, long M, long N, long K) // Interface of small SGEMM
LibShalom_sgemm_mp(int transa, int transb, float *C, float *A, float *B, long M, long N, long K) // Interface of irregular-shaped SGEMM
LibShalom_dgemm(int transa, int transb, double *C, double *A, double *B, long M, long N, long K) // Interface of small DGEMM
LibShalom_set_thread_nums(int num) // Set the total number of threads

Running Benchmark

The command

$ cd benchmark/small_SGEMM && make  

will compile the benchmark program of fp32 small GEMM to generate the executable file main. By executing main, the user can get the evaluation result of the matrices of sizes from 8x8x8 to 128x128x128.

Getting Started

the following C code is focused on a specific functionality but may be considered as Hello LibShalom.

#include <stdlib.h>
#include <stdlib.h>
#include "LibShalom.h"

int main()
{

	int i,j,k;
	int loop= 100;
	long M, N, K;
        M= N = K = 80;
        /* row-major */   	
	float *A = ( float * ) malloc( K* M * sizeof( float ) );
	float *B = ( float * ) malloc( K* N * sizeof( float ) );
	float *C = ( float * ) malloc( M* N * sizeof( float ) );

	double drand48();
	/* initialize input matrices A and B*/
	for ( i = 0; i < M; i++ )
	{
		for ( j = 0; j < K; j++ )
			A [i* K + j]= 2.0 * (float)drand48( ) - 1.0 ;
	}

	for ( i = 0; i < K; i++ )
	{
		for ( j = 0; j < N; j++ )
			B [i * K + j]= 2.0 * (float)drand48( ) - 1.0 ;
	}

	// warm up
	//perform C = A * B (B is transposed)
	for( i =0 ;i< 5; i++)
		LibShalom_sgemm(NoTrans, Trans, C, A, B, M, N, K);

	for( i= 0; i< loop ;i++)
		LibShalom_sgemm(NoTrans, Trans, C, A, B, M, N, K);


	free(A);
	free(B);
	free(C);
	return 0;
}

The makefile corresponding to this program:

LibShalom_PREFIX = $ path to install LibShalom 
LibShalom_INC    = $(LibShalom_PREFIX)/SMM/include
LibShalom_LIB    = $(LibShalom_PREFIX)/SMM/lib/libsmm.a 

OTHER_LIBS  =-fopenmp

CC          = g++
CFLAGS      = -O3 -I$(LibShalom_INC)
LINKER      = $(CC)

OBJS        = Hello.o

%.o: %.c
	 $(CC) $(CFLAGS) -c -fopenmp $< -o $@

all: $(OBJS)
	$(LINKER) $(OBJS) $(LibShalom_LIB) $(OTHER_LIBS) -o a.out

Note

The matrices are stored in the row-major format in this library. We will keep this library updated and maintained.

libshalom's People

Contributors

anonymousywl avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.