owensgroup / merge-spmm Goto Github PK

View Code? Open in Web Editor NEW

68.0 68.0 14.0 863 KB

Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018

License: Apache License 2.0

CMake 1.75% Shell 4.56% Makefile 2.05% C++ 68.55% Cuda 23.09%

merge-spmm's People

Contributors

Stargazers

Watchers

Forkers

ozturkosu blackjack2015 zhichengzhang1995 yangwang92 yukewang96 isratnisa shi27feng vmiheer xzr-create dataintensive-hpc microzhy dwwcqu siaaa-3

merge-spmm's Issues

Bugs about reading symmetric mtx data and processing data

Hi all, I found a bug in reading symmetric mtx data.
For example, when I tried to run

./gspmm --debug=true --max_ncols=4 ./4_4coo_dense.mtx

to read 4x4 dense matrix from 4_4coo_dense.mtx.

It will load a broken matrix from the symmetric mtx.

Wrong results

%%MatrixMarket matrix coordinate real symmetric
%
4 4 10
1 1 1
2 1 1
2 2 1
3 1 1
3 2 1
3 3 1
4 1 1
4 2 1
4 3 1
4 4 1

ta: 32
tb: 32
nt: 128
row: 1
debug: 1
%%MatrixMarket matrix coordinate real symmetric
4 4 13
csrColInd:
[0]:0 [1]:1 [2]:2 [3]:3 [4]:0 [5]:2 [6]:3 [7]:0 [8]:1 [9]:3 [10]:0 [11]:1 [12]:2 [13]:0 [14]:4113 [15]:0 [16]:0 [17]:0 [18]:0 [19]:0 [20]:0 [21]:0 [22]:0 [23]:0 [24]:0 [25]:0 [26]:0 [27]:0 [28]:0 [29]:0 [30]:0 [31]:0 [32]:0 [33]:0 [34]:0 [35]:0 [36]:0 [37]:0 [38]:0 [39]:0
csrRowPtr:
[0]:0 [1]:4 [2]:7 [3]:10 [4]:13 [5]:14 [6]:81 [7]:0 [8]:0 [9]:0 [10]:0 [11]:0 [12]:1 [13]:1 [14]:1 [15]:2 [16]:2 [17]:2 [18]:3 [19]:3 [20]:3 [21]:35143 [22]:3 [23]:3 [24]:35143 [25]:-1456 [26]:81 [27]:0 [28]:0 [29]:1 [30]:2 [31]:3 [32]:0 [33]:2 [34]:3 [35]:0 [36]:1 [37]:3 [38]:0 [39]:1
csrVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:9.10844e-44 [15]:0 [16]:1.51901e-38 [17]:0 [18]:1.49695e-38 [19]:0 [20]:2.69808e-38 [21]:0 [22]:0 [23]:1.44118e+17 [24]:1.05553e+14 [25]:4.58715e-41 [26]:2.93874e-39 [27]:0 [28]:0 [29]:0 [30]:2.03188e-43 [31]:0 [32]:1.23145e+14 [33]:4.58715e-41 [34]:2.93874e-38 [35]:0 [36]:0 [37]:0 [38]:0 [39]:0
pretty print:
x x x x
x 0 x x
x x 0 x
x x x 0
mxm: 0.036416 ms
denseVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:0 [6]:1 [7]:1 [8]:1 [9]:1 [10]:0 [11]:1 [12]:1 [13]:1 [14]:1 [15]:0
x x x x
x 0 x x
x x 0 x
x x x 0
There were 0 errors out of 13.

Correct Results

%%MatrixMarket matrix coordinate real general
%
4 4 16
1 1 1
1 2 1
1 3 1
1 4 1
2 1 1
2 2 1
2 3 1
2 4 1
3 1 1
3 2 1
3 3 1
3 4 1
4 1 1
4 2 1
4 3 1
4 4 1

ta: 32
tb: 32
nt: 128
row: 1
debug: 1
%%MatrixMarket matrix coordinate real general
4 4 16
csrColInd:
[0]:0 [1]:1 [2]:2 [3]:3 [4]:0 [5]:1 [6]:2 [7]:3 [8]:0 [9]:1 [10]:2 [11]:3 [12]:0 [13]:1 [14]:2 [15]:3 [16]:35143 [17]:96 [18]:81 [19]:0 [20]:1065353216 [21]:1065353216 [22]:1065353216 [23]:1065353216 [24]:1065353216 [25]:1065353216 [26]:1065353216 [27]:1065353216 [28]:1065353216 [29]:1065353216 [30]:1065353216 [31]:1065353216 [32]:1065353216 [33]:1065353216 [34]:1065353216 [35]:1065353216 [36]:4 [37]:1065353216 [38]:81 [39]:0
csrRowPtr:
[0]:0 [1]:4 [2]:8 [3]:12 [4]:16 [5]:0 [6]:33 [7]:0 [8]:0 [9]:0 [10]:4482352 [11]:0 [12]:-1672595478 [13]:7953 [14]:33 [15]:0 [16]:42126000 [17]:0 [18]:42126096 [19]:0 [20]:32 [21]:0 [22]:49 [23]:0 [24]:0 [25]:0 [26]:34833056 [27]:0 [28]:42189248 [29]:0 [30]:1638970554 [31]:1868983913 [32]:203358240 [33]:7967 [34]:49 [35]:0 [36]:0 [37]:0 [38]:41772304 [39]:0
csrVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1 [16]:5.60519e-45 [17]:1 [18]:1.13505e-43 [19]:0 [20]:0 [21]:0 [22]:1 [23]:1 [24]:1 [25]:1 [26]:1 [27]:1 [28]:1 [29]:1 [30]:1 [31]:1 [32]:1 [33]:1 [34]:1 [35]:1 [36]:2.95797e+17 [37]:4.58631e-41 [38]:2.70451e-43 [39]:0
pretty print:
x x x x
x x x x
x x x x
x x x x
mxm: 0.033792 ms
denseVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1
x x x x
x x x x
x x x x
x x x x
There were 0 errors out of 16.

Algorithm implementation

Hi @ctcyang,
Hope you are doing well.
I'm wondering which part of the code implementing the function PartitionSpmm mentioned in your paper?

Thanks in advance!

Hello, I couldn't compile modern GPU

Hello,

I tried to install merge-spmm based on the instructions provided in README.

But, I failed to install modern GPU as follows.

I am not sure, but password seems to be required to install modern GPU.

Could you tell me how can I resolve this issue?

Thank you!

Regards,
Changwan Hong

[hongc@cerberus mgpu]$ git clone --recursive https://github.com/owensgroup/GraphBLAS.git
Initialized empty Git repository in /home/hongc/carl_spmm/mgpu/GraphBLAS/.git/
remote: Enumerating objects: 230, done.
remote: Counting objects: 100% (230/230), done.
remote: Compressing objects: 100% (163/163), done.
remote: Total 4005 (delta 135), reused 141 (delta 66), pack-reused 3775
Receiving objects: 100% (4005/4005), 5.13 MiB, done.
Resolving deltas: 100% (2690/2690), done.
Submodule 'ext/cub' (https://[email protected]/NVlabs/cub.git) registered for path 'ext/cub'
Submodule 'ext/moderngpu' (https://[email protected]/ctcyang/moderngpu.git) registered for path 'ext/moderngpu'
Initialized empty Git repository in /home/hongc/carl_spmm/mgpu/GraphBLAS/ext/cub/.git/
Password:
remote: Enumerating objects: 32642, done.
remote: Total 32642 (delta 0), reused 0 (delta 0), pack-reused 32642
Receiving objects: 100% (32642/32642), 16.49 MiB | 24.46 MiB/s, done.
Resolving deltas: 100% (28621/28621), done.
Password:
Submodule path 'ext/cub': checked out 'd622848f9fb62f13e5e064e1deb43b6bcbb12bad'
Initialized empty Git repository in /home/hongc/carl_spmm/mgpu/GraphBLAS/ext/moderngpu/.git/
Password:
remote: Enumerating objects: 25, done.
remote: Counting objects: 100% (25/25), done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 1664 (delta 13), reused 15 (delta 6), pack-reused 1639
Receiving objects: 100% (1664/1664), 6.81 MiB, done.
Resolving deltas: 100% (952/952), done.
Password:
Submodule path 'ext/moderngpu': checked out '9e491c383e935c2cbc0279350640dad3febb8b9d'
[hongc@cerberus mgpu]$ git checkout europar
fatal: Not a git repository (or any of the parent directories): .git

Sparse matrix reader with incorrect nnz

Dear authors,

I run some simple benchmarks after I successfully compiled the codes on a Titan X GPU. But from the output log, it seems that the sparse matrix was read incorrectly. For example, I tried the command with a sparse matrix from here: c-42:
$./bin/gbspmm --iter=100 --tb=32 --nt=128 --max_ncols=10471 c-42.mtx
The output gives:
c-42.mtx, 10471, 10471, **99815**, 0, 4013, 0, 940, 10, 1000, 332, 1208, 128, 128, 0, 36, 712, 448, 40, 256, 9, 128, 0, 20, 64, 0, 264, 192, 8, 0, 128, 0, 8, 80, 40, 0, 279, 0, 9.53252, 41.339, 4.33663, 23.2004, merge path, 9.50961, 219.812, 92.3251,
The matrix was read by the program as a 10471x10471 with 99815 non zero elements, while the original file says 110285. Is there any particular process in the file reading? Thanks!

where is the paper Engineering a High-Performance GPU B-Tree's code?

question about merge-path SpMM implementation

Hello, I am reading your paper and code and I have some problems about the merge-path SpMM.
(1) In my understanding, limit[i] meaning paritition i should start from row limit[i] of A, and end in row limit[i+1], but how do I find which element it should start, I can't understand what does end and row_ind mean in your alg.
(2)what is the granularity of merge-path SpMM? In row-split a wrap in responsible for a row in A. In your paper you say a thread is responsible for T works. Does T work means T nnz in A?
(3) How to finally reduce the partial sum. Since each thread is responsible for T work which might come from different rows in A?

May I know where I can download the data sets that were used in your paper

I can download the "large" data set in the folder using the script: merge-spmm/dataset/europar/large/DownloadFigure6.sh, but the highd and lowd folders have not such download addresses. Could you provide some help? Thanks!

No ext/moderngpu found

Dear authors, it seems that the ext/moderngpu package was not put in the repository. I compiled the codes, but it said:
"Cannot find source file: /ext/moderngpu/src/mgpucontext.cu".

Could you provide the right position to pull the complete codes? Thanks!

Questions about merge-path algo!

When I ported the merge-spmm from cuda-8.0 to cuda11+ and successful compiled, I got the memory fatal access errors when running gspmm --mode=mergepath XXX.mtx.
The followings are compute-sanitizer tool's partial outputs:

========= Invalid __shared__ read of size 4 bytes
=========     at 0x12a0 in /merge-spmm/ext/moderngpu/include/device/ctasearch.cuh:50:void mgpu::BinarySearchIt<(mgpu::MgpuBounds)1, int, const int *, int, mgpu::less<int>>(T3, int &, int &, T4, int, T5)
=========     by thread (0,0,0) in block (2,0,0)
=========     Address 0xf58 is out of bounds
=========     Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasearch.cuh:85:int mgpu::BinarySearch<(mgpu::MgpuBounds)1, int, const int *, mgpu::less<int>>(T3, int, T2, T4) [0x1280]
=========     Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasegreduce.cuh:92:int mgpu::DeviceExpandCsrRows<(int)128, (int)1>(int, int, const int *, int, int, int *, int *) [0xf80]
=========     Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasegreduce.cuh:166:mgpu::SegReduceTerms mgpu::DeviceSegReducePrepareSpmm<(int)128, (int)1>(const int *, int *, int, int, int, int, bool, int *, int *) [0xf80]
=========     Device Frame:/merge-spmm/ext/moderngpu/include/kernels/spmvcsr.cuh:203:void mgpu::KernelSpmmCsr<(int)32, (int)128, (bool)0, (bool)1, float *, int *, int *, const int *, float *, float *, float, mgpu::multiplies<float>, mgpu::plus<float>>(T5, T6, int, T7, T8, T9, T7, T10, T11 *, T11, T12, T13, int) [0xf20]

The source codes I have modified includes cusparse API, warp shuffle functions from __shfl* to __shfl_sync*, __any to __any_sync.

If you have any suggestions, thanks for helps!

Error while compiling gpu_spmv

There are some deprecated functions being called (cusparseScsrmv, cusparseDcsrmv) in gpu_spmv.cu. I am using CUDA 11.2. Suppose these calls are modified according to latest functions available in cuSPARSE, will the compilation happen smoothly with this part and the rest?

owensgroup / merge-spmm Goto Github PK

merge-spmm's People

Contributors

Stargazers

Watchers

Forkers

merge-spmm's Issues

Bugs about reading symmetric mtx data and processing data

Algorithm implementation

Hello, I couldn't compile modern GPU

Sparse matrix reader with incorrect nnz

where is the paper Engineering a High-Performance GPU B-Tree's code?

question about merge-path SpMM implementation

May I know where I can download the data sets that were used in your paper

No ext/moderngpu found

Questions about merge-path algo!

Error while compiling gpu_spmv

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent