owensgroup / merge-spmm Goto Github PK
View Code? Open in Web Editor NEWCode for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018
License: Apache License 2.0
Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018
License: Apache License 2.0
Hi all, I found a bug in reading symmetric mtx data.
For example, when I tried to run
./gspmm --debug=true --max_ncols=4 ./4_4coo_dense.mtx
to read 4x4 dense matrix from 4_4coo_dense.mtx.
It will load a broken matrix from the symmetric mtx.
Wrong results
%%MatrixMarket matrix coordinate real symmetric
%
4 4 10
1 1 1
2 1 1
2 2 1
3 1 1
3 2 1
3 3 1
4 1 1
4 2 1
4 3 1
4 4 1
ta: 32
tb: 32
nt: 128
row: 1
debug: 1
%%MatrixMarket matrix coordinate real symmetric
4 4 13
csrColInd:
[0]:0 [1]:1 [2]:2 [3]:3 [4]:0 [5]:2 [6]:3 [7]:0 [8]:1 [9]:3 [10]:0 [11]:1 [12]:2 [13]:0 [14]:4113 [15]:0 [16]:0 [17]:0 [18]:0 [19]:0 [20]:0 [21]:0 [22]:0 [23]:0 [24]:0 [25]:0 [26]:0 [27]:0 [28]:0 [29]:0 [30]:0 [31]:0 [32]:0 [33]:0 [34]:0 [35]:0 [36]:0 [37]:0 [38]:0 [39]:0
csrRowPtr:
[0]:0 [1]:4 [2]:7 [3]:10 [4]:13 [5]:14 [6]:81 [7]:0 [8]:0 [9]:0 [10]:0 [11]:0 [12]:1 [13]:1 [14]:1 [15]:2 [16]:2 [17]:2 [18]:3 [19]:3 [20]:3 [21]:35143 [22]:3 [23]:3 [24]:35143 [25]:-1456 [26]:81 [27]:0 [28]:0 [29]:1 [30]:2 [31]:3 [32]:0 [33]:2 [34]:3 [35]:0 [36]:1 [37]:3 [38]:0 [39]:1
csrVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:9.10844e-44 [15]:0 [16]:1.51901e-38 [17]:0 [18]:1.49695e-38 [19]:0 [20]:2.69808e-38 [21]:0 [22]:0 [23]:1.44118e+17 [24]:1.05553e+14 [25]:4.58715e-41 [26]:2.93874e-39 [27]:0 [28]:0 [29]:0 [30]:2.03188e-43 [31]:0 [32]:1.23145e+14 [33]:4.58715e-41 [34]:2.93874e-38 [35]:0 [36]:0 [37]:0 [38]:0 [39]:0
pretty print:
x x x x
x 0 x x
x x 0 x
x x x 0
mxm: 0.036416 ms
denseVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:0 [6]:1 [7]:1 [8]:1 [9]:1 [10]:0 [11]:1 [12]:1 [13]:1 [14]:1 [15]:0
x x x x
x 0 x x
x x 0 x
x x x 0
There were 0 errors out of 13.
Correct Results
%%MatrixMarket matrix coordinate real general
%
4 4 16
1 1 1
1 2 1
1 3 1
1 4 1
2 1 1
2 2 1
2 3 1
2 4 1
3 1 1
3 2 1
3 3 1
3 4 1
4 1 1
4 2 1
4 3 1
4 4 1
ta: 32
tb: 32
nt: 128
row: 1
debug: 1
%%MatrixMarket matrix coordinate real general
4 4 16
csrColInd:
[0]:0 [1]:1 [2]:2 [3]:3 [4]:0 [5]:1 [6]:2 [7]:3 [8]:0 [9]:1 [10]:2 [11]:3 [12]:0 [13]:1 [14]:2 [15]:3 [16]:35143 [17]:96 [18]:81 [19]:0 [20]:1065353216 [21]:1065353216 [22]:1065353216 [23]:1065353216 [24]:1065353216 [25]:1065353216 [26]:1065353216 [27]:1065353216 [28]:1065353216 [29]:1065353216 [30]:1065353216 [31]:1065353216 [32]:1065353216 [33]:1065353216 [34]:1065353216 [35]:1065353216 [36]:4 [37]:1065353216 [38]:81 [39]:0
csrRowPtr:
[0]:0 [1]:4 [2]:8 [3]:12 [4]:16 [5]:0 [6]:33 [7]:0 [8]:0 [9]:0 [10]:4482352 [11]:0 [12]:-1672595478 [13]:7953 [14]:33 [15]:0 [16]:42126000 [17]:0 [18]:42126096 [19]:0 [20]:32 [21]:0 [22]:49 [23]:0 [24]:0 [25]:0 [26]:34833056 [27]:0 [28]:42189248 [29]:0 [30]:1638970554 [31]:1868983913 [32]:203358240 [33]:7967 [34]:49 [35]:0 [36]:0 [37]:0 [38]:41772304 [39]:0
csrVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1 [16]:5.60519e-45 [17]:1 [18]:1.13505e-43 [19]:0 [20]:0 [21]:0 [22]:1 [23]:1 [24]:1 [25]:1 [26]:1 [27]:1 [28]:1 [29]:1 [30]:1 [31]:1 [32]:1 [33]:1 [34]:1 [35]:1 [36]:2.95797e+17 [37]:4.58631e-41 [38]:2.70451e-43 [39]:0
pretty print:
x x x x
x x x x
x x x x
x x x x
mxm: 0.033792 ms
denseVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1
x x x x
x x x x
x x x x
x x x x
There were 0 errors out of 16.
Hi @ctcyang,
Hope you are doing well.
I'm wondering which part of the code implementing the function PartitionSpmm mentioned in your paper?
Thanks in advance!
Hello,
I tried to install merge-spmm based on the instructions provided in README.
But, I failed to install modern GPU as follows.
I am not sure, but password seems to be required to install modern GPU.
Could you tell me how can I resolve this issue?
Thank you!
Regards,
Changwan Hong
Dear authors,
I run some simple benchmarks after I successfully compiled the codes on a Titan X GPU. But from the output log, it seems that the sparse matrix was read incorrectly. For example, I tried the command with a sparse matrix from here: c-42:
$./bin/gbspmm --iter=100 --tb=32 --nt=128 --max_ncols=10471 c-42.mtx
The output gives:
c-42.mtx, 10471, 10471, **99815**, 0, 4013, 0, 940, 10, 1000, 332, 1208, 128, 128, 0, 36, 712, 448, 40, 256, 9, 128, 0, 20, 64, 0, 264, 192, 8, 0, 128, 0, 8, 80, 40, 0, 279, 0, 9.53252, 41.339, 4.33663, 23.2004, merge path, 9.50961, 219.812, 92.3251,
The matrix was read by the program as a 10471x10471 with 99815 non zero elements, while the original file says 110285. Is there any particular process in the file reading? Thanks!
where is the paper Engineering a High-Performance GPU B-Tree's code?
Hello, I am reading your paper and code and I have some problems about the merge-path SpMM.
(1) In my understanding, limit[i]
meaning paritition i should start from row limit[i] of A, and end in row limit[i+1], but how do I find which element it should start, I can't understand what does end
and row_ind
mean in your alg.
(2)what is the granularity of merge-path SpMM? In row-split a wrap in responsible for a row in A. In your paper you say a thread is responsible for T works. Does T work means T nnz in A?
(3) How to finally reduce the partial sum. Since each thread is responsible for T work which might come from different rows in A?
I can download the "large" data set in the folder using the script: merge-spmm/dataset/europar/large/DownloadFigure6.sh
, but the highd
and lowd
folders have not such download addresses. Could you provide some help? Thanks!
Dear authors, it seems that the ext/moderngpu
package was not put in the repository. I compiled the codes, but it said:
"Cannot find source file: /ext/moderngpu/src/mgpucontext.cu".
Could you provide the right position to pull the complete codes? Thanks!
When I ported the merge-spmm from cuda-8.0 to cuda11+ and successful compiled, I got the memory fatal access errors when running gspmm --mode=mergepath XXX.mtx
.
The followings are compute-sanitizer
tool's partial outputs:
========= Invalid __shared__ read of size 4 bytes
========= at 0x12a0 in /merge-spmm/ext/moderngpu/include/device/ctasearch.cuh:50:void mgpu::BinarySearchIt<(mgpu::MgpuBounds)1, int, const int *, int, mgpu::less<int>>(T3, int &, int &, T4, int, T5)
========= by thread (0,0,0) in block (2,0,0)
========= Address 0xf58 is out of bounds
========= Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasearch.cuh:85:int mgpu::BinarySearch<(mgpu::MgpuBounds)1, int, const int *, mgpu::less<int>>(T3, int, T2, T4) [0x1280]
========= Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasegreduce.cuh:92:int mgpu::DeviceExpandCsrRows<(int)128, (int)1>(int, int, const int *, int, int, int *, int *) [0xf80]
========= Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasegreduce.cuh:166:mgpu::SegReduceTerms mgpu::DeviceSegReducePrepareSpmm<(int)128, (int)1>(const int *, int *, int, int, int, int, bool, int *, int *) [0xf80]
========= Device Frame:/merge-spmm/ext/moderngpu/include/kernels/spmvcsr.cuh:203:void mgpu::KernelSpmmCsr<(int)32, (int)128, (bool)0, (bool)1, float *, int *, int *, const int *, float *, float *, float, mgpu::multiplies<float>, mgpu::plus<float>>(T5, T6, int, T7, T8, T9, T7, T10, T11 *, T11, T12, T13, int) [0xf20]
The source codes I have modified includes cusparse API, warp shuffle functions from __shfl*
to __shfl_sync*
, __any
to __any_sync
.
If you have any suggestions, thanks for helps!
There are some deprecated functions being called (cusparseScsrmv, cusparseDcsrmv) in gpu_spmv.cu. I am using CUDA 11.2. Suppose these calls are modified according to latest functions available in cuSPARSE, will the compilation happen smoothly with this part and the rest?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.