Git Product home page Git Product logo

merge-spmm's People

Contributors

ctcyang avatar vmiheer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

merge-spmm's Issues

Bugs about reading symmetric mtx data and processing data

Hi all, I found a bug in reading symmetric mtx data.
For example, when I tried to run

./gspmm --debug=true --max_ncols=4 ./4_4coo_dense.mtx

to read 4x4 dense matrix from 4_4coo_dense.mtx.

It will load a broken matrix from the symmetric mtx.

Wrong results

%%MatrixMarket matrix coordinate real symmetric
%
4 4 10
1 1 1
2 1 1
2 2 1
3 1 1
3 2 1
3 3 1
4 1 1
4 2 1
4 3 1
4 4 1

ta: 32
tb: 32
nt: 128
row: 1
debug: 1
%%MatrixMarket matrix coordinate real symmetric
4 4 13
csrColInd:
[0]:0 [1]:1 [2]:2 [3]:3 [4]:0 [5]:2 [6]:3 [7]:0 [8]:1 [9]:3 [10]:0 [11]:1 [12]:2 [13]:0 [14]:4113 [15]:0 [16]:0 [17]:0 [18]:0 [19]:0 [20]:0 [21]:0 [22]:0 [23]:0 [24]:0 [25]:0 [26]:0 [27]:0 [28]:0 [29]:0 [30]:0 [31]:0 [32]:0 [33]:0 [34]:0 [35]:0 [36]:0 [37]:0 [38]:0 [39]:0
csrRowPtr:
[0]:0 [1]:4 [2]:7 [3]:10 [4]:13 [5]:14 [6]:81 [7]:0 [8]:0 [9]:0 [10]:0 [11]:0 [12]:1 [13]:1 [14]:1 [15]:2 [16]:2 [17]:2 [18]:3 [19]:3 [20]:3 [21]:35143 [22]:3 [23]:3 [24]:35143 [25]:-1456 [26]:81 [27]:0 [28]:0 [29]:1 [30]:2 [31]:3 [32]:0 [33]:2 [34]:3 [35]:0 [36]:1 [37]:3 [38]:0 [39]:1
csrVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:9.10844e-44 [15]:0 [16]:1.51901e-38 [17]:0 [18]:1.49695e-38 [19]:0 [20]:2.69808e-38 [21]:0 [22]:0 [23]:1.44118e+17 [24]:1.05553e+14 [25]:4.58715e-41 [26]:2.93874e-39 [27]:0 [28]:0 [29]:0 [30]:2.03188e-43 [31]:0 [32]:1.23145e+14 [33]:4.58715e-41 [34]:2.93874e-38 [35]:0 [36]:0 [37]:0 [38]:0 [39]:0
pretty print:
x x x x
x 0 x x
x x 0 x
x x x 0
mxm: 0.036416 ms
denseVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:0 [6]:1 [7]:1 [8]:1 [9]:1 [10]:0 [11]:1 [12]:1 [13]:1 [14]:1 [15]:0
x x x x
x 0 x x
x x 0 x
x x x 0
There were 0 errors out of 13.

Correct Results

%%MatrixMarket matrix coordinate real general
%
4 4 16
1 1 1
1 2 1
1 3 1
1 4 1
2 1 1
2 2 1
2 3 1
2 4 1
3 1 1
3 2 1
3 3 1
3 4 1
4 1 1
4 2 1
4 3 1
4 4 1

ta: 32
tb: 32
nt: 128
row: 1
debug: 1
%%MatrixMarket matrix coordinate real general
4 4 16
csrColInd:
[0]:0 [1]:1 [2]:2 [3]:3 [4]:0 [5]:1 [6]:2 [7]:3 [8]:0 [9]:1 [10]:2 [11]:3 [12]:0 [13]:1 [14]:2 [15]:3 [16]:35143 [17]:96 [18]:81 [19]:0 [20]:1065353216 [21]:1065353216 [22]:1065353216 [23]:1065353216 [24]:1065353216 [25]:1065353216 [26]:1065353216 [27]:1065353216 [28]:1065353216 [29]:1065353216 [30]:1065353216 [31]:1065353216 [32]:1065353216 [33]:1065353216 [34]:1065353216 [35]:1065353216 [36]:4 [37]:1065353216 [38]:81 [39]:0
csrRowPtr:
[0]:0 [1]:4 [2]:8 [3]:12 [4]:16 [5]:0 [6]:33 [7]:0 [8]:0 [9]:0 [10]:4482352 [11]:0 [12]:-1672595478 [13]:7953 [14]:33 [15]:0 [16]:42126000 [17]:0 [18]:42126096 [19]:0 [20]:32 [21]:0 [22]:49 [23]:0 [24]:0 [25]:0 [26]:34833056 [27]:0 [28]:42189248 [29]:0 [30]:1638970554 [31]:1868983913 [32]:203358240 [33]:7967 [34]:49 [35]:0 [36]:0 [37]:0 [38]:41772304 [39]:0
csrVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1 [16]:5.60519e-45 [17]:1 [18]:1.13505e-43 [19]:0 [20]:0 [21]:0 [22]:1 [23]:1 [24]:1 [25]:1 [26]:1 [27]:1 [28]:1 [29]:1 [30]:1 [31]:1 [32]:1 [33]:1 [34]:1 [35]:1 [36]:2.95797e+17 [37]:4.58631e-41 [38]:2.70451e-43 [39]:0
pretty print:
x x x x
x x x x
x x x x
x x x x
mxm: 0.033792 ms
denseVal:
[0]:1 [1]:1 [2]:1 [3]:1 [4]:1 [5]:1 [6]:1 [7]:1 [8]:1 [9]:1 [10]:1 [11]:1 [12]:1 [13]:1 [14]:1 [15]:1
x x x x
x x x x
x x x x
x x x x
There were 0 errors out of 16.

Algorithm implementation

Hi @ctcyang,
Hope you are doing well.
I'm wondering which part of the code implementing the function PartitionSpmm mentioned in your paper?

Thanks in advance!

Hello, I couldn't compile modern GPU

Hello,

I tried to install merge-spmm based on the instructions provided in README.

But, I failed to install modern GPU as follows.

I am not sure, but password seems to be required to install modern GPU.

Could you tell me how can I resolve this issue?

Thank you!

Regards,
Changwan Hong


[hongc@cerberus mgpu]$ git clone --recursive https://github.com/owensgroup/GraphBLAS.git
Initialized empty Git repository in /home/hongc/carl_spmm/mgpu/GraphBLAS/.git/
remote: Enumerating objects: 230, done.
remote: Counting objects: 100% (230/230), done.
remote: Compressing objects: 100% (163/163), done.
remote: Total 4005 (delta 135), reused 141 (delta 66), pack-reused 3775
Receiving objects: 100% (4005/4005), 5.13 MiB, done.
Resolving deltas: 100% (2690/2690), done.
Submodule 'ext/cub' (https://[email protected]/NVlabs/cub.git) registered for path 'ext/cub'
Submodule 'ext/moderngpu' (https://[email protected]/ctcyang/moderngpu.git) registered for path 'ext/moderngpu'
Initialized empty Git repository in /home/hongc/carl_spmm/mgpu/GraphBLAS/ext/cub/.git/
Password:
remote: Enumerating objects: 32642, done.
remote: Total 32642 (delta 0), reused 0 (delta 0), pack-reused 32642
Receiving objects: 100% (32642/32642), 16.49 MiB | 24.46 MiB/s, done.
Resolving deltas: 100% (28621/28621), done.
Password:
Submodule path 'ext/cub': checked out 'd622848f9fb62f13e5e064e1deb43b6bcbb12bad'
Initialized empty Git repository in /home/hongc/carl_spmm/mgpu/GraphBLAS/ext/moderngpu/.git/
Password:
remote: Enumerating objects: 25, done.
remote: Counting objects: 100% (25/25), done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 1664 (delta 13), reused 15 (delta 6), pack-reused 1639
Receiving objects: 100% (1664/1664), 6.81 MiB, done.
Resolving deltas: 100% (952/952), done.
Password:
Submodule path 'ext/moderngpu': checked out '9e491c383e935c2cbc0279350640dad3febb8b9d'
[hongc@cerberus mgpu]$ git checkout europar
fatal: Not a git repository (or any of the parent directories): .git

Sparse matrix reader with incorrect nnz

Dear authors,

I run some simple benchmarks after I successfully compiled the codes on a Titan X GPU. But from the output log, it seems that the sparse matrix was read incorrectly. For example, I tried the command with a sparse matrix from here: c-42:
$./bin/gbspmm --iter=100 --tb=32 --nt=128 --max_ncols=10471 c-42.mtx
The output gives:
c-42.mtx, 10471, 10471, **99815**, 0, 4013, 0, 940, 10, 1000, 332, 1208, 128, 128, 0, 36, 712, 448, 40, 256, 9, 128, 0, 20, 64, 0, 264, 192, 8, 0, 128, 0, 8, 80, 40, 0, 279, 0, 9.53252, 41.339, 4.33663, 23.2004, merge path, 9.50961, 219.812, 92.3251,
The matrix was read by the program as a 10471x10471 with 99815 non zero elements, while the original file says 110285. Is there any particular process in the file reading? Thanks!

question about merge-path SpMM implementation

Hello, I am reading your paper and code and I have some problems about the merge-path SpMM.
(1) In my understanding, limit[i] meaning paritition i should start from row limit[i] of A, and end in row limit[i+1], but how do I find which element it should start, I can't understand what does end and row_ind mean in your alg.
(2)what is the granularity of merge-path SpMM? In row-split a wrap in responsible for a row in A. In your paper you say a thread is responsible for T works. Does T work means T nnz in A?
(3) How to finally reduce the partial sum. Since each thread is responsible for T work which might come from different rows in A?
image

No ext/moderngpu found

Dear authors, it seems that the ext/moderngpu package was not put in the repository. I compiled the codes, but it said:
"Cannot find source file: /ext/moderngpu/src/mgpucontext.cu".

Could you provide the right position to pull the complete codes? Thanks!

Questions about merge-path algo!

When I ported the merge-spmm from cuda-8.0 to cuda11+ and successful compiled, I got the memory fatal access errors when running gspmm --mode=mergepath XXX.mtx.
The followings are compute-sanitizer tool's partial outputs:

========= Invalid __shared__ read of size 4 bytes
=========     at 0x12a0 in /merge-spmm/ext/moderngpu/include/device/ctasearch.cuh:50:void mgpu::BinarySearchIt<(mgpu::MgpuBounds)1, int, const int *, int, mgpu::less<int>>(T3, int &, int &, T4, int, T5)
=========     by thread (0,0,0) in block (2,0,0)
=========     Address 0xf58 is out of bounds
=========     Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasearch.cuh:85:int mgpu::BinarySearch<(mgpu::MgpuBounds)1, int, const int *, mgpu::less<int>>(T3, int, T2, T4) [0x1280]
=========     Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasegreduce.cuh:92:int mgpu::DeviceExpandCsrRows<(int)128, (int)1>(int, int, const int *, int, int, int *, int *) [0xf80]
=========     Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasegreduce.cuh:166:mgpu::SegReduceTerms mgpu::DeviceSegReducePrepareSpmm<(int)128, (int)1>(const int *, int *, int, int, int, int, bool, int *, int *) [0xf80]
=========     Device Frame:/merge-spmm/ext/moderngpu/include/kernels/spmvcsr.cuh:203:void mgpu::KernelSpmmCsr<(int)32, (int)128, (bool)0, (bool)1, float *, int *, int *, const int *, float *, float *, float, mgpu::multiplies<float>, mgpu::plus<float>>(T5, T6, int, T7, T8, T9, T7, T10, T11 *, T11, T12, T13, int) [0xf20]

The source codes I have modified includes cusparse API, warp shuffle functions from __shfl* to __shfl_sync*, __any to __any_sync.

If you have any suggestions, thanks for helps!

Error while compiling gpu_spmv

There are some deprecated functions being called (cusparseScsrmv, cusparseDcsrmv) in gpu_spmv.cu. I am using CUDA 11.2. Suppose these calls are modified according to latest functions available in cuSPARSE, will the compilation happen smoothly with this part and the rest?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.