weifengliu-ssslab / benchmark_spmv_using_csr5 Goto Github PK
View Code? Open in Web Editor NEWCSR5-based SpMV on CPUs, GPUs and Xeon Phi
License: MIT License
CSR5-based SpMV on CPUs, GPUs and Xeon Phi
License: MIT License
Hi,
I am trying to run CSR5 on a k80 machine. I am using Cuda-8.0 with compiler flag compute_60. Looks like it is running but not passing the test. Can you please have a look at this?
--------------../../../dataset/bcsstk33.mtx--------------
( 8738, 8738 ) nnz = 591904
cpu sequential time = 1.32787 ms. Bandwidth = 5.40169 GB/s. GFlops = 0.891508 GFlops.
Device [0] Tesla K80, @ 823.5MHz.
omega = 32, sigma = 32.
CSR->CSR5 malloc time = 0.566336 ms.
CSR->CSR5 tile_ptr time = 0.024704 ms.
CSR->CSR5 tile_desc time = 0.043552 ms.
CSR->CSR5 transpose time = 0.019552 ms.
CSR->CSR5 time = 0.851136 ms.
CSR5-based SpMV time = 0.00701181 ms. Bandwidth = 1022.95 GB/s. GFlops = 168.831 GFlops.
Check... NO PASS! #Error = 8738 out of 8738 entries.
Here is one more issue when building CSR5 with CUDA:
nvcc -O3 -w -m64 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_52,code=compute_52 main.cu -o spmv -I/usr/local/cuda-6.5/include -I/home/oberhuber/NVIDIA_CUDA-10.1_Samples/common/inc -L/usr/local/cuda-6.5/lib64 -lcudart -D VALUE_TYPE=float -D NUM_RUN=1000
detail/cuda/utils_cuda.h(71): error: function "__shfl_down(double, unsigned int, int)" has already been defined
detail/cuda/utils_cuda.h(80): error: function "__shfl_up(double, unsigned int, int)" has already been defined
detail/cuda/utils_cuda.h(89): error: function "__shfl_xor(double, int, int)" has already been defined
detail/cuda/utils_cuda.h(340): error: function "atomicAdd(double *, double)" has already been defined
4 errors detected in the compilation of "/tmp/tmpxft_0036955f_00000000-4_main.cpp4.ii".
Tomas O.
Dear developers,
I am having problems with building CUDA CSR5 benchmark in CSR5_cuda
subfolder. There seems to be a problem with atomicAdd
for double
on architecture 5.2.
nvcc -O3 -w -m64 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_52,code=compute_52 main.cu -o spmv -I/usr/local/cuda-6.5/include -I/home/oberhuber/NVIDIA_CUDA-10.1_Samples/common/inc -L/usr/local/cuda-6.5/lib64 -lcudart -D VALUE_TYPE=double -D NUM_RUN=1000
detail/cuda/csr5_spmv_cuda.h(352): error: no instance of overloaded function "atomicAdd" matches the argument list
argument types are: (double *, double)
detected during:
instantiation of "void spmv_csr5_calibrate_kernel(const uiT *, const vT *, vT *, iT) [with iT=int, uiT=unsigned int, vT=double]"
Best regards, Tomas.
Hi, thanks for your awesome work and I am very interested in the CSR5 format.
But when I tested the code (AVX2 version) in this repository, the program complained that the result is incorrect. I tested the code with webbase-1M.mtx and some other matrices downloaded from SuiteSparse Matrix Collection. This is the output:
------------------------------------------------------
PRECISION = 64-bit Double Precision
------------------------------------------------------
--------------../webbase-1M/webbase-1M.mtx--------------
( 1000005, 1000005 ) nnz = 3105536
cpu sequential time = 10.758 ms. Bandwidth = 6.8889 GB/s. GFlops = 0.577344 GFlops.
omega = 4, sigma = 16. #partition = 48524
CSR->CSR5 malloc time = 4.077000 ms
CSR->CSR5 tile_ptr time = 29.580000 ms
CSR->CSR5 tile_desc time = 9.116000 ms
CSR->CSR5 transpose time = 5.595000 ms
omega = 4, sigma = 16. #partition = 48524
CSR->CSR5 malloc time = 0.144000 ms
CSR->CSR5 tile_ptr time = 1.335000 ms
CSR->CSR5 tile_desc time = 4.970000 ms
CSR->CSR5 transpose time = 3.740000 ms
omega = 4, sigma = 16. #partition = 48524
CSR->CSR5 malloc time = 0.126000 ms
CSR->CSR5 tile_ptr time = 1.243000 ms
CSR->CSR5 tile_desc time = 4.799000 ms
CSR->CSR5 transpose time = 3.132000 ms
omega = 4, sigma = 16. #partition = 48524
CSR->CSR5 malloc time = 0.122000 ms
CSR->CSR5 tile_ptr time = 1.229000 ms
CSR->CSR5 tile_desc time = 4.324000 ms
CSR->CSR5 transpose time = 3.812000 ms
omega = 4, sigma = 16. #partition = 48524
CSR->CSR5 malloc time = 0.109000 ms
CSR->CSR5 tile_ptr time = 1.020000 ms
CSR->CSR5 tile_desc time = 4.126000 ms
CSR->CSR5 transpose time = 2.930000 ms
omega = 4, sigma = 16. #partition = 48524
CSR->CSR5 malloc time = 0.107000 ms
CSR->CSR5 tile_ptr time = 0.926000 ms
CSR->CSR5 tile_desc time = 3.673000 ms
CSR->CSR5 transpose time = 3.280000 ms
CSR->CSR5 time = 8.053 ms.
CSR5-based SpMV time = 1.08296 ms. Bandwidth = 68.4337 GB/s. GFlops = 5.73529 GFlops.
Check... NO PASS! #Error = 3701 out of 1000005 entries.
------------------------------------------------------
The platform where I ran the code is a CentOS7 server with Intel Xeon E5-2680 v4, and the program was built icc (ICC) 19.1.2.254 20200623 (shipped with Intel System Studio 2020 Update 2. I am wondering that whether it is caused by the compiler.
Hope to get some hints.
Hi,
I'm pretty interested in testing your CSR5 format, but unfortunately do not have an AVX2 cpu.
Do you possibly have an AVX version of csr5_spmv_avx2.h, which you could send me?
Best regards,
Andreas
Should I generate a mtx file by myself, if I want to try your work?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.