maaars / cudpp Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 44.85 MB

Automatically exported from code.google.com/p/cudpp

License: Other

CMake 0.58% C 42.14% C++ 25.27% Cuda 32.02%

cudpp's People

Contributors

Watchers

cudpp's Issues

Code review request

Purpose of code changes on this branch:

Add tridiagonal solvers to cudpp.

When reviewing my code changes, please focus on:


After the review, I'll merge this branch into:
/trunk

Original issue reported on code.google.com by [email protected] on 13 Feb 2010 at 12:34

Wrong result for UINT seg min scan

Reproduction of the problem:
1. Download files from http://www.ilab.sztaki.hu/~erikbodzsar/cudpp/
2. Compile test.cu
3. Run ./a.out <error.txt

The test program runs a segmented min scan on the input data contained in
error.txt. Cudpp gets some elements of the result wrong (the first wrong
element, and some preceding and following elements will be printed out by
the test program).

I'm using cudpp 1.1, on a 64-bit debian system with debian version 5.0.2,
CUDA 2.2, g++/gcc 4.1.3.

Original issue reported on code.google.com by [email protected] on 5 Aug 2009 at 10:53

sorting test failed.

Hi,
I just build cudpp and ran cudpp_testrig which failed with 

(all previous tests were correct)
Running a sort of 1048581 unsigned int key-value pairs
Unordered key[1048576]:4294966923 > key[1048577]:0
Incorrectly sorted value[1048577] (0) 3530798281 != 0
GPU test FAILED
Average execution time: 2.586515 ms
Running a sort of 2097152 unsigned int key-value pairs
Unordered key[1048576]:4294966923 > key[1048577]:0
Incorrectly sorted value[1048577] (0) 3530798281 != 0
GPU test FAILED
Average execution time: 0.000000 ms
Running a sort of 4194304 unsigned int key-value pairs
Unordered key[1048576]:4294966923 > key[1048577]:0
Incorrectly sorted value[1048577] (0) 3530798281 != 0
GPU test FAILED
Average execution time: 0.000000 ms
Running a sort of 8388608 unsigned int key-value pairs
Unordered key[1048576]:4294966923 > key[1048577]:0
Incorrectly sorted value[1048577] (0) 3530798281 != 0
GPU test FAILED
Average execution time: 0.000000 ms

My gpu card is a Tesla C1060.

Original issue reported on code.google.com by [email protected] on 19 Jul 2009 at 10:07

findFile/findDir search in wrong direction, and therefore find wrong path if the startDir is repeated in the path

When CUDPP is in a path that has the name "cudpp" in it twice, for example,
the way I keep branches:

~/src/idav/branches/proj/cudpp/release1.1/cudpp/

cudpp_testrig -rand fails to find its files.  This is because cutupPath
goes from the root of the path above, finding the first /cudpp first.  It
should instead work backwards up the tree, so it finds the closest instance
of "startDir", rather than the farthest -- I think this is what users will
expect.

I think the correct way to do this is not using strtok, but by using the
chdir() to traverse up the tree until either the startDir is found or the
root is hit.  I find it hard to believe each OS doesn't have a built-in
function to do this, but a quick google search turns up nothing easy...

This needs to be fixed.  However I think we can leave it until after the
release.

Original issue reported on code.google.com by [email protected] on 29 Jun 2009 at 7:42

cudpp_segmented_scan_cta.cu "Removed dead synchronization intrinsic" advisories

What steps will reproduce the problem?
1. Build cudpp release or debug

What is the expected output? What do you see instead?

Expect no errors or warnings or advisories.  Instead get lots of these:

jS4_PjS9_
src/cta/segmented_scan_cta.cu(868): Advisory: Removed dead synchronization
intrinsic from function
_Z14segmentedScan4If19SegmentedScanTraitsIfL13CUDPPOperator2ELb0ELb0ELb0ELb0ELb1
ELb0EEEvPT_PKS3_PKjjS4_PjS9_

Suggested fix:
I realize that removing this __syncthreads() causes failure.  I believe
though that the compiler is only removing it from some calls to the
function that includes it, not all.  So instead of putting the
syncthreads() inside this function, put it right before the call to the
function, only where it is needed.


Please use labels and text to provide additional information.

Original issue reported on code.google.com by [email protected] on 17 Jun 2009 at 1:38

Investigate why CUDPP libraries have gotten so large

The release libcudpp.a on OS X is over 110 MB now.  On linux it is over 34 MB.  
What is causing this?

Original issue reported on code.google.com by [email protected] on 25 Jun 2009 at 12:15

Prevent the use of incorrect plan types with error messages at runtime

As an example, if you pass a handle to a scan plan to cudppSegmentedScan, you 
get a segmentation fault.  This should instead raise an error at runtime 
instead.

This is also related to issue 27.

Original issue reported on code.google.com by [email protected] on 12 Jul 2009 at 9:58

Building a MEX-wrapper using CUDPP in Matlab

There is no bug about it.

I wish I could use a MEX-Wrapper, wich allows me, to use CUDPP in M-Code. 
This (CUDA-)MEX-file could be compiled at first use and allow after people 
like me, who don't know much C, to use GPGPU very easily in Matlab.

On my site sort seems very interesing, in case it gaves back indices (like 
it is need for sortrows).


At the moment there a two different Toolboxes available for using CUDA in 
Matlab: Accelereyes' Jacket and GPUmat (from gp-you.org). Booth of them 
dont allow a sortrows, sort from Jacket is very slow on the other hand.

Original issue reported on code.google.com by [email protected] on 2 Sep 2009 at 2:04

Modify cudpp_rand so that random number generator also has CTA equivalent

Right now the only way to invoke the random number is through the API. 
Provide a version which can use device level routines.

Original issue reported on code.google.com by [email protected] on 19 Aug 2009 at 6:28

Code review request: CUDPP 1.1.1 branch

Purpose of code changes on this branch:

Review the changes I've made to the 1.1.1 branch before we release it.

When reviewing my code changes, please focus on:

Changes to scan_cta.cu, segmented_scan_cta.cu, radixsort_cta.cu, 
radixsort_app.cu, cudpp_util.h, cudpp_globals.h

Original issue reported on code.google.com by [email protected] on 11 Mar 2010 at 5:16

Document Size Limitations for all Algorithms

We get a lot of questions about supported size limitations.  We need to
document all limitations in the CUDPP docs.

Original issue reported on code.google.com by [email protected] on 11 Jun 2009 at 8:22

Any future with a C# wrapper or managed c++ examples?

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 12 Aug 2009 at 7:10

test_rand.cu on blaze is compiling incorrectly

What steps will reproduce the problem?
1. Compile test_rand.cu on blaze (with makefile)

What is the expected output? What do you see instead?

I expect a clean build. Instead I get:

[jowens@blaze cudpp_testrig]$ make
test_rand.cu(63): error: pointer to incomplete class type is not allowed

test_rand.cu(64): error: pointer to incomplete class type is not allowed

2 errors detected in the compilation of
"/tmp/tmpxft_00005b19_00000000-4_cudpp_testrig.cpp1.ii".

Original issue reported on code.google.com by [email protected] on 23 Jun 2009 at 1:24

RadixSortFloatKeys and RadixSortFloatKeysOnly have inconsistent parameter ordering

The summary says it all.  This is internal code and the fix is easy, but not 
crucial.

Original issue reported on code.google.com by [email protected] on 12 Oct 2009 at 9:30

Remove Visual Studio 7.1 projects

CUDA 2.2 doesn't support 7.1 or earlier.

Original issue reported on code.google.com by [email protected] on 16 Jun 2009 at 7:02

cudpp_testrig Stream compact tests ignore the "quiet" flag

Not high priority, but this needs to be fixed.

Original issue reported on code.google.com by [email protected] on 25 Jun 2009 at 12:26

make issues: missing typinfo, cstdlib, etc. includes in cutil and testrig

Hi,

1) cudpp ships with precompiled libcutil.a, but compiling example apps 
fails due to it being in a wrong format. I guess it is compiled for 32bit 
host, and i am running 64 bit. So cudpp make, or some kind other make 
should rebuild it. Running make in common/ fails with

./../common/inc/cmd_arg_reader.h: In member function ‘const T* 
CmdArgReader::getArgHelper(const std::string&)’:
./../common/inc/cmd_arg_reader.h:416: error: must #include <typeinfo> 
before using typeid
./../common/inc/cmd_arg_reader.h:432: error: must #include <typeinfo> 
before using typeid
src/cmd_arg_reader.cpp: In destructor ‘CmdArgReader::~CmdArgReader()’:
src/cmd_arg_reader.cpp:101: error: must #include <typeinfo> before using 
typeid
src/cmd_arg_reader.cpp:106: error: must #include <typeinfo> before using 
typeid
src/cmd_arg_reader.cpp:111: error: must #include <typeinfo> before using 
typeid
src/cmd_arg_reader.cpp:116: error: must #include <typeinfo> before using 
typeid
src/cmd_arg_reader.cpp:121: error: must #include <typeinfo> before using 
typeid
make: *** [obj/release/cmd_arg_reader.cpp_o] Error 1



2) If i change anything in kernels i hava to manualy delete compiled .o 
files, make doesn;t recreate them automaticaly.

3) Building testrig fails with

In file included from spmvmult_gold.cpp:13:
sparse.h: In constructor ‘MMMatrix::MMMatrix(unsigned int, unsigned int, 
unsigned int)’:
sparse.h:46: error: ‘malloc’ was not declared in this scope
spmvmult_gold.cpp: In function ‘void readMatrixMarket(MMMatrix*, const 
char*)’:
spmvmult_gold.cpp:94: error: ‘exit’ was not declared in this scope
spmvmult_gold.cpp:122: error: ‘qsort’ was not declared in this scope
make: *** [obj/release/spmvmult_gold.cpp_o] Error 1

Original issue reported on code.google.com by [email protected] on 22 Jul 2009 at 1:38

Code review request: tools.[h | cpp]

Added tools.h and tools.cpp into the trunk.  Once the code is accepted, I
will update the code in testrig

I checked the file-finding code in the cutil library and that only searches
./data/ and ../../../projects/<executable_name>/data/, which is not general
enough for our purposes.  

Based on John's SPMVMult testrig and my rand testrig, I've written two
types of file searching: finding a directory and finding a file.  I use the
directory finding to find the data/ directory while it seems that John's
needs one to find a specific filename.  The idea for both is that the
function will ascend to a parent directory and do a recursive search down
its children from there.  

So for example, if I were looking for the data directory and I am in, say 
/cudpp/bin/, then I'd call findDir("cudpp", "data", output) where output is
a character array.  In the end of the function, output will contain
"../apps/data".  Note that the recursive search does not search the .svn
directories (I don't think that any data file would be put in there...) 
Ditto for the findFile function.

The code for both file and directory finding uses OS-dependent calls and
libraries.  Linux / Mac uses the dirent.h and unistd.h to find the files
while the Windows version uses io.h and direct.h to find the files.  Right
now I have only checked the two files tools.cpp and tools.h into the trunk
and once they are accepted I will check in the revised testrig files.  I
have already tried the code on Blaze and this does fix Issue 3 (works as
well in Windows on my laptop).  I've tried running the code from various
directories and it finds the regression files no sweat.

Stanley

Original issue reported on code.google.com by [email protected] on 24 Jun 2009 at 12:27

cudppSort error for a large array

What steps will reproduce the problem?
1. tar -xzvf sort_test.tar.gz
2. cd sort_test
3. make
4. ./testsort 1000000

What is the expected output? What do you see instead?

expected :
before sort
radix sort : 0.00833379 s 1000000 elements

what I see :
before sort
radix sort : 0.00833379 s 1000000 elements
sort error 4 720476 541723


What version of the product are you using? On what operating system?

Using device 0: Quadroplex 2200 S4
Quadroplex 2200 S4; global mem: 4294705152B; compute v1.3; clock: 1296000 
kHz

cudpp 1.1.1
CUDA SDK 2.3 
on linux 2.6 kernel

$uname -a
Linux tesla 2.6.18-128.1.1.el5 #1 SMP Tue Feb 10 11:36:29 EST 2009 x86_64 
x86_64 x86_64 GNU/Linux

$cat /proc/driver/nvidia/version 
NVRM version: NVIDIA UNIX x86_64 Kernel Module  190.53  Wed Dec  9 
15:29:46 PST 2009
GCC version:  gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)


Please provide any additional information below.

This is a simple test to use cudppSort. 
For a small array, it passes the test, but for a large array, it fails.
It also fails in cudpp_testrig as follows.

$./cudpp_testrig -sort -n=1000000
Using device 0: Quadroplex 2200 S4
Quadroplex 2200 S4; global mem: 4294705152B; compute v1.3; clock: 1296000 
kHz
Running a sort of 1000000 unsigned int key-value pairs
Unordered key[3]:746051 > key[4]:16173
Incorrectly sorted value[0] (583160) 1153083146 != 460036
GPU test FAILED
Average execution time: 8.024296 ms

1 tests failed


If this is a driver version mismatch, please let me know which driver 
version is needed. Thank you.

Original issue reported on code.google.com by [email protected] on 30 Mar 2010 at 12:06

Merged into: #51

Attachments:

sort_test.tar.gz

Add namespaces to avoid name conflicts

Have already run into some issues in the CUDA SDK samples with name conflicts 
between functions used in cudppRadixSort and the samples.  Add a cudpp 
namespace to avoid this.

Original issue reported on code.google.com by [email protected] on 12 Oct 2009 at 9:29

vectorSegmentedAddUniformToRight4 in vector_kernel needs indices argument clarified

In vector_kernel, the function vectorSegmentedAddUniformToRight4 has an
argument "d_minIndices" but the documentation for that argument refers to
an "array of maximum indices". Please make the documentation and argument
name match the functionality.

Original issue reported on code.google.com by [email protected] on 17 Jun 2009 at 10:48

cudpp_testrig -all does not test backward segmented scan

What steps will reproduce the problem?
1. run cudpp_testrig -all


What is the expected output? What do you see instead?

It should test backward segmented scans (all ops, options), but it doesn't.

Original issue reported on code.google.com by [email protected] on 17 Jun 2009 at 1:35

RFE: Double Precision Support

Request from a user:

As far as I understand cudpp currently only supports the use of single
precision floating numbers by specifying the data type CUDPP_FLOAT.

Is the support for double precision planned in a future version of
cudpp?
When will such a version be available?
What kind of loss in performance do you expect?

Original issue reported on code.google.com by [email protected] on 25 Jun 2009 at 10:00

Add radix sort optimizations from CUDA SDK radix sort code

We need to get the faster version of radix sort from the CUDA SDK into
CUDPP.  Mark is working on this.

Original issue reported on code.google.com by [email protected] on 11 Jun 2009 at 8:41

satGL produces a garbled image

What steps will reproduce the problem?
1. Build and run the satGL sample app


What is the expected output? What do you see instead?

Correct output can be seen by running the device emulation version.  In
release or debug builds, instead the results are a green and blue smear.

Original issue reported on code.google.com by [email protected] on 17 Jun 2009 at 1:57

Add Visual Studio 9 projects

Just duplicate the visual studio 8 projects.  Perhaps we should look into
CMAKE.

Original issue reported on code.google.com by [email protected] on 16 Jun 2009 at 7:02

RFE: Sort key-multiple value arrays

We have key-value sorts, but sometimes you want to sort keys and multiple
value arrays along with them.  Can we make this general and efficient?

This is not a definite CUDPP feature to add, it's something that should be
considered first.

Original issue reported on code.google.com by [email protected] on 24 Jun 2009 at 9:58

Rand tests need to have a more standard output that says passed/failed

What steps will reproduce the problem?
1. Run random tests
2. Look at output

What is the expected output? What do you see instead?

I see:

128 
number of elements: 128, devOutputSize: 32
number of blocks: 1 blocksize: 32 devOutputsize = 32
number of threads: 32

What I want to see is something more like:
Generating 128 random numbers (1 block, 32 threads) ...
GPU test FAILED (x/y correct)

or something like that. (Look at the other ones.)

Also make sure the -q (quiet) option works, as Mark has previously described.

Original issue reported on code.google.com by [email protected] on 17 Jun 2009 at 1:53

CUDPP 1.1 compile errors (and fixes) (gcc 4.3.3-5ubuntu4)

What version of the product are you using? On what operating system?
CUDPP 1.1, gcc 4.3.3-5ubuntu4 on Ubuntu 9.04 x64

Please provide any additional information below.

Building CUDPP 1.1 did not work out-of-the-box for me, and I believe
that some includes that should have been there are missing:

$ cudpp_1.1 cd common
$ common make
[...]
./../common/inc/cmd_arg_reader.h:417: error: must #include <typeinfo>
before using typeid
[...]
src/cutil.cpp:620: error: ‘strlen’ was not declared in this scope
[...]

To fix these:
in cmd_arg_reader.h
#include <typeinfo>
and in cutil.cpp
#include <cstring>

$ common make
[...]
./../common/inc/exception.h:89: error: ‘EXIT_FAILURE’ was not declared
in this scope
[...]

To fix:
#include <cstdlib>
in exception.h

Original issue reported on code.google.com by [email protected] on 13 Aug 2009 at 9:52

Sorting Error

What steps will reproduce the problem?
- unzip gtc_to_sort_test.tar.gz in NVIDIA_CUDA_SDK project folder
1. make (~/NVIDIA_CUDA_SDK/projects/gtc_to_sort_test/)
2. execution (~/NVIDIA_CUDA_SDK/bin/linux/release)
(e.g.: ./gtc_sort_test 
~/NVIDIA_CUDA_SDK/projects/gtc_to_sort_test/input/1.txt 5 30 32)
3.

What is the expected output? What do you see instead?
[extected output]
Finished reading input file.
mi: 161795, mgrid: 32449
Sorting : Success
0.0425751 s Checksum: 0.000000
Sorting : Success
0.0424822 s Checksum: 0.000000
Sorting : Success
0.0425396 s Checksum: 0.000000
Sorting : Success
0.0425428 s Checksum: 0.000000
Sorting : Success
0.0427907 s Checksum: 0.000000
Sorting : Success
0.042537 s Checksum: 0.000000
Sorting : Success
0.0425729 s Checksum: 0.000000
Sorting : Success
0.0425132 s Checksum: 0.000000
Sorting : Success
0.0426874 s Checksum: 0.000000
Sorting : Success
0.0428964 s Checksum: 0.000000
=== Performance summary: BENCH_GPU A0 5057 blocks 32 threads/block ===
  0.0286377 Gflops
  Min: 0.0424822 s -- 0.674 Gflop/s
  Mean: 0.0426137 s -- 0.672 Gflop/s
  Max: 0.0428964 s -- 0.668 Gflop/s
  Stddev: 0.000134837 s (+/- 0.3164%)

[output]
Finished reading input file.
mi: 161795, mgrid: 32449
Sorting : Success
0.0426965 s Checksum: 0.000000
Sorting : Success
0.0425468 s Checksum: 0.000000
Sorting : Success
0.0426379 s Checksum: 0.000000
Sorting : Success
0.0425811 s Checksum: 0.000000
Sorting : Success
0.0426666 s Checksum: 0.000000
Unordered key[983]: 138 > key[984]: 27
Sorting : FAIL
0.0436186 s Checksum: 0.000000
Unordered key[45]: 6392 > key[46]: 6384
Sorting : FAIL
0.0434239 s Checksum: 0.000000
Unordered key[147]: 3 > key[148]: 0
Sorting : FAIL
0.0435097 s Checksum: 0.000000
Unordered key[210]: 218 > key[211]: 0
Sorting : FAIL
0.0436116 s Checksum: 0.000000
Unordered key[132]: 14 > key[133]: 0
Sorting : FAIL
0.0435575 s Checksum: 0.000000
=== Performance summary: BENCH_GPU A0 5057 blocks 32 threads/block ===
  0.0286377 Gflops
  Min: 0.0425468 s -- 0.673 Gflop/s
  Mean: 0.043085 s -- 0.665 Gflop/s
  Max: 0.0436186 s -- 0.657 Gflop/s
  Stddev: 0.000488756 s (+/- 1.134%)

What version of the product are you using? On what operating system?
GTX280
Ubuntu 8.04
cuda 2.2

Please provide any additional information below.
Sorting error occurs sometimes like the ouput example above.
Besides, the same error occurs in cudpp1.1 and cudpp1.1.1 test program as 
well.

Original issue reported on code.google.com by [email protected] on 19 Feb 2010 at 7:17

Attachments:

gtc_to_sort_test.tar.gz

Document all members of enum CUDPPAlgorithm

Right now we have:

enum CUDPPAlgorithm
{
    CUDPP_SCAN,
    CUDPP_SEGMENTED_SCAN,
    CUDPP_COMPACT,
    CUDPP_REDUCE,
    CUDPP_SORT_RADIX,        
    CUDPP_SPMVMULT,          /**< Sparse matrix-dense vector multiplication */
    CUDPP_RAND_MD5,          /**< Pseudo Random Number Generator using MD5
hash algorithm*/
    CUDPP_ALGORITHM_INVALID, /**< Placeholder at end of enum */
};

I didn't catch this in time for release1.1.

Original issue reported on code.google.com by [email protected] on 1 Jul 2009 at 8:02

Code review request: new radix sort implementation

Purpose of code changes on this branch:

To add Mark's optimizations to radix sort from the CUDA SDK.

When reviewing my code changes, please focus on:

radixsort_*.cu
cudpp_plan.*
cudpp_plan_manager.*
cudpp_maximal_launch.*
test_radixsort.cu (cudpp_testrig)

Use SVN diff to guide your review.

Original issue reported on code.google.com by [email protected] on 16 Jun 2009 at 6:52

SpMV doesn't run tests in cudpp_testrig -all

We need a way to regress SpMV.  I don't think it's been tested for the 1.1
release.

Original issue reported on code.google.com by [email protected] on 29 Jun 2009 at 7:49

Switch from custom build steps to cuda.rules file

This will make building (and adding files) much easier on windows.  Get the
latest cuda.rules file from the CUDA SDK.

Original issue reported on code.google.com by [email protected] on 16 Jun 2009 at 7:06

RFE: unsigned long long int types in CUDPP

Request from a user:

 is there an effort to support "unsigned long long int" data type in
CUDPPDatatype? I want to use this data type for getting the integral
image of the square of the image matrix, like SAT. If you use High
resolution images *unsigned int* cannot contain the values. . .

Original issue reported on code.google.com by [email protected] on 25 Jun 2009 at 9:36

Modify rand seed generation

Current rand seed generation only uses a basic seed XOR'd with the
threadIdx and blockIdx.  A more clever way would be to use an LCG. 
Original e-mail suggesting the change:

From Thomas Bradley:
The threadIdx and blockIdx are 16-bit quantities and even fewer bits will
actually be non-zero, therefore you are really only changing the low bits
of your seed. It may be more robust to use an LCG to generate the “input”
fields, for example a=69069 m=32 is easy and not a bad LCG:

state = (state * 69069) & 0xffffffffUL; return state;

Where state is initialized to the seed (combined somehow with the threadIdx
and blockIdx).

Original issue reported on code.google.com by [email protected] on 13 Jul 2009 at 2:57

cudppRand tests fail on 8800 GT GPU

What steps will reproduce the problem?
1. Run cudpp_testrig -rand on 8800 GT GPU

What is the expected output? What do you see instead?

Expect pass, get fail.

I know Stanley already knows about this issue, but I wanted to file it to
make sure it's covered.

Original issue reported on code.google.com by [email protected] on 17 Jun 2009 at 11:33

cudpp_1.1 does not compile with CUDA 2.3 in Debian

What steps will reproduce the problem?
1. compile cutil (succeeds)
2. compile cudpp


What is the expected output? What do you see instead?
I expect cudpp to compile succesfully. Instead i get:

gauguin:/raid/filipe/cudpp_1.1/cudpp> make verbose=1
nvcc  -o obj/release/segmented_scan_app.cu_o -c src/app/
segmented_scan_app.cu --host-compilation=C --compiler-options -fno-strict-
aliasing   -I./ -I./include/ -Isrc/ -Isrc/app/ -Isrc/kernel/ -Isrc/cta/ -
I. -I/opt/cuda/include -I./../common/inc -DUNIX -O 
In file included from /tmp/
tmpxft_00004836_00000000-1_segmented_scan_app.cudafe1.stub.c:6,
                 from src/app/segmented_scan_app.cu:247:
/opt/cuda/bin/../include/crt/host_runtime.h:178: warning: 'struct 
surfaceReference' declared inside parameter list
/opt/cuda/bin/../include/crt/host_runtime.h:178: warning: its scope is 
only this definition or declaration, which is probably not what you want
In file included from src/app/segmented_scan_app.cu:247:
/tmp/tmpxft_00004836_00000000-1_segmented_scan_app.cudafe1.stub.c: In 
function 
'__sti____cudaRegisterAll_53_tmpxft_00004836_00000000_4_segmented_scan_app_cpp1_
ii_999fefc3':
/tmp/tmpxft_00004836_00000000-1_segmented_scan_app.cudafe1.stub.c:11623: 
error: '__fatDeviceText' undeclared (first use in this function)
/tmp/tmpxft_00004836_00000000-1_segmented_scan_app.cudafe1.stub.c:11623: 
error: (Each undeclared identifier is reported only once
/tmp/tmpxft_00004836_00000000-1_segmented_scan_app.cudafe1.stub.c:11623: 
error: for each function it appears in.)
make: *** [obj/release/segmented_scan_app.cu_o] Error 255


What version of the product are you using? On what operating system?
cudpp 1.1 with CUDA 2.3 on Debian sid with both gcc 4.3 and 4.1.


Please provide any additional information below.

I tried compile with emu=1 and this did work. I also tried dbg=1 but this 
didn't seem to make any difference. This might well be a CUDA bug and not 
cudpp's fault.

Original issue reported on code.google.com by [email protected] on 27 Sep 2009 at 7:13

New warning in test_rand.cu on 64-bit linux

What steps will reproduce the problem?
1. Compile cudpp_testrig in Linux.  You will see this new warning:

test_rand.cu(246): warning: variable "memSize" was declared but never
referenced

What is the expected output? What do you see instead?

No warnings.

Please use labels and text to provide additional information.

Original issue reported on code.google.com by [email protected] on 22 Jun 2009 at 6:27

Wrong Results from multiScan for large number of rows

What steps will reproduce the problem?
I've included a function that runs the CUDPP multiScan and checks it
against what I think should appear using the CPU. For me it fails with a
datasize of 50000 and 100 rows.

What is the expected output? What do you see instead?
I would expect the test to pass. If you use the cudppScan inside the For
loop instead the test passes.

What version of the product are you using? On what operating system?

Using CUDPP 1.1 on Vista 64-bit with Visual Studio 2008. I'm using CUDA
2.2. I won't have time to test it using 2.3 as I'm about to leave the
country so maybe someone can confirm that it's still a fault with 2.3.

Please provide any additional information below.

I believe the problem lies in the scan_cta.cu file. In the scanCTA
function, a syncthreads is called only in emulation mode for backwards
scans. I think this needs to get called for device mode as well. At least
that fixed the problem for me. It looks like there would be race conditions
for large numbers of threads.

Original issue reported on code.google.com by [email protected] on 29 Sep 2009 at 12:00

Attachments:

sumTest.cu

rand_cta.cu functions have no documentation.

Stanley, when you get a chance, please document all the functions in
rand_cat.cu.  The other files have good docs, but not this one.

Not marking for Release 1.1 because it's not crucial.

Original issue reported on code.google.com by [email protected] on 26 Jun 2009 at 5:01

CUDPP Errors with OpenMP and Multi-GPU

What steps will reproduce the problem?
==============================================
1) Extract attachment
2) Open cudpp/cudpp.sln in Visual Studio 2008
3) Rebuild Debug solution
4) Open apps/simpleCUDPP_openMP/simpleCUDPP.sln
5) Rebuild Debug solution
6) Start debugging simpleCUDPP

What is the expected output?
==============================================
All tests should pass

What do you see instead?
==============================================
---------
- Run 1:
---------
Windows has triggered a breakpoint in simpleCUDPP.exe.

This may be due to a corruption of the heap, which indicates a bug in
simpleCUDPP.exe or any of the DLLs it has loaded.

This may also be due to the user pressing F12 while simpleCUDPP.exe has focus.

The output window may have more diagnostic information.

---------
- Run 2:
---------
Error destroying CUDPPPlan

---------
- Run 3:
---------
Unhandled exception at 0x006ca87e (cudpp32d.dll) in simpleCUDPP.exe:
0xC0000005: Access violation writing location 0xddddddf1.

---------
- Run 4:
---------
Unhandled exception at 0x007c32e4 (cudpp32d.dll) in simpleCUDPP.exe:
0xC0000005: Access violation reading location 0xfeeefee8.

---------
- Run 5:
---------
Error creating CUDPPPlan

What version of the product are you using? On what operating system?
==============================================
CUDPP 1.1 with CUDA 2.3 beta, Windows XP 32 bit, Visual Studio 2008, GTX 295

Please provide any additional information below.
==============================================
- Running with OpenMP with 2 GPUs. It will occasionally work but generally
fail.
- See the original query at
http://groups.google.co.uk/group/cudpp/browse_thread/thread/507fba92fac36b1e?hl=
en

Original issue reported on code.google.com by [email protected] on 29 Jul 2009 at 11:01

Attachments:

release1.1.zip

Increase the maximum sizes of scans and segmented scans

Segmented Scans and Scans doesnot work on very large array sizes.

Original issue reported on code.google.com by [email protected] on 13 Nov 2009 at 5:45

Investigate compile time

Compile time continues to get longer as we add more functionality.  CUDA is 
really slow at 
compiling template functions with multiple parameters, and we use a lot. There 
are something like 
384 different scan kernels, for example, and a similar number for segscan. 

How can we reduce this code explosion?  Can we give feedback to the CUDA 
compiler team? (Emu 
mode compiles WAY faster for example).

Original issue reported on code.google.com by [email protected] on 25 Jun 2009 at 12:17

Incorrect identity for unsigned int in cuddp_util.h

On line 302 of cuddp_util.h, the CUDPP_MIN identity for unsigned ints is
defined as INT_MAX. This is incorrect, as the maximum unsigned integer is
2*INT_MAX, or UINT_MAX.

Original issue reported on code.google.com by [email protected] on 4 Aug 2009 at 3:23

Provide example of multiple value array sorting by the same index

See issue 17 for more information.  It is not efficient to sort multiple 
value arrays inside CUDPP -- one can sort key-index pairs and then use the 
sorted indices to shuffle/gather the multiple arrays.  This is more efficient 
and more general, but it may not be obvious to users how to do it.  So we 
should provide an example in the "apps" directory.

Original issue reported on code.google.com by [email protected] on 2 Sep 2009 at 9:43

cudpp Makefile: NVCCFLAGS missing -Xcompiler -fPIC

When trying to build a shared library using cudpp on Linux(x86_64) the
version of cudpp delivered with the NVidia SDK (any version) as well any
Version of cudpp including 1.1 result in:

relocation R_X86_64_32 against `a local symbol' can not be used when making
a shared object; recompile with -fPIC

The Problem is known and has been discussed/solved on
http://forums.nvidia.com/lofiversion/index.php?t63748.html

It would be a good idea to include the changes (see attached patch for
version 1.1) in future releases of cudpp as well as the version shipped
with the NVidia SDK.

Original issue reported on code.google.com by [email protected] on 4 Sep 2009 at 9:12

Attachments:

cudpp_fPIC_common_mk.patch

Sorting in emulation mode broken

What steps will reproduce the problem?

Compile and run the following in emulation mode:

#include <stdio.h> 
#include <cudpp/cudpp.h> 
#include <cuda_runtime.h> 
#include <cutil_inline.h> 
typedef unsigned int uint; 
#define N 12 
uint keys[N]    = {111, 37, 430, 433, 431, 357, 6190, 6193, 6191, 
6117, 6837, 6911}; 
uint values[N]  = {37, 111, 433, 430, 357, 431, 6193, 6190, 6117, 
6191, 6911, 6837}; 
int main(){ 
        cudaSetDevice(0); 
        int* keys_dev = 0; 
        int* vals_dev = 0; 
        cutilSafeCall(cudaMalloc((void**)&keys_dev, sizeof(uint) * N)); 
        cutilSafeCall(cudaMalloc((void**)&vals_dev, sizeof(uint) * N)); 
        CUDPPConfiguration sortConfig; 
        sortConfig.algorithm = CUDPP_SORT_RADIX; 
        sortConfig.datatype  = CUDPP_UINT; 
        sortConfig.op            = CUDPP_ADD; 
        sortConfig.options       = CUDPP_OPTION_KEY_VALUE_PAIRS; 
        CUDPPHandle sortPlan; 
        cudppPlan(&sortPlan, sortConfig, 100 /* num elements */, 1 /* num 
rows */, 100 /* pitch */); 
        printf("Before\n"); 
        for (uint i = 0; i < N; i++) { 
                printf("(%d,\t%d)\n", keys[i], values[i]); 
        } 
        cutilSafeCall(cudaMemcpy(keys_dev, keys,        sizeof(uint) * N, 
cudaMemcpyHostToDevice)); 
        cutilSafeCall(cudaMemcpy(vals_dev, values,      sizeof(uint) * N, 
cudaMemcpyHostToDevice)); 
        cudppSort(sortPlan, keys_dev, vals_dev, 32, N); 
        cutilSafeCall(cudaMemcpy(keys,          keys_dev, sizeof(uint) * N, 
cudaMemcpyDeviceToHost)); 
        cutilSafeCall(cudaMemcpy(values,        vals_dev, sizeof(uint) * N, 
cudaMemcpyDeviceToHost)); 
        printf("After\n"); 
        for (uint i = 0; i < N; i++) { 
                printf("(%d,\t%d)\n", keys[i], values[i]); 
        } 
} 

What is the expected output? What do you see instead?

The output should be a list of sorted keys + values.  Instead:

Before 
(111,   37) 
(37,    111) 
(430,   433) 
(433,   430) 
(431,   357) 
(357,   431) 
(6190,  6193) 
(6193,  6190) 
(6191,  6117) 
(6117,  6191) 
(6837,  6911) 
(6911,  6837) 

After 
(37,    111) 
(111,   37) 
(357,   431) 
(357,   431) 
(357,   431) 
(430,   433) 
(6117,  6191) 
(6117,  6191) 
(6117,  6191) 
(6190,  6193) 
(6837,  6911) 
(6911,  6837) 
(6911,  6837) 

Key/value pairs are indeed sorted however some pairs have been duplicated 
whereas others have  
been deleted.

What version of the product are you using? On what operating system?

Using the version bundled with the CUDA toolkit v3.0 beta1 on both MacOS 10.6 
and Ubuntu 
9.04

Please provide any additional information below.

Works correctly when run on the device.

Original issue reported on code.google.com by [email protected] on 20 Nov 2009 at 9:46

Attachments:

test.cu

cudppRand tests in cudpp_testrig fail when run from the command line

What steps will reproduce the problem?
1. Run "cudpp_testrig -all" from the command line

What is the expected output? What do you see instead?

Expect "test passed" for all of them, but instead get a warning about not
finding a regression file.  Also, when this happens, the tests still pass!
 These should be test failures.

Original issue reported on code.google.com by [email protected] on 16 Jun 2009 at 6:44

RFE: Parallel Reductions

We need to finally add parallel reductions.

Note that there are two types: one for associative-commutative operators can be 
optimized more 
than one for operators that are only associative and not commutative.

Original issue reported on code.google.com by [email protected] on 25 Jun 2009 at 10:02

maaars / cudpp Goto Github PK

cudpp's People

Contributors

Watchers

cudpp's Issues

Recommend Projects

Recommend Topics

Recommend Org