I'm trying to run the rnn_bench from DeepBench on multiple HW platforms. In my case,

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

DeepBench rnn_bench fails when MIOpen built with debug, works when MIOpen built with release about miopen HOT 5 CLOSED

rocm commented on July 30, 2024

DeepBench rnn_bench fails when MIOpen built with debug, works when MIOpen built with release

from miopen.

Comments (5)

mattsinc commented on July 30, 2024

The uninitialized reads in the valgrind trace are coming the w index in the array not being initialized in rnn_bench_rocm.cpp in DeepBench. I'm not sure what the expected value is here -- should w (dim[3]) just be set to 1? For what it's worth, if I set w to 1 for all of the above lines in rnn_bench_rocm.cpp, all of the uninitialized read errors go away but the assert failure still happens. So I suspect this is a second problem with the code -- I can push this change to DeepBench pending the answer about what value should be used.

from miopen.

dagamayank commented on July 30, 2024

@mattsinc are you using this repo - https://github.com/ROCmSoftwarePlatform/DeepBench?

from miopen.

mattsinc commented on July 30, 2024

@dagamayank, no I had been using the internal DeepBench-v1 repo (I was not aware of this one). Looks like there were some changes between the repos that may address at least the valgrind issue I highlighted above. It looks like the fixes @daniellowell mentioned in Slack for miopen_helper.h are also in there too.

I just clone'd this repo and ran it. Certainly the place where the failure is happening is not happening with this version, although I'm going to run a few more tests overnight to be sure. A perhaps dumb question: why did you all change only https://github.com/ROCmSoftwarePlatform/DeepBench/blob/master/code/amd/rnn_bench_rocm.cpp#L48-L51, and not change https://github.com/ROCmSoftwarePlatform/DeepBench/blob/master/code/amd/rnn_bench_rocm.cpp#L52-L59? It seems like the same change would be needed for all of them, but I don't see the assert trigger for lines 52-59, so perhaps there's something subtle I'm not understanding.

Thanks,
Matt

from miopen.

daniellowell commented on July 30, 2024

From our documentation:

hxDesc: A hidden tensor descriptor that has as its first dimension of the number of layers if the direction mode is unidirectional and twice the number of layers if the direction mode is bidirectional. The second dimension of the descriptor must equal the largest first dimension of the xDesc tensor descriptor array. The third dimension equals the hiddenSize. (input)

Because in Deepbench we are using a single layer, the first element of the tuple in the first argument in the lines for 52-59 is 1.

xDesc: An array of tensor descriptors. These are the input descriptors to each time step. The first dimension of each descriptor is the batch size and may decrease from element n to element n+1 and not increase in size. The second dimension is the same for all descriptors in the array and is the input vector length. (input)

The lines 48-51 are arrays of data tensor descriptors. The first tuple argument is batch_size, hidden_size. In Deepbench hidden_size = input_vector_size, so we use this pair. The array size is the number of timesteps.

Let's just use https://github.com/ROCmSoftwarePlatform/DeepBench for further testing. We'll depreciate the other repo. I'm closing this issue, but if you are still having problems on this topic I can reopen it.

Thanks for testing our Deepbench, let me know if you have any other issues.

from miopen.

mattsinc commented on July 30, 2024

@daniellowell, @dagamayank, thanks for the help and info.

from miopen.

DeepBench rnn_bench fails when MIOpen built with debug, works when MIOpen built with release about miopen HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent