Git Product home page Git Product logo

Comments (10)

papajohn avatar papajohn commented on April 29, 2024 1

I read some other posts [1], [2] about this issue for other projects. It appears that such an error doesn't matter, and the GPU still gets used (just with some memory growth factor that gets adjusted over time).

You can check whether your GPU is being used with the nvidia-smi command-line tool.

[1] http://stackoverflow.com/questions/39465503/cuda-error-out-of-memory-in-tensorflow
[2] tensorflow/tensorflow#6048

from seq2seq.

DaoD avatar DaoD commented on April 29, 2024

Same problem, but I solve it by ignoring the file "train_seq2seq.yml" and just using the "nmt_xxxx.yml"
I guess there is some trouble in the configuration about hooks, but I'm not sure.

from seq2seq.

zzks avatar zzks commented on April 29, 2024

@DaoD Thanks for you reply!
However, I tried your method, still got CUDA_ERROR_OUT_OF_MEMORY.
Anyone succeeded to run the sample?
Could you tell us your configuration and env?

from seq2seq.

DaoD avatar DaoD commented on April 29, 2024

Yes the error will still happen but the training process could be continued.
I don't know the reason.

from seq2seq.

dennybritz avatar dennybritz commented on April 29, 2024

I think this is a different error and not related to the GPU:

"predicted_tokens": self._pred_dict["predicted_tokens"],
KeyError: 'predicted_tokens'

This is probably the same as #43. I haven't been able to reproduce this and not sure what is wrong here.

from seq2seq.

dennybritz avatar dennybritz commented on April 29, 2024

I'm closing this one because it seems like a duplicate of #43 - please discuss there or re-open the issue if it's not a duplicate.

from seq2seq.

zzks avatar zzks commented on April 29, 2024

@dennybritz @papajohn thanks for your reply!
I updated the new seq2seq, got some new errors.
however, i tried @DaoD 's method(no buckets) again, it still CUDA_ERROR_OUT_OF_MEMORY but the training process is continued.

new errors when using buckets config are as following (python3.4):
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 6810 get requests, put_count=8013 evicted_count=2000 eviction_rate=0.249594 and unsatisfied allocation rate=0.119824
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 212 to 233
INFO:tensorflow:Performing full trace on next step.
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcupti.so.8.0. LD_LIBRARY_PATH: :/usr/local/cuda/lib64:/usr/local/cudnn/lib64
F tensorflow/core/platform/default/gpu/cupti_wrapper.cc:59] Check failed: ::tensorflow::Status::OK() == (::tensorflow::Env::Default()->GetSymbolFromLibrary( GetDsoHandle(), kName, &f)) (OK vs. Not found: /home/sbai/tf134/lib/python3.4/site-packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: cuptiActivityRegisterCallbacks)could not find cuptiActivityRegisterCallbacksin libcupti DSO
Aborted (core dumped)

Anyway, I can run the sample now.

from seq2seq.

ayushidalmia avatar ayushidalmia commented on April 29, 2024

I also run into this. Any solution? A lighter unittest will help

from seq2seq.

eugenioclrc avatar eugenioclrc commented on April 29, 2024

same here...

from seq2seq.

myagmur01 avatar myagmur01 commented on April 29, 2024

Have you checked your operating processes on GPU ? It may arise by multiple opened environments that cause GPU overloaded. I recommend to close all terminals and open again. I hope this works for you.

from seq2seq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.