Comments (13)
I am not sure. I have seen errors like this happen when using the same GPU in multiple processes. Make sure that you don't have any other python/tensorflow processes running (including notebooks).
from seq2seq.
Thanks for reply my #98 issue. But, actually, I don't have any other processes running. But I run only on one GPU, and encounter this problem at evaluation. Is there any conflict between training and validation?
from seq2seq.
I checked the log. It seems when doing validation, a new process is started and trying to use the same GPU and then failed in this error. It's just my guess. I'm still not quite familiar with the code.
from seq2seq.
It's normal that validation loads the model graph again, and it shouldn't be an issue. Otherwise there is no difference between validation and training, except for the data. It's difficult for me to debug this without having a way to reproduce the error..
Does it work if you run on the CPU only?
from seq2seq.
Ok, I see. Thank you! But I think this error must be related to GPU, I didn't encounter the problem on CPU. I cannot figure out what's wrong here now. So, I decide to train model only without validation and save more checkpoints to validate later.
from seq2seq.
What GPU do you have?
from seq2seq.
01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)
This GPU works well before, and if I drop the validation and only train, it also works well. So, maybe there are some configs that I ignore or set wrong...
from seq2seq.
I met the same error...
from seq2seq.
I have also encountered the same error.
from seq2seq.
It seems that I might have managed to solve the issue by switching to the latest development version of Tensorflow
(compiled from source from the master
branch). At least now I'm getting a different error. See #102. I am not yet sure if now #102 doesn't just make it crash earlier. I will report back if that's the case.
EDIT:
I am no longer sure, that switching to development version of Tensorflow
is the thing that helped. I switched back to an evironment with the latest stable version (from pip install tensorflow-gpu
) and I could no longer reproduce the error. Instead I now get #101 which also happens while evaluating. It seems that something weird is going on with that process.
It would still be interesting to see if others have any luck by installing the development version of Tensorflow
.
from seq2seq.
Thanks for reporting, I do believe all of these issues are caused by the same problem, so I will close them and create a new issue to handle these: #103
from seq2seq.
Hi I am also getting a similar error while training autoencoder.
Caused by op u'decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/TensorArrayReadV3', defined at:
File "train_autoencoder.py", line 137, in
tf.app.run()
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "train_autoencoder.py", line 133, in main
train_autoencoder()
File "train_autoencoder.py", line 36, in train_autoencoder
model = Autoencoder(FLAGS.lstm_units, embedding_matrix, 0, 1, num_layers=FLAGS.num_layers, train_embeddings=False)
File "/home/abhay/Search_And_Match/encoder.py", line 110, in init
impute_finished=False)
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 304, in dynamic_decode
swap_memory=swap_memory)
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3224, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2956, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2893, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 249, in body
decoder_finished) = decoder.step(time, inputs, state)
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/basic_decoder.py", line 146, in step
sample_ids=sample_ids)
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/helper.py", line 250, in next_inputs
lambda: nest.map_structure(read_from_ta, self._input_tas))
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2072, in cond
orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1913, in BuildCondBranch
original_result = fn()
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/helper.py", line 250, in
lambda: nest.map_structure(read_from_ta, self._input_tas))
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/python/util/nest.py", line 375, in map_structure
structure[0], [func(*x) for x in entries])
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/seq2seq/python/ops/helper.py", line 247, in read_from_ta
return inp.read(next_time)
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/tensor_array_ops.py", line 861, in read
return self._implementation.read(index, name=name)
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/tensor_array_ops.py", line 260, in read
name=name)
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 6428, in tensor_array_read_v3
dtype=dtype, name=name)
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/home/abhay/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1718, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Tried to read from index 50 but array size is: 50
[[Node: decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/TensorArrayReadV3 = TensorArrayReadV3[dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/TensorArrayReadV3/Switch, decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/TensorArrayReadV3/Switch_1/_65, decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/TensorArrayReadV3/Switch_2/_67)]]
from seq2seq.
Hey @akanyaani ! I'm getting a very similar stack trace. Please could you tell what you did to solve the error?
from seq2seq.
Related Issues (20)
- speeding up inference nmt chatbot nlp
- InvalidArgumentError, Found Inf or NaN gradient(global norm). HOT 2
- Invalid argument: No OpKernel was registered to support Op 'PyFunc' HOT 4
- ValueError: Can not provide both every_secs and every_steps
- seq2seq checkpoint restore for transfer learning
- num_units is not a valid argument for BasicLSTMCell class tf 1.14 HOT 3
- KeyErrors when running pipeline test HOT 8
- Fix Google seq2seq Installation Errors
- AttributeError: module 'tensorflow.python.platform.flags' has no attribute '_FlagValues' HOT 4
- Error while executing
- tensorflow.python.framework.errors_impl.NotFoundError : Key not found HOT 2
- Error while making predictions (Testing).
- Deprecate non-standard BLEU scripts
- How to build a character based seq2seq tensorflow model for spell correction?
- Error On Setup HOT 1
- WMT 2016 En-De Download Link is broken HOT 1
- python -m unittest seq2seq.test.pipeline_test -> ModuleNotFoundError: No module named 'seq2seq' HOT 2
- ModuleNotFoundError: No module named 'tensorflow.contrib' HOT 2
- ModuleNotFoundError: No module named 'tensorflow' HOT 1
- Can I decode embedings to sequences using seq2seq? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from seq2seq.