Git Product home page Git Product logo

f-lm's Issues

Error when loading checkpoint model

After training a G-LSTM, I got error when evaluating it:

W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/lstm_0/lstm_cell/biases not found in checkpoint

This error occurs when restoring the ckpt model.

How can I solve this issue?

Can't restore model from pre-trained model link

Hi, I am trying to use the pre-trained model for evaluation, but I am seeing an error while restoring the model parameters. Is the code up to date with it?

This is the error that I see. I tried searching for some of the missing parameters in the graph.pbtxt file, but they weren't there. I tested with both the head commit and d98fb11.

$ python3 single_lm_train.py --logdir=/path/to/my/logdir --num_gpus=2 --datadir=/path/to/my/datadir --mode=eval_full --hpconfig run_profiler=False,float16_rnn=False,max_time=$SECONDS,num_steps=20,num_shards=8,num_layers=2,learning_rate=0.2,max_grad_norm=1,keep_prob=0.9,emb_size=1024,projected_size=1024,state_size=8192,num_sampled=8192,batch_size=4,num_of_groups=0
*****HYPER PARAMETERS*****
{'batch_size': 4, 'num_steps': 20, 'num_shards': 8, 'num_layers': 2, 'learning_rate': 0.2, 'max_grad_norm': 1.0, 'num_delayed_steps': 150, 'keep_prob': 0.9, 'optimizer': 0, 'vocab_size': 793470, 'emb_size': 1024, 'state_size': 8192, 'projected_size': 1024, 'num_sampled': 8192, 'num_gpus': 2, 'float16_rnn': False, 'float16_non_rnn': False, 'average_params': True, 'run_profiler': False, 'do_summaries': False, 'max_time': 1303, 'fact_size': None, 'fnon_linearity': 'none', 'num_of_groups': 0}
**************************
Not using groups
Not using fnonlinearities
Not using groups
Not using fnonlinearities
Not using groups
Not using fnonlinearities
Not using groups
Not using fnonlinearities
Averaging parameters for evaluation.
2017-12-23 11:35:51.468529: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2017-12-23 11:35:51.747194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:17:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2017-12-23 11:35:51.970520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 1 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:65:00.0
totalMemory: 10.91GiB freeMemory: 10.31GiB
2017-12-23 11:35:51.971259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2017-12-23 11:35:51.971284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 
2017-12-23 11:35:51.971289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0:   Y Y 
2017-12-23 11:35:51.971292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1:   Y Y 
2017-12-23 11:35:51.971299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:17:00.0, compute capability: 6.1)
2017-12-23 11:35:51.971303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
2017-12-23 11:35:52.541605: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
2017-12-23 11:35:52.542993: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage not found in checkpoint
2017-12-23 11:35:52.544005: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_1/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
2017-12-23 11:35:52.544978: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_1/LSTMCell/W_0/ExponentialMovingAverage not found in checkpoint
2017-12-23 11:35:52.669979: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
	 [[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:52.772370: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
	 [[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:52.863129: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
	 [[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:53.007704: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
	 [[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:53.021356: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
	 [[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:54.951154: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
	 [[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:54.955047: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
	 [[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:54.959807: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
	 [[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:54.959976: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
	 [[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:54.967513: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
	 [[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:55.552041: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
	 [[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:55.576411: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
	 [[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:55.582257: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
	 [[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:55.858505: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint

...

Thanks

using G-LSTM with dynamic-rnn

Hey. Thanks for the amazing article!

I'm trying to use G-LSTM for my cell in dynamic_rnn and I got this error:
File "/language_model.py", line 30, in init
loss = self._forward(i, xs[i], ys[i], lengths[i])
File /language_model.py", line 121, in _forward
inputs=x)
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 574, in dynamic_rnn
dtype=dtype)
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 737, in _dynamic_rnn_loop
swap_memory=swap_memory)
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2770, in while_loop
result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2599, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2549, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 722, in _time_step
(output, new_state) = call_cell()
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 708, in
call_cell = lambda: cell(input_t, state)
File "/factorized_lstm_cells.py", line 172, in call
self._get_input_for_group(m_prev, group_id, self._group_shape[0])], axis=1)
File "/factorized_lstm_cells.py", line 129, in _get_input_for_group
name="GLSTMinputGroupCreation")
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 547, in slice
return gen_array_ops.slice(input, begin, size, name=name)
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2896, in _slice
name=name)
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 499, in apply_op
repr(values), type(values).name))
TypeError: Expected int32 passed to parameter 'size' of op 'Slice', got [None, 128] of type 'list' instead.

Looks like its not proccessing cause of the size=[inpt.get_shape()[0].value, group_size] line, because the input size (apperantly, both batch size and time) is dynamic.
I think it can be treated with passing the batch_size directly to cell, but if there is any good solution, I'd be grateful if you'd tell me.

Checkpoint required?

Is a checkpoint required to run the model? It keeps printing out "No checkpoint file found. Waiting...".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.