okuchaiev / f-lm Goto Github PK
View Code? Open in Web Editor NEWLanguage Modeling
License: MIT License
Language Modeling
License: MIT License
After training a G-LSTM, I got error when evaluating it:
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key model/lstm_0/lstm_cell/biases not found in checkpoint
This error occurs when restoring the ckpt model.
How can I solve this issue?
Can it still useful for the latest repo code? Thanks.
Hi, I am trying to use the pre-trained model for evaluation, but I am seeing an error while restoring the model parameters. Is the code up to date with it?
This is the error that I see. I tried searching for some of the missing parameters in the graph.pbtxt file, but they weren't there. I tested with both the head commit and d98fb11.
$ python3 single_lm_train.py --logdir=/path/to/my/logdir --num_gpus=2 --datadir=/path/to/my/datadir --mode=eval_full --hpconfig run_profiler=False,float16_rnn=False,max_time=$SECONDS,num_steps=20,num_shards=8,num_layers=2,learning_rate=0.2,max_grad_norm=1,keep_prob=0.9,emb_size=1024,projected_size=1024,state_size=8192,num_sampled=8192,batch_size=4,num_of_groups=0
*****HYPER PARAMETERS*****
{'batch_size': 4, 'num_steps': 20, 'num_shards': 8, 'num_layers': 2, 'learning_rate': 0.2, 'max_grad_norm': 1.0, 'num_delayed_steps': 150, 'keep_prob': 0.9, 'optimizer': 0, 'vocab_size': 793470, 'emb_size': 1024, 'state_size': 8192, 'projected_size': 1024, 'num_sampled': 8192, 'num_gpus': 2, 'float16_rnn': False, 'float16_non_rnn': False, 'average_params': True, 'run_profiler': False, 'do_summaries': False, 'max_time': 1303, 'fact_size': None, 'fnon_linearity': 'none', 'num_of_groups': 0}
**************************
Not using groups
Not using fnonlinearities
Not using groups
Not using fnonlinearities
Not using groups
Not using fnonlinearities
Not using groups
Not using fnonlinearities
Averaging parameters for evaluation.
2017-12-23 11:35:51.468529: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2017-12-23 11:35:51.747194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:17:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2017-12-23 11:35:51.970520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:65:00.0
totalMemory: 10.91GiB freeMemory: 10.31GiB
2017-12-23 11:35:51.971259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2017-12-23 11:35:51.971284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1
2017-12-23 11:35:51.971289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0: Y Y
2017-12-23 11:35:51.971292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1: Y Y
2017-12-23 11:35:51.971299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:17:00.0, compute capability: 6.1)
2017-12-23 11:35:51.971303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
2017-12-23 11:35:52.541605: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
2017-12-23 11:35:52.542993: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/W_0/ExponentialMovingAverage not found in checkpoint
2017-12-23 11:35:52.544005: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_1/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
2017-12-23 11:35:52.544978: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_1/LSTMCell/W_0/ExponentialMovingAverage not found in checkpoint
2017-12-23 11:35:52.669979: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:52.772370: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:52.863129: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:53.007704: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:53.021356: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:54.951154: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:54.955047: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:54.959807: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:54.959976: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:54.967513: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:55.552041: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:55.576411: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:55.582257: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]
2017-12-23 11:35:55.858505: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key model/model/lstm_0/LSTMCell/B/ExponentialMovingAverage not found in checkpoint
...
Thanks
Hey. Thanks for the amazing article!
I'm trying to use G-LSTM for my cell in dynamic_rnn and I got this error:
File "/language_model.py", line 30, in init
loss = self._forward(i, xs[i], ys[i], lengths[i])
File /language_model.py", line 121, in _forward
inputs=x)
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 574, in dynamic_rnn
dtype=dtype)
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 737, in _dynamic_rnn_loop
swap_memory=swap_memory)
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2770, in while_loop
result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2599, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2549, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 722, in _time_step
(output, new_state) = call_cell()
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 708, in
call_cell = lambda: cell(input_t, state)
File "/factorized_lstm_cells.py", line 172, in call
self._get_input_for_group(m_prev, group_id, self._group_shape[0])], axis=1)
File "/factorized_lstm_cells.py", line 129, in _get_input_for_group
name="GLSTMinputGroupCreation")
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 547, in slice
return gen_array_ops.slice(input, begin, size, name=name)
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2896, in _slice
name=name)
File "/.pyenv/versions/tflow/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 499, in apply_op
repr(values), type(values).name))
TypeError: Expected int32 passed to parameter 'size' of op 'Slice', got [None, 128] of type 'list' instead.
Looks like its not proccessing cause of the size=[inpt.get_shape()[0].value, group_size]
line, because the input size (apperantly, both batch size and time) is dynamic.
I think it can be treated with passing the batch_size directly to cell, but if there is any good solution, I'd be grateful if you'd tell me.
hi could you please share pre-trained model?
Is a checkpoint required to run the model? It keeps printing out "No checkpoint file found. Waiting...".
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.