rafaljozefowicz / lm Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hi, i have read your code here, but several implementation details quite confused me. Hope for your help.
sharded embedding - emb_vars = sharded_variable("emb", [hps.vocab_size, hps.emb_size], hps.num_shards)
Could you tell me why you split the embeddings to several shards. What are the benefits for doing this?
delivery lstm state between batch - self.initial_states[i].assign(state)
As you have shuffled sentences before training, so i can't see links between training examples of adjacent batch. Therefore i can't understand why we need to delivery lstm state between batch.
https://github.com/rafaljozefowicz/lm/blob/master/language_model.py#L119-L122
for i in range(len(emb_grads)):
assert isinstance(emb_grads[i], tf.IndexedSlices)
emb_grads[i] = tf.IndexedSlices(emb_grads[i].values * hps.batch_size, emb_grads[i].indices,
emb_grads[i].dense_shape)
why is emb_grad modified this way?
at begin of a sentence, I think should reset the LSTM states as other rnnlm(https://github.com/facebookresearch/adaptive-softmax, https://github.com/yoonkim/lstm-char-cnn)
Hi Rafal, thank you for the code! Not sure if you are still supporting it.
But I keep getting errors using it with tf 1.2, after converting your code to be compatible with the new version tf using tf_upgrade.py. At first it complained targets
has int32 and doesn't match float32 in
loss = tf.nn.sampled_softmax_loss(softmax_w, softmax_b, tf.to_float(inputs),
targets, hps.num_sampled, hps.vocab_size)
So I changed targets
to tf.to_float(targets)
; now I am getting the error shown below:
$ python single_lm_train.py --logdir log --num_gpus 1 --datadir data --hpconfig emb_size=100,state_size=256,projected_size=128
Traceback (most recent call last):
File "single_lm_train.py", line 38, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "single_lm_train.py", line 27, in main
run_train(dataset, hps, FLAGS.logdir + "/train", ps_device="/gpu:0")
File "/home/ccrmad/Code/lm-master/run_utils.py", line 14, in run_train
model = LM(hps, "train", ps_device)
File "/home/ccrmad/Code/lm-master/language_model.py", line 24, in __init__
loss = self._forward(i, xs[i], ys[i], ws[i])
File "/home/ccrmad/Code/lm-master/language_model.py", line 100, in _forward
tf.to_float(targets), hps.num_sampled, hps.vocab_size)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_impl.py", line 1247, in sampled_softmax_loss
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_impl.py", line 1007, in _compute_sampled_logits
inputs, sampled_w, transpose_b=True) + sampled_b
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1825, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1242, in _mat_mul
transpose_b=transpose_b, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2536, in create_op
set_shapes_for_outputs(ret)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1818, in set_shapes_for_outputs
shapes = shape_func(op)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1768, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 595, in call_cpp_shape_fn
require_shape_fn)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/common_shapes.py", line 659, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
ValueError: Dimensions must be equal, but are 1 and 128 for 'model/model/sampled_softmax_loss/MatMul_1' (op: 'MatMul') with input shapes: [5120,1], [?,128].
Any idea why?
Hi, I'm trying to run the code on Google Colab but I'm facing the following error:
/usr/local/lib/python3.6/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From /content/drive/app/lm-master/model_utils.py:18: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
Traceback (most recent call last):
File "drive/app/lm-master/single_lm_train.py", line 38, in <module>
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "drive/app/lm-master/single_lm_train.py", line 27, in main
run_train(dataset, hps, FLAGS.logdir + "/train", ps_device="/gpu:0")
File "/content/drive/app/lm-master/run_utils.py", line 14, in run_train
model = LM(hps, "train", ps_device)
File "/content/drive/app/lm-master/language_model.py", line 24, in __init__
loss = self._forward(i, xs[i], ys[i], ws[i])
File "/content/drive/app/lm-master/language_model.py", line 62, in _forward
emb_vars = sharded_variable("emb", [hps.vocab_size, hps.emb_size], hps.num_shards)
File "/content/drive/app/lm-master/model_utils.py", line 18, in sharded_variable
initializer = tf.uniform_unit_scaling_initializer(dtype=dtype, full_shape=shape)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 250, in new_func
return func(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'full_shape'
I see that in the new version of tf the full_shape is no longer an argument of uniform_unit_scaling_initializer
.
I tried removing the shape argument to test, but I faced another error:
/usr/local/lib/python3.6/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From /content/drive/app/lm-master/model_utils.py:18: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
preferred_dtype=default_dtype)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1036, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 879, in _TensorTensorConversionFunction
(dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype int32 for Tensor with dtype float32: 'Tensor("model/model/dropout/mul:0", shape=(512,), dtype=float32, device=/gpu:0)'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "drive/app/lm-master/single_lm_train.py", line 38, in <module>
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "drive/app/lm-master/single_lm_train.py", line 27, in main
run_train(dataset, hps, FLAGS.logdir + "/train", ps_device="/gpu:0")
File "/content/drive/app/lm-master/run_utils.py", line 14, in run_train
model = LM(hps, "train", ps_device)
File "/content/drive/app/lm-master/language_model.py", line 24, in __init__
loss = self._forward(i, xs[i], ys[i], ws[i])
File "/content/drive/app/lm-master/language_model.py", line 68, in _forward
inputs = [tf.squeeze(v, [1]) for v in tf.split(1, hps.num_steps, x)]
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py", line 1366, in split
axis=axis, num_split=num_or_size_splits, value=value, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 5069, in _split
"Split", split_dim=axis, value=value, num_split=num_split, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 533, in _apply_op_helper
(prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'split_dim' of 'Split' Op has type float32 that does not match expected type of int32.
Then I tried to convert num_steps into int32 which was unsuccessful.
Basically above is my unsuccessful attempt in fixing this error. What should I do about it and how can I handle shape argument in uniform_unit_scaling_initializer
Using the (default configuration) LSTM-2048-512, we're able to run sampled softmax. However, when running eval with full softmax (num_sampled = 0) we hit a crash.
When running on CPU we get a Segmentation fault during the first call to sess.run (run_utils.py, line 121):.
With GPU, execution reaches the first call to sess.run (as above), but the error traces back to an earlier line (run_utils.py, line 94).
...
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 24.21GiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:968] Resource exhausted: OOM when allocating tensor with shape[8192,793470]
...
Environment is a ubuntu 14.04 box has 128GB RAM, 4xGTX1080s(6GB), and TF 0.10.
Is there a way to run a full softmax on this hardware?
Haven't validated if accuracy is unaffected, but for the model to run on 0.12, tf.nn.rnn_cell.LSTMCell needs to be changed to:
cell = tf.nn.rnn_cell.LSTMCell(hps.state_size, hps.emb_size, num_proj=hps.projected_size, state_is_tuple=False)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.