google / seq2seq Goto Github PK
View Code? Open in Web Editor NEWA general-purpose encoder-decoder framework for Tensorflow
Home Page: https://google.github.io/seq2seq/
License: Apache License 2.0
A general-purpose encoder-decoder framework for Tensorflow
Home Page: https://google.github.io/seq2seq/
License: Apache License 2.0
How to set the downloaded train data for this file?
Blocked by #18
The documentation should have an end-to-end walkthrough of training and evaluating an Image Captioning model using standard datasets.
Instead of having a bridge_spec
parameter we should have bridge.class
and bridge.params
to keep it consistent with the rest of the parameters.
Traceback (most recent call last):
File "seq2seq/test/pipeline_test.py", line 78, in test_train_infer
os.path.join(BIN_FOLDER, "train.py"))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 11-12: ordinal not in range(128)
Ran 2 tests in 0.003s
FAILED (errors=1)
Due to a lot of refactoring many Python docstrings are currently outdated. Need to go through the code, make sure they are still correct, and update where necessary.
When i try the WMT'16 EN-DE sample, encountered the following CUDA_ERROR_OUT_OF_MEMORY:
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:01:00.0
Total memory: 11.90GiB
Free memory: 11.39GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 11.90G (12778405888 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Traceback (most recent call last):
File "/usr/lib/python3.4/runpy.py", line 170, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.4/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/media/sbai/7A9C9BED9C9BA1E5/DL/seq2seq/bin/train.py", line 251, in
tf.app.run()
File "/home/sbai/tf134/lib/python3.4/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/media/sbai/7A9C9BED9C9BA1E5/DL/seq2seq/bin/train.py", line 246, in main
schedule=FLAGS.schedule)
File "/home/sbai/tf134/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 106, in run
return task()
File "/home/sbai/tf134/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 459, in train_and_evaluate
self.train(delay_secs=0)
File "/home/sbai/tf134/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 281, in train
monitors=self._train_monitors + extra_hooks)
File "/home/sbai/tf134/lib/python3.4/site-packages/tensorflow/python/util/deprecation.py", line 280, in new_func
return func(*args, **kwargs)
File "/home/sbai/tf134/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 426, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/home/sbai/tf134/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 984, in _train_model
_, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])
File "/home/sbai/tf134/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 462, in run
run_metadata=run_metadata)
File "/home/sbai/tf134/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 786, in run
run_metadata=run_metadata)
File "/home/sbai/tf134/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 744, in run
return self._sess.run(*args, **kwargs)
File "/home/sbai/tf134/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 883, in run
feed_dict, options)
File "/home/sbai/tf134/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 909, in _call_hook_before_run
request = hook.before_run(run_context)
File "/media/sbai/7A9C9BED9C9BA1E5/DL/seq2seq/seq2seq/training/hooks.py", line 239, in before_run
"predicted_tokens": self._pred_dict["predicted_tokens"],
KeyError: 'predicted_tokens'
Env: TF1.0 GPU & Python3.4 & ubuntu14.04
I changed the batch size and the num_units into a smaller number, but still encountered the same error.
I tried toy data, met the same error.
Is it because I am using python3.4?
############################### update ###############
I tried it on Python3.5, got the same error at the first try, and got following error when i tried again:
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/monitors.py:267: BaseMonitor.init (from tensorflow.contrib.learn.python.learn.monitors) is deprecated and will be removed after 2016-12-05.
Instructions for updating:
Monitors are deprecated. Please use tf.train.SessionRunHook.
*** Error in `python3.5': double free or corruption (!prev): 0x0000000002870d90 ***
Aborted (core dumped)
Currently, the decoding and metrics are hardcoded to:
@@
This should be configurable and should be able to support processing done by google/sentencepiece. What needs to happen is to allow users to pass parameters to metrics.
What is the best way to implement model ensembling per time step in Tensorflow? Models are ensembled by averaging the output probabilities at each decoding step. Is there a way to do this using raw_rnn
?
here is my config.yml
model: BasicSeq2Seq
model_params:
bridge.class: seq2seq.models.bridges.InitialStateBridge
embedding.dim: 1024
encoder.class: seq2seq.encoders.UnidirectionalRNNEncoder
encoder.params:
rnn_cell:
cell_class: BasicLSTMCell
cell_params:
num_units: 512
dropout_input_keep_prob: 0.8
dropout_output_keep_prob: 1.0
num_layers: 1
decoder.class: seq2seq.decoders.BasicDecoder
decoder.params:
rnn_cell:
cell_class: BasicLSTMCell
cell_params:
num_units: 512
dropout_input_keep_prob: 0.8
dropout_output_keep_prob: 1.0
num_layers: 1
optimizer.name: Adam
optimizer.learning_rate: 0.0001
source.max_seq_len: 83
source.reverse: false
target.max_seq_len: 95
vocab_source: ./data/vocab_post_50000
vocab_target: ./data/vocab_comt_50000
input_pipeline_train:
class: ParallelTextInputPipeline
params:
source_files: ['./data/post_50000']
target_files: ['./data/comt_50000']
batch_size: 64
train_steps: 1000
output_dir: ./model/50000
when I use the command below to run the model and there is always an error: "ValueError: Input Pipeline definition must have a class property"
python3 -m bin.train --config_paths="./myconfig/config_50000.yml,./myconfig/train_seq2seq.yml"
The _build
method of the model base class is currently very long. If a subclass wants to overwrite this method it needs to copy the full code and change the relevant parts. It should be possible to refactor the method into smaller ones (embedding, encoding, decoding, loss, etc) so that parts are easier to swap out.
Parsing GraphDef...
Parsing RunMetadata...
Parsing OpLog...
Preparing Views...
Traceback (most recent call last):
File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/disk1/mouna/code/seq2seq/bin/train.py", line 251, in
tf.app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/disk1/mouna/code/seq2seq/bin/train.py", line 246, in main
schedule=FLAGS.schedule)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 106, in run
return task()
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 459, in train_and_evaluate
self.train(delay_secs=0)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 281, in train
monitors=self._train_monitors + extra_hooks)
File "/usr/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 280, in new_func
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 426, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 984, in _train_model
_, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 462, in run
run_metadata=run_metadata)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 786, in run
run_metadata=run_metadata)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 744, in run
return self._sess.run(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 899, in run
run_metadata=run_metadata))
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/monitors.py", line 1157, in after_run
induce_stop = m.step_end(self._last_step, result)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/monitors.py", line 356, in step_end
return self.every_n_step_end(step, output)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/monitors.py", line 657, in every_n_step_end
steps=self.eval_steps, metrics=self.metrics, name=self.name)
File "/usr/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 280, in new_func
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 514, in evaluate
log_progress=log_progress)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 836, in _evaluate_model
hooks=hooks)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/evaluation.py", line 430, in evaluate_once
session.run(eval_ops, feed_dict)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 462, in run
run_metadata=run_metadata)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 786, in run
run_metadata=run_metadata)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 744, in run
return self._sess.run(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 891, in run
run_metadata=run_metadata)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 744, in run
return self._sess.run(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to read from index 32 but array size is: 32
[[Node: model/att_seq2seq/decode/attention_decoder_1/decoder/while/CustomHelperNextInputs/TrainingHelperNextInputs/cond/TensorArrayReadV3 = TensorArrayReadV3[_class=["loc:@model/att_seq2seq/decode/TrainingHelper/TensorArray"], dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](model/att_seq2seq/decode/attention_decoder_1/decoder/while/CustomHelperNextInputs/TrainingHelperNextInputs/cond/TensorArrayReadV3/Switch, model/att_seq2seq/decode/attention_decoder_1/decoder/while/CustomHelperNextInputs/TrainingHelperNextInputs/cond/TensorArrayReadV3/Switch_1/_463, model/att_seq2seq/decode/attention_decoder_1/decoder/while/CustomHelperNextInputs/TrainingHelperNextInputs/cond/TensorArrayReadV3/Switch_2)]]
Caused by op u'model/att_seq2seq/decode/attention_decoder_1/decoder/while/CustomHelperNextInputs/TrainingHelperNextInputs/cond/TensorArrayReadV3', defined at:
File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/disk1/mouna/code/seq2seq/bin/train.py", line 251, in
tf.app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/disk1/mouna/code/seq2seq/bin/train.py", line 246, in main
schedule=FLAGS.schedule)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 106, in run
return task()
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 459, in train_and_evaluate
self.train(delay_secs=0)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 281, in train
monitors=self._train_monitors + extra_hooks)
File "/usr/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 280, in new_func
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 426, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 984, in _train_model
_, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 462, in run
run_metadata=run_metadata)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 786, in run
run_metadata=run_metadata)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 744, in run
return self._sess.run(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 899, in run
run_metadata=run_metadata))
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/monitors.py", line 1157, in after_run
induce_stop = m.step_end(self._last_step, result)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/monitors.py", line 356, in step_end
return self.every_n_step_end(step, output)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/monitors.py", line 657, in every_n_step_end
steps=self.eval_steps, metrics=self.metrics, name=self.name)
File "/usr/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 280, in new_func
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 514, in evaluate
log_progress=log_progress)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 810, in _evaluate_model
eval_ops = self._get_eval_ops(features, labels, metrics)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1190, in _get_eval_ops
features, labels, model_fn_lib.ModeKeys.EVAL)
File "/usr/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features, labels, **kwargs)
File "/disk1/mouna/code/seq2seq/bin/train.py", line 164, in model_fn
return model(features, labels, params)
File "seq2seq/models/model_base.py", line 111, in call
return self._build(features, labels, params)
File "seq2seq/models/seq2seq_model.py", line 263, in _build
decoder_output, _, = self.decode(encoder_output, features, labels)
File "seq2seq/graph_utils.py", line 38, in func_wrapper
return templated_func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/template.py", line 276, in call
return self._call_func(args, kwargs, check_for_new_variables=False)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/template.py", line 216, in _call_func
result = self._func(*args, **kwargs)
File "seq2seq/models/basic_seq2seq.py", line 124, in decode
labels)
File "seq2seq/models/basic_seq2seq.py", line 87, in _decode_train
return decoder(decoder_initial_state, helper_train)
File "seq2seq/graph_module.py", line 57, in call
return self._template(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/template.py", line 267, in call
return self._call_func(args, kwargs, check_for_new_variables=False)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/template.py", line 216, in _call_func
result = self._func(*args, **kwargs)
File "seq2seq/decoders/rnn_decoder.py", line 110, in _build
maximum_iterations=maximum_iterations)
File "seq2seq/contrib/seq2seq/decoder.py", line 282, in dynamic_decode
swap_memory=swap_memory)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2605, in while_loop
result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2438, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2388, in BuildLoop
body_result = body(*packed_vars_for_body)
File "seq2seq/contrib/seq2seq/decoder.py", line 242, in body
decoder_finished) = decoder.step(time, inputs, state)
File "seq2seq/decoders/attention_decoder.py", line 186, in step
time=time, outputs=outputs, state=cell_state, sample_ids=sample_ids)
File "seq2seq/contrib/seq2seq/helper.py", line 125, in next_inputs
time=time, outputs=outputs, state=state, sample_ids=sample_ids)
File "seq2seq/decoders/attention_decoder.py", line 154, in att_next_inputs
name=name)
File "seq2seq/contrib/seq2seq/helper.py", line 204, in next_inputs
lambda: nest.map_structure(read_from_ta, self._input_tas))
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1745, in cond
_, res_f = context_f.BuildCondBranch(fn2)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1639, in BuildCondBranch
r = fn()
File "seq2seq/contrib/seq2seq/helper.py", line 204, in
lambda: nest.map_structure(read_from_ta, self._input_tas))
File "/usr/lib/python2.7/site-packages/tensorflow/python/util/nest.py", line 302, in map_structure
structure[0], [func(*x) for x in entries])
File "seq2seq/contrib/seq2seq/helper.py", line 200, in read_from_ta
return inp.read(next_time)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/tensor_array_ops.py", line 250, in read
name=name)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2421, in _tensor_array_read_v3
name=name)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in init
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Tried to read from index 32 but array size is: 32
[[Node: model/att_seq2seq/decode/attention_decoder_1/decoder/while/CustomHelperNextInputs/TrainingHelperNextInputs/cond/TensorArrayReadV3 = TensorArrayReadV3[_class=["loc:@model/att_seq2seq/decode/TrainingHelper/TensorArray"], dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](model/att_seq2seq/decode/attention_decoder_1/decoder/while/CustomHelperNextInputs/TrainingHelperNextInputs/cond/TensorArrayReadV3/Switch, model/att_seq2seq/decode/attention_decoder_1/decoder/while/CustomHelperNextInputs/TrainingHelperNextInputs/cond/TensorArrayReadV3/Switch_1/_463, model/att_seq2seq/decode/attention_decoder_1/decoder/while/CustomHelperNextInputs/TrainingHelperNextInputs/cond/TensorArrayReadV3/Switch_2)]]
When passing comma-delimited configs we should strip out newlines.
We should prepare datasets for All WMT'17 language pairs. This is also a change to try out google/sentencepiece as a preprocessor.
Each dataset should come in different configurations, i.e. different vocabulary sizes and also have a character-level version.
Together with the raw data files we also need the script that was used for the process.
During debugging of bug #39 I found that metric_fn was called many, many times with the same data. So i probed further.
I dumped every call to metric_fn in a file. It grows by batch_size for every call, until it's called with the entire (dev) dataset. The 32 first rows in metric-dump-02-hyp are equal to the rows in metric-dump-01-hyp and so forth. This seems redundant.
I'm worried how this affects the metrics reported. Is it the last call? the first? the average?
1 metric-dump-00-hyp.txt
1 metric-dump-00-ref.txt
32 metric-dump-01-hyp.txt
32 metric-dump-01-ref.txt
64 metric-dump-02-hyp.txt
64 metric-dump-02-ref.txt
96 metric-dump-03-hyp.txt
96 metric-dump-03-ref.txt
128 metric-dump-04-hyp.txt
128 metric-dump-04-ref.txt
160 metric-dump-05-hyp.txt
160 metric-dump-05-ref.txt
192 metric-dump-06-hyp.txt
192 metric-dump-06-ref.txt
224 metric-dump-07-hyp.txt
224 metric-dump-07-ref.txt
256 metric-dump-08-hyp.txt
256 metric-dump-08-ref.txt
288 metric-dump-09-hyp.txt
288 metric-dump-09-ref.txt
320 metric-dump-10-hyp.txt
320 metric-dump-10-ref.txt
352 metric-dump-11-hyp.txt
352 metric-dump-11-ref.txt
384 metric-dump-12-hyp.txt
384 metric-dump-12-ref.txt
416 metric-dump-13-hyp.txt
416 metric-dump-13-ref.txt
448 metric-dump-14-hyp.txt
448 metric-dump-14-ref.txt
480 metric-dump-15-hyp.txt
480 metric-dump-15-ref.txt
512 metric-dump-16-hyp.txt
512 metric-dump-16-ref.txt
544 metric-dump-17-hyp.txt
544 metric-dump-17-ref.txt
576 metric-dump-18-hyp.txt
576 metric-dump-18-ref.txt
608 metric-dump-19-hyp.txt
608 metric-dump-19-ref.txt
640 metric-dump-20-hyp.txt
640 metric-dump-20-ref.txt
672 metric-dump-21-hyp.txt
672 metric-dump-21-ref.txt
704 metric-dump-22-hyp.txt
704 metric-dump-22-ref.txt
736 metric-dump-23-hyp.txt
736 metric-dump-23-ref.txt
768 metric-dump-24-hyp.txt
768 metric-dump-24-ref.txt
800 metric-dump-25-hyp.txt
800 metric-dump-25-ref.txt
832 metric-dump-26-hyp.txt
832 metric-dump-26-ref.txt
864 metric-dump-27-hyp.txt
864 metric-dump-27-ref.txt
893 metric-dump-28-hyp.txt
893 metric-dump-28-ref.txt
893 metric-dump-29-hyp.txt
893 metric-dump-29-ref.txt
Should figure out how to export models for serving, I think Tensorflow does provide something like an ExportStrategy
that can be passed to the estimator and it will occasionally export the model.
We currently have a script that generates translation data, but running it can take an hour or wo, mostly due to the BPE processing. We should create a dataset that users can simply download.
The dataset probably only needs to include the 32k vocabulary BPE, not all of them.
This makes a simple accuracy metric that checks hyp==ref break
I dunno if this is intended, but at least to me it was a surprise.
Thanks for the great lib btw.
Hi,
I am running the code using default configuration (nmt_small.yaml, changed the size of hidden layer from 128 to 50) using TITANX. The first 1000 training steps are good. But then the evaluation failed with the follow errors:
tensorflow/core/framework/op_kernel.cc:993] Internal: Failed to run py callback pyfunc_2: see error log.
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/script_ops.py", line 82, in call
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/script_ops.py", line 82, in call
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/script_ops.py", line 82, in call
ret = func(*args)
File "/home/ultralisksu/host/seq2seq/seq2seq/metrics/metric_specs.py", line 156, in _py_func
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/script_ops.py", line 82, in call
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/script_ops.py", line 82, in call
ret = func(*args)
File "/home/ultralisksu/host/seq2seq/seq2seq/metrics/metric_specs.py", line 156, in _py_func
ret = func(*args)
File "/home/ultralisksu/host/seq2seq/seq2seq/metrics/metric_specs.py", line 156, in _py_func
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/script_ops.py", line 82, in call
return self.metric_fn(sliced_hypotheses, sliced_references)
File "/home/ultralisksu/host/seq2seq/seq2seq/metrics/metric_specs.py", line 206, in metric_fn
return self.metric_fn(sliced_hypotheses, sliced_references)
I attached the full logs.
log.log.txt
Thanks
Installed according to the guide on the contribution page.
Im running:
Ubuntu 14.04
tensorflow 1.0.0
Python 3.5.1 (default, Mar 14 2017, 15:32:51)
Hi all,
Following the page at https://google.github.io/seq2seq/getting_started/, got error
user@localhost:~/Desktop/seq2seq$ pip install -e . Obtaining file:///home/user/Desktop/seq2seq Requirement already satisfied: numpy in /home/user/anaconda2/lib/python2.7/site-packages (from seq2seq==0.1) Requirement already satisfied: matplotlib in /home/user/anaconda2/lib/python2.7/site-packages (from seq2seq==0.1) Requirement already satisfied: pyyaml in /home/user/anaconda2/lib/python2.7/site-packages (from seq2seq==0.1) Requirement already satisfied: pyrouge in /home/user/anaconda2/lib/python2.7/site-packages (from seq2seq==0.1) Requirement already satisfied: six>=1.10 in /home/user/anaconda2/lib/python2.7/site-packages (from matplotlib->seq2seq==0.1) Requirement already satisfied: python-dateutil in /home/user/anaconda2/lib/python2.7/site-packages (from matplotlib->seq2seq==0.1) Requirement already satisfied: functools32 in /home/user/anaconda2/lib/python2.7/site-packages (from matplotlib->seq2seq==0.1) Requirement already satisfied: subprocess32 in /home/user/anaconda2/lib/python2.7/site-packages (from matplotlib->seq2seq==0.1) Requirement already satisfied: pytz in /home/user/anaconda2/lib/python2.7/site-packages (from matplotlib->seq2seq==0.1) Requirement already satisfied: cycler>=0.10 in /home/user/anaconda2/lib/python2.7/site-packages (from matplotlib->seq2seq==0.1) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=1.5.6 in /home/user/anaconda2/lib/python2.7/site-packages (from matplotlib->seq2seq==0.1) Installing collected packages: seq2seq Running setup.py develop for seq2seq Successfully installed seq2seq user@localhost:~/Desktop/seq2seq$ python -m unittest seq2seq.test.pipeline_test Traceback (most recent call last): File "/home/user/anaconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/home/user/anaconda2/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/user/anaconda2/lib/python2.7/unittest/__main__.py", line 12, in main(module=None) File "/home/user/anaconda2/lib/python2.7/unittest/main.py", line 94, in __init__ self.parseArgs(argv) File "/home/user/anaconda2/lib/python2.7/unittest/main.py", line 149, in parseArgs self.createTests() File "/home/user/anaconda2/lib/python2.7/unittest/main.py", line 158, in createTests self.module) File "/home/user/anaconda2/lib/python2.7/unittest/loader.py", line 130, in loadTestsFromNames suites = [self.loadTestsFromName(name, module) for name in names] File "/home/user/anaconda2/lib/python2.7/unittest/loader.py", line 91, in loadTestsFromName module = __import__('.'.join(parts_copy)) File "seq2seq/__init__.py", line 24, in from seq2seq import contrib ImportError: cannot import name contrib
PS: It should be
pip install -e .
at https://google.github.io/seq2seq/getting_started/
In theory it should be easy to support Image Captioning by just swapping out the encoder with something like ResNet/Inception (e.g. tensorflow.contrib.slim.python.slim.nets.inception_v3
). However, there are a few things that need to happen to support problems other than text-to-text.
source_vocabulary
, source_delimiter
, etc. We probably need another abstraction layer that defines what kind of task the user is solving and adjust flags/parameters based on it. For example, I could imagine having a Task
class, with TextToText
, ImageToText
, ..., subclasses. The user then passes the type of task as part of the config and the task class is responsible for setting the appropriate parameters and creating the model.SessionRunHook
that loads a subset of the variables. In other words, the hooks used in the training script must be configurable.Due to a lot of recent refactoring the test coverage has fallen significantly. Take a look at the CircleCI coverage report to bring it back close to above 98%.
The coverage reports can be found by clicking on the latest build -> Artifacts -> Coverage -> index.html
I have install seq2seq sucessfully in window 10 with tensorflow-gpu (1.0)
when I run the seq2seq.test.pipeline_test.py
it Prompt errors:
Traceback (most recent call last):
File "F:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\ops\script_ops.py", line 85, in call
ret = func(*args)
File "H:\java_pro\tensorflow\project_src\seq2seq-master\seq2seq-master\seq2seq\metrics\metric_specs.py", line 132, in _py_func
return self.metric_fn(sliced_hypotheses, sliced_references)
File "H:\java_pro\tensorflow\project_src\seq2seq-master\seq2seq-master\seq2seq\metrics\metric_specs.py", line 157, in metric_fn
return bleu.moses_multi_bleu(hypotheses, references, lowercase=False)
File "H:\java_pro\tensorflow\project_src\seq2seq-master\seq2seq-master\seq2seq\metrics\bleu.py", line 71, in moses_multi_bleu
with open(hypothesis_file.name, "r") as read_pred:
PermissionError: [Errno 13] Permission denied: 'C:\Users\gdy\AppData\Local\Temp\tmpjo9ekggj'
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\framework\op_kernel.cc:993] Internal: Failed to run py callback pyfunc_0: see error log.
EF:\Program Files\Anaconda3\lib\unittest\case.py:628: ResourceWarning: unclosed file <_io.BufferedRandom name=3>
outcome.errors.clear()
F:\Program Files\Anaconda3\lib\unittest\case.py:628: ResourceWarning: unclosed file <_io.BufferedRandom name=4>
outcome.errors.clear()
F:\Program Files\Anaconda3\lib\unittest\case.py:628: ResourceWarning: unclosed file <_io.BufferedRandom name=5>
outcome.errors.clear()
F:\Program Files\Anaconda3\lib\unittest\case.py:628: ResourceWarning: unclosed file <_io.BufferedRandom name=6>
outcome.errors.clear()
F:\Program Files\Anaconda3\lib\unittest\case.py:628: ResourceWarning: unclosed file <_io.BufferedRandom name=7>
outcome.errors.clear()
F:\Program Files\Anaconda3\lib\unittest\case.py:628: ResourceWarning: unclosed file <_io.BufferedRandom name=8>
outcome.errors.clear()
Blocked by #12
When I was executing
python -m unittest seq2seq.test.pipeline_test
there was an a error and following is the log:
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 5705 get requests, put_count=5553 evicted_count=1000 eviction_rate=0.180083 and unsatisfied allocation rate=0.219457
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
INFO:tensorflow:Performing full trace on next step.
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcupti.so.8.0. LD_LIBRARY_PATH: /usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/lib64:
F tensorflow/core/platform/default/gpu/cupti_wrapper.cc:59] Check failed: ::tensorflow::Status::OK() == (::tensorflow::Env::Default()->GetSymbolFromLibrary( GetDsoHandle(), kName, &f)) (OK vs. Not found: /home/dl/anaconda2/envs/tf/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: cuptiActivityRegisterCallbacks)could not find cuptiActivityRegisterCallbacksin libcupti DSO
There's a fatal in the last line and I'm not sure how to fix this. This bug appeared only after I have git pulled the repository today, some version in the last week worked fine. Can someone please help?
Ubuntu 14.04, Cuda 8.0
tf.Transform is a new library for TensorFlow that allows users to define preprocessing pipelines. Not sure how mature and easy to integrate it is, but it's worth looking into.
To replicate the GNMT architecture, the following needs to happen. This list is not exhaustive and other things may be required:
This issue is only here to keep track of the high-level tasks. All of the points above should probably be done in separate issues.
nltk
has a simple implementation of GLEU score as described in Wu et al.'s (2016) GMNT system: https://github.com/nltk/nltk/blob/develop/nltk/translate/gleu_score.py
It'll be great if the seq2seq
has a similar port of GLEU =)
Add support for custom training loops to easily train GANs and RL algorithms. Two ideas on how to to implement this:
Need to look into these options in more details.
I was playing with the toy data and this error appeared when I ran the 'Decoding with Beam Search' script. What's the probable reason and how can I fix it? Thank you.
The documentation should have an end-to-end walkthrough of training and evaluating a Summarizaiton model, including installing and evaluating ROUGE. This will be very similar to the Machine Translation walkthrough #16. Unfortunately we can't publish the data for this.
Move the create_predictions
functions into the model class. This makes it easier for subclasses to overwrite this method.
So, I have cloned the repo and am trying to start using it. As a sensible first step I decided to run the provided test, only to be greeted with this:
I tried reinstalling all I could, and I am pretty sure all other required pieces are as up-to-date as they could ever be and are functioning properly. I am using OS X 10.12 and/or 10.11 --- the error is present in both.
What could be the source of this and will it hinder my ability to use the software?
Correcting the variable names in rouge.py works fine
The documentation should have an end-to-end walkthrough of training and evaluating a Machine Translation model using standard datasets and BLEU scripts.
Input sentences: 2999 Output sentences: 2997
Cleaning /Users/nchan/programs/git-ws/seq2seq/bin/data/output-all//train...
clean-corpus.perl: processing /Users/nchan/programs/git-ws/seq2seq/bin/data/output-all//train.de & .en to /Users/nchan/programs/git-ws/seq2seq/bin/data/output-all//train.clean, cutoff 1-80, ratio 9
..........(100000)..........(200000)..........(300000)..........(400000)..........(500000)..........(600000)..........(700000)..........(800000)..........(900000)..........(1000000)..........(1100000)..........(1200000)..........(1300000)..........(1400000)..........(1500000)..........(1600000)..........(1700000)..........(1800000)..........(1900000)..........(2000000)..........(2100000)..........(2200000)..........(2300000)..........(2400000)..........(2500000)..........(2600000)..........(2700000)..........(2800000)..........(2900000)..........(3000000)..........(3100000)..........(3200000)..........(3300000)..........(3400000)..........(3500000)..........(3600000)..........(3700000)..........(3800000)..........(3900000)..........(4000000)..........(4100000)..........(4200000)..........(4300000)..........(4400000)..........(4500000)......
Input sentences: 4562102 Output sentences: 4524868
Cleaning /Users/nchan/programs/git-ws/seq2seq/bin/data/output-all//train.tok...
clean-corpus.perl: processing /Users/nchan/programs/git-ws/seq2seq/bin/data/output-all//train.tok.de & .en to /Users/nchan/programs/git-ws/seq2seq/bin/data/output-all//train.tok.clean, cutoff 1-80, ratio 9
..........(100000)..........(200000)..........(300000)..........(400000)..........(500000)..........(600000)..........(700000)..........(800000)..........(900000)..........(1000000)..........(1100000)..........(1200000)..........(1300000)..........(1400000)..........(1500000)..........(1600000)..........(1700000)..........(1800000)..........(1900000)..........(2000000)..........(2100000)..........(2200000)..........(2300000)..........(2400000)..........(2500000)..........(2600000)..........(2700000)..........(2800000)..........(2900000)..........(3000000)..........(3100000)..........(3200000)..........(3300000)..........(3400000)..........(3500000)..........(3600000)..........(3700000)..........(3800000)..........(3900000)..........(4000000)..........(4100000)..........(4200000)..........(4300000)..........(4400000)..........(4500000)......
Input sentences: 4562102 Output sentences: 4500966
Traceback (most recent call last):
File "/Users/nchan/programs/git-ws/seq2seq/bin/tools/generate_vocab.py", line 53, in
for line in fileinput.input():
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/fileinput.py", line 248, in next
line = self._readline()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/fileinput.py", line 360, in _readline
self._file = open(self._filename, self._mode)
FileNotFoundError: [Errno 2] No such file or directory: '--max_vocab_size'
NCHAN-M-G1HR:data nchan$
I found that some codes are commited with messages like automated code formatting, can anyone tell me what's the tool in google for formatting python codes?
Writing configuration files with hyperparameters from scratch is difficult because there are lot of options. We should provide a few examples hyperparameter configurations for models. For example:
If we decide to add image captioning support we should also add configuration files for that.
We should generate a proper API documentation based on PyDoc strings. The question are:
Should finished #23 before doing this.
We can probably use the pre-processing script from im2text. It generates SequenceExamples, which isn't great for text, but perhaps better than writing our own.
I cannot find some traces of attention mechanism,and I am curious about whether have added the funtion of attention mechanism in this seq2seq model framework.
In [1], linked from [2], the test sets are not subword encoded. In addition, the merges file from BPE is not included in the archive. Therefore, a model trained on the subword-encoded training set cannot be evaluated without downloading the raw data and training the BPE model from scratch.
[1] https://drive.google.com/open?id=0B_bZck-ksdkpREE3LXJlMXVLUWM
[2] https://google.github.io/seq2seq/nmt/
$ tar -tzvf wmt16_en_de.tar.gz
-rw-r--r-- dennybritz/eng 279423 2017-03-07 04:31 vocab.bpe.32000
-rw-r--r-- dennybritz/eng 778882202 2017-03-07 04:29 train.tok.clean.bpe.32000.de
-rw-r--r-- dennybritz/eng 673102722 2017-03-07 04:21 train.tok.clean.bpe.32000.en
-rw-r--r-- dennybritz/eng 393971 2017-03-07 02:43 newstest2016.tok.de
-rw-r--r-- dennybritz/eng 354403 2017-03-07 02:44 newstest2016.tok.en
-rw-r--r-- dennybritz/eng 279171 2017-03-07 02:43 newstest2015.tok.de
-rw-r--r-- dennybritz/eng 253703 2017-03-07 02:44 newstest2015.tok.en
-rw-r--r-- dennybritz/eng 410292 2017-03-07 02:43 newstest2014.tok.de
-rw-r--r-- dennybritz/eng 377491 2017-03-07 02:44 newstest2014.tok.en
-rw-r--r-- dennybritz/eng 399549 2017-03-07 02:43 newstest2013.tok.de
-rw-r--r-- dennybritz/eng 349480 2017-03-07 02:44 newstest2013.tok.en
What's missing are the bpe.32000
merges file and newstest*.tok.bpe.32000.*
subword-encoded test set files. (Not hard to regenerate, but might as well include them for future folk.)
In the original paper of SmartReply:
First, the elements of R (possible set of responses) are organized into a trie. Then, we conduct a left-to-right beam search, but only retain hypotheses that appear in the trie. This search process has complexity O(bl) for beam size b and maximum response length l. Both b and l are typically in the range of 10-30, so this method dramatically reduces the time to find the top responses and is a critical element of making this system deployable.
Would you please consider this feature and add a detailed task list to this issue for interested contributors.
Traceback (most recent call last):
File "seq2seq/test/pipeline_test.py", line 184, in test_train_infer
infer_script.main([])
File "~/seq2seq/bin/infer.py", line 125, in main
sess.run([])
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 462, in run
run_metadata=run_metadata)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 786, in run
run_metadata=run_metadata)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 744, in run
return self._sess.run(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 899, in run
run_metadata=run_metadata))
File "seq2seq/tasks/dump_attention.py", line 126, in after_run
_create_figure(fetches)
File "seq2seq/tasks/dump_attention.py", line 58, in _create_figure
fig = plt.figure(figsize=(8, 8))
File "/usr/lib64/python2.7/site-packages/matplotlib/pyplot.py", line 535, in figure
**kwargs)
File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_tkagg.py", line 81, in new_figure_manager
return new_figure_manager_given_figure(num, figure)
File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_tkagg.py", line 89, in new_figure_manager_given_figure
window = Tk.Tk()
File "/usr/lib64/python2.7/lib-tk/Tkinter.py", line 1745, in init
self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
TclError: no display name and no $DISPLAY environment variable
Ran 2 tests in 25.226s
FAILED (errors=1)
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/lib/python2.7/unittest/main.py", line 12, in
main(module=None)
File "/usr/lib/python2.7/unittest/main.py", line 94, in init
self.parseArgs(argv)
File "/usr/lib/python2.7/unittest/main.py", line 149, in parseArgs
self.createTests()
File "/usr/lib/python2.7/unittest/main.py", line 158, in createTests
self.module)
File "/usr/lib/python2.7/unittest/loader.py", line 130, in loadTestsFromNames
suites = [self.loadTestsFromName(name, module) for name in names]
File "/usr/lib/python2.7/unittest/loader.py", line 91, in loadTestsFromName
module = import('.'.join(parts_copy))
File "seq2seq/init.py", line 24, in
from seq2seq import contrib
ImportError: cannot import name contrib
When I run"python -m unittest seq2seq.test.pipeline_test",I got this error.Could someone tell me how to solve this problem?
Un-PEP263-like style of encoding definitions should be corrected. The python script's encoding should be magically declared in the 1st/2nd line otherwise it'll not be useful and be treated as a normal comment.
Maybe it should follow how TF's style where the encoding definition comes the license/copyrights, e.g. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tensorboard/lib/python/http_util_test.py
For details, see http://stackoverflow.com/q/42777847/610569
Hi I'm using Windows 10 and Python 3.5
I got an error message ImportError: cannot import name 'SecondOrStepTimer'
Can I get some help?
ImportError: Failed to import test module: seq2seq
Traceback (most recent call last):
File "C:\Users\Bumho\Anaconda3\lib\unittest\loader.py", line 153, in loadTestsFromName
module = import(module_name)
File "C:\Users\Bumho\seq2seq\seq2seq_init_.py", line 26, in
from seq2seq import decoders
File "C:\Users\Bumho\seq2seq\seq2seq\decoders_init_.py", line 17, in
from seq2seq.decoders.rnn_decoder import *
File "C:\Users\Bumho\seq2seq\seq2seq\decoders\rnn_decoder.py", line 32, in
from seq2seq.encoders.rnn_encoder import default_rnn_cell_params
File "C:\Users\Bumho\seq2seq\seq2seq\encoders_init.py", line 17, in
import seq2seq.encoders.rnn_encoder
File "C:\Users\Bumho\seq2seq\seq2seq\encoders\rnn_encoder.py", line 27, in
from seq2seq.training import utils as training_utils
File "C:\Users\Bumho\seq2seq\seq2seq\training_init_.py", line 17, in
from seq2seq.training import hooks
File "C:\Users\Bumho\seq2seq\seq2seq\training\hooks.py", line 28, in
from tensorflow.python.training.basic_session_run_hooks import SecondOrStepTimer # pylint: disable=E0611
ImportError: cannot import name 'SecondOrStepTimer'
I think seq2seq
training is not using multiple GPUs. The tokens/sec
metric is the same as when I was training on a VM with only 1 GPU or 4 GPUs.
Can someone provide a demo of how to use 4 GPUs on a single machine? All I found in the docs was https://google.github.io/seq2seq/training/#distributed-training . That links to an example of how to use multiple devices using tf.device
and how to use a cluster with tf.learn
, but I couldn't figure out how to proceed with either approach. Thanks!
Running python -m bin.train
as specified in https://google.github.io/seq2seq/nmt/ ...
Four devices are found (from logs):
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y N N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: N Y N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 2: N N Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 3: N N N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: a370:00:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K80, pci bus id: 9f8e:00:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla K80, pci bus id: b265:00:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) -> (device: 3, name: Tesla K80, pci bus id: 8743:00:00.0)
Memory is allocated to all 4, but only one GPU has non-zero utilization.
$ nvidia-smi
Tue Mar 14 19:42:15 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26 Driver Version: 375.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 8743:00:00.0 Off | 0 |
| N/A 50C P0 74W / 149W | 10363MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 9F8E:00:00.0 Off | 0 |
| N/A 78C P0 67W / 149W | 10363MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | A370:00:00.0 Off | 0 |
| N/A 74C P0 94W / 149W | 10402MiB / 11439MiB | 46% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | B265:00:00.0 Off | 0 |
| N/A 62C P0 64W / 149W | 10363MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.