I am currently working on reproducing summarization results using provided checkpoints.
It was very succesfull for the other datasets.
However, I tried the XSum dataset but it ends up with Out of range
error.
Currently, for the XSum dataset, TensorFlow (TFDS) requires to manually download and put the preprocessed dataset into a specific location, such as ~/tensorflow_datasets/downloads/manual/xsum-extracts-from-downloads.tar.gz
.
Here is the full-length log (I removed some of unnecessary warning log lines) :
CUDA_VISIBLE_DEVICES=1 python pegasus/bin/evaluate.py --params=xsum_transformer --param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model,batch_size=1,beam_size=5,beam_alpha=0.6 --model_dir=ckpt/pegasus_ckpt/xsum/model.ckpt-30000 --evaluate_test
WARNING:tensorflow:Estimator's model_fn (<function _estimator_model_fn.<locals>.model_fn at 0x7fd86d117c80>) includes params argument, but params are not passed to Estimator.
W0526 14:32:49.484526 140570555500352 estimator.py:1994] Estimator's model_fn (<function _estimator_model_fn.<locals>.model_fn at 0x7fd86d117c80>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': 'ckpt/pegasus_ckpt/xsum', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd86d116a20>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I0526 14:32:49.485391 140570555500352 estimator.py:212] Using config: {'_model_dir': 'ckpt/pegasus_ckpt/xsum', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd86d116a20>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I0526 14:32:49.485875 140570555500352 tpu_context.py:220] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W0526 14:32:49.486188 140570555500352 tpu_context.py:222] eval_on_tpu ignored because use_tpu is False.
I0526 14:32:49.493249 140570555500352 dataset_info.py:358] Load dataset info from /home/wonjin/tensorflow_datasets/xsum/1.1.0
I0526 14:32:49.494374 140570555500352 dataset_builder.py:287] Reusing dataset xsum (/home/wonjin/tensorflow_datasets/xsum/1.1.0)
I0526 14:32:49.494525 140570555500352 dataset_builder.py:499] Constructing tf.data.Dataset for split test, from /home/wonjin/tensorflow_datasets/xsum/1.1.0
I0526 14:32:49.777998 140570555500352 datasets.py:215] Number of examples for config xsum test is 11334
2020-05-26 14:32:50.674068: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-05-26 14:32:50.707175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:20:00.0
2020-05-26 14:32:50.707436: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-05-26 14:32:50.708788: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-05-26 14:32:50.709977: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-05-26 14:32:50.710302: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-05-26 14:32:50.711864: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-05-26 14:32:50.713033: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-05-26 14:32:50.716500: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-26 14:32:50.722362: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
INFO:tensorflow:Calling model_fn.
I0526 14:32:51.179434 140570555500352 estimator.py:1148] Calling model_fn.
INFO:tensorflow:Running infer on CPU
I0526 14:32:51.180392 140570555500352 tpu_estimator.py:3124] Running infer on CPU
INFO:tensorflow:Done calling model_fn.
I0526 14:33:01.511505 140570555500352 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Graph was finalized.
I0526 14:33:02.609902 140570555500352 monitored_session.py:240] Graph was finalized.
2020-05-26 14:33:02.611567: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-05-26 14:33:02.648825: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2020-05-26 14:33:02.652197: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x617d2c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-26 14:33:02.652242: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-05-26 14:33:02.994441: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4ec4b30 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-26 14:33:02.994511: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): TITAN RTX, Compute Capability 7.5
2020-05-26 14:33:02.996206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:20:00.0
2020-05-26 14:33:02.996437: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-05-26 14:33:02.996472: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-05-26 14:33:02.996508: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-05-26 14:33:02.996548: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-05-26 14:33:02.996575: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-05-26 14:33:02.996619: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-05-26 14:33:02.996658: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-26 14:33:02.999220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-05-26 14:33:02.999273: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-05-26 14:33:03.001303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-26 14:33:03.001323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-05-26 14:33:03.001333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-05-26 14:33:03.003746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8384 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:20:00.0, compute capability: 7.5)
INFO:tensorflow:Restoring parameters from ckpt/pegasus_ckpt/xsum/model.ckpt-30000
I0526 14:33:03.009030 140570555500352 saver.py:1284] Restoring parameters from ckpt/pegasus_ckpt/xsum/model.ckpt-30000
2020-05-26 14:33:06.812386: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Out of range: Read less bytes than requested
ERROR:tensorflow:Error recorded from prediction_loop: 2 root error(s) found.
(0) Out of range: Read less bytes than requested
[[node save/RestoreV2 (defined at /home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[save/RestoreV2/_301]]
(1) Out of range: Read less bytes than requested
[[node save/RestoreV2 (defined at /home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'save/RestoreV2':
File "pegasus/bin/evaluate.py", line 153, in <module>
tf.app.run(main)
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "pegasus/bin/evaluate.py", line 144, in main
FLAGS.enable_logging)
File "/hdd3/wonjin/pegasus/pegasus/eval/text_eval.py", line 153, in text_eval
for i, features in enumerate(features_iter):
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3072, in predict
yield_single_examples=yield_single_examples):
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 638, in predict
hooks=all_hooks) as mon_sess:
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 1014, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 725, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 1207, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 1212, in _create_session
return self._sess_creator.create_session()
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 878, in create_session
self.tf_sess = self._session_creator.create_session()
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 638, in create_session
self._scaffold.finalize()
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/monitored_session.py", line 229, in finalize
self._saver = training_saver._get_saver_or_default() # pylint: disable=protected-access
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 599, in _get_saver_or_default
saver = Saver(sharded=True, allow_empty=True)
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
self.build()
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
build_restore=build_restore)
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 502, in _build_internal
restore_sequentially, reshape)
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 381, in _AddShardedRestoreOps
name="restore_shard"))
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
restore_sequentially)
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
name=name)
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/home/wonjin/pegasusenv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()