lambdaji / tf_repos Goto Github PK
View Code? Open in Web Editor NEWTensorFlow Script
TensorFlow Script
aij = tf.contrib.layers.fully_connected(inputs=deep_inputs, num_outputs=1, activation_fn=tf.identity, \
weights_regularizer=tf.contrib.layers.l2_regularizer(l2_reg), scope='attention_out')# (None * (F*(F-1))) * 1
#aij_reshape = tf.reshape(aij, shape=[-1, num_interactions, 1]) # None * (F*(F-1)) * 1
aij_softmax = tf.nn.softmax(tf.reshape(aij, shape=[-1, num_interactions, 1]), dim=1, name='attention_soft')
if mode == tf.estimator.ModeKeys.TRAIN:
aij_softmax = tf.nn.dropout(aij_softmax, keep_prob=dropout[0])
按照原论文,应该是没有加指数吧?还是我理解错了呢
when i run the deepfm in the distribute mode, an error happened:
No worker known as /job:chief/replica:0/task:0
could you help me~
原始数据如:
1 a,b,c,0.1,0.2,0.3
类似于这种?
拜读了您的大作,有一个问题想请教下:我看您这边的data process逻辑,都是做成了“类似FM二阶部分, 统一做embedding, <id, val> 离散特征val=1.0”的libsvm格式,但是根据您的wide&deep的模型代码,input_fn中是直接parse csv,是否没有按照libsvm格式来处理?
建议可以加个readme
比如get_criteo_feature.py默认测试集的所有特征都在训练集出现过,否则feature_map不全;
比如测试的数据不能太少,不然cutoff都没了;
比如测试集这里跟训练集这里下标差一:val = dists.gen(i, features[continous_features[i] - 1]),然后我改成跟训练集一样的下标了,应该是我的数据格式测试集合训练集是一样的,博主的两者数据坐标差一? 测试集的label = features[0]我也加上去了,这样后面对比测试效果应该能更加方便对比,不然延用训练集的最后一个label感觉怪怪的;
比如数值型连续值不能只有一个唯一值,否则归一化出错;
...........
DeepFM Infer:No such file or directory: '~/deepFM_ex2/data/criteo/pred.txt
How can I get this file?(I have already run feature.py and model_train.py, no error has been reported)
你好,我看了你的代码,感觉并没有对IPNN做的product做分解是么?还是我的理解有问题?谢谢
hidden_units = map(int, FLAGS.deep_layers.split(","))
改为
hidden_units = list(map(int, FLAGS.deep_layers.split(",")))
有个疑问,请问特征109_14和特征206上的ID为啥是不统一的,理论上如果是同一个商品类目,ID不应该是一致的吗,我统计了其中40W条数据,发现这两个特征里出现的ID并没有交集,请问是我理解错了这个数据集的含义吗,有人可以解答吗,感谢!
再请教您一个问题
work1 一直在等待INFO:tensorflow:Waiting 1800.000000 secs before starting eval.
WARNING:tensorflow:Estimator is not trained yet. Will start an evaluation when a checkpoint is ready.
work'-0
日志的一部分
INFO:tensorflow:Saving checkpoints for 25076 into /workspace/wlc/model_dir/model.ckpt.
INFO:tensorflow:global_step/sec: 7.5244
E0712 18:25:49.778093 ProjectorPluginIsActiveThread saver.py:1817] Couldn't match files for checkpoint /workspace/wlc/model_dir/model.ckpt-25076
INFO:tensorflow:loss = 84.10749, average_loss = 0.65708977 (31.580 sec)
INFO:tensorflow:loss = 84.10749, step = 25285 (31.580 sec)
E0712 18:26:20.777756 ProjectorPluginIsActiveThread saver.py:1817] Couldn't match files for checkpoint /workspace/wlc/model_dir/model.ckpt-25076
INFO:tensorflow:loss = 81.15384, average_loss = 0.63401437 (25.918 sec)
work-1 一直在等待
TensorBoard 1.6.0 at http://tensorflow-wanglianchen-144-16-worker-1-0grc9:6006 (Press CTRL+C to quit)
INFO:tensorflow:TF_CONFIG environment variable: {'cluster': {'worker': ['tensorflow-wanglianchen-144-16-worker-2:2222'], 'ps': ['tensorflow-wanglianchen-144-16-ps-0:2222'], 'chief': ['tensorflow-wanglianchen-144-16-worker-0:2222']}, 'task': {'type': 'evaluator', 'index': 0}}
INFO:tensorflow:Using config: {'_num_worker_replicas': 0, '_num_ps_replicas': 0, '_global_id_in_cluster': None, '_master': '', '_save_checkpoints_steps': 1000, '_session_config': device_count {
key: "CPU"
value: 1
}
device_count {
key: "GPU"
}
, '_keep_checkpoint_every_n_hours': 10000, '_save_summary_steps': 1000, '_keep_checkpoint_max': 5, '_log_step_count_steps': 1000, '_service': None, '_save_checkpoints_secs': None, '_is_chief': False, '_tf_random_seed': None, '_model_dir': '/workspace/wlc/model_dir/', '_evaluation_master': '', '_task_id': 0, '_cluster_spec': , '_task_type': 'evaluator'}
INFO:tensorflow:Waiting 1800.000000 secs before starting eval.
WARNING:tensorflow:Estimator is not trained yet. Will start an evaluation when a checkpoint is ready.
INFO:tensorflow:Waiting 1799.999588 secs before starting next eval run.
WARNING:tensorflow:Estimator is not trained yet. Will start an evaluation when a checkpoint is ready.
INFO:tensorflow:Waiting 1799.999654 secs before starting next eval run.
WARNING:tensorflow:Estimator is not trained yet. Will start an evaluation when a checkpoint is ready.
INFO:tensorflow:Waiting 1799.999693 secs before starting next eval run.
WARNING:tensorflow:Estimator is not trained yet. Will start an evaluation when a checkpoint is ready.
INFO:tensorflow:Waiting 1799.999667 secs before starting next eval run.
WARNING:tensorflow:Estimator is not trained yet. Will start an evaluation when a checkpoint is ready.
INFO:tensorflow:Waiting 1799.999685 secs before starting next eval run.
work-2运行成功
INFO:tensorflow:loss = 84.003555, average_loss = 0.6562778 (26.016 sec)
INFO:tensorflow:loss = 84.003555, step = 25914 (26.016 sec)
INFO:tensorflow:Loss for final step: 84.82182.
ps_host ['tensorflow-wanglianchen-144-16-ps-0:2222']
worker_host ['tensorflow-wanglianchen-144-16-worker-2:2222']
chief_hosts ['tensorflow-wanglianchen-144-16-worker-0:2222']
{"task": {"index": 0, "type": "worker"}, "cluster": {"ps": ["tensorflow-wanglianchen-144-16-ps-0:2222"], "worker": ["tensorflow-wanglianchen-144-16-worker-2:2222"], "chief": ["tensorflow-wanglianchen-144-16-worker-0:2222"]}}
model_type:wide_deep
train_samples_num:3000000
Parsing /workspace/wlc/wide_deep_dist/data/train.csv
1.0hours
task train success.
modeldir=/workspace/wlc,modelname=model_dir
ps—0 日志
start checkWorkerIsFinish
TensorBoard 1.6.0 at http://tensorflow-wanglianchen-144-16-ps-0-jrngn:6006 (Press CTRL+C to quit)
INFO:tensorflow:TF_CONFIG environment variable: {'cluster': {'worker': ['tensorflow-wanglianchen-144-16-worker-2:2222'], 'ps': ['tensorflow-wanglianchen-144-16-ps-0:2222'], 'chief': ['tensorflow-wanglianchen-144-16-worker-0:2222']}, 'task': {'type': 'ps', 'index': 0}}
INFO:tensorflow:Using config: {'_cluster_spec': , '_task_id': 0, '_model_dir': '/workspace/wlc/model_dir/', '_service': None, '_session_config': device_count {
key: "CPU"
value: 1
}
device_count {
key: "GPU"
}
, '_save_summary_steps': 1000, '_is_chief': False, '_save_checkpoints_secs': None, '_master': 'grpc://tensorflow-wanglianchen-144-16-ps-0:2222', '_global_id_in_cluster': 2, '_evaluation_master': '', '_keep_checkpoint_max': 5, '_save_checkpoints_steps': 1000, '_task_type': 'ps', '_tf_random_seed': None, '_num_worker_replicas': 2, '_log_step_count_steps': 1000, '_num_ps_replicas': 1, '_keep_checkpoint_every_n_hours': 10000}
INFO:tensorflow:Start Tensorflow server.
2018-07-12 17:26:33.154403: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-07-12 17:26:33.160418: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job chief -> {0 -> tensorflow-wanglianchen-144-16-worker-0:2222}
2018-07-12 17:26:33.160444: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> localhost:2222}
2018-07-12 17:26:33.160463: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> tensorflow-wanglianchen-144-16-worker-2:2222}
2018-07-12 17:26:33.164749: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:324] Started server with target: grpc://localhost:2222
请问在CPU集群运行分布式TF的时候遇到这个问题是咋回事?有啥解决办法吗?
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
could not find method isEncrypted from class org/apache/hadoop/fs/FileStatus with signature ()Z
hdfsGetPathInfo(/user/tdw_gilbertchen/model_path/test/2019080400): getFileInfo error:
java.lang.NoSuchMethodError: isEncrypted
INFO:tensorflow:Graph was finalized.
2019-09-04 11:14:49.486416: I tensorflow/core/distributed_runtime/master_session.cc:1161] Start master session 239ec7870b717670 with config: gpu_options { }
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into hdfs://ss-wxg-3-v2/user/tdw_gilbertchen/model_path/test/2019080400/model.ckpt.
Traceback (most recent call last):
File "/data/user/code/mainRun.py", line 150, in
tf.app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/data/user/code/mainRun.py", line 137, in main
tf.estimator.train_and_evaluate(model, train_spec, eval_spec)
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/training.py", line 471, in train_and_evaluate
return executor.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/training.py", line 637, in run
getattr(self, task_to_run)()
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/training.py", line 642, in run_chief
return self._start_distributed_training()
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/training.py", line 788, in _start_distributed_training
saving_listeners=saving_listeners)
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1241, in _train_model_default
saving_listeners)
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1468, in _train_with_estimator_spec
log_step_count_steps=log_step_count_steps) as mon_sess:
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 504, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 921, in init
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 643, in init
self._sess = _RecoverableSession(self._coordinated_creator)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1107, in init
_WrappedSession.init(self, self._create_session())
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1112, in _create_session
return self._sess_creator.create_session()
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 807, in create_session
hook.after_create_session(self.tf_sess, self.coord)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 568, in after_create_session
self._save(session, global_step)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/basic_session_run_hooks.py", line 599, in _save
self._get_saver().save(session, self._save_path, global_step=step)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1441, in save
{self.saver_def.filename_tensor_name: checkpoint_file})
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/usr/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: hdfs://ss-wxg-3-v2/user/tdw_gilbertchen/model_path/test/2019080400/model.ckpt-0_temp_6e34385f8bd846499e635fda38324771/part-00000-of-00001.index; Unknown error 255
[[node save/MergeV2Checkpoints (defined at /data/user/code/mainRun.py:137) = MergeV2Checkpoints[delete_old_dirs=true, _device="/job:ps/replica:0/task:0/device:CPU:0"](save/MergeV2Checkpoints/checkpoint_prefixes, _recv_save/Const_0_S581)]]
[[{{node save/Identity_S583}} = _HostRecvclient_terminated=false, recv_device="/job:chief/replica:0/task:0/device:CPU:0", send_device="/job:ps/replica:0/task:0/device:CPU:0", send_device_incarnation=-6548387880355174373, tensor_name="edge_302_save/Identity", tensor_type=DT_STRING, _device="/job:chief/replica:0/task:0/device:CPU:0"]]
Caused by op u'save/MergeV2Checkpoints', defined at:
File "/data/user/code/mainRun.py", line 150, in
tf.app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/data/user/code/mainRun.py", line 137, in main
tf.estimator.train_and_evaluate(model, train_spec, eval_spec)
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/training.py", line 471, in train_and_evaluate
return executor.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/training.py", line 637, in run
getattr(self, task_to_run)()
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/training.py", line 642, in run_chief
return self._start_distributed_training()
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/training.py", line 788, in _start_distributed_training
saving_listeners=saving_listeners)
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1241, in _train_model_default
saving_listeners)
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1468, in _train_with_estimator_spec
log_step_count_steps=log_step_count_steps) as mon_sess:
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 504, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 921, in init
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 643, in init
self._sess = _RecoverableSession(self._coordinated_creator)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1107, in init
_WrappedSession.init(self, self._create_session())
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1112, in _create_session
return self._sess_creator.create_session()
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 800, in create_session
self.tf_sess = self._session_creator.create_session()
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 557, in create_session
self._scaffold.finalize()
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 215, in finalize
self._saver.build()
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1114, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1151, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 786, in _build_internal
save_tensor = self._AddShardedSaveOps(filename_tensor, per_device)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 369, in _AddShardedSaveOps
return self._AddShardedSaveOpsForV2(filename_tensor, per_device)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 351, in _AddShardedSaveOpsForV2
sharded_prefixes, checkpoint_prefix, delete_old_dirs=True)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 473, in merge_v2_checkpoints
delete_old_dirs=delete_old_dirs, name=name)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()
UnknownError (see above for traceback): hdfs://ss-wxg-3-v2/user/tdw_gilbertchen/model_path/test/2019080400/model.ckpt-0_temp_6e34385f8bd846499e635fda38324771/part-00000-of-00001.index; Unknown error 255
[[node save/MergeV2Checkpoints (defined at /data/user/code/mainRun.py:137) = MergeV2Checkpoints[delete_old_dirs=true, _device="/job:ps/replica:0/task:0/device:CPU:0"](save/MergeV2Checkpoints/checkpoint_prefixes, _recv_save/Const_0_S581)]]
[[{{node save/Identity_S583}} = _HostRecvclient_terminated=false, recv_device="/job:chief/replica:0/task:0/device:CPU:0", send_device="/job:ps/replica:0/task:0/device:CPU:0", send_device_incarnation=-6548387880355174373, tensor_name="edge_302_save/Identity", tensor_type=DT_STRING, _device="/job:chief/replica:0/task:0/device:CPU:0"]]
请问阿里ccp数据集
你是否做了下采样 我过滤掉非法数据 还有近三千万行数据
压缩成gz也有30多g
https://github.com/lambdaji/tf_repos/tree/master/deep_ctr#how-to-use
should update the link
-> 实验数据集用criteo,特征工程参考here
目前的实现在最终效果上有一些问题 。。
是不是训练数据太不平衡了,大部分label z都是0,是否需要采样后训练?
tf_repos/deep_ctr/Serving_pipeline/deep_fm_serving_client.cpp编译时碰到很多依赖问题,能否提供一下client的编译脚本,以及运行过程
这个链接失效了 可以再给一个嘛
lambdaji你好~一直有关注你的知乎和tf_repos,�最近在实践中利用DeepFM实现了一个排序模型,想请教一个实际运用的问题,请问你在实际运用中是否会出现GPU利用率的问题?我这边在训练过程利用率始终在10%以下,如果单GPU资源利用都达不到100%,分布式也就没意义了。。。我用的是Tesla P40,显存有24G,显存应该不是瓶颈,数据规模field有81,�feature index大约是百万级,对利用率问题一直不解,还望指教,多谢!
你好,我看代码中的field_size是固定死的,但实际中如果遇到每行的field 大小不确定,因为是稀疏的, 所以就不能直接reshape了,请问有相应的解决方案吗?
feat_ids = features['feat_ids']
feat_ids = tf.reshape(feat_ids,shape=[-1,field_size])
feat_vals = features['feat_vals']
feat_vals = tf.reshape(feat_vals,shape=[-1,field_size])
你好,非常感谢能提供相关CTR训练模型代码,在我的实验结果上DeepFM 比FNN在千分位低5个百分点,理论上讲不应该是高吗,希望可以给出意见,谢谢
模型不断输出
Parsing ['../../data/criteo/tr.libsvm']
INFO:tensorflow:Calling model_fn.
是什么原因?是因为没有设置max_steps吗?
感觉不太合理啊,即使运行多个epoch,也不应该每次都调用model_fn以及解析数据啊?
Caused by op u'Reshape', defined at:
File "DeepFM.py", line 392, in
tf.app.run()
File "/data/hadoop/local/usercache/test/appcache/application_5145270655_21212399/container_1569565_99362122/Python/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "DeepFM.py", line 326, in main
tf.estimator.train_and_evaluate(DeepFM, train_spec, eval_spec)
File "DeepFM.py", line 128, in model_fn
feat_ids = tf.reshape(feat_ids, shape=[-1, field_size])
InvalidArgumentError (see above for traceback): Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero
[[Node: Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:chief/replica:0/task:0/device:CPU:0"](IteratorGetNext, Reshape/shape)]]
[[Node: gradients/Deep-part/deep_out/MatMul_grad/tuple/control_dependency_1_S313 = _Recvclient_terminated=false, recv_device="/job:ps/replica:0/task:0/device:CPU:0", send_device="/job:chief/replica:0/task:0/device:CPU:0", send_device_incarnation=-1178756093214127197, tensor_name="edge_1006_gradients/Deep-part/deep_out/MatMul_grad/tuple/control_dependency_1", tensor_type=DT_FLOAT, _device="/job:ps/replica:0/task:0/device:CPU:0"]]
按理batch size增加同时运算的GPU需要内存也会增加
Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key Deep-part/mlp2/biases not found in checkpoint
[[node save/RestoreV2 (defined at Model_pipeline/DeepFM.py:366) ]]
W0813 17:48:05.416769 Reloader plugin_event_accumulator.py:303] Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events. Overwriting the graph with the newest event. W0813 17:48:05.417315 Reloader plugin_event_accumulator.py:311] Found more than one metagraph event per run. Overwriting the metagraph with the newest event. W0813 17:48:05.419010 Reloader plugin_event_accumulator.py:303] Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events. Overwriting the graph with the newest event. W0813 17:48:05.419358 Reloader plugin_event_accumulator.py:311] Found more than one metagraph event per run. Overwriting the metagraph with the newest event. W0813 17:48:05.420542 Reloader plugin_event_accumulator.py:303] Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events. Overwriting the graph with the newest event. W0813 17:48:05.421025 Reloader plugin_event_accumulator.py:311] Found more than one metagraph event per run. Overwriting the metagraph with the newest event. W0813 17:48:05.422015 Reloader plugin_event_accumulator.py:303] Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events. Overwriting the graph with the newest event. W0813 17:48:05.422390 Reloader plugin_event_accumulator.py:311] Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
always remind even though the log dir only have one event file
运行DCN模型跑下面这个数据集时候有些疑问:
http://labs.criteo.com/2014/02/download-kaggle-display-advertising-challenge-dataset/
Kaggle Display Advertising Challenge Dataset
我看里面数据格式是:
The columns are tab separeted with the following schema:
<integer feature 1> ... <integer feature 13> <categorical feature 1> ... <categorical feature 26>
并没有区分用户id、商品id,那这样如何给用户做推荐呢?而且我看get_criteo_feature.py处理的时候,很多categorical 类型数据直接被截断没了,那如何区分开用户呢?
parser.add_argument(
"--cutoff",
type=int,
default=200,
help="cutoff long-tailed categorical values"
)
谢谢!
看到 DeepFM 中一阶项,
PNN中 线性项 y_linear = tf.reduce_sum(tf.multiply(feat_wgts, feat_vals),1),输出都是一个数值而非一个向量;论文中一阶项都是一个向量而非一个数值吧?
我下载了数据集,但是aliccp文件里面读取数据部分和数据集命名,格式都不同,无法处理,请问是否更换了数据集?
criteo数据集里只有reademe.txt,train.txt,test.txt,并没有aliccp中的*-*命名,其中也没有“,”分隔符
hi,请问一下,你使用的python和tensorflow的版本分别是什么?
165行左右,构建deep全连接时,给变量都加上了l2正则
y_deep = tf.contrib.layers.fully_connected(inputs=deep_inputs, num_outputs=1, activation_fn=tf.identity,
weights_regularizer=tf.contrib.layers.l2_regularizer(l2_reg), scope='deep_out')
然后在189行左右定义损失函数
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y, labels=labels)) +
l2_reg * tf.nn.l2_loss(FM_W) +
l2_reg * tf.nn.l2_loss(FM_V)
我理解,上面的损失函数没有把前面通过weights_regularizer正则的变量取出来
所以应该改成
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y, labels=labels)) +
l2_reg * tf.nn.l2_loss(FM_W) +
l2_reg * tf.nn.l2_loss(FM_V)+
tf.reduce_sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))
模型训练都会报这个错,谁知道咋解决啊
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=y, labels=labels))
这里的计算loss的时候,预测值使用的是y,在代码里面y是未经过sigmoid激活函数的一个输出,想请教一下这样做的原因,因为通常来说分类任务都是会经过sigmoid做预测的。
PS:比较困惑的是,做实验发现不经过sigmoid网络优化的更好,但是一经过sigmoid之后就会差很多。
谢谢!
遇到
UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape.
This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
我跑DeepMLT model时
我的显卡显存是6g的
占用内存Allocation of 16384131072 exceeds 10% of system memory
显存没怎么用到
是代码的问题么
As the title, i can't find the license in the repo
这个路径下有一堆特征处理脚本,特别乱,看的头都大了,请问这些脚本的具体执行顺序是怎么样的?get_join_sample.sh得到的是特征频次没过滤的libsvm格式,要得到最终的特征频次过滤后的libsvm格式,这些脚本应该按什么顺序执行?
outer = tf.reshape(tf.einsum('aik,ajk->aijk', p, q), [-1, num_pairsembedding_sizeembedding_size])
should be:
outer = tf.reshape(tf.einsum('api,apj->apij', p, q), [-1, num_pairsembedding_sizeembedding_size])
get_criteo_feature.py中的特征下标应该是从1开始的,PNN.py中直接使用idx,这样的话使用tf.nn.embedding_lookup进行embedding是从0开始?
tf_repos/deep_ctr/Feature_pipeline/get_criteo_feature.py 第50行中对离散特征编号从0, 但同时连续特征是1-13,会导致连续特征对应的embedding会和离散特征embedding相同的问题,想问下?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.