Git Product home page Git Product logo

Comments (8)

cstur4 avatar cstur4 commented on July 24, 2024 1

@hexiangnan pretrain的FM权重没了,能再发一下吗,谢谢

from attentional_factorization_machine.

hexiangnan avatar hexiangnan commented on July 24, 2024

你好,试试这组参数:
python AFM.py --keep '[1.0,0.8]' --lamda_attention 16 --hidden_factor '[256,256]' --batch_size 128 --dataset frappe --pretrain 1 --epoch 100 --valid_dimen 10 --lr 0.015

Init: train=0.7788, validation=0.7828 [7.5 s]
Epoch 1 [15.5 s] train=0.3464, validation=0.4068 [6.7 s]
Epoch 2 [15.5 s] train=0.2797, validation=0.3678 [6.7 s]
Epoch 3 [15.4 s] train=0.2468, validation=0.3522 [6.7 s]
Epoch 4 [15.4 s] train=0.2248, validation=0.3428 [6.7 s]
Epoch 5 [15.4 s] train=0.2083, validation=0.3373 [6.7 s]
Epoch 6 [15.4 s] train=0.1976, validation=0.3342 [6.7 s]
Epoch 7 [15.4 s] train=0.1843, validation=0.3298 [6.7 s]
Epoch 8 [15.4 s] train=0.1735, validation=0.3263 [6.8 s]
Epoch 9 [15.4 s] train=0.1648, validation=0.3241 [6.8 s]
Epoch 10 [15.4 s] train=0.1648, validation=0.3260 [6.8 s]
Epoch 11 [15.4 s] train=0.1517, validation=0.3202 [6.8 s]
Epoch 12 [15.4 s] train=0.1470, validation=0.3189 [6.9 s]
Epoch 13 [15.4 s] train=0.1505, validation=0.3232 [6.8 s]
Epoch 14 [15.3 s] train=0.1378, validation=0.3180 [6.9 s]
Epoch 15 [15.3 s] train=0.1468, validation=0.3235 [6.9 s]
Epoch 16 [15.4 s] train=0.1365, validation=0.3198 [6.9 s]
Epoch 17 [15.3 s] train=0.1323, validation=0.3183 [6.9 s]
Epoch 18 [15.3 s] train=0.1242, validation=0.3160 [6.9 s]
Epoch 19 [15.3 s] train=0.1319, validation=0.3202 [7.0 s]
Epoch 20 [15.3 s] train=0.1172, validation=0.3145 [7.0 s]
Epoch 21 [15.4 s] train=0.1213, validation=0.3168 [7.2 s]
Epoch 22 [15.4 s] train=0.1304, validation=0.3210 [7.4 s]
Epoch 23 [15.3 s] train=0.1159, validation=0.3157 [7.3 s]
Epoch 24 [15.3 s] train=0.1098, validation=0.3137 [7.4 s]
Epoch 25 [15.3 s] train=0.1074, validation=0.3132 [7.5 s]
Epoch 26 [15.3 s] train=0.1188, validation=0.3184 [7.5 s]
Epoch 27 [15.3 s] train=0.1173, validation=0.3182 [7.5 s]
Epoch 28 [15.3 s] train=0.1043, validation=0.3135 [7.7 s]
Epoch 29 [15.2 s] train=0.0995, validation=0.3124 [7.6 s]
Epoch 30 [15.2 s] train=0.0986, validation=0.3122 [7.7 s]
Epoch 31 [15.2 s] train=0.1158, validation=0.3187 [7.7 s]
Epoch 32 [15.1 s] train=0.0940, validation=0.3112 [7.7 s]
Epoch 33 [15.1 s] train=0.0944, validation=0.3115 [7.6 s]
Epoch 34 [15.1 s] train=0.0916, validation=0.3115 [7.8 s]
Epoch 35 [15.1 s] train=0.0941, validation=0.3126 [7.7 s]
Epoch 36 [15.2 s] train=0.0915, validation=0.3114 [7.7 s]
Epoch 37 [15.1 s] train=0.0943, validation=0.3133 [7.9 s]
Epoch 38 [15.0 s] train=0.0923, validation=0.3128 [7.7 s]
Epoch 39 [15.0 s] train=0.0877, validation=0.3111 [7.7 s]
Epoch 40 [15.1 s] train=0.1082, validation=0.3187 [7.9 s]
Epoch 41 [15.1 s] train=0.0885, validation=0.3123 [7.7 s]
Epoch 42 [15.0 s] train=0.0859, validation=0.3113 [7.8 s]
Epoch 43 [15.1 s] train=0.0874, validation=0.3126 [7.6 s]
Epoch 44 [15.2 s] train=0.0888, validation=0.3128 [7.8 s]
Epoch 45 [15.0 s] train=0.0818, validation=0.3104 [7.7 s]
Epoch 46 [15.1 s] train=0.0794, validation=0.3102 [7.6 s]
Epoch 47 [15.1 s] train=0.1055, validation=0.3186 [7.7 s]
Epoch 48 [15.0 s] train=0.0783, validation=0.3093 [7.8 s]
Epoch 49 [15.0 s] train=0.0799, validation=0.3107 [7.6 s]
Epoch 50 [15.1 s] train=0.0839, validation=0.3122 [7.8 s]

保持其它参数一致,调整lr的话,lr越小,收敛速度越慢但越稳定;反之,收敛速度越快,但是训练过程中波动会比较大。比如上面列出来的lr取0.015,当到46个epoch时,validation就降到了0.3102,这和paper里展示的最佳数据相同(epoch 48的结果比paper中实验结果更好,实际跑会有一定误差是正常的);此外我还尝试了lr取0.005时,训练到60个epoch,最好的validation才到0.3174,下降十分缓慢;lr取0.01时,在52个epoch后达到了0.3111;lr取0.02时,在20个epoch就到0.3116了,不过波动相当大。

from attentional_factorization_machine.

yufengwhy avatar yufengwhy commented on July 24, 2024

@hexiangnan 多谢您的回答~
还想请教:

  1. attention正则系数 lamda_attention都比较大(2或者16),这会不会导致模型更多学习attention,忽略了其他参数;
  2. 另外,0.015*16=0.24,attention参数每次衰减24%,是不是太多了?
  3. hidden_factor '[256,256]'这个数据集其实不大,但是embedding维度都很大,为什么?
  4. batch_size 两个数据集上分别是4096 128 ,对模型performance影响很大吗?为什么?

我在多个数据集上面验证您的模型AFM,包括一个四千万样本的广告点击预测数据集avazu,这是一个分类任务,在这个数据集上面效果始终调不好。前人在这个数据集提出的模型(CNN,DeepFM)参数大约:embedding维度30,学习率0.0005。仔细分析代码发现,分类 回归任务对应的模型差别就是最后一层是否有激活函数。或者回归任务本身较分类任务难,需要更大的embedding维度?

多谢您的讨论~

from attentional_factorization_machine.

yufengwhy avatar yufengwhy commented on July 24, 2024

@hexiangnan 请教您:
2楼最优参数对应的FM最优参数是什么?我今天尝试了您提供的AFM参数,仍然没有达到论文结果,可能差别在于FM的训练参数不一样?

from attentional_factorization_machine.

hexiangnan avatar hexiangnan commented on July 24, 2024

你好,谢谢你的提问~关于你的四个问题:

  1. 这里使用较大的lamda_attention其实是基于实验调参得到的一个较好的结果,是否会导致过多地学习attention而忽略其它参数,我觉得是一个有意思的想法,目前我还没有观察这方面的变化,我会留意一下这个问题,欢迎继续交流;
  2. 训练过程使用了adagrad优化器,其会慢慢减小learning rate;
  3. 嗯,我们也有大致做过几个较小embedding的实验,比如16,32,效果相比256有不同程度的降低,论文中最终选择了256来展示是希望以较大的embedding来获得更高的模型表达能力的上限,希望降低embedding对最终预测准确性的限制,从而尽可能更好地突出attention的差异性。这方面确实欠缺更多的实验来做充分的比较;
  4. batch_size对performance还是有较明显的影响的,小了训练时震荡会比较大,会相对较难稳定收敛;太大了显存(内存)会爆而且训练会变慢,实验做起来比较麻烦。过大过小都会有影响,不过不是特别敏感,你根据实际情况调大调小一些问题不大;

我们后来也有做AFM在分类问题上的应用实验,frappe和movielens的embedding还是取256,不过在另一个数据集iPinYou上碍于显存不够用我们把embedding降低了些。拿AFM和baselines比还是有着较为明显的差距,并没有发现回归问题训练难度更大而需要调大embedding,也可能我们原来就设置的比较大的关系。iPinYou是两千万样本。你说的avazu上始终调不好是指和其它方法比效果不好,还是AFM收敛遇到了问题,过拟合还是什么?

另外,关于FM的最优参数很抱歉我这里没有留,你可以再调调看,或者用我pretrain好的FM:
Frappe-256: https://1drv.ms/f/s!AnAUblgCBi3Ggo9OgWctDDpGVriTuA
Movielens-256: https://1drv.ms/f/s!AnAUblgCBi3Ggo9JI-MmzPUAxYwF3g

from attentional_factorization_machine.

yufengwhy avatar yufengwhy commented on July 24, 2024

你说的avazu上始终调不好是指和其它方法比效果不好,还是AFM收敛遇到了问题,过拟合还是什么?

@hexiangnan 在avazu上面,我用CNN时验证集AUC到0.778,用AFM(调过多组参数),最好结果是第一个epoch时验证集AUC到0.752,后面epoch几乎不动了。这个数据集任务上面AUC差0.02算特别多。

from attentional_factorization_machine.

cstur4 avatar cstur4 commented on July 24, 2024

@yufengwhy 请问可以共享一下FM pretrain的权重吗

from attentional_factorization_machine.

qk-huang avatar qk-huang commented on July 24, 2024

@cstur4 你能重现在frappe上的结果吗?我用FM跑预训练,根据论文和实验取得lr 0.005 dropout 0.5 在验证集上最低达到0.3350,这结果和论文中大差不差。但是当训练AFM时首先init validation=0.8898,这个初始结果和上面提到的Init: train=0.7788, validation=0.7828 [7.5 s]差距不是半点。

from attentional_factorization_machine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.