Comments (7)
Thanks for pointing it out, you are right! And this should explain #3 ...
Apologize for the glitch that may have arisen from code transfer (since I don't have the access to the original server anymore). You can always try our released pre-trained VAE weight here, they should work fine.
from pcae.
And I've updated the train.py
from pcae.
感谢你的回复以及你提供的预训练权重,由于后面我还需要在我自己的数据集上进行实验,所以确保VAE_finetuning阶段运行正常是必要的。我在使用最新的代码后发现VAE_finetuning的训练过程仍然不能收敛,主要表现为train loss rec很高保持在6.0左右,train loss kl 低到小于0.01。通过观察encoder_outputs变量,我发现每个样本的值几乎是一样的。似乎对于任意一个样本,encoder_outputs总为一个定值。通过再次仔细阅读代码,我注意到在类VAE中,定义了如下线性层lm_head:
self.lm_head = nn.Linear(self.decoder.config.d_model, self.shared.num_embeddings, bias=False)
在VAE_finetuning中这个线性层被model中,也即模型Bart中的线性层lm_head替换了,如下:
vae.lm_head = model.lm_head
如果使用最新的train.py,这意味着vae.lm_head中的参数也会被优化器更新,于是我在使用了lm_head的地方进行了如下操作:
self.lm_head.eval()
self.lm_head.requires_grad_(False)
logits = self.lm_head(decoder_outputs[0])
通过以上修改来保证lm_head的参数不被优化器更新。随后我进行的训练中,train loss rec很快降低到了4.8左右并且在持续下降,train loss kl也始终保持在0.01以上。从我的实验结果看来,似乎lm_head的参数是不需要被优化器更新的。请问在你们的工作中,lm_head的参数被优化器更新了吗?
from pcae.
In our experiments, we did activate lm_head
of the VAE model.
This behavior of non-convergence may be due to the specific dataset, maybe you can try different learning rates, and other hyper parameters or using the plain Bart model on your dataset, to see whether the model converges
from pcae.
这里可能存在一些误解。我刚刚的训练是在数据集yelp上进行的。在确保目前代码能在yelp上正常运行后,我才会在我自己的数据集上进行训练。实际上,我刚刚重新阅读了你们的论文,我发现论文中学习率的设置为1e-4,而在你们的github的Stage 1 BART VAE Finetuning中,提供的命令中的学习率为5e-4。在重新激活lm_head以及使用新的学习率1e-4后,训练过程变得正常了。建议将Stage 1 BART VAE Finetuning中提供的命令中的学习率修改为1e-4。
from pcae.
thanks for your advice. We'll improve it. You can always pull requests and contribute to this repo if you are interested..
And I'll close this issue shortly as completed if there are no more questions.
PS: Please consider using English to comment in the future, so that everyone could enjoy the QA section.
from pcae.
Thanks for your reply and your advice.So far, there have been no issues.
from pcae.
Related Issues (7)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pcae.