Git Product home page Git Product logo

Comments (7)

ImKeTT avatar ImKeTT commented on June 12, 2024

Thanks for pointing it out, you are right! And this should explain #3 ...

Apologize for the glitch that may have arisen from code transfer (since I don't have the access to the original server anymore). You can always try our released pre-trained VAE weight here, they should work fine.

from pcae.

ImKeTT avatar ImKeTT commented on June 12, 2024

And I've updated the train.py

from pcae.

HomiKetalys avatar HomiKetalys commented on June 12, 2024

感谢你的回复以及你提供的预训练权重,由于后面我还需要在我自己的数据集上进行实验,所以确保VAE_finetuning阶段运行正常是必要的。我在使用最新的代码后发现VAE_finetuning的训练过程仍然不能收敛,主要表现为train loss rec很高保持在6.0左右,train loss kl 低到小于0.01。通过观察encoder_outputs变量,我发现每个样本的值几乎是一样的。似乎对于任意一个样本,encoder_outputs总为一个定值。通过再次仔细阅读代码,我注意到在类VAE中,定义了如下线性层lm_head:
self.lm_head = nn.Linear(self.decoder.config.d_model, self.shared.num_embeddings, bias=False)
在VAE_finetuning中这个线性层被model中,也即模型Bart中的线性层lm_head替换了,如下:
vae.lm_head = model.lm_head
如果使用最新的train.py,这意味着vae.lm_head中的参数也会被优化器更新,于是我在使用了lm_head的地方进行了如下操作:
self.lm_head.eval()
self.lm_head.requires_grad_(False)
logits = self.lm_head(decoder_outputs[0])
通过以上修改来保证lm_head的参数不被优化器更新。随后我进行的训练中,train loss rec很快降低到了4.8左右并且在持续下降,train loss kl也始终保持在0.01以上。从我的实验结果看来,似乎lm_head的参数是不需要被优化器更新的。请问在你们的工作中,lm_head的参数被优化器更新了吗?

from pcae.

ImKeTT avatar ImKeTT commented on June 12, 2024

In our experiments, we did activate lm_head of the VAE model.

This behavior of non-convergence may be due to the specific dataset, maybe you can try different learning rates, and other hyper parameters or using the plain Bart model on your dataset, to see whether the model converges

from pcae.

HomiKetalys avatar HomiKetalys commented on June 12, 2024

这里可能存在一些误解。我刚刚的训练是在数据集yelp上进行的。在确保目前代码能在yelp上正常运行后,我才会在我自己的数据集上进行训练。实际上,我刚刚重新阅读了你们的论文,我发现论文中学习率的设置为1e-4,而在你们的github的Stage 1 BART VAE Finetuning中,提供的命令中的学习率为5e-4。在重新激活lm_head以及使用新的学习率1e-4后,训练过程变得正常了。建议将Stage 1 BART VAE Finetuning中提供的命令中的学习率修改为1e-4。

from pcae.

ImKeTT avatar ImKeTT commented on June 12, 2024

thanks for your advice. We'll improve it. You can always pull requests and contribute to this repo if you are interested..

And I'll close this issue shortly as completed if there are no more questions.

PS: Please consider using English to comment in the future, so that everyone could enjoy the QA section.

from pcae.

HomiKetalys avatar HomiKetalys commented on June 12, 2024

Thanks for your reply and your advice.So far, there have been no issues.

from pcae.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.