Comments (3)
Well, I found that during training, - logcmk(kappa)
is always ~ -420 and never change. torch.log(1 + kappa) * (self.lambda_vmf - (output_emb_unitnorm * target_emb_unitnorm).sum(dim=-1))
is decreasing from ~ 0.5. Is it abnormal?
from seq2seq-con.
I tried using -approximate_vmf
in args, found that logcmkappox(kappa, emb_size)
is always ~ -690 and never change.
from seq2seq-con.
Hi EuphoriaYan,
Apologies for such a long delay in my reply.
As you can see, the acc is decreasing and the perplexity is always zero.
Sorry, the statistics are not named correctly. They are named according to softmax-based models. "acc" here means "cosine distance", and x-ent means vMF loss. Perplexity is computed on top of the reported vMF loss which is 0 because vMF values are highly negative (so it's sort of meaningless). The only two losses worth monitoring here are "acc" and "x-ent" which by the trend looks find since they both should be decreasing. Also if you could let me know your final validation loss on this training set, I can judge if the model trained well or not. With good token embeddings, a cosine (acc) value of less than around 0.25 usually results in decent MT performance (for English).
./fasttext skipgram -input valid.en.bpetok -output emb/en -dim 300 -thread 8
You should train the embeddings on a larger training set, not the validation set. This method needs good quality embeddings to work. If you switch it to train.en.bpetok
, you should be able to get better results. The English token embeddings (without BPE) that I used are provided here
/path/to/moses/scripts/tokenizer/tokenizer.perl -l zh -a -no-escape -threads 20 < train.zh > train.tok.zh
Not 100% sure if moses supports Chinese tokenization. This could be an issue.
Hope these suggestions resolve your issues :)
Sachin
from seq2seq-con.
Related Issues (12)
- Extra term in `LogCmk` forward HOT 10
- Why don't you experiment VMF loss on Transformer? HOT 1
- Hyper-parameters to train word2vec/FastText HOT 1
- VMF loss problems HOT 1
- How to use evaluate.sh? HOT 2
- function returns nothing HOT 3
- Regarding the training speed of the model HOT 2
- pickle error when saving checkpoints HOT 2
- error running translate.py HOT 5
- Hyperparameter in NLL-VMF loss HOT 1
- weird value of LogCmk HOT 13
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from seq2seq-con.