Comments (6)
Those values worked well based on cross validation experiments.
from fairseq.
为什么要乘这个数?
from fairseq.
That multiple may enlarge the value of both words and positional embedding Matrix.
from fairseq.
这大哥。。。我还以为author不睡觉回复这么快。。。
我知道会enlarge值域,但是后面应该怎么调呢?我想知道该加layer_norm还是咋地。
About why multiply sqrt(512).
Explain: I'm testing position encoding from transformer on fairseq-py. Here is my result:
Experiment 1: x = embed(tokens) * sqrt(512) + position encoding ---- loss exploding
Experiment 2: x = embed(tokens) + position encoding/sqrt(512) ---- loss decreases normal
Experiment 3: x = embed(tokens) * sqrt(512)----loss exploding
Experiment 4: x = embed(tokens) + position encoding ---- loss exploding
It's obviously the range problem.
from fairseq.
Yes, you are changing the range of the embeddings which is the problem. If you want a fixed encoding for the positions, then try to set it so that the range of the position encoding is similar to how nn.Embedding is initialized, i.e., similar to normal(0, 0.1). Also, you may want to use a custom layer instead of nn.Embedding which simply outputs those fixed position features.
from fairseq.
Thank you. Is there any explanation on why input should be in a small range (normal(0, 0.1)) instead of normal(0, 1)?
from fairseq.
Related Issues (20)
- Installation Error: M1 Mac HOT 1
- Importing `hydra.experimental` results with an ImportError
- MMS 下载预训练模型MMS-1B:L1107,想测试一下安多藏语
- assert step < max_len, f"{step} < {max_len}" AssertionError: 60 < 60
- [Hubert] Use different kmeans models for train and valid dataset?
- ModuleNotFoundError: No module named 'fairseq.criterions.' HOT 2
- FileNot Found error
- zero-shot prediction on MMPT VidCLIP model warning `Some weights of the model checkpoint at bert-base-uncased were not used when initializing MMBertForEncoder` HOT 1
- fairseq package isn't working with python 3.11.8 as it supposed to be. HOT 1
- Need help in running prepare_text.sh for Wav2Vec2-U
- How did you train the k-means clustering model on the HuBERT model?
- generated an exception: Failed to decode audio While running asr_prep_json.py on custom dataset
- Why this project alway force me to install CUDA12.1 ???
- calculation of the perplexity score
- Question about data preparation with speech data alignment in speech matrix dataset
- Empty 'args' value in Neural Language Modeling "Training a transformer language model with the CLI tools" example model HOT 2
- Arabizi/ Franco-Arabic text translated to English
- How to use fairseq.models.transformer.TransformerModel.from_pretrained on Multilingual translation model
- Failed to build fairseq ERROR: Could not build wheels for fairseq, which is required to install pyproject.toml-based projects
- AttributeError when trying to train model in Fairseq
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fairseq.