Comments (6)
The example code you cited uses mean pooling on the token embeddings from the model's last transformer block. This doesn't require lm_head
.
from declutr.
Hmm, your pretrained model does not have weights for ['lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight', 'lm_head.dense.bias']
Did you set masked_language_modelling
to True
in the config? If so the model would have been loaded with AutoModelForMaskedLM
(see here) and I would have expected those weights to have been trained.
Still, maybe I am wrong and lm_head
is not used by your particular model. I think it is still worth evaluating this model you have trained and see if it performs well on your downstream tasks.
from declutr.
- the config is equal to your original
declutr.jsonnet
- besides the min/max length issue (see #235), thusmasked_language_modelling
set toTrue
is the case (just checked)
"model": {
"type": "declutr",
"text_field_embedder": {
"type": "mlm",
"token_embedders": {
"tokens": {
"type": "pretrained_transformer_mlm",
"model_name": transformer_model,
"masked_language_modeling": true
},
},
},
"loss": {
"type": "nt_xent",
"temperature": 0.05,
},
// There was a small bug in the original implementation that caused gradients derived from
// the contrastive loss to be scaled by 1/N, where N is the number of GPUs used during
// training. This has been fixed. To reproduce results from the paper, set this to false.
// Note that this will have no effect if you are not using distributed training with more
// than 1 GPU.
"scale_fix": false
},
However, as I wrote in #118 (comment) ** in the continued/restarted runs I used the first model as from_archive
: ** Is that the problem?
"model": {
"type": "from_archive",
"archive_file": "/notebooks/DeCLUTR/output_bs32_ep10/model.tar.gz"
},
-
the underlying model according to huggingface seems to be
XLMRobertaModel
- does it not use the referencedlm_head
hyperparameters in training? I doubt it... -
Something has been trained for sure :) The embeddings are signficantly different to the base model (
sentence-transformers/paraphrase-multilingual-mpnet-base-v2
) when used for semantic textual similarity, but I wonder if I miss out something here if the model "complains" in such a manner?
Any clarification is highly appreciated!
from declutr.
I think you are free to ignore these messages. I imagine this happens because somewhere during loading of the model, AutoModel.from_pretrained
is used, so the weights of lm_head
are not initialized, which is OK because we don't use them to produce sentence embeddings.
from declutr.
I have to admit that I am not particularly familiar enough with the underyling XLMRobertaModel
, but lm_head
sounds to me like the last hidden layer (in general, you put a task-specific header on top, e.g. softmax for classification tasks etc.) So for embeddings I would expect lm_head
to be used as last layer?
from declutr.
Closing this, feel free to re-open if you are still having issues.
from declutr.
Related Issues (20)
- Cant set up DECLUTR in local AWS linux machine HOT 2
- argument 'lazy' for dataset_reader HOT 2
- Superclass initialization in token embedder HOT 2
- Could not lex the character code 194 HOT 3
- Minimum text length violated despite preprocessing HOT 2
- How to plot the learning curve from the output logs created post training of declutr? HOT 1
- Impact of "shorter" documents (span, number of tokens) for extended pretraining HOT 7
- Installation issue HOT 8
- Strange issue occuring during Training HOT 2
- load pretrained tf1 model with pytorch HOT 5
- How to integrate a longer sequence model like longformer into declutr architecture HOT 8
- Encoder class breaks for long strings
- can i finetune the model ? HOT 2
- Update DeCLUTR requirements? HOT 5
- How to use a validation dataset when training? HOT 8
- RuntimeError: Error(s) in loading state_dict for DeCLUTR: HOT 2
- Error while encoding HOT 4
- Training with multi gpus HOT 6
- Installation fails in colab notebook HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from declutr.