Comments (6)
Fixed it ... Afterall I stumbled upon the subtle change required for the config:
Honestly thought I had tried this several times, so perhaps it was fatigue. But the following works for multi-gpu with allennlp v2.10. It is a subtle change from v1.1
"distributed": {
"cuda_devices": [8,9],
},
"trainer": {
// Set use_amp to true to use automatic mixed-precision during training (if your GPU supports it)
"use_amp": true,
"optimizer": {
"type": "huggingface_adamw",
"lr": 5e-5,
"eps": 1e-06,
"correct_bias": false,
"weight_decay": 0.1,
"parameter_groups": [
// Apply weight decay to pre-trained params, excluding LayerNorm params and biases
[["bias", "LayerNorm\.weight", "layer_norm\.weight"], {"weight_decay": 0}],
],
},
"callbacks":[{"type":'tensorboard'}],
"num_epochs": 10,
"checkpointer": {
// A value of null or -1 will save the weights of the model at the end of every epoch
"keep_most_recent_by_count": 2,
},
"grad_norm": 1.0,
"learning_rate_scheduler": {
"type": "slanted_triangular",
},
},
}
from declutr.
Hi @NtaylorOX, does this work without any changes to this codebase? I started migrating this to allennlp>2.0.0 a while back but ended up giving up because every breaking change I fixed seemed to be followed by another.
from declutr.
Hi @JohnGiorgi,
So I did have to make a few changes - in line with the guidance found here: allenai/allennlp#4933.
Whilst I seemed to have been successful in modifying the DeCLUTR codebase to work with allennlp v2.10, it has involved a couple crude/less than ideal changes from me. I was trying to get it to work on both windows and linux, which was a bit of a pain. I think I end up commenting out an assertion somewhere to get it to work.... At least now allennlp isn't changing the codebase.
I have wanted to take the time to make it much cleaner/robust to submit a pull request.
If it would be helpful for you, I can submit one anyway, or just share the code with you directly.
Let me know how you want to proceed
from declutr.
Hi @NtaylorOX, yeah would definitely be interested in an update that works on AllenNLP > 2.0. I think the big thing for me to merge it would be a demonstration that models are trained to the same loss and downstream performance
from declutr.
Hi @JohnGiorgi . Sorry I ended up so quiet on this, got swamped with other things...
I am still planning to find a day to action on this - am also beginning to migrate the functionality of DeCLUTR to the transformers library directly, just to make using your awesome architecture/algorithm hopefully more straight forward with what seems to have become the library of choice for NLP work.
Will try to keep you posted on both fronts.
Thanks
from declutr.
Wow that sounds great! Yeah keep me updated and let me know if you have any questions / there's anything I can help with.
from declutr.
Related Issues (20)
- Cant set up DECLUTR in local AWS linux machine HOT 2
- argument 'lazy' for dataset_reader HOT 2
- Superclass initialization in token embedder HOT 2
- Could not lex the character code 194 HOT 3
- Minimum text length violated despite preprocessing HOT 2
- How to plot the learning curve from the output logs created post training of declutr? HOT 1
- Impact of "shorter" documents (span, number of tokens) for extended pretraining HOT 7
- Installation issue HOT 8
- Wrong training procedure? HOT 6
- Strange issue occuring during Training HOT 2
- load pretrained tf1 model with pytorch HOT 5
- How to integrate a longer sequence model like longformer into declutr architecture HOT 8
- Encoder class breaks for long strings
- can i finetune the model ? HOT 2
- Update DeCLUTR requirements? HOT 5
- How to use a validation dataset when training? HOT 8
- RuntimeError: Error(s) in loading state_dict for DeCLUTR: HOT 2
- Error while encoding HOT 4
- Installation fails in colab notebook HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from declutr.