Comments (2)
Mostly based off intuition. We wanted to train for long enough where the resulting model would be reasonably "within distribution" such that we could study the phenomena of interest, but we also don't have infinite data & budget to keep training or try a lot of different amounts of steps.
100k updates at batch size of 256 seemed sensible based on informal past domain adaptation experiments. But we also found the RoBERTa paper recommended using extremely large batch sizes (e.g. 2048), so we just converted that to 100K * 256 / 2048 = 12.5K updates.
from dont-stop-pretraining.
ok, got it. Thanks!
from dont-stop-pretraining.
Related Issues (20)
- Reproduce the result of Chemprot using RoBERTa HOT 4
- Dataset for DAPT HOT 1
- Datasets for DAPT HOT 1
- This problem occurs when running the script:allennlp.common.checks.ConfigurationError:key 'data_loader' is required HOT 1
- Pytorch-transformer and Allennlp Compatibility HOT 9
- Accessing data: 403 Forbidden HOT 1
- Regarding MLM Loss of Lrob and Ldapt HOT 1
- Extend to T5 models HOT 2
- About data selection
- ImportError SpacyTokenizer on vampire branch allennlp-1.0
- Fail to reproduce the work HOT 4
- TAPT dataset HOT 1
- pre-train commands,where is `ADAPTIVE_PRETRAINING.md`file for DAPT/TAPT commands? HOT 1
- when do domain-adaptive pretraining, seems can not extend the vocabulary?
- TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType HOT 2
- How to preprocess the data ?
- 您好!我运行时为啥老出现各种奇葩问题?显示 /bin/sh:1: allennlp:not found “Command allenlp train --include-apckage dont_stop_pretraining training_config/classifier.jsonnet -s model_logs/citation-intent-base” returned non-zero exit status 127 HOT 1
- Does DAPT lead to forgetting over the original LM domain or overfitting over the target domain?
- allennlp.common.checks.ConfigurationError: Extra parameters passed to PretrainedTransformerIndexer: {'do_lowercase': False} HOT 6
- Does more steps of pretraining lead to better encoder for downstream tasks?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dont-stop-pretraining.