Comments (6)
As of f0c79cc, I have changed the location of dropouts to "after" layer norm from "before" layer norm. It doesn't make sense to drop input channels to layer norm as they normalize across channel dimensions, this will cause distribution mismatch during inference time and training time. We shall see how this improves the model.
from qanet.
To overcome your GPU memory constraints, what about just decreasing batch size?
On a 1080 Ti (11GB), I'm able to run 128 hidden units, 8 attention heads, 300 glove_dim, 300 char_dim with a batch size of 12. At least 16 and above, CUDA is out of memory. Accuracy seems comparable so far.
from qanet.
You have a valid point, and I would like to know how your experiment goes. I would also suggest trying group norm instead of layer norm as they report better performance with lower batch sizes.
from qanet.
Good suggestion, Min. Since the paper compares against batch norm, have you found that layer norm generally outperforms batch norm lately? One could try batch norm also for comparison. Interestingly the 'break-even' point is about batch size 12 between batch norm and group norm for those paper's conditions. Layer norm is supposedly more robust to small mini batches compared to batch norm.
Also the conditions from the above comment run fine on a 1070 gpu.
Do you have a sense if model parallelization across multiple gpus is worth it for this type of model?
from qanet.
Hi @mikalyoung , I haven't tried parallelisation across multiple GPUs so I wouldn't know what the best way to go about it is. I heard that data parallelism is easier to get working than model parallelisation. It seems that from #15 using bigger hidden size and bigger number of heads in attention improves the performance, so I would try fitting the bigger model with smaller batches into multiple GPUs.
from qanet.
Right now what is the status reproducing the paper's result?
from qanet.
Related Issues (20)
- https://nlp.stanford.edu/data/glove.840B.300d.zip HOT 2
- how to predict answers for custom question and context by reusing loaded model HOT 1
- The embedding projection HOT 1
- Train Models With Macs HOT 2
- mask_logits in layer.py HOT 1
- layer normalization in layer? HOT 3
- Inference on my machine differs from other machines I tested HOT 3
- Is this snippet in prepro.py correct HOT 1
- Training stops after some time HOT 9
- how to adapt it for squad2.0 dataset? HOT 1
- This repo cannot reproduce the result of original paper HOT 2
- RuntimeError('cannot join current thread',) in <object repr() failed> HOT 1
- How do I resume training off a checkpoint? HOT 1
- how to train by changing/adding batch_size?
- Parameter setting problem
- conv_block problem
- problem about highwaynet
- why did u add this kind of dropout in every residual block HOT 1
- ValueError: could not convert string to float: 'bewildered'
- Predict
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qanet.