hmm-rnn's People
hmm-rnn's Issues
Tagging
Accuracy -- Does the most common tag of the word predicted, match the gold tag.
Perplexity -- p(w) --> p(t) against gold
HMM numbers are 1-best cluster, not marginal
model | LM Prp | UPOS | PTB |
---|---|---|---|
hmm_none_h900_lr0.001_drop0.0_ramsprop_wd0.0 | 304.09 | 68.23 | 52.36 |
hmm_word_h200_lr20.0_drop0.0_sgd_wd0.0 | 288.15 | 61.66 | 45.16 |
hmm-g_none_h900_lr0.001_drop0.0_ramsprop_wd0.0 | 243.51 | 59.64 | 44.62 |
rnn-1_word_h850_lr0.002_drop0.2_ramsprop | 207.95 | 48.54 | 36.68 |
rrnn-r_word_h800_lr0.002_drop0.6_ramsprop | 88.91 | 52.63 | 43.06 |
elman_word_h850_lr0.002_drop0.4_ramsprop | 87.27 | 54.59 | 44.97 |
lstm_word_h650_lr10.0_drop0.6_sgd | 80.61 | 55.08 | 45.75 |
LM experiments
- Perform sanity checks that models behave roughly as before with updated implementation.
- Add dropout and do a minimal amount of hyperparameter tuning (although for good LM performance better optimization techniques will required).
- Run experiments to compare models on PTB setup (once available).
Best Train/Val Numbers
Model | Train | Val | MostCommon |
---|---|---|---|
lstm.sgd.drop0.dim650.lr10.trshdecay4.drop06 | 51.860 | 80.610 | 0.3757 |
elman.ramsprop.drop0.4.dim850.lr0.002.trshdecay10.wdecay1e5 | 50.240 | 87.270 | 0.3858 |
rrnn-r.ramsprop.drop0.6.dim800.lr0.002.trshdecay10.wdecay1e5 | 56.370 | 88.910 | 0.3525 |
rnn-3.ramsprop.drop0.2.dim900.lr0.002.trshdecay10.wdecay1e5 | 77.130 | 107.450 | |
rnn-2.ramsprop.drop0.5.dim850.lr0.002.trshdecay10.wdecay1e5 | 113.950 | 162.410 | |
rnn-1.ramsprop.drop02.dim850.lr0.002.trshdecay10.wdecay1e7 | 201.350 | 207.950 | 0.2898 |
hmm-g none h900 e900 lr0.001 drop0.0 ramsprop wd0.0 pat5 tieE | 195.910 | 243.510 | 0.3977 (max) |
hmm-new none ramsprop.drop0.dim900.lr0.002.trshdecay10 | 233.220 | 284.590 | |
hmm+1 none h900 e900 lr0.001 drop0.0 ramsprop wd0.0 pat5 tieE | 208.090 | 287.000 | |
hmm word h200 e200 lr20.0 drop0.0 sgd wd0.0 pat5 tieE | 210.630 | 288.150 | 0.4354 (marg) |
hmm-new-c word ramsprop.drop0.dim900.lr0.002.trshdecay10 | 245.420 | 288.620 | |
hmm-new-rnn-emit none ramsprop.drop0.dim900.lr0.002.trshdecay10 | 202.570 | 299.580 | |
hmm none h900 e900 lr0.001 drop0.0 ramsprop wd0.0 pat5 tieE | 246.080 | 304.090 | 0.5002 (marg) |
hmm-new-elman-hmm-emit ramsprop.drop0.dim900.lr0.002.trshdecay10 | 325.140 | 343.040 | |
hmm+1 word h200 e200 lr10.0 drop0.0 sgd wd0.0 pat5 tieE | 327.890 | 351.530 |
PTB LM setup
- Load PTB data.
- Compute perplexity on training and validation data.
- Train with truncated backpropagation through time.
Models to implement
Add here whatever ideas we have and want to implement:
- RAN and other additive RNN variants
- HMM with delayed softmax in emission distribution
Full Work List
Models
- LSTM
- RAN -- Implemented simplified version (RRNN)
- Elman (sigmoid)
- Elman (softmax)
- Elman (early softmax) -- decomposed hidden+input
- HMM (delayed emission) -- Implemented. but due to tensor expands using lots of GPU memory can only use small hidden state size (up to 150).
- HMM (delayed transition)
- HMM (still w/ word cond)
- HMM (vanilla)
TODO
Max dim always 1024
- LSTM --@janmbuys Tuning
-- No Dropout
-- SGD (two strategies)
-- Dims
-- LRs - Implement the shit above
- LogSpace HMM -- made a tweak, it now seems to be getting ppl's very close to prob space.
- GridSearch -- Try and overfit
- Elman (3) -- @ybisk attempting to tune
- Elman (4)
- Elman (5)
- HMM (6)
- HMM (7)
- HMM (8)
- HMM (9)
- Total parameter calculation - implemented
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.