Comments (8)
Sorry for the slow reply! I've been busy fine-tuning the hyperparameters, fixing small bugs, and reproducing the results. The latest model reports test ppl of 46.47 using LSTM 2048-256 - a single-layer lstm with 256 projection layer. You can run the model on a single GPU - (1080 Ti or higher) to reproduce the results.
from pytorch_gbw_lm.
Thanks for the reply! As I am reading the reference "Exploring the Limits of Language Modeling"
and your code, I noticed that the original "LSTM Projected" cells actually performs the projection and pass the projected result as the state to the next time step. But in your implementation, you're projecting only the LSTM result, not passing it as states between time steps. I am wondering if that intended? Please correct me if I misunderstood anything. Thank you very much for you kind attention.
from pytorch_gbw_lm.
Yes, it was intended. I wanted to use the pyTorch LSTM module with its cuDNN optimizations. It currently doesn't support the projection operation on the internal state. I also believed the projection layer is primarily used to reduce the number of parameters in the embedding and softmax layers.
Passing the projected result as the state for the next step should provide a minor speed boost.
from pytorch_gbw_lm.
I see. That makes sense.
Also I noticed that when calculating the loss, tensorflow implementation uses a mask which sets 0 on sentence boundaries, while yours doesn't. Is that also intentional? That would give different result on evaluation, right?
from pytorch_gbw_lm.
Where is that in the TensorFlow / PyTorch implementation?
from pytorch_gbw_lm.
Oh, it's the TF github mentioned in your references in README.
https://github.com/rafaljozefowicz/lm/blob/master/language_model.py#L102-L103
https://github.com/rafaljozefowicz/lm/blob/master/data_utils.py#L100-L110
from pytorch_gbw_lm.
I don't believe it should make a difference. The weights are always set to 1, since the entire vector is filled.
This repository removed the weight vector completely. (ICLR 2017 - Factorized LSTM)
It was forked from the original repo. https://github.com/okuchaiev/f-lm
from pytorch_gbw_lm.
Right, I misinterpreted it. It's always filled with ones except that there could be a few zeros for the last batch. Shouldn't make a big difference.
from pytorch_gbw_lm.
Related Issues (15)
- RuntimeError: inconsistent tensor size HOT 7
- how to build Log_Uniform Sampler? HOT 1
- dead link (Google Billion Word Dataset for Torch) HOT 1
- Is the dataset offline? HOT 3
- Resume Training? HOT 2
- missing train_data.pt
- missing dataset
- Nondeterministic result? HOT 1
- sample_ids being ignored? HOT 2
- ImportError: cannot import name 'LogUniformSampler' HOT 4
- build Log_Uniform Sampler HOT 1
- Pretrained Model? HOT 2
- Preprocess problem HOT 1
- TypeError: iteration over a 0-d tensor HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch_gbw_lm.