Nice work! I have a question regarding the result: In the paper "Exploring the lim

Oh, it's the TF github mentioned in your references in README. <a href="https://gi

state of the art performance? about pytorch_gbw_lm HOT 8 CLOSED

rdspring1 commented on August 23, 2024

state of the art performance?

from pytorch_gbw_lm.

Comments (8)

rdspring1 commented on August 23, 2024

Sorry for the slow reply! I've been busy fine-tuning the hyperparameters, fixing small bugs, and reproducing the results. The latest model reports test ppl of 46.47 using LSTM 2048-256 - a single-layer lstm with 256 projection layer. You can run the model on a single GPU - (1080 Ti or higher) to reproduce the results.

from pytorch_gbw_lm.

eric-haibin-lin commented on August 23, 2024

Thanks for the reply! As I am reading the reference "Exploring the Limits of Language Modeling"
and your code, I noticed that the original "LSTM Projected" cells actually performs the projection and pass the projected result as the state to the next time step. But in your implementation, you're projecting only the LSTM result, not passing it as states between time steps. I am wondering if that intended? Please correct me if I misunderstood anything. Thank you very much for you kind attention.

from pytorch_gbw_lm.

rdspring1 commented on August 23, 2024

Yes, it was intended. I wanted to use the pyTorch LSTM module with its cuDNN optimizations. It currently doesn't support the projection operation on the internal state. I also believed the projection layer is primarily used to reduce the number of parameters in the embedding and softmax layers.

Passing the projected result as the state for the next step should provide a minor speed boost.

from pytorch_gbw_lm.

eric-haibin-lin commented on August 23, 2024

I see. That makes sense.

Also I noticed that when calculating the loss, tensorflow implementation uses a mask which sets 0 on sentence boundaries, while yours doesn't. Is that also intentional? That would give different result on evaluation, right?

from pytorch_gbw_lm.

rdspring1 commented on August 23, 2024

Where is that in the TensorFlow / PyTorch implementation?

from pytorch_gbw_lm.

eric-haibin-lin commented on August 23, 2024

Oh, it's the TF github mentioned in your references in README.
https://github.com/rafaljozefowicz/lm/blob/master/language_model.py#L102-L103
https://github.com/rafaljozefowicz/lm/blob/master/data_utils.py#L100-L110

from pytorch_gbw_lm.

rdspring1 commented on August 23, 2024

I don't believe it should make a difference. The weights are always set to 1, since the entire vector is filled.

This repository removed the weight vector completely. (ICLR 2017 - Factorized LSTM)
It was forked from the original repo. https://github.com/okuchaiev/f-lm

from pytorch_gbw_lm.

eric-haibin-lin commented on August 23, 2024

Right, I misinterpreted it. It's always filled with ones except that there could be a few zeros for the last batch. Shouldn't make a big difference.

from pytorch_gbw_lm.

state of the art performance? about pytorch_gbw_lm HOT 8 CLOSED

Comments (8)

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent