Git Product home page Git Product logo

Comments (8)

rdspring1 avatar rdspring1 commented on July 21, 2024

Sorry for the slow reply! I've been busy fine-tuning the hyperparameters, fixing small bugs, and reproducing the results. The latest model reports test ppl of 46.47 using LSTM 2048-256 - a single-layer lstm with 256 projection layer. You can run the model on a single GPU - (1080 Ti or higher) to reproduce the results.

from pytorch_gbw_lm.

eric-haibin-lin avatar eric-haibin-lin commented on July 21, 2024

Thanks for the reply! As I am reading the reference "Exploring the Limits of Language Modeling"
and your code, I noticed that the original "LSTM Projected" cells actually performs the projection and pass the projected result as the state to the next time step. But in your implementation, you're projecting only the LSTM result, not passing it as states between time steps. I am wondering if that intended? Please correct me if I misunderstood anything. Thank you very much for you kind attention.

from pytorch_gbw_lm.

rdspring1 avatar rdspring1 commented on July 21, 2024

Yes, it was intended. I wanted to use the pyTorch LSTM module with its cuDNN optimizations. It currently doesn't support the projection operation on the internal state. I also believed the projection layer is primarily used to reduce the number of parameters in the embedding and softmax layers.

Passing the projected result as the state for the next step should provide a minor speed boost.

from pytorch_gbw_lm.

eric-haibin-lin avatar eric-haibin-lin commented on July 21, 2024

I see. That makes sense.

Also I noticed that when calculating the loss, tensorflow implementation uses a mask which sets 0 on sentence boundaries, while yours doesn't. Is that also intentional? That would give different result on evaluation, right?

from pytorch_gbw_lm.

rdspring1 avatar rdspring1 commented on July 21, 2024

Where is that in the TensorFlow / PyTorch implementation?

from pytorch_gbw_lm.

eric-haibin-lin avatar eric-haibin-lin commented on July 21, 2024

Oh, it's the TF github mentioned in your references in README.
https://github.com/rafaljozefowicz/lm/blob/master/language_model.py#L102-L103
https://github.com/rafaljozefowicz/lm/blob/master/data_utils.py#L100-L110

from pytorch_gbw_lm.

rdspring1 avatar rdspring1 commented on July 21, 2024

I don't believe it should make a difference. The weights are always set to 1, since the entire vector is filled.

This repository removed the weight vector completely. (ICLR 2017 - Factorized LSTM)
It was forked from the original repo. https://github.com/okuchaiev/f-lm

from pytorch_gbw_lm.

eric-haibin-lin avatar eric-haibin-lin commented on July 21, 2024

Right, I misinterpreted it. It's always filled with ones except that there could be a few zeros for the last batch. Shouldn't make a big difference.

from pytorch_gbw_lm.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.