Git Product home page Git Product logo

Comments (5)

sailordiary avatar sailordiary commented on June 21, 2024

Hi,

LipNet is one of the smallest lip reading models that I know of. It converges fairly quickly; however the GRID dataset is small so it might take a bit longer to train to optimal performance (e.g. to match the WERs reported in the paper).

On the other hand you can also experiment with simple 2D video encoders like VGG-M. I think it converges quickly too.

from lipnet-pytorch.

sailordiary avatar sailordiary commented on June 21, 2024

@kunshou123 , some updates: I just reproduced the overlapped speakers setup in the paper. It took 91 epochs to reach the "Baseline-NoLM" results reported in the paper (notably, I used greedy decoding, not beam search decoding, so the actual performance could be even better). Training takes about 40 min per epoch, using the parameter settings in the current revision, which are taken directly from the authors.

For those who happen to have dropped by, I plan to release the pre-trained checkpoints, as soon as I have the time to clean the dataset preparation code (clearly, preprocessing matters for the model to be useful).

from lipnet-pytorch.

sailordiary avatar sailordiary commented on June 21, 2024

Here are the training curves for overlapped speakers, if anyone's interested. (The discontinuities were accidental; I restored optimizer states.)
training_curve

from lipnet-pytorch.

kunshou123 avatar kunshou123 commented on June 21, 2024

oh!! Thank you very much for your reply. I have been confused in this aspect for a long time. I am a novice in lipreading, and the effect of 3d CNN training is very bad and loss cannot significantly decreased.i will try it

from lipnet-pytorch.

WeicongChen avatar WeicongChen commented on June 21, 2024

@kunshou123 , some updates: I just reproduced the overlapped speakers setup in the paper. It took 91 epochs to reach the "Baseline-NoLM" results reported in the paper (notably, I used greedy decoding, not beam search decoding, so the actual performance could be even better). Training takes about 40 min per epoch, using the parameter settings in the current revision, which are taken directly from the authors.

For those who happen to have dropped by, I plan to release the pre-trained checkpoints, as soon as I have the time to clean the dataset preparation code (clearly, preprocessing matters for the model to be useful).

Hi, can you kindly share your preprocessing code? I am struggling with reproducing LipNet these days.

from lipnet-pytorch.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.