Comments (5)
Hi,
LipNet is one of the smallest lip reading models that I know of. It converges fairly quickly; however the GRID dataset is small so it might take a bit longer to train to optimal performance (e.g. to match the WERs reported in the paper).
On the other hand you can also experiment with simple 2D video encoders like VGG-M. I think it converges quickly too.
from lipnet-pytorch.
@kunshou123 , some updates: I just reproduced the overlapped speakers setup in the paper. It took 91 epochs to reach the "Baseline-NoLM" results reported in the paper (notably, I used greedy decoding, not beam search decoding, so the actual performance could be even better). Training takes about 40 min per epoch, using the parameter settings in the current revision, which are taken directly from the authors.
For those who happen to have dropped by, I plan to release the pre-trained checkpoints, as soon as I have the time to clean the dataset preparation code (clearly, preprocessing matters for the model to be useful).
from lipnet-pytorch.
Here are the training curves for overlapped speakers, if anyone's interested. (The discontinuities were accidental; I restored optimizer states.)
from lipnet-pytorch.
oh!! Thank you very much for your reply. I have been confused in this aspect for a long time. I am a novice in lipreading, and the effect of 3d CNN training is very bad and loss cannot significantly decreased.i will try it
from lipnet-pytorch.
@kunshou123 , some updates: I just reproduced the overlapped speakers setup in the paper. It took 91 epochs to reach the "Baseline-NoLM" results reported in the paper (notably, I used greedy decoding, not beam search decoding, so the actual performance could be even better). Training takes about 40 min per epoch, using the parameter settings in the current revision, which are taken directly from the authors.
For those who happen to have dropped by, I plan to release the pre-trained checkpoints, as soon as I have the time to clean the dataset preparation code (clearly, preprocessing matters for the model to be useful).
Hi, can you kindly share your preprocessing code? I am struggling with reproducing LipNet these days.
from lipnet-pytorch.
Related Issues (6)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lipnet-pytorch.