<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

spend times on training about lipnet-pytorch HOT 5 CLOSED

sailordiary commented on June 21, 2024

spend times on training

from lipnet-pytorch.

Comments (5)

sailordiary commented on June 21, 2024

Hi,

LipNet is one of the smallest lip reading models that I know of. It converges fairly quickly; however the GRID dataset is small so it might take a bit longer to train to optimal performance (e.g. to match the WERs reported in the paper).

On the other hand you can also experiment with simple 2D video encoders like VGG-M. I think it converges quickly too.

from lipnet-pytorch.

sailordiary commented on June 21, 2024

@kunshou123 , some updates: I just reproduced the overlapped speakers setup in the paper. It took 91 epochs to reach the "Baseline-NoLM" results reported in the paper (notably, I used greedy decoding, not beam search decoding, so the actual performance could be even better). Training takes about 40 min per epoch, using the parameter settings in the current revision, which are taken directly from the authors.

For those who happen to have dropped by, I plan to release the pre-trained checkpoints, as soon as I have the time to clean the dataset preparation code (clearly, preprocessing matters for the model to be useful).

from lipnet-pytorch.

sailordiary commented on June 21, 2024

Here are the training curves for overlapped speakers, if anyone's interested. (The discontinuities were accidental; I restored optimizer states.)

from lipnet-pytorch.

kunshou123 commented on June 21, 2024

oh!! Thank you very much for your reply. I have been confused in this aspect for a long time. I am a novice in lipreading, and the effect of 3d CNN training is very bad and loss cannot significantly decreased.i will try it

from lipnet-pytorch.

WeicongChen commented on June 21, 2024

@kunshou123 , some updates: I just reproduced the overlapped speakers setup in the paper. It took 91 epochs to reach the "Baseline-NoLM" results reported in the paper (notably, I used greedy decoding, not beam search decoding, so the actual performance could be even better). Training takes about 40 min per epoch, using the parameter settings in the current revision, which are taken directly from the authors.

For those who happen to have dropped by, I plan to release the pre-trained checkpoints, as soon as I have the time to clean the dataset preparation code (clearly, preprocessing matters for the model to be useful).

Hi, can you kindly share your preprocessing code? I am struggling with reproducing LipNet these days.

from lipnet-pytorch.

spend times on training about lipnet-pytorch HOT 5 CLOSED

Comments (5)

Related Issues (6)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent