Git Product home page Git Product logo

learn-an-effective-lip-reading-model-without-pains's People

Contributors

fengdalu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

learn-an-effective-lip-reading-model-without-pains's Issues

incorrect results: am i doing something wrong?

Hi,
I tried to test it on some videos. i crop the lip region using dlib. I get results but the words do not match. What could possibly be wrong? Do you need videos at a specific fps? Please help. Thanks for the code!

questions about testing on my own data

Hi, I am very interested in the work of your team, and I have a few questions

  1. On LRW1000, you need to choose 40 frames for a word, so do I need to adjust it according to the speech rate when testing on other videos?
  2. The accuracy rate is very low when testing my own video, But the demo displayed on the team's homepage have good performance(such as https://vipl.ict.ac.cn/team.php?id=10) ,is the model here trained on LRW-1000? or is there a private data set?
    Looking forward to your reply!

How to output the predicted word or sentence

Hello,dalu. II want to know how to output the predicted word or sentence. For example, whether it is right or wrong, it will output the predicted result 'about'. And how can I build my own test file with my own video?

RuntimeWarning: invalid value encountered in double_scalars

Hey there, I tried to run your program and get this warning. Can you help me out. I'm just starting deeplearning 😊

load weights
loaded params/tot params:151/151
miss matched params: []
Start Testing, Data Length: 0
start testing
main_visual.py:155: RuntimeWarning: Mean of empty slice.
  acc = float(np.array(v_acc).reshape(-1).mean())
/home/ubuntu/anaconda3/envs/learn_lip/lib/python3.8/site-packages/numpy/core/_methods.py:188: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
acc=nan

RuntimeWarning: Mean of empty slice.

load weights
loaded params/tot params:183/183
miss matched params: []
Start Testing, Data Length: 0
start testing

Hi, I already run the scripts/prepare_lrw1000.py and it generate LRW1000_Public_pkl_jpeg folder which has pkl files in trn,val and test. But still report this error

Training on LRW, accuracy not improving

Hi there,

Thanks for your great work! I have been trying to reproduce the results, however, the training loss didn't decrease and the accuracy was always 0. I followed the instructions and didn't change the code except calculating the ETA in seconds. Do you have any idea what is happening?

Here is parts of the training log.

Start Training, Data Length: 488766
epoch=0,train_iter=0,eta=36370.56232s,CE V=6.21437,lr=0.004800,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=34.82164
epoch=0,train_iter=1,eta=1969.14507s,CE V=6.45720,lr=0.004800,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=36.19475

epoch=1,train_iter=3819,eta=2056.03918s,CE V=7.47496,lr=0.004799,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=37.12496

epoch=2,train_iter=7638,eta=1983.88589s,CE V=7.74652,lr=0.004797,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=34.23733

epoch=3,train_iter=11457,eta=1980.25018s,CE V=7.42752,lr=0.004793,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=34.97991

epoch=4,train_iter=15276,eta=2137.25215s,CE V=7.26190,lr=0.004787,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=37.26590

epoch=5,train_iter=19095,eta=2146.40197s,CE V=7.49472,lr=0.004779,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=33.89755

epoch=6,train_iter=22914,eta=2150.36547s,CE V=8.05367,lr=0.004770,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=37.19277

epoch=7,train_iter=26733,eta=2126.86129s,CE V=7.35550,lr=0.004760,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=34.21873

epoch=8,train_iter=30552,eta=2104.79027s,CE V=7.92131,lr=0.004748,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=36.37280

epoch=9,train_iter=34371,eta=2036.25903s,CE V=7.50125,lr=0.004734,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=35.59670

epoch=10,train_iter=38190,eta=2054.36018s,CE V=7.56499,lr=0.004718,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00205,eta=37.45618

epoch=12,train_iter=45828,eta=2067.50536s,CE V=7.65708,lr=0.004683,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=37.46259

epoch=13,train_iter=49647,eta=2110.89804s,CE V=7.37481,lr=0.004662,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=37.27319

epoch=14,train_iter=53466,eta=2129.09115s,CE V=8.00831,lr=0.004641,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00205,eta=37.12486

epoch=15,train_iter=57285,eta=2052.47540s,CE V=7.74086,lr=0.004618,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=37.35628

epoch=16,train_iter=61104,eta=2103.29428s,CE V=7.08775,lr=0.004593,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=38.62439

epoch=17,train_iter=64923,eta=2127.42126s,CE V=7.76207,lr=0.004566,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=39.42123

epoch=18,train_iter=68742,eta=2032.93654s,CE V=7.45536,lr=0.004539,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=35.68857

epoch=19,train_iter=72561,eta=2047.84631s,CE V=7.57583,lr=0.004509,best_acc=0.000000
Start Testing, Data Length: 25000
start testing

v_acc=0.00000,eta=37.30936

epoch=20,train_iter=76380,eta=2126.75749s,CE V=7.46031,lr=0.004479,best_acc=0.000000
Terminated

where are the codes

Hi, dalu,
"A special setting on LRW-1000 is that we chose 40 frames for each word and put the target word at the center to make it similar to the data in LRW"
Where is the code in this repo?
This line tensor[:t,...] = files.copy() does not mean the center.

word boundary?

Hi, when test samples of real videos(not in dataset), how to get "word boundary"?

What is the metric?

What is the metric of your results (e.g. WER, or accuracy)? It seems not stated in the README or paper.

demo code for video?

Dear author dalu:
Thanks for sharing this simple but effective work. I just wonder will you share the demo code for inference a single video and visualize the result on screen. it would be greate helpful for the community. Thank you again.

transformer result

The result of the transformer in your arxiv paper Table2 is 44.5%,which paper you cited?

What is the definition of accuracy?

I've read the whole paper, and it fully introduces how the model is built. But there is one thing I don't figure out. What is the definition of accuracy? Is it depend on every single word that the model predicts or the whole sentence?

Preview message

After testing for one video it is only showing the accuracy how to see the predicted word

关于baseline和alignedlip的疑问

你好,在阅读论文和代码后有以下两个疑问。
1.论文里Baseline采用的是3d+resnet18+3 layers GRU。在LRW1000上指标是46.5
之前论文类似的方法只能到38.7,如table5中引用29提到的《Mutual information maximization for effective lip reading》
我想问下这个提升怎么做到的?

2.table3中alignedLip是指做了嘴部对齐的,但是在LRW-1000上没有看到相关指标。想问下嘴唇对齐在LRW-1000上能提高多少?

Audio data format in LRW1000?

Hi, I would like to ask the data format of the audio in the LRW1000 dataset is .wav, while the data format of prepare_lrw1000 is .npy, How to convert?

How much training time?

Thanks for your great work.
Can you tell me how much time you cost when training model in LRW data? And do you meet with the problem that it cost too much time in loading the data?
Thanks in advance.

CUDNN_STATUS_BAD_PARAM error

HI there,
I can only run the test function on cpu device. Have some one faced this error. even when I train the model, it works fine in train mode but when it calls the test function on validation set, it return 'CUDNN_STATUS_BAD_PARAM ' error. I searched and one solution was that to .float() to all your tensors. I did try this solution, but I'm still seeing the same error.
Do any one have any idea?

RuntimeError: [enforce fail at inline_container.cc:144] . PytorchStreamReader failed reading zip archive: failed finding central directory

Traceback (most recent call last): File "main_visual.py", line 100, in <module> weight = torch.load(args.weights, map_location=torch.device('cpu')) File "/home/xjw/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/serialization.py", line 577, in load with _open_zipfile_reader(opened_file) as opened_zipfile: File "/home/xjw/.conda/envs/open-mmlab/lib/python3.7/site-packages/torch/serialization.py", line 241, in __init__ super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError: [enforce fail at inline_container.cc:144] . PytorchStreamReader failed reading zip archive: failed finding central directory

I try to test lrw1000 model which downloaded from your google drive, but when i get ready eveything, some problems have raisen in test code, i guess the model couldn't load by using torch.load, so could you give some advise for the problem, thank you!

if (args.weights is not None): print('load weights') weight = torch.load(args.weights, map_location=torch.device('cpu')) load_mi

Need help

hi, in prepare_lrw1000.py
audio_file = 'LRW1000_Public/audio/' + audio_file + '.npy'
if(os.path.exists(audio_file)):
should I transform wav files into npy before run prepare_lrw1000?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.