georgesterpu / avsr-tf1 Goto Github PK

View Code? Open in Web Editor NEW

80.0 7.0 28.0 5.17 MB

Audio-Visual Speech Recognition using Sequence to Sequence Models

License: GNU General Public License v3.0

Python 91.58% HTML 6.15% CSS 0.36% JavaScript 1.91%

asr avsr lipreading seq2seq tensorflow sigmedia tcd-timit

avsr-tf1's People

Contributors

Stargazers

Watchers

avsr-tf1's Issues

use video-only on LRS2

Hi,
I used the video-only model for 100 iteration training on the dataset LRS2, but it always predicted almost the same sentences. What is the cause of the problem?

visemes and phonemes mapping

@georgesterpu there are 16 classes of visemes in English. why you just use 12 visemes?
where is the mapping?

awgn: out of bounds when sampling noise clip

https://github.com/georgesterpu/Sigmedia-AVSR/blob/41a32e985ac16d0a86aa3bf2e445f69467f3aef5/avsr/awgn.py#L66

should be
start_limit = len(data) - target_len - 1
or
start_limit = len(data) - (target_len + 1)

Sorry I have a question about the program. I have trained the models for many epochs, but until now (20 epochs), the decoded results are all the same, I want to ask if it is normal at the beginning the results are the same?

Here are some results at 20 epochs:
Test/6378096013485054233/00006 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [THAT'S WHY I'M LOSING MY VOICE]
Test/6355556454612448247/00010 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [AND IN THE PROCESS OF ME NOT OFFENDING GOD]
Test/6374083655167624227/00107 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [DEPENDED ON TRADE]
Test/6344709944203120665/00030 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [THAT'S A SHOCKING PERCENTAGE]
Test/6362913733590438579/00099 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [WHEN I GET HOME]
Test/6347195441777342774/00010 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [OUR CONTINUED PRESSURE]
Test/6343252661930009508/00117 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [ACCORDING TO LEGEND]
Test/6339758276407584924/00030 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [WHICH IS ONLY 10 MINUTES AWAY]
Test/6381853250875656673/00019 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [TODAY ALMOST 50]
Test/6334563083966338309/00024 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [IF IT WAS GOING TO BE INCREASED]
Test/6361814651459450364/00025 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [THE PRIME MINISTER SLAPS HIM DOWN]
Test/6349793037997935601/00128 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [ESPECIALLY AS IT RELATES TO CHILDREN]
Test/6369259547770281088/00021 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [BUT SLIGHTLY REBELLIOUS SOCKS]
Test/6357006006074844331/00013 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [AFTER ALL I'VE BEEN THROUGH]
Test/6377794506780964609/00004 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [I'M SORRY THAT DOESN'T MEET WITH YOUR APPROVAL]
Test/6357006006074844331/00003 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [BUT IT'S NOT ALL BAD NEWS]
Test/6392896900283869820/00003 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [IT MAY TAKE SOME TIME]
Test/6334934169140712754/00001 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [BUT WHAT A SURPRISE WHEN YOU COME IN]
Test/6354988230439182363/00085 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [MY FIRST GRANDDAUGHTER]
Test/6388126909604862142/00013 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [ANYTHING CAN HAPPEN]
Test/6392197250111351363/00024 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [EVERYONE HAS GONE HOME HAPPY AND THAT'S WHAT IT'S ALL ABOUT]
Test/6362913733590438579/00082 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [SPACE WAS STILL THE FINAL FRONTIER]
Test/6386959537493875163/00014 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [WE NEED TO BE A TEAM]
Test/6384829663342218937/00013 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [FROM TIME TO TIME]
Test/6361725745636423155/00006 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [BUT SHE'S NOT CONFUSED]
Test/6341242617105082637/00032 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [IF YOU CAN'T GO UP]
Test/6350523611934991188/00027 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [THIS IS ONE OF THE QUESTIONS]
Test/6370728426585516325/00014 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [I THINK YOU DESERVE]
Test/6392568335285725796/00003 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [THE WAY THE MARKET IS]
Test/6381766922032944158/00007 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [VERY HEAVILY TAXED]
Test/6373112133434795034/00005 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [FOR SOMETHING COMPLETELY DIFFERENT]
Test/6368494184598133861/00041 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [IT'S DOWN TO OUR BRAIN GETTING THINGS WRONG]
Test/6340299442417279404/00006 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [SO RATHER THAN JUST RELYING ON THIS INFORMATION]
Test/6369912812296002056/00009 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [BACK TO THE HISTORY BOOKS]
Test/6351265782283739990/00032 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [TRAPPED UNDER HEAVY ARTILLERY FIRE]
Test/6384361941273252259/00012 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [YOU'RE AN IDIOT]
Test/6347195441777342774/00139 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [IN ALL SERIOUSNESS]
Test/6343252661930009508/00020 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [FOR THE FIRST TIME]
Test/6380036479839881698/00012 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [WITH OTHER PEOPLE]
Test/6360322579951237696/00006 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [WHAT WE'VE GOT HERE IS AN OPPORTUNITY TO UNDERSTAND ONE OF THESE MONUMENTS IN A MODERN]
Test/6380021017957616097/00021 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [COME TO MY ROOM]
Test/6351810813633601326/00016 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [WHY DON'T WE ALL JUST GET ALONG]
Test/6349847154585871175/00014 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [HERSELF CURRENTLY LEARNING TO SIGN]
Test/6368494184598133861/00030 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [REPRESENTED BY THIS BUNCH OF BANANAS]
Test/6374454740211592999/00010 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [MUCH MORE DRIVEN]
Test/6375923619026758902/00020 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [AS LONG AS THEY SEE THE SPARK OF SUCCESS]
Test/6353909764151156687/00001 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [FURNITURE OR OBJECTS LIKE THAT]
Test/6335305254315087182/00014 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [BUT WE'RE GOING TO IMPROVE ON THINGS LIKE THE GREEN ROOF]
Test/6373112133434795034/00002 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [FLOWERS AND EVEN VEGETABLES]
Test/6374060462213818017/00008 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [AND MORE MONEY FOR FIFA]
Test/6385475196796375542/00001 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [I WOULD LIKE TO HELP]
Test/6367373198133943465/00022 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [YOU GO TO BED WITH IT]
Test/6369599709180191149/00011 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [I'M NOT SURE WHAT'S THE MOST DIFFICULT THING TO GET RIGHT]
Test/6355788382846429521/00005 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [TWO APPOINTMENTS LATER]
Test/6348038114360790454/00004 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [WITHOUT FURTHER ADO]
Test/6364791063795580193/00006 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [THE PERFECT FAMILY]
Test/6362913733590438579/00185 THE THE THE THE THE THE THE THE THE THE THE THE THE THE THE [YOU CAN'T REALLY SAY]

What do the folders speaker-dependence and speaker-independence stand for?

What do the folders speaker-dependence and speaker-independence stand for?Thanks

How can I configure with cuda

Hallo,

I'm trying train the model with cuda, I changed the experiment_tcd_av.py and set os.environ['CUDA_VISIBLE_DEVICES'] = "0", but when I use nvidia-smi check, there is no program running on gpu, I want to ask what should I do to make it possible running on cuda.

Thank you very much.

KeyError: 'aus' when running run_audiovisual.py

Hi, I have an error of KeyError: 'aus' from the line normed_aus = tf.clip_by_value(self._data.payload['aus'], 0.0, 3.0) / 3.0 in encoder.py.
I preprocessed my data with extract_faces.py and write_records_tcd.py with the LRS3 data. I realized that my self._data.payload is an empty dictionary. Any idea how to solve this error? Or is there any other possible variable that I can replace self._data.payload['aus'] with?

Any help is appreciated, thank you in advance!

The result in noisy environment

Hi,

Thank you for your open-source codes. With your help, I have reproduced the result under noiseless conditions. In ao, I get the result cer:19.48% and wer:44.91. But at the 10db cafe noisy, The result is not convergent at all. In the case of noise, do we need to modify any parameters. Could you please give me some suggestions?

Thanks a lot.

Ask about epochs and learning rate

Hallo @georgesterpu ,

I want ask about your parameters, your code can also implement the experiment, which described in "Lip Reading Sentences in the Wild". I want ask about how many epochs do you have trained and which learning rate you have chosen.

Thank you very much.

How to run this program on multiple GPUs

hello @georgesterpu,
Thank you for the open source code, I have a problem now.
When I run this program on multiple GPUs, I found that only one GPU is full, and the remaining GPUs are not used.
I am just a new comer for tensorflow. Some of the methods provided by Google are also useless, so I want to know how to modify the code to solve my problem?

[AMSGrad] get running error in audio only training

Hi, @georgesterpu.

I do audio only test in my dataset.

so i fix some expriment_tcd_audio.py code to my unit_file & TFRecord path.

And then, i run my code. but i get running error in TensorFlow.

below is my error.

WARNING:tensorflow:From /home/yong/project/yong_Sigmedia-AVSR/avsr/io_utils.py:208: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version. Instructions for updating: Use eager execution and: tf.data.TFRecordDataset(path)WARNING:tensorflow:From /home/yong/project/yong_Sigmedia-AVSR/avsr/io_utils.py:113: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/data/ops/dataset_ops.py:1419: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /home/yong/project/yong_Sigmedia-AVSR/avsr/encoder.py:44: batch_normalization (from tensorflow.python.layers.normalization) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.batch_normalization instead. WARNING:tensorflow:From /home/yong/project/yong_Sigmedia-AVSR/avsr/cells.py:24: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version. Instructions for updating: This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0. WARNING:tensorflow:From /home/yong/project/yong_Sigmedia-AVSR/avsr/cells.py:92: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version. Instructions for updating: This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0. WARNING:tensorflow:From /home/yong/project/yong_Sigmedia-AVSR/avsr/encoder.py:81: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version. Instructions for updating: Please usekeras.layers.RNN(cell), which is equivalent to this API WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/rnn.py:626: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/rnn_cell_impl.py:1259: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use rateinstead ofkeep_prob. Rate should be set to rate = 1 - keep_prob. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/seq2seq/python/ops/helper.py:311: Bernoulli.__init__ (from tensorflow.python.ops.distributions.bernoulli) is deprecated and will be removed after 2019-01-01. Instructions for updating: The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use tfp.distributionsinstead oftf.distributions. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/distributions/bernoulli.py:97: Distribution.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01. Instructions for updating: The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use tfp.distributionsinstead oftf.distributions. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/seq2seq/python/ops/helper.py:314: Categorical.__init__ (from tensorflow.python.ops.distributions.categorical) is deprecated and will be removed after 2019-01-01. Instructions for updating: The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use tfp.distributionsinstead oftf.distributions`.
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/distributions/categorical.py:278: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.random.categorical instead.
/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients_impl.py:110: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 511, in _apply_op_helper
preferred_dtype=default_dtype)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1175, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 977, in _TensorTensorConversionFunction
(dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype float32 for Tensor with dtype resource: 'Tensor("Decoder/audio/Encoder/multi_rnn_cell/cell_0/gru_cell/gates/kernel/AMSGrad:0", shape=(), dtype=resource)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "experiment_lrs3_audio.py", line 61, in
main(sys.argv)
File "experiment_lrs3_audio.py", line 44, in main
num_gpus=1,
File "/home/yong/project/yong_Sigmedia-AVSR/avsr/avsr.py", line 193, in init
self._create_models()
File "/home/yong/project/yong_Sigmedia-AVSR/avsr/avsr.py", line 381, in _create_models
batch_size=self._hparams.batch_size[0])
File "/home/yong/project/yong_Sigmedia-AVSR/avsr/avsr.py", line 418, in _make_model
hparams=self._hparams
File "/home/yong/project/yong_Sigmedia-AVSR/avsr/seq2seq.py", line 21, in init
self._make_decoder()
File "/home/yong/project/yong_Sigmedia-AVSR/avsr/seq2seq.py", line 101, in _make_decoder
hparams=self._hparams
File "/home/yong/project/yong_Sigmedia-AVSR/avsr/decoder_unimodal.py", line 59, in init
self._init_decoder()
File "/home/yong/project/yong_Sigmedia-AVSR/avsr/decoder_unimodal.py", line 118, in _init_decoder
self._init_optimiser()
File "/home/yong/project/yong_Sigmedia-AVSR/avsr/decoder_unimodal.py", line 465, in _init_optimiser
zip(gradients, variables), global_step=tf.train.get_global_step())
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/optimizer.py", line 612, in apply_gradients
update_ops.append(processor.update_op(self, grad))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/optimizer.py", line 171, in update_op
update_op = optimizer._resource_apply_dense(g, self._v)
File "/home/yong/project/yong_Sigmedia-AVSR/avsr/AMSGrad.py", line 96, in _resource_apply_dense
m_t = state_ops.assign(m, beta1_t * m + m_scaled_g_values, use_locking=self._use_locking)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 812, in binary_op_wrapper
return func(x, y, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 1078, in _mul_dispatch
return gen_math_ops.mul(x, y, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 5860, in mul
"Mul", x=x, y=y, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 547, in _apply_op_helper
inferred_from[input_arg.type_attr]))
TypeError: Input 'y' of 'Mul' Op has type resource that does not match type float32 of argument 'x'.
Exception ignored in: <bound method AVSR.del of <avsr.avsr.AVSR object at 0x7fa48a9e3470>>
Traceback (most recent call last):
File "/home/yong/project/yong_Sigmedia-AVSR/avsr/avsr.py", line 198, in del
self._train_session.close()
AttributeError: 'AVSR' object has no attribute '_train_session'`

I guess some TF version or whatelse...

what can i do for fix it?

thanks.

Inquiry about some parameter selection reason

Hi, georgesterpu.

Thanks for sharing this great code ahead of the question.

I have some questions some initial parameter settings.

In expriment_tcd_av.py, why do you choose (0.9, 0.9, 0.9) for dropout probability?

Second, why do you initialize 'highway_encoder' parameter to 'False'?

Third, If i change architecture from 'av_align' to 'wlas', can i run the WLAS model?

Finally, could you sharing your 'num_epochs' and 'learning_rate' on the LRS2 DB?

how to pad features and labels to same length in one batch ?

Hi, I can not find the code for padding, could you point out please ?
And what is the meaning of [MASK, END] in unit dict, where used it

Can u provice me the dataset?

hello. How can I get the TCD-TIMIT dataset? I create an acount in the website "https://sigmedia.tcd.ie/TCDTIMIT/'' but can't pass . I am a postgraduate, I want to run your model and try to use my own datas to train the model. Can u provice me the dataset? Thank u very much.

[feature] minimum data length for stack log mel feature

Hi @georgesterpu

I do curriculum learning for LRS3 dataset.

So i cut the pretrain wav files to 1~3 words, (about 500ms), and make the feature TFRecord.

I make stacked log mel feature, stacked_w8s3 However, an error occurred.

I did not leave an error log, but I think the length of the wav file is short.

Is there a minimum length of data needed to create a stacked log mel feature?

Inquiry about aus csv generation

Hello, I found we need csv files to use append aus in your code, do you mind if share how to generate aus csv files?

How can I solve this problem

Hello @georgesterpu, I have a problem now. When I execute extract_face.py after compiling Openface on Ubuntu16, I will get an error "Failed to open the video file at location:/media/guo/sewell/dataset/LRS2/mvlrs_v1/main/6386959537493875163/00004", the file is there. Have you solved such a problem?

get error by running program

What should I do to reproduce the results of the paper？

I have trained 400 epochs using learning rate 0.001 and 100 epochs using learning rate 0.0001 with clean tcd-timit data of speaker-dependent split by experiment_tcd_av.py, but i can't get the result which is shown in the paper"Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition".
My experiment result: 30.70%(cer)/65.04%(wer)
Result in paper: 17.70%(cer)/41.90%(wer)
What should I do to reproduce the results of the paper？

How do I change loss function to CTC??

Hi, @georgesterpu.

I also do audio only test in my dataset with LeeYongHyeok.

In this work, I would like to change the loss function from the default loss to CTC loss.

So I changed line 398 from seq2seq.sequence_loss to CTC loss in the avsr/decoder_unimodal.py, and the loss became inf

I'd appreciate it if you could tell me How do I change loss function to CTC

Ask about transfer learning

Hallo @georgesterpu,

I want use transfer learning extract video features, I want ask if in your code this part is already done(like "Lip Reading Sentences in the Wild")? Or I have to implement by myself?

Thank you very much

How can i change this av_align model for applying to audio2video?

Hello, @georgesterpu.
Thanks for the code release. I made your av_align model the basis of my research.

In your paper, the cross-modal alignment of Video to Audio is working well.
However, Audio-to-video cross-modal alignment may not work well.

So I want to see how cross-modal alignment of video-to-audio works by using both audio-video and video-to-audio simultaneously.

So I checked the structure of the AttentiveEncoder class in your code avsr / encoder.py.

I found that the AttentiveEncoder uses the normal video encoder's output and audio data to create a video-to-audio AttentiveEncoder at once.

I would like to have video-to-audio and audio-to-video at the same time, but I think this is not possible with the current code structure.

Which part do I need to modify so that I can use your AttentiveEncoder at the same time?

I am very pleased and thank you for doing research in the same research field.

Sincery, YongHyeok Lee.

The tfrecord files

Hi,
Because the dataset I downloaded is incomplete. When I use write_records_tcd.py to convert data into .tfrecord files, I cannot get the complete files. Can you send the .tfrecords files to my email address: [email protected].
Thank you very much.
xjw

How many epochs should i train?

I have trained 2000 epochs with clean tcd-timit data by experiment_tcd_audio.py, but i can't get the result which is shown in the paper"Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition".
My experiment result: 28.27%(cer)/59.99%(wer)
Result in paper: 19.16%(cer)/45.53%(wer)

run_audiovisual.py

Hi, Thank you for your open-source codes. I used my own dataset on your model but encountered a problem.
In fact, I ran extract_faces.py and write_records_tcd.py without any issues.
The error message is as follows:

Traceback (most recent call last):
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Nan in summary histogram for: Decoder/decoder/my_dense/bias_0-grad
         [[{{node Decoder/decoder/my_dense/bias_0-grad}}]]
  (1) Invalid argument: Nan in summary histogram for: Decoder/decoder/my_dense/bias_0-grad
         [[{{node Decoder/decoder/my_dense/bias_0-grad}}]]
         [[gradients/Decoder/decoder/while/BasicDecoderStep/decoder/attention_wrapper/Select_grad/Select/StackPopV2/_198]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_audiovisual.py", line 64, in <module>
    main()
  File "run_audiovisual.py", line 59, in main
    logfile=logfile,
  File "/home/exp/test/avsr-tf1-yjq/avsr/experiment.py", line 111, in run_experiment
    try_restore_latest_checkpoint=True
  File "/home/exp/test/avsr-tf1-yjq/avsr/avsr.py", line 274, in train
    ], **self.sess_opts)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Nan in summary histogram for: Decoder/decoder/my_dense/bias_0-grad
         [[node Decoder/decoder/my_dense/bias_0-grad (defined at /home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Invalid argument: Nan in summary histogram for: Decoder/decoder/my_dense/bias_0-grad
         [[node Decoder/decoder/my_dense/bias_0-grad (defined at /home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
         [[gradients/Decoder/decoder/while/BasicDecoderStep/decoder/attention_wrapper/Select_grad/Select/StackPopV2/_198]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'Decoder/decoder/my_dense/bias_0-grad':
  File "run_audiovisual.py", line 64, in <module>
    main()
  File "run_audiovisual.py", line 59, in main
    logfile=logfile,
  File "/home/exp/test/avsr-tf1-yjq/avsr/experiment.py", line 106, in run_experiment
    **kwargs
  File "/home/exp/test/avsr-tf1-yjq/avsr/avsr.py", line 216, in __init__
    self._create_models()
  File "/home/exp/test/avsr-tf1-yjq/avsr/avsr.py", line 531, in _create_models
    batch_size=self._hparams.batch_size[0])
  File "/home/exp/test/avsr-tf1-yjq/avsr/avsr.py", line 574, in _make_model
    hparams=self._hparams
  File "/home/exp/test/avsr-tf1-yjq/avsr/seq2seq.py", line 26, in __init__
    self._init_optimiser()
  File "/home/exp/test/avsr-tf1-yjq/avsr/seq2seq.py", line 231, in _init_optimiser
    summary = tf.summary.histogram("%s-grad" % variable.name, value)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/summary/summary.py", line 179, in histogram
    tag=tag, values=values, name=scope)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_logging_ops.py", line 329, in histogram_summary
    "HistogramSummary", tag=tag, values=values, name=name)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

Is this possibly caused by different data dimensions?
Thanks a lot.

TCD-TIMIT results

What about the results on TCD-TIMIT dataset?
Could you share the results using different architectures with us?

The program always be killed

Hallo @georgesterpu ,

I'm using your code implement the algorithm in the paper "Lip reading sentences in the wild", but the program always be killed. It seems the program use too much memory and at end run out of memory.

~/Sigmedia-AVSR-LRS2$ ./run_av.sh
/home/wentao/.local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Could not restore from checkpoint, training from scratch!

batch: 0
./run_av.sh: line 3: 2546 Killed python3 experiment_tcd_av.py 100000 0.1

My configure is like this
os.environ['CUDA_VISIBLE_DEVICES'] = "0"
os.environ['TF_CPP_MIN_LOG_LEVEL'] = "2" # ERROR

def main(argv):

num_epochs = int(argv[1])
learning_rate = float(argv[2])

experiment = avsr.AVSR(
    unit='character',
    unit_file='./avsr/misc/character_list',
    video_processing='resnet_cnn',
    cnn_filters=(8, 16, 32, 64),
    cnn_dense_units=64,
    batch_normalisation=True,
    video_train_record='./tfrecords/rgb36lips_train_sd.tfrecord',
    video_test_record='./tfrecords/rgb36lips_test_sd.tfrecord',
    audio_processing='features',
    audio_train_record='./tfrecords/mfcc_train_sd_stack_clean.tfrecord',
    audio_test_record='./tfrecords/mfcc_test_sd_stack_clean.tfrecord',
    labels_train_record ='./tfrecords/characters_train_sd.tfrecord',
    labels_test_record ='./tfrecords/characters_test_sd.tfrecord',
    encoder_type='unidirectional',
    architecture='bimodal',
    clip_gradients=True,
    max_gradient_norm=1.0,
    recurrent_l2_regularisation=0.0001,
    cell_type='lstm',
    highway_encoder=False,
    sampling_probability_outputs=0.1,
    embedding_size=128,
    dropout_probability=(0.9, 0.9, 0.9),
    decoding_algorithm='beam_search',     
    decoder_units_per_layer=((512,512,512)), #3 layers decoder
    encoder_units_per_layer=((256, 256, 256), (256, 256, 256)),  #3 layers encoder    
    attention_type=(('bahdanau', )*1, ('bahdanau', )*1),
    beam_width=10,
    batch_size=(48, 64),
    optimiser='AMSGrad',
    learning_rate=learning_rate,
lr_decay=0.1,
    num_gpus=1,
)

# uer = experiment.evaluate(
#    checkpoint_path='./checkpoints/tcd_video_to_chars/checkpoint.ckp-400',
# )
# print(uer)
# return

experiment.train(
    num_epochs=num_epochs,
    logfile='./logs/tcd_av_to_chars',
    try_restore_latest_checkpoint=True
)

Have I doing something wrong? Thank you very much.

By the way the warning is beginning at ;
def _make_model(self, graph, mode, batch_size)
in this step:
model = Seq2SeqModel(
data_sequences=(video_features, audio_features),
mode=mode,
hparams=self._hparams
)

georgesterpu / avsr-tf1 Goto Github PK

avsr-tf1's People

Contributors

Stargazers

Watchers

Forkers

avsr-tf1's Issues

Recommend Projects

Recommend Topics

Recommend Org