facebookresearch / muavic Goto Github PK

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

License: Other

Python 93.57% Shell 6.43%

muavic's Issues

Error when preprocessing the video data

Dear authors,

thanks a lot for this contribution to multi-lingual AV-ASR! I have an error when preprocessing the video data. This error happens in:

muavic/mtedx_utils.py

Lines 190 to 201 in 122ef0c

 process_map( 

 partial( 

 segment_normalize_video_file, 

 mean_face_metadata, 

 metadata_path / src_lang / split, 

 video_dir_path, 

 out_path, 

 ), 

 video_segments.items(), 

 max_workers=os.cpu_count(), 

 chunksize=1, 

 )

The error is as follows, did you also happen to have this error or do you have some clues to solve it? :

  0%|          | 0/95 [00:00<?, ?it/s]
  0%|          | 0/95 [00:05<?, ?it/s]
concurrent.futures.process._RemoteTraceback: 
'''
Traceback (most recent call last):
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/process.py", line 368, in _queue_management_worker
    result_item = result_reader.recv()
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
TypeError: __init__() missing 2 required positional arguments: 'stdout' and 'stderr'
'''

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "get_data.py", line 77, in <module>
    main(args)
  File "get_data.py", line 59, in main
    prepare_mtedx(args)
  File "get_data.py", line 22, in prepare_mtedx
    preprocess_mtedx_video(
  File "/beegfs/work/zhengyangli/muavic/mtedx_utils.py", line 190, in preprocess_mtedx_video
    process_map(
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 105, in process_map
    return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map
    return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

Empty X -> EN translations

For the X -> EN task, I noticed there are some blank / empty translations even though the source language text has a valid sentence.

Here are the number of blank translations per language. The problem is mainly on a few of the test sets. How does the bleu score computation handle this?

el train 0
el valid 0
el test 3
es train 0
es valid 0
es test 11
fr train 0
fr valid 0
fr test 1
it train 0
it valid 0
it test 0
pt train 0
pt valid 0
pt test 1
ru train 0
ru valid 0
ru test 8

Problem met when downloading German data

Hi,
I run the following command to download the German Dataset from MuAViC:
python get_data.py --root-path ./muavic_project --src-lang de
and met the error below during the stage of running segmenting (at 21% of the process "Segmenting de videos files (It takes a few hours to complete)").

  File "get_data.py", line 115, in <module>
    main(args)
  File "get_data.py", line 84, in main
    prepare_mtedx(args)
  File "get_data.py", line 26, in prepare_mtedx
    preprocess_mtedx_video(
  File "/mnt/ceph_rbd/muavic_project/muavic/mtedx_utils.py", line 236, in preprocess_mtedx_video
    process_map(
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 105, in process_map
    return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map
    return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/site-packages/tqdm/std.py", line 1170, in __iter__
    for obj in iterable:
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists    for element in iterable:
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

I'm not very familiar with using process_map. Do you have any potential assumption about the reason of this error and suggestions on solving it?
Many thanks.

How much storage do I need in total to download the muavic dataset?

Noise parameters for decoding and training

I am trying to figure out the noise parameters for the decode and train script to reproduce the results in the paper.
For decoding, I originally tried adding babble noise from musan:

override.noise_wav=/path-to-musan/musan/tsv/babble \
override.noise_prob=1 \
override.noise_snr=0

I found the average performance of the monolingual and multilingual models in the noisy condition was noticeably better than reported in the paper (while obtaining similar results as to the paper in clean conditions).
I also tried using the babble noise from lrs3 (override.noise_wav=/path-to-lrs3/noise/babble), and the average performance was closer to what was reported in the paper.
Which noise should be used?

For training, are these the right parameters to add?

override.noise_wav=/path-to-musan/musan/tsv/all \
override.noise_prob=0.25 \
override.noise_snr=0

Also, for the pre-trained model ("All models FT from strongest large_vox_iter5.pt") is this the noisy pre-trained checkpoint or clean pre-trained checkpoint? I assume it's the noisy one, but just double checking.

Thanks for the help!

Only audio files could be downloaded

Dear authors,
I have downloaded the tgz files by myself, but only txt, vtt and wav files can be found in the directory. Then how could I download video files for visual speech recognition goals? Thanks!

download_ted2020() error

It seems that you try to parse a zip file using GzipFile?
Here is the traceback:
Downloading el-en.txt.zip from https://opus.nlpl.eu/download.php?f=TED2020/v1/moses/el-en.txt.zip
MBTraceback (most recent call last):
File "get_data.py", line 107, in
main(args)
File "get_data.py", line 73, in main
prepare_lrs3(args)
File "get_data.py", line 59, in prepare_lrs3
download_ted2020(args["ted2020"])
File "/mnt/pfs/wanghe/corpus/muavic/muavic/lrs3_utils.py", line 345, in download_ted2020
extract_ted2020_data(str(tgz_filepath), "en", lang, ted2020_path)
File "/mnt/pfs/wanghe/corpus/muavic/muavic/lrs3_utils.py", line 308, in extract_ted2020_data
tmx_dict = xmltodict.parse(GzipFile(tgz_filepath))
File "/opt/conda/envs/oslasr/lib/python3.8/site-packages/xmltodict.py", line 372, in parse
parser.ParseFile(xml_input)
File "/opt/conda/envs/oslasr/lib/python3.8/gzip.py", line 292, in read
return self._buffer.read(size)
File "/opt/conda/envs/oslasr/lib/python3.8/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/opt/conda/envs/oslasr/lib/python3.8/gzip.py", line 479, in read
if not self._read_gzip_header():
File "/opt/conda/envs/oslasr/lib/python3.8/gzip.py", line 427, in _read_gzip_header
raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'PK')

Unable to download corpora other than English

Downloading mtedx_el.tgz from https://www.openslr.org/resources/100/mtedx_el.tgz
Traceback (most recent call last):
File "/home/w30043779/miniconda3/lib/python3.10/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/home/w30043779/miniconda3/lib/python3.10/http/client.py", line 1282, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/w30043779/miniconda3/lib/python3.10/http/client.py", line 1328, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/w30043779/miniconda3/lib/python3.10/http/client.py", line 1277, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/w30043779/miniconda3/lib/python3.10/http/client.py", line 1037, in _send_output
self.send(msg)
File "/home/w30043779/miniconda3/lib/python3.10/http/client.py", line 975, in send
self.connect()
File "/home/w30043779/miniconda3/lib/python3.10/http/client.py", line 1454, in connect
self.sock = self._context.wrap_socket(self.sock,
File "/home/w30043779/miniconda3/lib/python3.10/ssl.py", line 513, in wrap_socket
return self.sslsocket_class._create(
File "/home/w30043779/miniconda3/lib/python3.10/ssl.py", line 1071, in _create
self.do_handshake()
File "/home/w30043779/miniconda3/lib/python3.10/ssl.py", line 1342, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:997)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/w30043779/code1/muavic-main/utils.py", line 62, in download_file
wget.download(url, out=str(download_path / filename), bar=custom_bar)
File "/home/w30043779/miniconda3/lib/python3.10/site-packages/wget.py", line 526, in download
(tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback)
File "/home/w30043779/miniconda3/lib/python3.10/urllib/request.py", line 241, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/home/w30043779/miniconda3/lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/home/w30043779/miniconda3/lib/python3.10/urllib/request.py", line 519, in open
response = self._open(req, data)
File "/home/w30043779/miniconda3/lib/python3.10/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/home/w30043779/miniconda3/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/home/w30043779/miniconda3/lib/python3.10/urllib/request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/home/w30043779/miniconda3/lib/python3.10/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:997)>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/w30043779/code1/muavic-main/get_data.py", line 115, in
main(args)
File "/home/w30043779/code1/muavic-main/get_data.py", line 84, in main
prepare_mtedx(args)
File "/home/w30043779/code1/muavic-main/get_data.py", line 15, in prepare_mtedx
download_mtedx_data(args["mtedx"], args["src_lang"], args["src_lang"])
File "/home/w30043779/code1/muavic-main/mtedx_utils.py", line 27, in download_mtedx_data
download_extract_file_if_not(
File "/home/w30043779/code1/muavic-main/utils.py", line 89, in download_extract_file_if_not
download_file(url, download_path)
File "/home/w30043779/code1/muavic-main/utils.py", line 65, in download_file
raise HTTPError(e.url, e.code, message, e.hdrs, e.fp)
AttributeError: 'URLError' object has no attribute 'url'

VSR performance lower on MuAViC version of LRS3 (En)

Hi, thanks for your nice work! I preprocessed the MuAViC dataset according to the instructions. I already had LRS3 processed according to the AV-HuBERT instructions, so I wanted to test if a pre-trained model would get the same performance on both the AV-HuBERT dataset version and the MuAViC version of LRS3.

I first tried ckpt=large_noise_pt_noise_ft_433h.pt from AV-HuBERT, and ran this command:

python -B infer_s2s.py --config-dir ./conf/ --config-name s2s_decode.yaml \
  dataset.gen_subset=test common_eval.path=${ckpts_dir}/${ckpt} \
  common_eval.results_path=${exp_dir}/av-hubert/decode/s2s/test \
  override.modalities=['audio', 'video'] override.data=${lrs3_dir}/30h_data override.label_dir=${lrs3_dir}/30h_data common.user_dir=`pwd`

Using the AV-HuBERT version of LRS3:

433 audio-visual: 1.486
433h audio-only: 1.951
433h video-only: 34.135

Using the MuAViC version of LRS3:

433 audio-visual: 1.496 (slightly worse)
433h audio-only: 1.951 (the same)
433h video-only: 35.995 (noticeably worse)

It seems that the AV-HuBERT checkpoint got worse performance on the MuAViC data versions whenever video is involved.

I also tried running the MuAViC decoding script using the MuAViC English checkpoint on the MuAViC version of LRS3 and got the following performance:

433 audio-visual: 2.1941
433h audio-only: 3.22
433h video-only: 35.995

Then I tried the MuAViC decoding script, MuAViC English checkpoint, and the AV-HuBERT LRS3 dataset version:

433h audio-visual: 2.153 (slightly better)
433h audio-only: 3.225 (the same)
433h video-only: 34.459 (noticeably better).

The MuAViC checkpoint also gets better performance on the AV-HuBERT version of LRS3 which is kind of surprising. In both cases (AV-HuBERT checkpoint or MuAViC checkpoint), the audio-only performance stays identical.
I have also tried this with the other AV-HuBERT checkpoints and the conclusion is the same (also, the gap was more noticeable for the base models).
I wonder if MuAViC processed the LRS3 video differently than AV-HuBERT, which leads to a different performance?

Multilingual AVSR model decoding and training

I downloaded the multilingual AVSR model (x_avsr) and tried to use the decoding script.
First, I ran into this error:

Traceback (most recent call last):   
  File "/usr/users/roudi/muavic/av_hubert/avhubert/infer_s2s.py", line 311, in hydra_main                                                         
    distributed_utils.call_main(cfg, main)
  File "/data/sls/u/meng/roudi/muavic/av_hubert/fairseq/fairseq/distributed/utils.py", line 369, in call_main                                     
    main(cfg, **kwargs)
  File "/usr/users/roudi/muavic/av_hubert/avhubert/infer_s2s.py", line 96, in main
    return _main(cfg, h)                                                                                                                          
  File "/usr/users/roudi/muavic/av_hubert/avhubert/infer_s2s.py", line 118, in _main                                                              
    models, saved_cfg, task = checkpoint_utils.load_model_ensemble_and_task([cfg.common_eval.path])
  File "/data/sls/u/meng/roudi/muavic/av_hubert/fairseq/fairseq/checkpoint_utils.py", line 432, in load_model_ensemble_and_task                   
    task = tasks.setup_task(cfg.task)                                                                                                             
  File "/data/sls/u/meng/roudi/muavic/av_hubert/fairseq/fairseq/tasks/__init__.py", line 39, in setup_task
    cfg = merge_with_parent(dc(), cfg)
  File "/data/sls/u/meng/roudi/muavic/av_hubert/fairseq/fairseq/dataclass/utils.py", line 490, in merge_with_parent                               
    merged_cfg = OmegaConf.merge(dc, cfg)                                                                                                         
omegaconf.errors.ConfigKeyError: Key 'add_eos' not in 'AVHubertPretrainingConfig'
        full_key: add_eos
        reference_type=Optional[AVHubertPretrainingConfig]                                                                                        
        object_type=AVHubertPretrainingConfig

I fixed this by adding add_eos: bool = field(default=False, metadata={"help": "hack: make the multilingual model work"}) to this line: https://github.com/facebookresearch/av_hubert/blob/e8a6d4202c208f1ec10f5d41a66a61f96d1c442f/avhubert/hubert_pretraining.py#L161

I ran decoding on a few languages. I noticed the model outputs a language tag in the hypothesis (examples: <fr> (Applaudissements), <es> (Aplausos)), while the reference doesn't contain the language tag.
My WERs were quite different than what's reported in the paper, but I found that adding the language tag to the reference sentences seems to make the WERs comparable to what's in the paper (removing the language tag in the hypothesis resulted in worse WER than reported). Just wanted to check if you used the language tag in the reference for evaluation in the multilingual setting?

The model sometimes outputs the text in the wrong language (as well as the incorrect language tag). Is there a way to force output text in a certain language?

I was also wondering how to train the multilingual model (the current training script seems to be for audio in one language). Specifically, should I add the language tag in the beginning of all of the sentences, and how do you balance samples from different languages?

RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for AVHubertSeq2Seq:

Traceback (most recent call last):
File "/home/lpl/muavic/demo/run_demo.py", line 220, in
AV_RESOURCES = load_av_models(args.av_models_path)
File "/home/lpl/muavic/demo/demo_utils.py", line 65, in load_av_models
models, _, task = checkpoint_utils.load_model_ensemble_and_task(
File "/home/lpl/av_hubert/fairseq/fairseq/checkpoint_utils.py", line 447, in load_model_ensemble_and_task
model.load_state_dict(
File "/home/lpl/av_hubert/fairseq/fairseq/models/fairseq_model.py", line 125, in load_state_dict
return super().load_state_dict(new_state_dict, strict)
File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for AVHubertSeq2Seq:
size mismatch for decoder.layers.0.encoder_attn.k_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.0.encoder_attn.v_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.1.encoder_attn.k_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.1.encoder_attn.v_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.2.encoder_attn.k_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.2.encoder_attn.v_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.3.encoder_attn.k_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.3.encoder_attn.v_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.4.encoder_attn.k_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.4.encoder_attn.v_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.5.encoder_attn.k_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.5.encoder_attn.v_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).

    I'm having this issue, pls is there any solution？

Error running the data prep script

First, I downloaded lrs3_pretrain.zip, lrs3_test_v0.4.zip, and lrs_3_v0.4_txt.zip, and made sure the checksums matched. Unzipping them gave me three folders: pretrain, lrs3_v0.4, and test. I copied out lrs3_v0.4/trainval and placed it in the root folder beside pretrain.

Next, I ran the command:
python3 get_data.py --root-path . --src-lang en

I got an error with "Creating AVSR manifests for en"
KeyError: 'iW4fCwfw1vg/00033'

Can you please let me know I'm doing wrong?

Problems when Downloading the Italian Dataset

Hi,

I run the following command to download the Italian Datasert from MuAViC:

python get_data.py --root-path ./esperanza/ --src-lang it

However, in some moment of the running the script was interrupted. Please find attached the full error trace:

Traceback (most recent call last):
  File "/home/dgimeno/phd/muavic/utils.py", line 62, in download_file
    wget.download(url, out=str(download_path / filename), bar=custom_bar)
  File "/home/dgimeno/anaconda3/envs/muavic/lib/python3.8/site-packages/wget.py", line 506, in download
    (fd, tmpfile) = tempfile.mkstemp(".tmp", prefix=prefix, dir=".")
  File "/home/dgimeno/anaconda3/envs/muavic/lib/python3.8/tempfile.py", line 331, in mkstemp
    return _mkstemp_inner(dir, prefix, suffix, flags, output_type)
  File "/home/dgimeno/anaconda3/envs/muavic/lib/python3.8/tempfile.py", line 250, in _mkstemp_inner
    fd = _os.open(file, flags, 0o600)
FileNotFoundError: [Errno 2] No such file or directory: './esperanza/metadata/it_metadata.tgz88g65ab3.tmp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "get_data.py", line 115, in <module>
    main(args)
  File "get_data.py", line 84, in main
    prepare_mtedx(args)
  File "get_data.py", line 26, in prepare_mtedx
    preprocess_mtedx_video(
  File "/home/dgimeno/phd/muavic/mtedx_utils.py", line 220, in preprocess_mtedx_video
    video_metadata = load_video_metadata(
  File "/home/dgimeno/phd/muavic/utils.py", line 110, in load_video_metadata
    download_extract_file_if_not(
  File "/home/dgimeno/phd/muavic/utils.py", line 89, in download_extract_file_if_not
    download_file(url, download_path)
  File "/home/dgimeno/phd/muavic/utils.py", line 65, in download_file
    raise HTTPError(e.url, e.code, message, e.hdrs, e.fp)
AttributeError: 'FileNotFoundError' object has no attribute 'url'

Could you please tell me what version your 'sox' is?

It tells me that sox can't handle flac files and encounters "sox.core.SoxiError: SoXI failed with exit code 1" in several places.

Minor issue

Hi!

Thanks for publishing this repo, looks very interesting.

While trying to build it, I got a 403 HTTP error code while the script was trying to download https://dl.fbaipublicfiles.com/muavic/metadata/20words_mean_face.npy (I'm assuming this is similar to mpc001's repo?).

I'm asking if it's possible to also share the file so the get_data.py can proceed.

Thank you.

Error when generating the manifest for AVSR

Dear authors,

Thanks a lot for your work. When generating manifests for AVSR, I meet the following error, which can not restart from the breaking point:

Creating fr/train manifest: 26%|██▌ | 30189/116045 [10:24:50<40:25:00, 1.69s/it]
Creating fr/train manifest: 26%|██▌ | 30195/116045 [10:24:53<25:27:27, 1.07s/it]
Creating fr/train manifest: 26%|██▌ | 30197/116045 [10:24:58<33:41:41, 1.41s/it]
Creating fr/train manifest: 26%|██▌ | 30199/116045 [10:25:00<30:51:53, 1.29s/it]Exception in thread QueueManagerThread:
Traceback (most recent call last):
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/threading.py", line 932, in _bootstrap_inner

Creating fr/train manifest: 26%|██▌ | 30202/116045 [10:25:03<29:36:35, 1.24s/it]
self.run()
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/process.py", line 394, in _queue_management_worker
work_item.future.set_exception(bpe)
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/_base.py", line 547, in set_exception
raise InvalidStateError('{}: {!r}'.format(self._state, self))
concurrent.futures._base.InvalidStateError: CANCELLED: <Future at 0x7fc3fa091ca0 state=cancelled>
concurrent.futures.process._RemoteTraceback:
'''
Traceback (most recent call last):
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/process.py", line 368, in _queue_management_worker
result_item = result_reader.recv()
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
TypeError: init() missing 2 required positional arguments: 'stdout' and 'stderr'
'''

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "get_data.py", line 115, in
main(args)
File "get_data.py", line 84, in main
prepare_mtedx(args)
File "get_data.py", line 31, in prepare_mtedx
prepare_mtedx_avsr_manifests(args["mtedx"], args["src_lang"], args["muavic"])
File "/beegfs/work/zhengyangli/muavic/mtedx_utils.py", line 268, in prepare_mtedx_avsr_manifests
process_map(
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 105, in process_map
return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/site-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
for element in iterable:
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
yield fs.pop().result()
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
slurmstepd: error: *** JOB 536548 ON gpu01 CANCELLED AT 2023-04-21T10:16:11 ***

Do you have any clue to solve the problem?

Best regards,
Zhengyang

Got error when preparing LRS3

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "get_data.py", line 107, in
main(args)
File "get_data.py", line 73, in main
prepare_lrs3(args)
File "get_data.py", line 53, in prepare_lrs3
process_lrs3_videos(args["lrs3"], args["metadata"], args["muavic"])
File "/mnt/pfs/wanghe/corpus/muavic/muavic/lrs3_utils.py", line 239, in process_lrs3_videos
process_map(
File "/opt/conda/envs/oslasr/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 130, in process_map
return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
File "/opt/conda/envs/oslasr/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 76, in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, **map_args), **kwargs))
File "/opt/conda/envs/oslasr/lib/python3.8/site-packages/tqdm/std.py", line 1195, in iter
for obj in iterable:
File "/opt/conda/envs/oslasr/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
for element in iterable:
File "/opt/conda/envs/oslasr/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
yield fs.pop().result()
File "/opt/conda/envs/oslasr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/opt/conda/envs/oslasr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
AssertionError: /mnt/pfs/wanghe/corpus/muavic/metadata/en/train/FPhZGDS6kVQ.pkl should've been downloaded!

Questions towards hyper-parameters and the token post-processing

Dear authors,
thanks for the great work. I have two questions about the paper.
In Section 4.1 about the experimental setup, it's written:
For both AVSR and AVST, we use an English AV-HuBERT large pre-trained model [3], which is trained on the combination of LRS3-TED [8] and the English portion of VoxCeleb2 [27]. We follow [3] for fine-tuning hyper-parameters, except that we fine-tune our bilingual models to 30K updates and our multilingual AVSR model to 90K updates.

I would ask, how many warmup_steps, hold_steps, and decay_steps did you use? And how many freeze_finetune_updates did you set? Because the original configuration file for the large model has 60k updates. We may need to change the above-mentioned hyperparameters if the max_updates is changed to 30k.

The second question is about punctuation removal and lowercasing before calculating WER. Because I also observed some special tokens, e.g. the music token ♪ in the dictionary. Which tokens have you removed and how?

I'm looking forward to your reply and thank you in advance :)

Best regards,
Zhengyang

A small bug during audio pre-processing

Hi,
I just found an error when pre-processing the audio data in

muavic/mtedx_utils.py

Lines 77 to 102 in 122ef0c

 for split in ["train", "valid", "test"]: 

 # create directory for segmented & normalized audio 

 out_path = muavic_path / src_lang / "audio" / split 

 out_path.mkdir(parents=True, exist_ok=True) 

 if not is_empty(out_path): 

 if split == "train": 

 print(f"\nSegmenting {src_lang} audio files") 

 # collect needed info from segment file 

 segments_info = [] 

 split_dir_path = mtedx_path / f"{src_lang}-{src_lang}" / "data" / split 

 wav_dir_path = split_dir_path / "wav" 

 segment_file = split_dir_path / "txt" / "segments" 

 for line in read_txt_file(segment_file): 

 seg_id, fid, start, end = line.strip().split(' ') 

 segments_info.append( 

 (wav_dir_path/(fid+".flac"), fid, seg_id, float(start), float(end)) 

 ) 

 # preprocess audio files 

 process_map( 

 partial(segment_normalize_audio_file, out_path), 

 segments_info, 

 max_workers=os.cpu_count(), 

 desc=f"Preprocessing {src_lang}/{split} Audios", 

 chunksize=1, 

 )

There are additional Tabs from line 84 to line 102. I corrected it to the following:

def preprocess_mtedx_audio(mtedx_path, src_lang, muavic_path):
    # get files id per split
    for split in ["train", "valid", "test"]:
        # create directory for segmented & normalized audio
        out_path = muavic_path / src_lang / "audio" / split
        out_path.mkdir(parents=True, exist_ok=True)
        if not is_empty(out_path):
            if split == "train":
                print(f"\nSegmenting {src_lang} audio files")
        # collect needed info from segment file
        segments_info = []
        split_dir_path = mtedx_path / f"{src_lang}-{src_lang}" / "data" / split
        wav_dir_path = split_dir_path / "wav"
        segment_file = split_dir_path / "txt" / "segments"
            
        for line in read_txt_file(segment_file):
            seg_id, fid, start, end = line.strip().split(' ')
            segments_info.append(
                (wav_dir_path/(fid+".flac"), fid, seg_id, float(start), float(end))
            )
        # preprocess audio files
        process_map(
            partial(segment_normalize_audio_file, out_path),
            segments_info,
            max_workers=os.cpu_count(),
            desc=f"Preprocessing {src_lang}/{split} Audios",
            chunksize=1,
        )

Then the code can work ;)

TEDx Talk with ID=D4TE28-L7FI is not available anymore

I was downloading the MuAViC database for the Spanish language when suddenly a error message appeared when segmenting videos. It seems that the video with ID=D4TE28-L7FI is not available anymore. Do you have a backup of the database for these cases? In addition, the script was interrupted, I consider that it should not happen.

Best regards,

David.

	process_map(
	partial(
	segment_normalize_video_file,
	mean_face_metadata,
	metadata_path / src_lang / split,
	video_dir_path,
	out_path,
	),
	video_segments.items(),
	max_workers=os.cpu_count(),
	chunksize=1,
	)

	for split in ["train", "valid", "test"]:
	# create directory for segmented & normalized audio
	out_path = muavic_path / src_lang / "audio" / split
	out_path.mkdir(parents=True, exist_ok=True)
	if not is_empty(out_path):
	if split == "train":
	print(f"\nSegmenting {src_lang} audio files")
	# collect needed info from segment file
	segments_info = []
	split_dir_path = mtedx_path / f"{src_lang}-{src_lang}" / "data" / split
	wav_dir_path = split_dir_path / "wav"
	segment_file = split_dir_path / "txt" / "segments"

	for line in read_txt_file(segment_file):
	seg_id, fid, start, end = line.strip().split(' ')
	segments_info.append(
	(wav_dir_path/(fid+".flac"), fid, seg_id, float(start), float(end))
	)
	# preprocess audio files
	process_map(
	partial(segment_normalize_audio_file, out_path),
	segments_info,
	max_workers=os.cpu_count(),
	desc=f"Preprocessing {src_lang}/{split} Audios",
	chunksize=1,
	)

facebookresearch / muavic Goto Github PK

muavic's Issues

Recommend Projects

Recommend Topics

Recommend Org