I was downloading the MuAViC database for the Spanish language when suddenly a error m

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

TEDx Talk with ID=D4TE28-L7FI is not available anymore about muavic HOT 5 CLOSED

david-gimeno commented on July 29, 2024

TEDx Talk with ID=D4TE28-L7FI is not available anymore

from muavic.

Comments (5)

Anwarvic commented on July 29, 2024

Hi @david-gimeno,

Thanks for raising this issue. Could you post the full trace of the error? Thanks!

from muavic.

david-gimeno commented on July 29, 2024

You are right, i should have shared the full error trace. I have run the script twice and this is what I got both times:

`Downloading mtedx_es.tgz from https://www.openslr.org/resources/100/mtedx_es.tgz
Extracting mtedx_es.tgz: 100%|██████████| 2058/2058 [01:44<00:00, 19.64it/s]
Downloading mtedx_es-en.tgz from https://www.openslr.org/resources/100/mtedx_es-en.tgz
Extracting mtedx_es-en.tgz: 100%|██████████| 842/842 [00:40<00:00, 20.61it/s]

Downloading es videos from YouTube
[download] 40.2% of 317.89MiB at 229.22KiB/s ETA 14:08ERROR: [youtube] D4TE28-L7FI: Video unavailable
Downloading es/train Videos: 100%|██████████| 988/988 [14:24<00:00, 1.14it/s]
Downloading es/valid Videos: 100%|██████████| 16/16 [00:08<00:00, 1.99it/s]
Downloading es/test Videos: 100%|██████████| 12/12 [00:09<00:00, 1.27it/s]

Segmenting es audio files
Preprocessing es/train Audios: 100%|██████████| 102171/102171 [02:30<00:00, 676.90it/s]
Preprocessing es/valid Audios: 100%|██████████| 905/905 [00:01<00:00, 584.80it/s]
Preprocessing es/test Audios: 100%|██████████| 1012/1012 [00:01<00:00, 669.18it/s]
Downloading 20words_mean_face.npy from https://dl.fbaipublicfiles.com/muavic/metadata/20words_mean_face.npy
MB
Segmenting es videos files (It takes a few hours to complete)
0%| | 0/988 [00:00<?, ?it/s]Downloading es_metadata.tgz from https://dl.fbaipublicfiles.com/muavic/metadata/es_metadata.tgz
Extracting es_metadata.tgz: 100%|██████████| 1019/1019 [00:22<00:00, 46.00it/s]
21%|██ | 203/988 [6:28:13<26:44:13, 122.62s/it][ WARN:[email protected]] global loadsave.cpp:244 findDecoder imread_('/tmp/tmprpv_y3yz/11837.png'): can't open/read file: check file path/integrity
21%|██ | 203/988 [6:30:03<25:08:21, 115.29s/it]
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/david/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/david/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/process.py", line 198, in _process_chunk
return [fn(*args) for args in chunk]
File "/home/david/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/process.py", line 198, in
return [fn(*args) for args in chunk]
File "/home/david/phd/muavic/mtedx_utils.py", line 144, in segment_normalize_video
frames = resize_frames(video_frames, new_size=(96, 96))
File "/home/david/phd/muavic/utils.py", line 151, in resize_frames
return [cv2.resize(frame, new_size) for frame in input_frames]
File "/home/david/phd/muavic/utils.py", line 151, in
return [cv2.resize(frame, new_size) for frame in input_frames]
cv2.error: OpenCV(4.7.0) /io/opencv/modules/imgproc/src/resize.cpp:4062: error: (-215:Assertion failed) !ssize.empty() in function 'resize'

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "get_data.py", line 107, in
main(args)
File "get_data.py", line 76, in main
prepare_mtedx(args)
File "get_data.py", line 26, in prepare_mtedx
preprocess_mtedx_video(
File "/home/david/phd/muavic/mtedx_utils.py", line 208, in preprocess_mtedx_video
process_map(
File "/home/david/anaconda3/envs/muavic/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 105, in process_map
return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
File "/home/david/anaconda3/envs/muavic/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
File "/home/david/anaconda3/envs/muavic/lib/python3.8/site-packages/tqdm/std.py", line 1166, in iter
for obj in iterable:
File "/home/david/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
for element in iterable:
File "/home/david/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
yield fs.pop().result()
File "/home/david/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/home/david/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
cv2.error: OpenCV(4.7.0) /io/opencv/modules/imgproc/src/resize.cpp:4062: error: (-215:Assertion failed) !ssize.empty() in function 'resize'
`

Another curious aspect is that, although the D4TE28-L7FI is unavailable (i.e., it was not download), there are audio segments for this sample. How is this possible?

Thanks in advance,

David.

from muavic.

Anwarvic commented on July 29, 2024

Hi @david-gimeno ,

Thanks for posting the full error trace!

This issue occurred because there is a mismatch between the actual video frames and the downloaded metadata found at /home/david/phd/muavic/metadata/es/train/*.pkl. This bug has been taken care of in our recent updates. Please, run:

# update your source code
$ git pull

# re-run your script
$ python get_data.py --root-path /home/david/phd --src-lang es  #should resume where it stopped

This issue has nothing to do with the file D4TE28-L7FI. The message ERROR: [youtube] D4TE28-L7FI: Video unavailable just warns you that our downloading script is incapable of downloading this TED talk (https://www.youtube.com/watch?v=D4TE28-L7FI). All failed-to-download files are found at /home/david/phd/muavic/mtedx/not_found_videos.txt.

Also, the audio files /home/david/phd/muavic/es/audio/train/D4TE28-L7FI/D4TE28-L7FI_xxxx.wav exist because they were segmented from mTEDx dataset which has been downloaded fully already.

Hope that fixes your issue!

from muavic.

david-gimeno commented on July 29, 2024

Thank so much! All the Spanish MuAViC database has been processed :) But just only one question more:

Regarding the transcripts in muavic/es/train_avsr.es, are they following the same order specified in muavic/es/train_avsr.tsv?

On the other hand, I would like to tell you something, it is just a suggestion. According to my experience, instead of saving the video samples as .mp4, using .npz compressed files (using the numpy library) is very efficient in terms of storage or when creating data loaders for training models.

np.savez_compressed(dst_path+"/"+sampleID+".npz", data=rois),

being rois a numpy array with the sequence of region of interest (96x96 pixels) of one sample.

Anyway, thank you again for your time. Best regards,

David.

from muavic.

Anwarvic commented on July 29, 2024

Hi David,

Glad that everything is working now!

Regarding your question, the answer is "Yes". Transcripts follow the same order as manifest files for AVSR and AVST. And thank you for your suggestion, my team and I will definitely take it into consideration.

I'm gonna close this issue for now if you don't mind. Thanks!

from muavic.

TEDx Talk with ID=D4TE28-L7FI is not available anymore about muavic HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent