Git Product home page Git Product logo

muavic's Introduction

MuAViC

https://arxiv.org/abs/2303.00628

A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation.

Overview

MuAViC provides

  • 1200 hours of transcribed audio-visual speech for 9 languages (English, Arabic, German, Greek, Spanish, French, Italian, Portuguese and Russian)
  • text translations for 6 English-to-X directions and 6 X-to-English directions (X = Greek, Spanish, French, Italian, Portuguese or Russian)
MuAViC data statistics

The raw data is collected from TED/TEDx talk recordings.

Detailed statistics

Audio-Visual Speech Recognition

Language Code Train Hours (H+P) Train Speakers
English En 436 + 0 4.7K
Arabic Ar 16 + 0 95
German De 10 + 0 53
Greek El 25 + 0 113
Spanish Es 178 + 0 987
French Fr 176 + 0 948
Italian It 101 + 0 487
Portuguese Pt 153 + 0 810
Russian Ru 49 + 0 238

Audio-Visual En-X Speech-to-Text Translation

Direction Code Train Hours (H+P) Train Speakers
English-Greek En-El 17 + 420 4.7K
English-Spanish En-Es 21 + 416 4.7K
English-French En-Fr 21 + 416 4.7K
English-Italian En-It 20 + 417 4.7K
English-Portuguese En-Pt 18 + 419 4.7K
English-Russian En-Ru 20 + 417 4.7K

Audio-Visual X-En Speech-to-Text Translation

Direction Code Train Hours (H+P) Train Speakers
Greek-English El-En 8 + 17 113
Spanish-English Es-En 64 + 114 987
French-English Fr-En 45 + 131 948
Italian-English It-En 48 + 53 487
Portuguese-English Pt-En 53 + 100 810
Russian-English Ru-En 8 + 41 238

Getting Data

We provide scripts to generate the audio/video data and AV-HuBERT training manifests for MuAViC.

First, clone this repo for the scripts

git clone https://github.com/facebookresearch/muavic.git

Install required packages:

conda install -c conda-forge ffmpeg==4.2.2
pip install -r requirements.txt

Then get audio-visual speech recognition and translation data via

python get_data.py --root-path ${ROOT} --src-lang ${SRC_LANG}

where the speech language ${SRC_LANG} is one of en, ar, de, el, es, fr, it, pt and ru.

Generated data will be saved to ${ROOT}/muavic:

  • ${ROOT}/muavic/${SRC_LANG}/audio for processed audio files
  • ${ROOT}/muavic/${SRC_LANG}/video for processed video files
  • ${ROOT}/muavic/${SRC_LANG}/*.tsv for AV-HuBERT AVSR training manifests
  • ${ROOT}/muavic/${SRC_LANG}/${TGT_LANG}/*.tsv for AV-HuBERT AVST training manifests

Models

In the following table, we provide all AV-HuBERT trained models mentioned in our paper:

Task Languages Best Checkpoint Dictionary Tokenizer
AVSR ar best_ckpt.pt dict tokenizer
de best_ckpt.pt dict tokenizer
el best_ckpt.pt dict tokenizer
en best_ckpt.pt dict tokenizer
es best_ckpt.pt dict tokenizer
fr best_ckpt.pt dict tokenizer
it best_ckpt.pt dict tokenizer
pt best_ckpt.pt dict tokenizer
ru best_ckpt.pt dict tokenizer
ar,de,el,es,fr,it,pt,ru best_ckpt.pt dict tokenizer
AVST en-el best_ckpt.pt dict tokenizer
en-es best_ckpt.pt dict tokenizer
en-fr best_ckpt.pt dict tokenizer
en-it best_ckpt.pt dict tokenizer
en-pt best_ckpt.pt dict tokenizer
en-ru best_ckpt.pt dict tokenizer
el-en best_ckpt.pt dict tokenizer
es-en best_ckpt.pt dict tokenizer
fr-en best_ckpt.pt dict tokenizer
it-en best_ckpt.pt dict tokenizer
pt-en best_ckpt.pt dict tokenizer
ru-en best_ckpt.pt dict tokenizer
{el,es,fr,it,pt,ru}-en best_ckpt.pt dict tokenizer

License

CC-BY-NC 4.0

Citation

@article{anwar2023muavic,
  title={MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation},
  author={Anwar, Mohamed and Shi, Bowen and Goswami, Vedanuj and Hsu, Wei-Ning and Pino, Juan and Wang, Changhan},
  journal={arXiv preprint arXiv:2303.00628},
  year={2023}
}

muavic's People

Contributors

anwarvic avatar kahne avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.