Git Product home page Git Product logo

cobert's Introduction

CoBERT

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

Introduction

Code BERT (CoBERT) is an approach for self-supervised speech representation learning. The idea is to convert an utterance to a sequence of discrete codes, and perform code representation learning. CoBERT outperforms the most recent state-of-the-art performance on the ASR task and brings significant improvements on the SUPERB speech translation (ST) task.

se

Pre-Trained Models

Model Pretraining Data Model
code teacher 1 Librispeech 960 hr download
code teacher 2 Librispeech 960 hr download
CoBERT base Librispeech 960 hr download

Extract features using pre-trained models

import torch
import torch.nn.functional as F
from cobert.models.cobert_with_teacher import CobertWithTeacherConfig, CobertWithTeacherModel

checkpoint = torch.load("path/to/checkpoint.pt")
cfg = CobertWithTeacherConfig(**checkpoint["cfg"]["model"])
model = CobertWithTeacherModel.build_model(cfg)

# code teacher is useless in this case. remove them.
model.code_teacher_model = None
for k in list(checkpoint["model"].keys()):
    if "code_teacher_model" in k:
        del checkpoint["model"][k]

# also delete ema
del checkpoint["model"]["_ema"]
model.load_state_dict(checkpoint["model"])
model.eval()

wav_input_16khz = torch.randn(1,10000)
normalize = checkpoint["cfg"]["task"]["normalize"]  # True by default
if normalize:
    wav_input_16khz = F.layer_norm(wav_input_16khz[0], wav_input_16khz[0].shape).unsqueeze(0)

# extract representations for each layer
layer_results = model.extract_features(source=wav_input_16khz, padding_mask=None)["layer_results"]
# T x B x C -> B x T x C
layer_results = [l[0].transpose(0, 1) for l in layer_results]

Implementation

Setup

Please follow the instructions below to clone the code and install the python environment for CoBERT.

git clone https://github.com/mct10/CoBERT.git
cd CoBERT
git submodule update --init fairseq
pip install --editable fairseq/
cd fairseq
python setup.py build develop
cd ..

Data Preparation

We follow the steps for prepare the manifest in here and HuBERT label in here.

code teacher 1

  • Pre-training
fairseq-hydra-train -m \
    --config-dir cobert/config/code_teacher_1/pretraining \
    --config-name base_librispeech \
    task.data=/path/to/manifest \
    task.label_dir=/path/to/codes \
    model.label_rate=50 \
    dataset.valid_subset=dev_other \
    dataset.train_subset=train \
    common.user_dir=/path/to/CoBERT/cobert/
  • Fine-tuning
fairseq-hydra-train -m \
  --config-dir cobert/config/code_teacher_1/finetuning \
  --config-name base_100h \
  task.data=/path/to/manifest \
  task.label_dir=/path/to/label \
  +task.code_dir=/path/to/codes \
  model.w2v_path=/path/to/ckpt \
  common.user_dir=/path/to/CoBERT/cobert/
  • Inference
python cobert/infer.py \
  --config-dir cobert/config/code_teacher_1/decode \
  --config-name infer_viterbi \
  task.data=/path/to/manifest \
  task.normalize=false \
  +task.code_dir=/path/to/codes \
  common_eval.path=/path/to/ckpt \
  dataset.gen_subset=dev_other \
  common.user_dir=/path/to/CoBERT/cobert/

code teacher 2

  • Pre-training
fairseq-hydra-train -m \
    --config-dir cobert/config/code_teacher_2/pretraining \
    --config-name base_librispeech \
    task.data=/path/to/manifest \
    dataset.valid_subset=dev_other \
    dataset.train_subset=train \
    +model.no_sin_pos_embed=true \
    common.user_dir=/path/to/CoBERT/cobert/
  • Fine-tuning
fairseq-hydra-train \
    --config-dir cobert/config/code_teacher_2/finetuning \
    --config-name base_100h \
    task.data=/path/to/manifest \
    task.label_dir=/path/to/label \
    model.w2v_path=/path/to/ckpt \
    dataset.train_subset=train_100h \
    dataset.valid_subset=dev_other \
    common.user_dir=/path/to/CoBERT/cobert/
  • Inference
python cobert/infer.py \
  --config-dir cobert/config/code_teacher_2/decode \
  --config-name infer_viterbi \
  task.data=/path/to/manifest \
  task.normalize=false \
  task.label_dir=/path/to/label \
  common_eval.path=/path/to/ckpt \
  dataset.gen_subset=dev_other \
  common.user_dir=/path/to/CoBERT/cobert/

CoBERT

  • Pre-training
fairseq-hydra-train -m \
    --config-dir cobert/config/cobert/pretraining \
    --config-name base_librispeech \
    task.data=/path/to/manifest \
    dataset.valid_subset=dev_other \
    dataset.train_subset=train \
    model.code_teacher_ckpt=/path/to/teacher \
    model.code_teacher_type=code_teacher_2 \
    +model.multi_outputs=true \
    common.user_dir=/path/to/CoBERT/cobert/
  • Fine-tuning
fairseq-hydra-train \
    --config-dir fairseq/examples/wav2vec/config/finetuning \
    --config-name base_100h \
    task.data=/path/to/manifest \
    model.w2v_path=/path/to/ckpt \
    +model.normalize=true \
    dataset.train_subset=train_100h \
    dataset.valid_subset=dev_other \
    common.user_dir=/path/to/CoBERT/cobert/
  • Inference
python cobert/infer.py \
  --config-dir fairseq/examples/speech_recognition/new/conf \
  --config-name infer \
  task=audio_finetuning \
  task.labels=ltr \
  decoding.type=viterbi \
  task.data=/path/to/manifest \
  task.normalize=true \
  common_eval.path=/path/to/ckpt \
  dataset.gen_subset=dev_other \
  common.user_dir=/path/to/CoBERT/cobert/

Citation

If you find our work is useful in your research, please cite the following paper:

@article{meng2022cobert,
  title   = {CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning},
  author  = {Meng, Chutong and Ao, Junyi and Ko, Tom and Wang, Mingxuan and Li, Haizhou},
  eprint={2210.04062},
  archivePrefix={arXiv},
  primaryClass={cs.SD},
  year={2022}
}

cobert's People

Contributors

mct10 avatar ajyy avatar tomkocse avatar

Stargazers

GitscYee avatar Wang Li avatar  avatar Xueyao Zhang avatar  avatar Seanghay Yath (上海) avatar ShengHeng Ye avatar Alexanda avatar  avatar Yongyi Zang avatar Mason avatar  avatar Jeff Carpenter avatar Oleg avatar Lingwei Meng avatar  avatar Dmitry Ryumin avatar Nickolay V. Shmyrev avatar Tower avatar HAESUNG JEON avatar  avatar Tianrui Wang (王天锐) avatar  avatar Shu-wen (Leo) Yang avatar  avatar  avatar Yuan-Man avatar Zyser avatar Ewald Enzinger avatar Slice avatar 爱可可-爱生活 avatar snoop2head avatar MrBananaHuman avatar Aleksei Dorkin avatar  avatar 陳建成(Chien-Cheng (Jeff) Chen) avatar Eric Lam avatar  avatar Haoran Duan avatar  avatar Rui Wang avatar Qibing Bai avatar Pawel Cyrta avatar  avatar Guan-Ting (Daniel) Lin avatar  avatar

Watchers

Nickolay V. Shmyrev avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.