Git Product home page Git Product logo

tvsm-dataset's Introduction

TVSM Dataset

The TV Speech and Music (TVSM) dataset contains speech and music activity labels across a variety of TV shows and their corresponding audio features extracted from professionally-produced high-quality audio. The dataset aims to facilitate research on speech and music detection tasks.

Get the dataset

  • The dataset can be downloaded via Zenodo.org.
  • The paper can be downloaded via EURASIP open access.
  • This repo contains materials and codebase to reproduce the baseline experiment in the paper.

License and attribution

@article{Hung2022,
  title={{A Large TV Dataset for Speech and Music Activity Detection},
  author={Hung, Yun-Ning and Wu, Chih-Wei and Orife, Iroro and Hipple, Aaron and Wolcott, William and Lerch, Alexander},
  journal={EURASIP Journal on Audio, Speech, and Music Processing},
  volume={2022},
  number={1},
  pages={21},
  year={2022},
  publisher={Springer}
}

The TVSM dataset is licensed under a Apache License 2.0 license

Dataset introduction

The downloaded dataset has the following structure:

└─── READEME.txt
└─── TVSM-cuesheet/
│    └─── labels/
│    └─── mel_features/
│    └─── mfcc/
│    └─── vgg_features/
│    └─── TVSM-xxxx_metadata.csv
└─── TVSM-pseudo/
└─── TVSM-test/
  • READEME.txt: basic information about the dataset
  • TVSM-cuesheet/: smaller subset used for training. The labels are derived from cuesheet information
  • TVSM-pseudo/: larger subset used for training. The labels are labeled from a pre-trained model trained on TVSM-cuesheet
  • TVSM-test/: subset for testing. The labels are labeled by human annotators

Each subset folder has the same structure:

  • labels/: speech and music activation labels for each sample. Each row in a csv file represents "start time", "end time" and "s(speech)/m(music)"
  • mel_features/: the Mel spectrogram feature extracted from the audio of each sample
  • mfcc/: the MFCCs feature extracted from the audio of each sample
  • vgg_features/: the VGGish feature extracted from the audio of each sample
  • TVSM-xxxx_metadata.csv: the metadata of each sample

For more information, please visit our paper

Codebase introduction

Interested in inferencing existing samples? Please visit predictor.py for usage.

cd training_code
python3 predictor.py --audio_path test.wav

Please install git lfs first then run git-lfs pull to restore the checkpoints

Please replace line 31 in SM_detector.py with self.save_hyperparameters(hparams) if you are using newer pytorch_lightning versions.

└─── Evaluation_Output/
│    └─── AVASpeech/
│    │    └─── T2
│    │    └─── TVSM-cuesheet
│    │    └─── TVSM-pseudo
│    └─── ...
└─── Models/
└─── training_code/
  • Evaluation_Output: the output generated by three models across five evaluation sets
    • T2: baseline method
    • TVSM-cuesheet: CRNN-P-Cue method
    • TVSM-pseudo: CRNN-P-Pseu method
  • Models: the pre-trained checkpoint from CRNN-P-Cue and CRNN-P-Pseu methods
  • training_code: code for training the model

Bug Fix

If you encounter error "batch response: This repository is over its data quota. Account responsible for LFS...", can download the model checkpoint from Google Drive

Contact

Please feel free to contact [email protected] or open an issue here if you have any questions about the dataset or the support code.

tvsm-dataset's People

Contributors

biboamy avatar keunwoochoi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.