Git Product home page Git Product logo

beat_this's Introduction

Beat This!

Official implementation of the beat tracker from the ISMIR 2024 paper "Beat This! Accurate Beat Tracking Without DBN Postprocessing" by Francesco Foscarin, Jan Schlüter and Gerhard Widmer.

Inference

To predict beats for audio files, you can either use our command line tool or call the beat tracker from Python. Both have the same requirements, unless you go for the online demo.

Online demo

To process a small set of audio files without installing anything, open our example notebook in Google Colab and follow the instructions.

Requirements

The beat tracker requires Python with a set of packages installed:

  1. Install PyTorch 2.0 or later following the instructions for your platform.
  2. Install further modules with pip install tqdm einops soxr rotary-embedding-torch. (If using conda, we still recommend pip. You may try installing soxr-python and einops from conda-forge, but rotary-embedding-torch is only on PyPI.)
  3. To read other audio formats than .wav, install ffmpeg or another supported backend for torchaudio. (ffmpeg can be installed via conda or via your operating system.)

Finally, install our beat tracker with:

pip install https://github.com/CPJKU/beat_this/archive/main.zip

Command line

Along with the python package, a command line application called beat_this is installed. For a full documentation of the command line options, run:

beat_this --help

The basic usage is:

beat_this path/to/audio.file -o path/to/output.beats

To process multiple files, specify multiple input files or directories, and give an output directory instead:

beat_this path/to/*.mp3 path/to/whole_directory/ -o path/to/output_directory

The beat tracker will use the first GPU in your system by default, and fall back to CPU if PyTorch does not have CUDA access. With --gpu=2, it will use the third GPU, and with --gpu=-1 it will force the CPU. If you have a lot of files to process, you can distribute the load over multiple processes by running the same command multiple times with --touch-first, --skip-existing and potentially different options for --gpu:

for gpu in {0..3}; do beat_this input_dir -o output_dir --touch-first --skip-existing --gpu=$gpu & done

If you want to use the DBN for postprocessing, add --dbn. The DBN parameters are the default ones from madmom. This requires installing the madmom package.

Python class

If you are a Python user, you can directly use the beat_this.inference module.

First, instantiate an instance of the File2Beats class that encapsulates the model along with pre- and postprocessing:

from beat_this.inference import File2Beats
file2beats = File2Beats(checkpoint_path="final0", device="cuda", dbn=False)

To obtain a list of beats and downbeats for an audio file, run:

audio_path = "path/to/audio.file"
beats, downbeats = file2beats(audio_path)

Optionally, you can produce a .beats file (e.g., for importing into Sonic Visualizer):

from beat_this.utils import save_beat_tsv
output_path = "path/to/output.beats"
save_beat_tsv(beats, downbeats, outpath)

If you already have an audio tensor loaded, instead of File2Beats, use Audio2Beats and pass the tensor and its sample rate. We also provide Audio2Frames for framewise logits and Spect2Frames for spectrogram inputs.

Available models

Models are available for manual download at our cloud space, but will also be downloaded automatically by the above inference code. By default, the inference will use final0, but it is possible to select another model via a command line option (--model) or Python parameter (checkpoint_path).

Main models:

  • final0, final1, final2: Our main model, trained on all data except the GTZAN dataset, with three different seeds. This corresponds to "Our system" in Table 2 of the paper. About 78 MB per model.
  • small0, small1, small2: A smaller model, again trained on all data except GTZAN, with three different seeds. This corresponds to "smaller model" in Table 2 of the paper. About 8.1 MB per model.
  • single_final0, single_final1, single_final2: Our main model, trained on the single split described in Section 4.1 of the paper, with three different seeds. This corresponds to "Our system" in Table 3 of the paper. About 78 MB per model.
  • fold0, fold1, fold2, fold3, fold4, fold5, fold6, fold7: Our main model, trained in the 8-fold cross validation setting with a single seed per fold. This corresponds to "Our" in Table 1 of the paper. About 78 MB per model.

Other models, available mainly for result reproducibility:

  • hung0, hung1, hung2: A model trained on all the data used by the "Modeling Beats and Downbeats with a Time-Frequency Transformer" system by Hung et al. (except GTZAN dataset), with three different seeds. This corresponds to "limited to data of [10]" in Table 2 of the paper.
  • the other models used for the ablation studies in Table 3, all trained with 3 seeds on the single split described in Section 4.1 of the paper:
    • single_notempoaug0, single_notempoaug1, single_notempoaug2
    • single_nosumhead0, single_nosumhead1, single_nosumhead2
    • single_nomaskaug0, single_nomaskaug1, single_nomaskaug2
    • single_nopartialt0, single_nopartialt1, single_nopartialt2
    • single_noshifttol0, single_noshifttol1, single_noshifttol2
    • single_nopitchaug0, single_nopitchaug1, single_nopitchaug2
    • single_noshifttolnoweights0, single_noshifttolnoweights1, single_noshifttolnoweights0

Please be aware that the results may be unfairly good if you run inference on any file from the training datasets. For example, an evaluation with final* or small* can only be performed fairly on GTZAN or other datasets we didn't consider in our paper.

If you need to run an evaluation on some datasets we used other than GTZAN, consider targeting the validation part of the single split (with single_final*), or of the 8-fold cross-validation (with fold*).

All the models are provided as PyTorch Lightning checkpoints, stripped of the optimizer state to reduce their size. This is useful for reproducing the paper results, or verifying the hyper parameters (stored in the checkpoint under hyper_parameters and datamodule_hyper_parameters). During inference, PyTorch Lighting is not used, and the checkpoints are converted and loaded into vanilla PyTorch modules.

Data

This part will be available soon.

Reproducing metrics from the paper

This part won't work until the data release. We still provide the commands for reference.

Requirements

In addition to the inference requirements, computing evaluation metrics requires PyTorch Lightning and mir_eval, as well as obtaining and setting up the GTZAN dataset. This part will be completed soon.

Command line

Compute results on the test set (GTZAN) corresponding to Table 2 in the paper.

Main results for our system:

python launch_scripts/compute_paper_metrics.py --models final0 final1 final2 --datasplit test

Smaller model:

python launch_scripts/compute_paper_metrics.py --models small0 small1 small2 --datasplit test

Hung data:

python launch_scripts/compute_paper_metrics.py --models hung0 hung1 hung2 --datasplit test

With DBN (this requires installing the madmom package):

python launch_scripts/compute_paper_metrics.py --models final0 final1 final2 --datasplit test --dbn

Compute 8-fold cross-validation results, corresponding to Table 1 in the paper.

python launch_scripts/compute_paper_metrics.py --models fold0  fold1 fold2 fold3 fold4 fold5 fold6 fold7 --datasplit val --aggregation-type k-fold

Compute ablation studies on the validation set of the single split, correponding to Table 3 in the paper.

Our system:

python launch_scripts/compute_paper_metrics.py --models single_final0 single_final1 single_final2 --datasplit val

No sum head:

python launch_scripts/compute_paper_metrics.py --models single_nosumhead0 single_nosumhead1 single_nosumhead2 --datasplit val

No tempo augmentation:

python launch_scripts/compute_paper_metrics.py --models single_notempoaug0 single_notempoaug1 single_notempoaug2 --datasplit val

No mask augmentation:

python launch_scripts/compute_paper_metrics.py --models single_nomaskaug0 single_nomaskaug1 single_nomaskaug2 --datasplit val

No partial transformers:

python launch_scripts/compute_paper_metrics.py --models single_nopartialt0 single_nopartialt1 single_nopartialt2 --datasplit val

No shift tolerance:

python launch_scripts/compute_paper_metrics.py --models single_noshifttol0 single_noshifttol1 single_noshifttol2 --datasplit val

No pitch augmentation:

python launch_scripts/compute_paper_metrics.py --models single_nopitchaug0 single_nopitchaug1 single_nopitchaug2 --datasplit val

No shift tolerance and no weights:

python launch_scripts/compute_paper_metrics.py --models single_noshifttolnoweights0 single_noshifttolnoweights1 single_noshifttolnoweights2  --datasplit val

Training

This part will be available soon.

Reusing the loss

To reuse our shift-invariant binary cross-entropy loss, just copy out the ShiftTolerantBCELoss class from loss.py, it does not have any dependencies.

Reusing the model

To reuse the BeatThis model, you have multiple options:

From the package

When installing the beat_this package, you can directly import the model class:

from beat_this.model.beat_tracker import BeatThis

Instantiating this class will give you an untrained model from spectrograms to frame-wise beat and downbeat logits. For a pretrained model, use load_model:

from beat_this.inference import load_model
beat_this = load_model('final0', device='cuda')

From torch.hub

To quickly try the model without installing the package, just install the requirements for inference and do:

import torch
beat_this = torch.hub.load('CPJKU/beat_this', 'beat_this', 'final0', device='cuda')

Copy and paste

To copy the BeatThis model into your own project, you will need the beat_tracker.py and roformer.py files. If you remove the BeatThis.state_dict() and BeatThis._load_from_state_dict() methods that serve as a workaround for compiled models, then there are no other internal dependencies, only external dependencies (einops, rotary-embedding-torch).

Citation

@inproceedings{foscarin2024beatthis,
    author = {Francesco Foscarin and Jan Schl{\"u}ter and Gerhard Widmer},
    title = {Beat this! Accurate beat tracking without DBN postprocessing}
    year = 2024,
    month = nov,
    booktitle = {Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR)},
    address = {San Francisco, CA, United States},
}

beat_this's People

Contributors

fosfrancesco avatar f0k avatar zhaotw avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.