Git Product home page Git Product logo

siavashshams / ssamba Goto Github PK

View Code? Open in Web Editor NEW
83.0 8.0 5.0 1.85 MB

The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model

License: BSD 3-Clause "New" or "Revised" License

Python 90.70% Shell 9.30%
audio audio-classification keyword-spotting mamba representation-learning self-supervised-learning speaker-identification state-space-model deep-learning emotion-recognition

ssamba's Introduction

SSAMBA: Self-Supervised Audio Mamba

icon

⚠️ Under Construction ⚠️

We will add recipes for fine-tuning on more datasets later. 🛠️ Stay tuned!!!

Introduction

This repository contains the official implementation (in PyTorch) of the the paper SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model. SSAMBA is an advanced audio representation learning model designed to leverage self-supervised learning techniques using the Mamba State Space Model. This project builds on the success of the Self-Supervised Audio Spectrogram Transformer (SSAST) and introduces novel methodologies to further enhance performance and efficiency on various audio tasks.

Installation

To install the necessary dependencies, you can use the following commands:

git clone https://github.com/SiavashShams/ssamba.git
cd ssamba
pip install -r requirements.txt

Architecture

architecture

Efficiency Comparison

SSAMBA is approximately 92.7% faster in batch inference speed and 95.4% more memory-efficient than SSAST for the tiny model size with an input token size of 22k.

Models Inference Speed Models GPU Memory

Pretraining

We pretrained SSAMBA with various sizes (base, small, tiny) for patches (250, 300, and 400) on a mixture of unlabeled audios from AudioSet and LibriSpeech. You can find these weights in the "Pretrained Model Weights" section below. However, if you want to pretrain the model from scratch, follow this recipe:

  1. Navigate to the Directory: Change to the directory containing the pretraining scripts. You can do this by running the following command in your terminal:

    cd ssamba/src/pretrain
  2. Adjust the Script: Edit the run_mask_patch_amba.sh script to update the paths to your data files, Mamba encoder configurations, and any other necessary hyperparameters. Make sure that all paths and settings accurately reflect your local environment and the specifics of the dataset you are using.

  3. Run the Script: After making the necessary adjustments, execute the script to start the pretraining process. You can run the script directly from the terminal with the following command:

    ./run_mask_patch_amba.sh

Pretrained Model Weights

The pretrained model weights for our SSAMBA model in sizes (base, small, and tiny) for different number of masked patches (400, 300, 250) can be found at:

Pretrained Model Weights

Finetuning

Audioset_20k and ESC-50

To finetune the pretrained SSAMBA on the balanced Audioset or ESC-50 datasets, follow these steps:

  1. Navigate to the finetuning directory:

    • For Audioset:
      cd src/finetune/audioset
    • For ESC-50:
      cd src/finetune/esc50
  2. Adjust the paths and hyperparameters: Edit run_as_amba.sh and run_esc_patch_amba.sh. Adjust the paths and hyperparameters as needed for your dataset.

  3. Configure SLURM job submission (if using SLURM): Add the models you want to finetune to submit_jobs.sh:

    #!/bin/bash
    
    # Array of pre-trained models
    declare -a models=("ssamba_tiny_400")
    
    # Submit a job for each model
    for model in "${models[@]}"; do
        sbatch run_as_amba.sh $model
    done
  4. Run the job submission script: Execute the submit_jobs.sh script in the terminal to start the finetuning process:

    ./submit_jobs.sh

Make sure to monitor the jobs and adjust any parameters as needed to suit your specific requirements and hardware configuration.

VoxCeleb

Step 1: Install the SUPERB Package

  1. Clone the SUPERB repository:

    git clone https://github.com/s3prl/s3prl.git
  2. Navigate to the s3prl directory:

    cd s3prl
  3. Install the package:

    pip install -e ./

Step 2: Prepare the Fine-Tuning Scripts

  1. Copy our files:
    • Copy the files from src/finetune/voxceleb1/ssast to s3prl/s3prl/upstream/ssast.

Step 3: Adjust Paths and Specify Models

  1. Edit the run_sid.sh file:

    • Adjust the paths in the run_sid.sh file to point to the correct directories for your dataset and model.
  2. Specify models in submit_jobs_amba.sh:

    • Edit the submit_jobs_amba.sh script to specify the models you want to fine-tune.

Step 4: Run the Fine-Tuning Script

  1. Execute the submit_jobs_amba.sh script:
    • In the terminal, navigate to the directory containing submit_jobs_amba.sh and run:
      ./submit_jobs_amba.sh

License

The license for borrowed code can be found in LICENSE file. We acknowledge the wonderful work of SSAST, and Vision Mamba.

Citing

If you find this work helpful, please consider giving us a star 🌟 and citing:

@article{shams2024ssamba,
      title={SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model},
      author={Siavash Shams and Sukru Samet Dindar and Xilin Jiang and Nima Mesgarani},
      year={2024},
      eprint={2405.11831},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      journal={arXiv preprint arXiv:2405.11831}
}

ssamba's People

Contributors

siavashshams avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ssamba's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.