Git Product home page Git Product logo

m-ltm-audio-text-retrieval's Introduction

Deep Audio-Text Retrieval through the Lens of Transportation

Setup

  • Clone the respository
  • Create conde environment with dependencies: conda env create -f environment.yaml -n [env-name]&&conda activate [env-name]
  • Create a pretrained folder: mdkir -p pretrained_models/audio_encoder
  • Go to pretrained_models/audio_encoder and download the pretrained ResNet38 audio encoder model: gdown https://zenodo.org/records/3987831/files/ResNet38_mAP%3D0.434.pth?download=1 -O ResNet38.pth
  • Download AudioCaps and Clotho datasets. AudioCaps dataset can be downloaded at link and Clotho dataset can be downloaded at link.
  • Unzip datasets and put wavefiles under data/AudioCaps/waveforms or data/Clotho/waveforms

Training

  • The training config is in the setting folder settings/m-ltm-settings.yaml
  • Set value of dataset parameter in the config file to etheir "AudioCaps" or "Clotho" to train model on AudioCaps or Clotho dataset.
  • Run experiments: python train.py -n [exp_name] -c m-ltm-settings

Zeroshot evaluation

  • Download the test data of ESC50 from the link
  • Run the evaluation: python trainer/eval_esc50.py -c m-ltm-settings -p [pretrained model's folder]

Cite

@inproceedings{
luong2024revisiting,
title={Revisiting Deep Audio-Text Retrieval Through the Lens of Transportation},
author={Manh Luong and Khai Nguyen and Nhat Ho and Reza Haf and Dinh Phung and Lizhen Qu},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=l60EM8md3t}
}

Acknowledgement

m-ltm-audio-text-retrieval's People

Contributors

xinhaomei avatar v-manhlt3 avatar

Stargazers

Jaeyoon Jung avatar Jaeyeon Kim avatar Nickolay V. Shmyrev avatar  avatar Lupnis J H avatar  avatar

Watchers

Nickolay V. Shmyrev avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.