Git Product home page Git Product logo

learning_to_localize_sound_source's Introduction

Learning to Localize Sound Source in Visual Scenes [CVPR 2018,TPAMI "To Appear"]

The codebase is the re-implementation of the code that was used in CVPR 2018 Learning to Localize Sound Source in Visual Scenes and TPAMI Learning to Localize Sound Source in Visual Scenes: Analysis and Applications papers. Original code was written in the early version of Tensorflow so that we re-implemented it in PyTorch for the community.

Getting started

  • tqdm
  • scipy

Preparation

  • Training Data

    • We used 144k samples from Flickr-SoundNet dataset for training as it is mentioned in the paper.
    • Sound features are directly obtained from SoundNet implementation. We apply average pooling on the output of "Object" branch of Conv8 layer and use it as sound feature in our architecture.
    • To be able to use our dataloader (Sound_Localization_Dataset.py);
      • Each sample folder should contain frames as .jpg and audio features as .mat extensions. For details please refer to Sound_Localization_Dataset.py
        • /hdd/SoundLocalization/dataset/12015590114.mp4/frame1.jpg
        • /hdd/SoundLocalization/dataset/12015590114.mp4/12015590114.mat
  • The Sound Localization Dataset (Annotated Dataset)

    The Sound Localization dataset can be downloaded from the following link:

    https://drive.google.com/open?id=1P93CTiQV71YLZCmBbZA0FvdwFxreydLt

    This dataset contains 5k image-sound pairs and their annotations in XML format. Each XML file has annotations of 3 annotators.

    test_list.txt file includes the id of every pair that is used for testing.

Training

python sound_localization_main.py --dataset_file /hdd3/Old_Machine/sound_localization/semisupervised_train_list.txt  
--val_dataset_file /hdd3/Old_Machine/sound_localization/supervised_test_list.txt 
--annotation_path /hdd/Annotations/xml_box_20  --mode train --niter 10 --batchSize 30 --nThreads 8 --validation_on True 
--validation_freq 1 --display_freq 1 --save_latest_freq 1 --name semisupervised_sound_localization_t1 
--optimizer adam --lr_rate 0.0001 --weight_decay 0.0

Pretrained Model

We provide pre-trained model for semisupervised architecture. Accuracy is slightly lower than reported number in the paper (Because of re-implementation in another framework). You can download the model from here.

If you end up using our code or dataset, we ask you to cite the following papers:

@InProceedings{Senocak_2018_CVPR,
author = {Senocak, Arda and Oh, Tae-Hyun and Kim, Junsik and Yang, Ming-Hsuan and So Kweon, In},
title = {Learning to Localize Sound Source in Visual Scenes},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}
@article{Senocak_2019_TPAMI,
title = {Learning to Localize Sound Source in Visual Scenes: Analysis and Applications},
author = {Senocak, Arda and Oh, Tae-Hyun and Kim, Junsik and Yang, Ming-Hsuan and So Kweon, In},
journal = {TPAMI},
year = {2019},
publisher = {IEEE}
}

Image-sound pairs are collected by using the Flickr-SoundNet dataset. Thus, please cite the Yahoo dataset the Yahoo dataset and SoundNet paper as well.

The dataset and the code must be used for research purposes only.

learning_to_localize_sound_source's People

Contributors

ardasnck avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.