Git Product home page Git Product logo

icodoa's Introduction

Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs

Code repository for the paper Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs [1].

If you're only looking for our Pytorch implementation of the Icosahedral CNNs, you can find it here.

Dependencies

  • Python: it has been tested with Python 3.8.1
  • Numpy, matplotlib, scipy, soundfile, pandas and tqdm
  • Pytorch: it has been tested with Python 1.8.1
  • gpuRIR [2]
  • icoCNN
  • webrtcvad

Datasets

  • LibriSpeech The training dataset is generated during the training of the models as the trajectories are needed, but to simulate them you will need to have the LibriSpeech corpus in your machine. By default, the main scripts look for it in datasets/LibriSpeech but you can modify its phat with the path_train and path_test variables.
  • LOCATA In order to test the models with actual recordings, you will also need the dataset of the LOCATA challenge. By default, the main scripts look for it in datasets/LOCATA but you can modify its phat with the path_locata variable.

Main script

You can use the script 1sourceTracking_icoCNN.py to train the model and test it with synthetic and real recordings. You can change the resolution of the inputs maps by changing the value or r in line 22. The script is organized in cells, you can skip the training cell and just load the pretrained models.

You can find the definition of the model in acousticTrackingModels.py and the implementation of our sof-argmax function in acousticTrackingModules.py. If you are looking for the implementation of the icosahedral convolutions, they have their own repository. The baseline model Cross3D [3] also has his own repository with his code and the pretrained models.

Pretrained models

The pretrained models and the test results can be found in the models and results folders.

Other source files

acousticTrackingDataset.py, acousticTrackingLearners.py, acousticTrackingModels.py and acousticTrackingDataset.py contain several classes and functions employed by the main script. They are updated versions of the onew found in the repository of Cross3D and have been published to facilitate the replicability of the research presented in [1], not as a software library. Therefore, any feature included in them that is not used by the main script may be untested and could contain bugs.

References

[1] D. Diaz-Guerra, A. Miguel, J.R. Beltran, "Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs," [arXiv preprint].

[2] D. Diaz-Guerra, A. Miguel, J.R. Beltran, "gpuRIR: A python library for Room Impulse Response simulation with GPU acceleration," in Multimedia Tools and Applications, Oct. 2020 [DOI] [SharedIt] [arXiv preprint].

[3] D. Diaz-Guerra, A. Miguel and J. R. Beltran, "Robust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural Networks," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 300-311, 2021 [DOI] [arXiv preprint].

icodoa's People

Contributors

daviddiazguerra avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.