mvrl / bird-audio-detection Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 3.0 116 KB

An entry for the Bird Audio Detection Challenge.

Python 94.02% MATLAB 4.84% Shell 1.14%

bird-audio-detection's People

Contributors

Stargazers

Watchers

Forkers

turapeach xuerenjie124

bird-audio-detection's Issues

live detection script

We should have a script that pulls live audio from the computer and displays whether or not a bird is currently singing.

Python script to download and unpack dataset

Here is the location: http://machine-listening.eecs.qmul.ac.uk/bird-audio-detection-challenge/

predict using soundnet features

Idea: Use features extracted from the soundnet network in a traditional ML pipeline. Use lots of bagging and boosting like the typical winning solutions to Kaggle challenges.

Python script to convert dataset into appropriate format for training/testing

The currently convert_to_records.py does the conversion and dataset.py loads the output of that for training.

integrate species detection

Armin Hadzic pointed out this challenge: https://www.kaggle.com/c/mlsp-2013-birds/data.

It might be useful to simultaneously perform species classification and bird detection in networks that share several processing elements.

Update neural network architecture for new data format

This is the relevant https://github.com/UkyVision/bird-audio-detection/blob/bird-detection/network.py. The new data is longer temporally and am not sure if it is mono or stereo audio.

change run name format

I think a better format would be something like:

[network name][string w/o an underscore (network configuration)][string w/o an underscore (training configuration)]

The two "strings w/o an underscore" would be parsed inside of the respective functions. This would allow us to put more network-specific options in the configuration format... for example, not every network needs two capacity parameters, but we are stuck with that in the current format.

This is not a change I think we should make before the end of the contest, but it is probably worthwhile if we choose to fork this repo to work on another challenge.

Automatically create training and testing splits

rewrite src/analysis/display_evaluation.m in python

We should be able to easily recreate this in python. That would eliminate the dependence on Matlab completely.

decouple capacity parameter for micro and macro processing

Currently, we have one parameter, c, that controls the model capacity. I think we should split this into two parts, one for the windows and one across time.

finetune from SoundNet

https://github.com/cvondrick/soundnet

Two options

import the model and hope it works
implement in torch

adjust challenge.py to average predictions across many subwindows

We are currently extracting a 400000 subwindow (out of 441000) and predicting based on that. We could randomly sample many subwindows and average the prediction. That technique is frequently used in image classification.

Improve generalization to new datasets

Here are a few suggestions:

http://machine-listening.eecs.qmul.ac.uk/2016/11/bird-audio-detection-tips-on-building-robust-detectors/

Probably the first step would be to setup the training/testing code to train on one dataset and test on another so we can really test whether our attempts at improving generalization are working.

Here are some other datasets we could use:

http://projects.csail.mit.edu/soundnet/ (huge... probably what we want to use)
https://github.com/karoldvl/ESC-50

Alternative approach (spectrogram)

This repo shows an alternative approach that might apply for this task. This would be worthwhile to replicate on the new dataset.

determine a process for sharing model files

Options:

a separate repository
in this repository using large file support
via google drive

Any suggestions?

handle deprecation of variable initialization methods

I get the following warnings

WARNING:tensorflow:From evaluate.py:55 in .: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use tf.global_variables_initializer instead.
WARNING:tensorflow:From evaluate.py:56 in .: initialize_local_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use tf.local_variables_initializer instead.

I tried to fix this at some point but couldn't figure out the correct replacement.

Data augmentation by mixing and matching audio clips

If you have two audio clips, one with a bird and one without, you can add the two together and you have a new clip with birds! This should help generalize.

write test case to verify training and testing sets are disjoint

The new data augmentation approach seems to be helping. We need to make sure that there isn't a mistake.

add basic tensorboard summaries

A few useful things:

loss
learning rate
mini-batch accuracy (training data)
(maybe) mini-batch accuracy (test data)

This will allow us to more easily monitor training progress and compare different methods. Output to logs/{run_name}/ so we can see them all together.

create visualization scripts

Things we will want to be able to do:

find input sequences that maximize the activation for internal nodes of the network
prettyprint the network architecture (filter sizes, strides) and activations (blob sizes)