mvrl / bird-audio-detection Goto Github PK
View Code? Open in Web Editor NEWAn entry for the Bird Audio Detection Challenge.
An entry for the Bird Audio Detection Challenge.
We should have a script that pulls live audio from the computer and displays whether or not a bird is currently singing.
Here is the location: http://machine-listening.eecs.qmul.ac.uk/bird-audio-detection-challenge/
Idea: Use features extracted from the soundnet network in a traditional ML pipeline. Use lots of bagging and boosting like the typical winning solutions to Kaggle challenges.
Some pulications we need to cite:
The currently convert_to_records.py does the conversion and dataset.py loads the output of that for training.
Armin Hadzic pointed out this challenge: https://www.kaggle.com/c/mlsp-2013-birds/data.
It might be useful to simultaneously perform species classification and bird detection in networks that share several processing elements.
This is the relevant https://github.com/UkyVision/bird-audio-detection/blob/bird-detection/network.py. The new data is longer temporally and am not sure if it is mono or stereo audio.
I think a better format would be something like:
[network name][string w/o an underscore (network configuration)][string w/o an underscore (training configuration)]
The two "strings w/o an underscore" would be parsed inside of the respective functions. This would allow us to put more network-specific options in the configuration format... for example, not every network needs two capacity parameters, but we are stuck with that in the current format.
This is not a change I think we should make before the end of the contest, but it is probably worthwhile if we choose to fork this repo to work on another challenge.
We should be able to easily recreate this in python. That would eliminate the dependence on Matlab completely.
Currently, we have one parameter, c, that controls the model capacity. I think we should split this into two parts, one for the windows and one across time.
https://github.com/cvondrick/soundnet
Two options
We are currently extracting a 400000 subwindow (out of 441000) and predicting based on that. We could randomly sample many subwindows and average the prediction. That technique is frequently used in image classification.
Here are a few suggestions:
Probably the first step would be to setup the training/testing code to train on one dataset and test on another so we can really test whether our attempts at improving generalization are working.
Here are some other datasets we could use:
This repo shows an alternative approach that might apply for this task. This would be worthwhile to replicate on the new dataset.
Options:
Any suggestions?
I get the following warnings
WARNING:tensorflow:From evaluate.py:55 in .: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Usetf.global_variables_initializer
instead.
WARNING:tensorflow:From evaluate.py:56 in .: initialize_local_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Usetf.local_variables_initializer
instead.
I tried to fix this at some point but couldn't figure out the correct replacement.
If you have two audio clips, one with a bird and one without, you can add the two together and you have a new clip with birds! This should help generalize.
The new data augmentation approach seems to be helping. We need to make sure that there isn't a mistake.
A few useful things:
This will allow us to more easily monitor training progress and compare different methods. Output to logs/{run_name}/ so we can see them all together.
Things we will want to be able to do:
It seems like we might be overfitting (since continuing to lower the model capacity helps with generalization). Would it help to incorporate dropout layers? I suggest we pick a few of the best model settings and add dropout to see how it works.
Here is one paper (https://arxiv.org/abs/1412.3474), but there have been others by the same group.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.