Git Product home page Git Product logo

cnns-speech-music-discrimination's People

Contributors

mikempapa avatar tyiannak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cnns-speech-music-discrimination's Issues

weights

Can you add weights file so that it can be used for initialization for my network?

regarding the input shape

Dear Mike,

Having been working with you code for quite a while now and with promising results, I have noticed that you transform the input shape to a 160x227x227x3, and naturally the 227x227x3 is the image size and the RGB-channels, however have been working to figure out where the 160 comes from (I have currently traced it back to the 'initialize_transformer'-function (16x10) which seems to be responsible for the tranformation. Originally thought it was a batch size, but seeing as batch size is set to 1 in the 'singleFrame_classify_video'-function it seems unlikely.

Have also looked into your paper for any possible explanation, but have yet to find any clues.

thanks in advance,
Kenneth Enevoldsen

Data preparation

Please tell how to generate spectrograms and set segmentation parameters.

UnboundLocalError: local variable 'flagsIndGT' referenced before assignment

Dear Mike,

A group I am working with is intending to hopefully utilize this project in an upcoming publication. (assuming it generalise well to Danish)
I have read previous issues and found xtluo-ai comments especially useful.

However I am currently running in to the problem stated in the title:
"UnboundLocalError: local variable 'flagsIndGT' referenced before assignment"

Looking at "ClassifyWav.py", however I found the issue to be the missing '*_true.mat' file, which is seemingly a ground truth file. I however have no intention of using a ground truth, besides the initial performance testing.

Is it possible to utilize the already pretrained model, without such a file? Couldn't find any method in the readme nor in the code itself. I might however have missed something.

Thanks in advance,
Kenneth Enevoldsen
__
If relevant the script is currently run in using the caffe docker optimized for CPU:
https://hub.docker.com/r/bvlc/caffe/

Clarification regarding spectogram shapes

Dear Mike,

Have had great results with your script on our danish corpus of radio, however we have been trying to scale it up and have the removed the following code which I had previously added (following the issue by xtluo-ai)

import librosa # this have been removed, but the script worked with this in
 x, Fs = librosa.load(fileName, sr=None)
x = librosa.resample(x, Fs, 16000)
Fs = 16000

However then I get the following output of the mtCNN_classification()

___flagsInd___
[]
___classesAll___
[u'music', u'speech']
___CNNprobs___
[]

this seems to be caused by the fact that the spectrogram shapes is:

('specgram.shape[0]:', 66)
('specgram.shape[1]:', 160)

In your comment it, '# TODO (this must be dynamic!)', it seems you have already considered this as a potential problem. I would love to write a fix for this, however I am uncertain why it happens in the first place. Hopefully you can enlighten me

thanks in advance,
Kenneth Enevoldsen

Missing file *_classNames

It seems that file *_classNames is not provided which has to be loaded at line 144 in file "ClassifyWav.py".

I got the following error using the model downloaded from Dropbox

  • CMD:

python ClassifyWav.py evaluate audio_dir SM_imagenet_10000_aug_iter_3500.caffemodel cnn 1 ""

  • ERROR:
File "ClassifyWav.py", line 144, in loadCNN
    classNamesAll = pickle.load(open(classNamesFileName, 'rb'))
FileNotFoundError: [Errno 2] No such file or directory: 'SM_imagenet_10000_aug_classNames'

Am I missing something ?

Increasing number of classes

Dear Mike

I am trying to train a 3-class based (music, speech and noise) audio classifier system using your tool.

Do I have to change the given proto somehow?

I trained a 2-classes (music, speech) model days ago and works fine, but when I increase the number of classes and do training, the output model does not work correctly even the training process has finished without errors. The classification gives always the same class as output (music in this case, the first one) without probabilities.

Thank you so much in advance! ;-)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.