mikempapa / cnns-speech-music-discrimination Goto Github PK

View Code? Open in Web Editor NEW

68.0 68.0 13.0 519 KB

A deep learning framework for Speech-Music discrimination of continuous audio streams

License: MIT License

Python 99.63% Shell 0.37%

cnns-speech-music-discrimination's People

Contributors

Stargazers

Watchers

Forkers

xiaoyun4 xi-studio statml dacson aitorbajo shubhampachori12110095 00001101-xt aascode qoboty sohappyzkx racerchen pythonthings ashishpatel26

cnns-speech-music-discrimination's Issues

Is it available for python3?

ATT，as pyAudioAnalysis is a python2 project?

weights

Can you add weights file so that it can be used for initialization for my network?

Having been working with you code for quite a while now and with promising results, I have noticed that you transform the input shape to a 160x227x227x3, and naturally the 227x227x3 is the image size and the RGB-channels, however have been working to figure out where the 160 comes from (I have currently traced it back to the 'initialize_transformer'-function (16x10) which seems to be responsible for the tranformation. Originally thought it was a batch size, but seeing as batch size is set to 1 in the 'singleFrame_classify_video'-function it seems unlikely.

Have also looked into your paper for any possible explanation, but have yet to find any clues.

thanks in advance,
Kenneth Enevoldsen

Data preparation

Please tell how to generate spectrograms and set segmentation parameters.

UnboundLocalError: local variable 'flagsIndGT' referenced before assignment

Dear Mike,

A group I am working with is intending to hopefully utilize this project in an upcoming publication. (assuming it generalise well to Danish)
I have read previous issues and found xtluo-ai comments especially useful.

However I am currently running in to the problem stated in the title:
"UnboundLocalError: local variable 'flagsIndGT' referenced before assignment"

Looking at "ClassifyWav.py", however I found the issue to be the missing '*_true.mat' file, which is seemingly a ground truth file. I however have no intention of using a ground truth, besides the initial performance testing.

Is it possible to utilize the already pretrained model, without such a file? Couldn't find any method in the readme nor in the code itself. I might however have missed something.

Thanks in advance,
Kenneth Enevoldsen
__
If relevant the script is currently run in using the caffe docker optimized for CPU:
https://hub.docker.com/r/bvlc/caffe/

Clarification regarding spectogram shapes

Dear Mike,

Have had great results with your script on our danish corpus of radio, however we have been trying to scale it up and have the removed the following code which I had previously added (following the issue by xtluo-ai)

import librosa # this have been removed, but the script worked with this in
 x, Fs = librosa.load(fileName, sr=None)
x = librosa.resample(x, Fs, 16000)
Fs = 16000

However then I get the following output of the mtCNN_classification()

___flagsInd___
[]
___classesAll___
[u'music', u'speech']
___CNNprobs___
[]

this seems to be caused by the fact that the spectrogram shapes is:

('specgram.shape[0]:', 66)
('specgram.shape[1]:', 160)

In your comment it, '# TODO (this must be dynamic!)', it seems you have already considered this as a potential problem. I would love to write a fix for this, however I am uncertain why it happens in the first place. Hopefully you can enlighten me

thanks in advance,
Kenneth Enevoldsen

Missing file *_classNames

It seems that file *_classNames is not provided which has to be loaded at line 144 in file "ClassifyWav.py".

I got the following error using the model downloaded from Dropbox：

CMD:

python ClassifyWav.py evaluate audio_dir SM_imagenet_10000_aug_iter_3500.caffemodel cnn 1 ""

ERROR:

File "ClassifyWav.py", line 144, in loadCNN
    classNamesAll = pickle.load(open(classNamesFileName, 'rb'))
FileNotFoundError: [Errno 2] No such file or directory: 'SM_imagenet_10000_aug_classNames'

Am I missing something ?

Increasing number of classes

Dear Mike

I am trying to train a 3-class based (music, speech and noise) audio classifier system using your tool.

Do I have to change the given proto somehow?

I trained a 2-classes (music, speech) model days ago and works fine, but when I increase the number of classes and do training, the output model does not work correctly even the training process has finished without errors. The classification gives always the same class as output (music in this case, the first one) without probabilities.

Thank you so much in advance! ;-)

mikempapa / cnns-speech-music-discrimination Goto Github PK

cnns-speech-music-discrimination's People

Contributors

Stargazers

Watchers

Forkers

cnns-speech-music-discrimination's Issues

Is it available for python3?

weights

regarding the input shape

Data preparation

UnboundLocalError: local variable 'flagsIndGT' referenced before assignment

Clarification regarding spectogram shapes

Missing file *_classNames

Increasing number of classes

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent