hkveeranki / speech-emotion-recognition Goto Github PK

Speaker independent emotion recognition

Home Page: http://speechemotionrecognition.repos.hkveeranki.com/

License: MIT License

Python 100.00%

deeplearning emotion-recognition deep-learning deep-neural-networks lstm college-project speech-emotion-recognition python keras emodb

speech-emotion-recognition's Introduction

Speech Emotion Recognition

This repository contains our work on Speech emotion recognition using emodb dataset. This dataset is available here Emo-db

Prerequisites

Linux (preferable Ubuntu LTS). Python2.x

Installing dependencies

Note: You can skip this step, if you are installing the packages. Dependencies are listed below and in the requirements.txt file.

h5py
Keras
scipy
sklearn
speechpy
tensorflow

Install one of python package managers in your distro. If you install pip, then you can install the dependencies by running pip3 install -r requirements.txt

If you prefer to accelerate keras training on GPU's you can install tensorflow-gpu by pip3 install tensorflow-gpu

Directory Structure

speechemotionrecognition/ - Package folder which contains all the code files corresponding to package
dataset/ - Contains the speech files in wav formatted seperated into 7 folders which are the corresponding labels of those files
models/ - Contains the saved models which obtained best accuracy on test data.
examples/ - Contains examples on how to use the package

Details of the package

utilities.py - Contains code to read the files, extract the features and create test and train data
mlmodel.py - Code to train non DL models. We have three models
- 1 - SVM
- 2 - Random Forest
- 3 - Neural Network
dnn.py - Code to train Deep learning Models. Supports two models given below
- 1 - CNN
- 2 - LSTM

Examples

Have a look at examples/ directory. ml_example.py has examples using ML models. cnn_example.py and lstm_example.py has examples using cnn and lstm models.

Documentation

Code documentation can be found here

Installation

A setup.py file is provided in the repository. You can run sudo python3 setup.py install to install it at system level. If you don't have privileges to do so, you can install it at user level by running python3 setup.py install --user.

Contributing to the repository.

If you find any problem with the code, please feel free to open an issue.
Found something you can improve, please send me a pull request with your changes. I will be more than happy to review and approve them.

Note: If you find this code useful, please leave a star :)

speech-emotion-recognition's People

Contributors

Stargazers

Watchers

Forkers

sangeet2020 wxb506 raremood jasonaidm hung96ad nasirbashak jgabriellima iamweiweishi krypto94 sameersaxena89 wuqiangch mhanbgs slothologist sathishmtech01 oerdem19 julie2016 mrx1068 jayaneetha yingmuying haorotu asrivast13 liujuihung gokulsg pradii23 twomeng earnestbin huangqiang97 prakashreddy44 riviera2015 shivam24a avinashshah099 anjali1708 ma4465r aipersonal nikhilbharadwaj08 mahanotrahul ruddy202 three-star-potato renish-charaniya prateekthakkar dcmr jonghwan-hong md1284 houjibofa2050 manikantachowdhary rushpeng pranav-x rezaebrh shaikmubina314 kbitc sangsakawira snehilsanyal jackie-luo upvoter yanglijiajenny abeer2021 ahmedzgaren joontohub cslele aelgazar123 immortalsdm sawravchy fandresenajasmin typhonclaw soubhagyabehera mirzakhalov julia-chu bobbygrey keithwang5 akshat-raj-vansh blackhatabhi laxmikant-ainapure littlewat olayinkaadeleye ggzhang0071 vickykr26941 dxktrr nareshsuchi kawtharmssiaidi harishkagitha monicaaraneda sullivan12138 prkshtmg jayalakshmi7599 ishuuutanejaaa koffie99 shivanigupta-17 youjunli888 aslezar ayushjain001 shabeebhasan jaywyn-c ravib007 sadeenjradeen

speech-emotion-recognition's Issues

unable to run the code

i created the environment as per the requirements. But still i am getting an error

$ python2 mlmodel.py 1
Traceback (most recent call last):
File "mlmodel.py", line 9, in
from . import Model
ValueError: Attempted relative import in non-package

use current model

Hello,
can we use the current model without training?

Error

TypeError: pad_width must be of integral type
getting this error when i run the code

Problem with the code

[3 1 1 0 1 2 1 3 3 0 1 2 2 1 1 1 1 3 2 0 3 0 1 1 1 1 1 2 1 3 1 0 1 3 0 0 3
3 1 0 1 0 1 2 0 3 3 3 0 0 3 1 1 1 0 1 0 3 3 0 3 3 3 1 2 0 0 0]
(0, 0, 0, 0, 1, 0, 0, 3, 3, 0, 2, 0, 2, 0, 0, 0, 2, 3, 0, 0, 3, 0, 0, 0, 1, 0, 0, 0, 3, 3, 0, 0, 0, 3, 0, 0, 3, 3, 1, 0, 0, 0, 1, 0, 0, 0, 3, 3, 0, 0, 0, 1, 2, 1, 0, 1, 0, 3, 0, 0, 3, 0, 3, 0, 0, 0, 0, 0)
Accuracy:0.574

Confusion matrix: [[18 0 0 0]
[14 7 3 1]
[ 6 0 1 0]
[ 5 0 0 13]]
Traceback (most recent call last):
File "emotionRecognition_cnn.py", line 63, in
cnn_example()
File "emotionRecognition_cnn.py", line 33, in cnn_example
predicted = cnn.predict_one(feature)
File "/home/diego/Desktop/progetto/cnn_emotionrecognition/dnn.py", line 98, in predict_one
return np.argmax(self.model.predict(np.array([sample])))
File "/home/diego/.local/lib/python3.7/site-packages/keras/engine/training.py", line 1149, in predict
x, _, _ = self._standardize_user_data(x)
File "/home/diego/.local/lib/python3.7/site-packages/keras/engine/training.py", line 751, in _standardize_user_data
exception_prefix='input')
File "/home/diego/.local/lib/python3.7/site-packages/keras/engine/training_utils.py", line 128, in standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (1, 198, 39)

The code used is:
python3 cnn_examples.py

The envinronment is created via "pip3 install -r requirements.txt" as you suggested

I don't know what this problem is, can you help me?
Thanks

怎麼调用models/已给模型？

博主，运行完examples/下的文件，并没有保存训练的模型，想知道怎么您给的models/下的模型给出调用方法了吗？可以直接调用吗？

do u have any paper for this ?

any research paper for this or proper doc

Can you tell me what GPU you have used?

Hello from South Korea! I'm a grad school student who wants to use your code to train a new dataset.
The thing is, I want to know what and how many GPUs you have used for parallel processing, or the cloud computing server you have used to train your model with your best weight.

That's all! Thank you so much for reading :)

regarding accuray

recently i asked you regarding accuracy .
#3
i did multiclassification not binary classification
Is keras calculates accuracy by sample wise or labels wise ?
just consider 2 samples only.
suppose y_pred=[ [0.35, 0.2, 0.2, 0.25], [0.33, 0.22, 0.18, 0.27] ]
and my t_test= [ [1, 0, 0, 0] , [0, 0, 0, 1] ]
scikit accuracy is 50% because in scikit all labels must be same
what about keras .how it actually calculates.
can you please explain me with exmples. it will help to me a lot

Models accuracy

Is there any report about applied models accuracy?

accuracy calculated from model.evaluate is not same as from model.pred using scikit

actually i run the code below code .

from sklearn.metrics import accuracy_score
import numpy as np
import sys
from keras import Sequential
from keras.layers import LSTM, Dense, Dropout, Conv2D, Flatten, \
    BatchNormalization, Activation, MaxPooling2D
from keras.utils import np_utils
from keras.layers import Bidirectional
from tqdm import tqdm

from utilities import get_data, class_labels
import pickle
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"

models = ["CNN", "LSTM"]


def get_model(model_name, input_shape):
    """
    Generate the required model and return it
    :return: Model created
    """
    # Models are inspired from
    # CNN - https://yashk2810.github.io/Applying-Convolutional-Neural-Network-on-the-MNIST-dataset/
    # LSTM - https://github.com/harry-7/Deep-Sentiment-Analysis/blob/master/code/generatePureLSTM.py
    model = Sequential()
    if model_name == 'CNN':
        model.add(Conv2D(8, (13, 13),
                         input_shape=(input_shape[0], input_shape[1], 1)))
        model.add(BatchNormalization(axis=-1))
        model.add(Activation('relu'))
        model.add(Conv2D(8, (13, 13)))
        model.add(BatchNormalization(axis=-1))
        model.add(Activation('relu'))
        model.add(MaxPooling2D(pool_size=(2, 1)))
        model.add(Conv2D(8, (13, 13)))
        model.add(BatchNormalization(axis=-1))
        model.add(Activation('relu'))
        model.add(Conv2D(8, (2, 2)))
        model.add(BatchNormalization(axis=-1))
        model.add(Activation('relu'))
        model.add(MaxPooling2D(pool_size=(2, 1)))
        model.add(Flatten())
        model.add(Dense(64))
        model.add(BatchNormalization())
        model.add(Activation('relu'))
        model.add(Dropout(0.2))
    elif model_name == 'LSTM':
        model.add(Bidirectional(LSTM(128), input_shape=(input_shape[0], input_shape[1])))
        model.add(Dropout(0.5))
        model.add(Dense(32, activation='relu'))
        model.add(Dense(16, activation='tanh'))
    model.add(Dense(len(class_labels), activation='softmax'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    print(model.summary())
    return model


def evaluateModel(model):
    """
    Train the model and evaluate it
    :param model: model to be evaluted
    """
    # Train the epochs
    best_acc = 0
    global x_train, y_train, x_test, y_test
    for i in tqdm(range(50)):
        # Shuffle the data for each epoch in unison inspired from https://stackoverflow.com/a/4602224
        p = np.random.permutation(len(x_train))
        x_train = x_train[p]
        y_train = y_train[p]
        model.fit(x_train, y_train, batch_size=32, epochs=1)
        loss, acc = model.evaluate(x_test, y_test)
        if acc > best_acc:
            print ('Updated best accuracy', acc)
            best_acc = acc
            model.save_weights(best_model_path)
    model.load_weights(best_model_path)
    print ('keras_Accuracy = ', model.evaluate(x_test, y_test)[1])
    y_pred=model.predict(x_test)
    return y_pred


if __name__ == "__main__":

    if len(sys.argv) != 2:
        sys.stderr.write('Invalid arguments\n')
        sys.stderr.write('Usage python2 train_DNN.py <model_number>\n')
        sys.stderr.write('1 - CNN\n')
        sys.stderr.write('2 - LSTM\n')
        sys.exit(-1)

    n = int(sys.argv[1]) - 1
    print ('model given', models[n])

    # Read data
    global x_train, y_train, x_test, y_test
    x_train, x_test, y_train, y_test = get_data(flatten=False)
    y_train = np_utils.to_categorical(y_train)
    y_test = np_utils.to_categorical(y_test)

    if n == 0:
        # Model is CNN so have to reshape the data
        in_shape = x_train[0].shape
        x_train = x_train.reshape(x_train.shape[0], in_shape[0], in_shape[1], 1)
        x_test = x_test.reshape(x_test.shape[0], in_shape[0], in_shape[1], 1)
    elif n > len(models):
        sys.stderr.write('Model Not Implemented yet')
        sys.exit(-1)

    model = get_model(models[n], x_train[0].shape)

    global best_model_path
    best_model_path = '../model/best_model_' + models[n - 1] + '.h5'

    y_pred=evaluateModel(model)
    t=[]
    for i in y_test:
        t.append(np.argmax(i))
    p=[]
    for i in y_pred:
        p.append(np.argmax(i))
    scikit_accuracy=accuracy_score(t,p)*100
    print(scikit_accuracy)

I got 92 as keras_accyracy from model.evalute , but i got 67 as scikit accuracy . could you anyone solve my problem please......@

load and train the model

how to load the previous model and continue training on that checkpoint ?

best score

can you share your best score for CNN and LSTM.

unknown data

FileNotFoundError Traceback (most recent call last)
in
22
23 if name == "main":
---> 24 ml_example()

in ml_example()
17 filename = './dataset/Neutral/srg1.wav'
18 print('prediction', model.predict_one(
---> 19 get_feature_vector_from_mfcc(filename, flatten=to_flatten)),
20 'Actual 3')
21

~\SERproject\Code\speechemotionrecognition\utilities.py in get_feature_vector_from_mfcc(file_path, flatten, mfcc_len)
31 numpy.ndarray: feature vector of the wav file made from mfcc.
32 """
---> 33 fs, signal = wav.read(file_path)
34 s_len = len(signal)
35 # pad the signals to have same size if lesser than required

c:\users\srg\appdata\local\programs\python\python39\lib\site-packages\scipy\io\wavfile.py in read(filename, mmap)
637 mmap = False
638 else:
--> 639 fid = open(filename, 'rb')
640
641 try:

FileNotFoundError: [Errno 2] No such file or directory: './dataset/Neutral/srg1.wav'

why this project does not recognising unknown speech data? please help me.... i want to give my own data.... not emodb dataset....

I am facing a similar error in github. However I tried it in Google Colab and I got the following error:

I am facing a similar error in github. However I tried it in Google Colab and I got the following error:
NN has no attribute called 'train'. Could you please look into the issue?
This is the Colab notebook.
https://colab.research.google.com/drive/19zXD0GzR5-gXqoTEfZIW0PbHswdbc4ZD

Originally posted by @Nandy-Saran in #9 (comment)

can you tell How to run this project

documentation link is down

Hi, thanks for the code. The documentation link is broken, could you please update it?

ImportError: cannot import name 'Model'

When running dnn.py
I'm getting an error as follows:
Using TensorFlow backend.
Traceback (most recent call last):
File "dnn.py", line 11, in
from . import Model
ImportError: cannot import name 'Model'

Unable to understand the the concept of padding, in utilities.py file?

I know the length of voice signals vary from file to file or you can say that there may b some outliers in the data set. But padding adds zeros in the data. So why we are aimed to equalize the length of the audio signal with zeros? If we are adding zeros in the data, will it distort original data, if yes then we are padding the data?

My second question is that how voice data is normalized, did you normalize the data in current project?

Issue facing in lstm_example

Dear sir,

I am getting error while executing the lstm_example.
could you please check once and please let me know if any modifications has to be done.

Thank you.

Make the package installable.

Currently one can download and use the package but there is no way to install the package. Create a setup.py and make it installable.

can we detect the % of any classification

if any audio file is detecting in anger class so i wanted to know the % like it is detecting 10% or 50% or more? how to find this?

Need a help.

I didnt understand your code.. I have a doubt..
Can i provide my own dataset and get the output if the audio is angry,happy,sad or neutral?

How to get the best models?

Is there a part of the code to get the best models in h5 format? I'm navigating through the package and did not find any. Thanks for the help.

Create Docs

Most of the code has incode documentation. But there are no standard docs for the code and the logic behind it. Add the necessary documentation

<_ObjectIdentityWrapper wrapping <tensorflow.python.keras.callbacks.History object at 0x7fe7b240f6a0>>

im getting this error while passing 3 sec of audio

how did you calculated the mslen for a particular db?

train with custom dataset

how to train on ml model with my own custom dataset of call recording ?

Some issues during training

Epoch 1/1
271/271 [==============================] - 3s 12ms/step - loss: 0.0187 - acc: 0.9963
68/68 [==============================] - 0s 2ms/step
[3 1 1 0 1 2 1 3 3 0 1 2 2 1 1 1 1 3 2 0 3 0 1 1 1 1 1 2 1 3 1 0 1 3 0 0 3
3 1 0 1 0 1 2 0 3 3 3 0 0 3 1 1 1 0 1 0 3 3 0 3 3 3 1 2 0 0 0]
(3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
Accuracy:0.412

Confusion matrix: [[ 0 18 0 0]
[ 0 25 0 0]
[ 0 7 0 0]
[ 0 15 0 3]]
Traceback (most recent call last):
File "cnn_example.py", line 33, in
cnn_example()
File "cnn_example.py", line 27, in cnn_example
get_feature_vector_from_mfcc(filename, flatten=to_flatten)),
File "/home/sww/workspace/speech-emotion-recognition/speechemotionrecognition/dnn.py", line 97, in predict_one
return np.argmax(self.model.predict(np.array([sample])))
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1149, in predict
x, _, _ = self._standardize_user_data(x)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 751, in _standardize_user_data
exception_prefix='input')
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_utils.py", line 128, in standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (1, 198, 39)
root@2d14eba3fb00:/home/sww/workspace/speech-emotion-recognition# Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (1, 198, 39)

The training code is:
'''
#!/usr/bin/env bash
set -x
export PYTHONPATH=/path_to/speech-emotion-recognition/:$PYTHONPATH
export PATH=/path_to/speech-emotion-recognition/:$PATH
cd examples
python cnn_example.py
'''

There seems to be some problems with the input.
Could you pls help me with this?
Many Thanks.

Cannot import 'Model'

I ran dnn.py
but I have an error.

Cannot import name 'Model'

(My tensorflow-gpu version is 2.0.0)

How do I solve this problem...?

How to plot the accuracy/loss?

Can anyone please say the procedure for running this project

min_sample = int('9' * 10)

what does this particular line stands for and how did you reached this particular value?

Predicting the emotion using saved DL model - Error: str attribute has no object ndim

The python file dl_example ran correctly and we got an accuracy of 0.98 with the training set. We then tried to predict the emotion using predict() function by giving a wave file as an argument. We preloaded the model best_model_LSTM.h5 from the models' folder.

Here is the screenshot of the code of the file dl_example.py

The error and stack trace is attached.

Please look into the issue as soon as you can Harry. Thanks again :)