Git Product home page Git Product logo

speech-emotion-recognition's Introduction

Speech Emotion Recognition

This repository contains our work on Speech emotion recognition using emodb dataset. This dataset is available here Emo-db

Prerequisites

Linux (preferable Ubuntu LTS). Python2.x

Installing dependencies

Note: You can skip this step, if you are installing the packages. Dependencies are listed below and in the requirements.txt file.

  • h5py
  • Keras
  • scipy
  • sklearn
  • speechpy
  • tensorflow

Install one of python package managers in your distro. If you install pip, then you can install the dependencies by running pip3 install -r requirements.txt

If you prefer to accelerate keras training on GPU's you can install tensorflow-gpu by pip3 install tensorflow-gpu

Directory Structure

  • speechemotionrecognition/ - Package folder which contains all the code files corresponding to package
  • dataset/ - Contains the speech files in wav formatted seperated into 7 folders which are the corresponding labels of those files
  • models/ - Contains the saved models which obtained best accuracy on test data.
  • examples/ - Contains examples on how to use the package

Details of the package

  • utilities.py - Contains code to read the files, extract the features and create test and train data
  • mlmodel.py - Code to train non DL models. We have three models
    • 1 - SVM
    • 2 - Random Forest
    • 3 - Neural Network
  • dnn.py - Code to train Deep learning Models. Supports two models given below
    • 1 - CNN
    • 2 - LSTM

Examples

Have a look at examples/ directory. ml_example.py has examples using ML models. cnn_example.py and lstm_example.py has examples using cnn and lstm models.

Documentation

Code documentation can be found here

Installation

A setup.py file is provided in the repository. You can run sudo python3 setup.py install to install it at system level. If you don't have privileges to do so, you can install it at user level by running python3 setup.py install --user.

Contributing to the repository.

  • If you find any problem with the code, please feel free to open an issue.
  • Found something you can improve, please send me a pull request with your changes. I will be more than happy to review and approve them.

Note: If you find this code useful, please leave a star :)

speech-emotion-recognition's People

Contributors

harry-7 avatar hkveeranki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speech-emotion-recognition's Issues

unable to run the code

i created the environment as per the requirements. But still i am getting an error

$ python2 mlmodel.py 1
Traceback (most recent call last):
File "mlmodel.py", line 9, in
from . import Model
ValueError: Attempted relative import in non-package

Error

TypeError: pad_width must be of integral type
getting this error when i run the code

Problem with the code

[3 1 1 0 1 2 1 3 3 0 1 2 2 1 1 1 1 3 2 0 3 0 1 1 1 1 1 2 1 3 1 0 1 3 0 0 3
3 1 0 1 0 1 2 0 3 3 3 0 0 3 1 1 1 0 1 0 3 3 0 3 3 3 1 2 0 0 0]
(0, 0, 0, 0, 1, 0, 0, 3, 3, 0, 2, 0, 2, 0, 0, 0, 2, 3, 0, 0, 3, 0, 0, 0, 1, 0, 0, 0, 3, 3, 0, 0, 0, 3, 0, 0, 3, 3, 1, 0, 0, 0, 1, 0, 0, 0, 3, 3, 0, 0, 0, 1, 2, 1, 0, 1, 0, 3, 0, 0, 3, 0, 3, 0, 0, 0, 0, 0)
Accuracy:0.574

Confusion matrix: [[18 0 0 0]
[14 7 3 1]
[ 6 0 1 0]
[ 5 0 0 13]]
Traceback (most recent call last):
File "emotionRecognition_cnn.py", line 63, in
cnn_example()
File "emotionRecognition_cnn.py", line 33, in cnn_example
predicted = cnn.predict_one(feature)
File "/home/diego/Desktop/progetto/cnn_emotionrecognition/dnn.py", line 98, in predict_one
return np.argmax(self.model.predict(np.array([sample])))
File "/home/diego/.local/lib/python3.7/site-packages/keras/engine/training.py", line 1149, in predict
x, _, _ = self._standardize_user_data(x)
File "/home/diego/.local/lib/python3.7/site-packages/keras/engine/training.py", line 751, in _standardize_user_data
exception_prefix='input')
File "/home/diego/.local/lib/python3.7/site-packages/keras/engine/training_utils.py", line 128, in standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (1, 198, 39)

The code used is:
python3 cnn_examples.py

The envinronment is created via "pip3 install -r requirements.txt" as you suggested

I don't know what this problem is, can you help me?
Thanks

怎麼调用models/已给模型?

博主,运行完examples/下的文件,并没有保存训练的模型,想知道怎么您给的models/下的模型给出调用方法了吗?可以直接调用吗?

Can you tell me what GPU you have used?

Hello from South Korea! I'm a grad school student who wants to use your code to train a new dataset.
The thing is, I want to know what and how many GPUs you have used for parallel processing, or the cloud computing server you have used to train your model with your best weight.

That's all! Thank you so much for reading :)

regarding accuray

recently i asked you regarding accuracy .
#3
i did multiclassification not binary classification
Is keras calculates accuracy by sample wise or labels wise ?
just consider 2 samples only.
suppose y_pred=[ [0.35, 0.2, 0.2, 0.25], [0.33, 0.22, 0.18, 0.27] ]
and my t_test= [ [1, 0, 0, 0] , [0, 0, 0, 1] ]
scikit accuracy is 50% because in scikit all labels must be same
what about keras .how it actually calculates.
can you please explain me with exmples. it will help to me a lot

accuracy calculated from model.evaluate is not same as from model.pred using scikit

actually i run the code below code .

from sklearn.metrics import accuracy_score
import numpy as np
import sys
from keras import Sequential
from keras.layers import LSTM, Dense, Dropout, Conv2D, Flatten, \
    BatchNormalization, Activation, MaxPooling2D
from keras.utils import np_utils
from keras.layers import Bidirectional
from tqdm import tqdm

from utilities import get_data, class_labels
import pickle
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"

models = ["CNN", "LSTM"]


def get_model(model_name, input_shape):
    """
    Generate the required model and return it
    :return: Model created
    """
    # Models are inspired from
    # CNN - https://yashk2810.github.io/Applying-Convolutional-Neural-Network-on-the-MNIST-dataset/
    # LSTM - https://github.com/harry-7/Deep-Sentiment-Analysis/blob/master/code/generatePureLSTM.py
    model = Sequential()
    if model_name == 'CNN':
        model.add(Conv2D(8, (13, 13),
                         input_shape=(input_shape[0], input_shape[1], 1)))
        model.add(BatchNormalization(axis=-1))
        model.add(Activation('relu'))
        model.add(Conv2D(8, (13, 13)))
        model.add(BatchNormalization(axis=-1))
        model.add(Activation('relu'))
        model.add(MaxPooling2D(pool_size=(2, 1)))
        model.add(Conv2D(8, (13, 13)))
        model.add(BatchNormalization(axis=-1))
        model.add(Activation('relu'))
        model.add(Conv2D(8, (2, 2)))
        model.add(BatchNormalization(axis=-1))
        model.add(Activation('relu'))
        model.add(MaxPooling2D(pool_size=(2, 1)))
        model.add(Flatten())
        model.add(Dense(64))
        model.add(BatchNormalization())
        model.add(Activation('relu'))
        model.add(Dropout(0.2))
    elif model_name == 'LSTM':
        model.add(Bidirectional(LSTM(128), input_shape=(input_shape[0], input_shape[1])))
        model.add(Dropout(0.5))
        model.add(Dense(32, activation='relu'))
        model.add(Dense(16, activation='tanh'))
    model.add(Dense(len(class_labels), activation='softmax'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    print(model.summary())
    return model


def evaluateModel(model):
    """
    Train the model and evaluate it
    :param model: model to be evaluted
    """
    # Train the epochs
    best_acc = 0
    global x_train, y_train, x_test, y_test
    for i in tqdm(range(50)):
        # Shuffle the data for each epoch in unison inspired from https://stackoverflow.com/a/4602224
        p = np.random.permutation(len(x_train))
        x_train = x_train[p]
        y_train = y_train[p]
        model.fit(x_train, y_train, batch_size=32, epochs=1)
        loss, acc = model.evaluate(x_test, y_test)
        if acc > best_acc:
            print ('Updated best accuracy', acc)
            best_acc = acc
            model.save_weights(best_model_path)
    model.load_weights(best_model_path)
    print ('keras_Accuracy = ', model.evaluate(x_test, y_test)[1])
    y_pred=model.predict(x_test)
    return y_pred


if __name__ == "__main__":

    if len(sys.argv) != 2:
        sys.stderr.write('Invalid arguments\n')
        sys.stderr.write('Usage python2 train_DNN.py <model_number>\n')
        sys.stderr.write('1 - CNN\n')
        sys.stderr.write('2 - LSTM\n')
        sys.exit(-1)

    n = int(sys.argv[1]) - 1
    print ('model given', models[n])

    # Read data
    global x_train, y_train, x_test, y_test
    x_train, x_test, y_train, y_test = get_data(flatten=False)
    y_train = np_utils.to_categorical(y_train)
    y_test = np_utils.to_categorical(y_test)

    if n == 0:
        # Model is CNN so have to reshape the data
        in_shape = x_train[0].shape
        x_train = x_train.reshape(x_train.shape[0], in_shape[0], in_shape[1], 1)
        x_test = x_test.reshape(x_test.shape[0], in_shape[0], in_shape[1], 1)
    elif n > len(models):
        sys.stderr.write('Model Not Implemented yet')
        sys.exit(-1)

    model = get_model(models[n], x_train[0].shape)

    global best_model_path
    best_model_path = '../model/best_model_' + models[n - 1] + '.h5'

    y_pred=evaluateModel(model)
    t=[]
    for i in y_test:
        t.append(np.argmax(i))
    p=[]
    for i in y_pred:
        p.append(np.argmax(i))
    scikit_accuracy=accuracy_score(t,p)*100
    print(scikit_accuracy)

I got 92 as keras_accyracy from model.evalute , but i got 67 as scikit accuracy . could you anyone solve my problem please......@

best score

can you share your best score for CNN and LSTM.

unknown data

FileNotFoundError Traceback (most recent call last)
in
22
23 if name == "main":
---> 24 ml_example()

in ml_example()
17 filename = './dataset/Neutral/srg1.wav'
18 print('prediction', model.predict_one(
---> 19 get_feature_vector_from_mfcc(filename, flatten=to_flatten)),
20 'Actual 3')
21

~\SERproject\Code\speechemotionrecognition\utilities.py in get_feature_vector_from_mfcc(file_path, flatten, mfcc_len)
31 numpy.ndarray: feature vector of the wav file made from mfcc.
32 """
---> 33 fs, signal = wav.read(file_path)
34 s_len = len(signal)
35 # pad the signals to have same size if lesser than required

c:\users\srg\appdata\local\programs\python\python39\lib\site-packages\scipy\io\wavfile.py in read(filename, mmap)
637 mmap = False
638 else:
--> 639 fid = open(filename, 'rb')
640
641 try:

FileNotFoundError: [Errno 2] No such file or directory: './dataset/Neutral/srg1.wav'


why this project does not recognising unknown speech data? please help me.... i want to give my own data.... not emodb dataset....

ImportError: cannot import name 'Model'

When running dnn.py
I'm getting an error as follows:
Using TensorFlow backend.
Traceback (most recent call last):
File "dnn.py", line 11, in
from . import Model
ImportError: cannot import name 'Model'

Unable to understand the the concept of padding, in utilities.py file?

I know the length of voice signals vary from file to file or you can say that there may b some outliers in the data set. But padding adds zeros in the data. So why we are aimed to equalize the length of the audio signal with zeros? If we are adding zeros in the data, will it distort original data, if yes then we are padding the data?

My second question is that how voice data is normalized, did you normalize the data in current project?

Issue facing in lstm_example

Dear sir,

I am getting error while executing the lstm_example.
could you please check once and please let me know if any modifications has to be done.

Thank you.
Screenshot from 2019-03-31 16-50-56

Make the package installable.

Currently one can download and use the package but there is no way to install the package. Create a setup.py and make it installable.

Need a help.

I didnt understand your code.. I have a doubt..
Can i provide my own dataset and get the output if the audio is angry,happy,sad or neutral?

How to get the best models?

Is there a part of the code to get the best models in h5 format? I'm navigating through the package and did not find any. Thanks for the help.

Create Docs

Most of the code has incode documentation. But there are no standard docs for the code and the logic behind it. Add the necessary documentation

Some issues during training

Epoch 1/1
271/271 [==============================] - 3s 12ms/step - loss: 0.0187 - acc: 0.9963
68/68 [==============================] - 0s 2ms/step
[3 1 1 0 1 2 1 3 3 0 1 2 2 1 1 1 1 3 2 0 3 0 1 1 1 1 1 2 1 3 1 0 1 3 0 0 3
3 1 0 1 0 1 2 0 3 3 3 0 0 3 1 1 1 0 1 0 3 3 0 3 3 3 1 2 0 0 0]
(3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
Accuracy:0.412

Confusion matrix: [[ 0 18 0 0]
[ 0 25 0 0]
[ 0 7 0 0]
[ 0 15 0 3]]
Traceback (most recent call last):
File "cnn_example.py", line 33, in
cnn_example()
File "cnn_example.py", line 27, in cnn_example
get_feature_vector_from_mfcc(filename, flatten=to_flatten)),
File "/home/sww/workspace/speech-emotion-recognition/speechemotionrecognition/dnn.py", line 97, in predict_one
return np.argmax(self.model.predict(np.array([sample])))
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1149, in predict
x, _, _ = self._standardize_user_data(x)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 751, in _standardize_user_data
exception_prefix='input')
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_utils.py", line 128, in standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (1, 198, 39)
root@2d14eba3fb00:/home/sww/workspace/speech-emotion-recognition# Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (1, 198, 39)

The training code is:
'''
#!/usr/bin/env bash
set -x
export PYTHONPATH=/path_to/speech-emotion-recognition/:$PYTHONPATH
export PATH=/path_to/speech-emotion-recognition/:$PATH
cd examples
python cnn_example.py
'''

There seems to be some problems with the input.
Could you pls help me with this?
Many Thanks.

Cannot import 'Model'

I ran dnn.py
but I have an error.

Cannot import name 'Model'

(My tensorflow-gpu version is 2.0.0)

How do I solve this problem...?

Predicting the emotion using saved DL model - Error: str attribute has no object ndim

The python file dl_example ran correctly and we got an accuracy of 0.98 with the training set. We then tried to predict the emotion using predict() function by giving a wave file as an argument. We preloaded the model best_model_LSTM.h5 from the models' folder.

Here is the screenshot of the code of the file dl_example.py
dl (1)

The error and stack trace is attached.
o1
o2
o3

Please look into the issue as soon as you can Harry. Thanks again :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.