pietz / language-recognition Goto Github PK

CNN to classify samples of voice recordings into the language that was spoken

Jupyter Notebook 100.00%

language-recognition's Introduction

Spoken Language Recognition

This notebook trains a convolutional neural network to classify audio files of voice recordings into the languages that were spoken. The dataset I used contained 66.000 files across 176 languages. I found it on TopCoder (https://goo.gl/G5XBJl). I liked the idea behind this problem, because it's very hard for humans to do. It's intersting to see that CNNs perform well on problems where intuition doesn't get you anywhere.

I included a saved version of my pretrained model, which evaluates to an accuracy of 98,79%. Further notes on development can be seen in the Jupyter Notebook.

language-recognition's People

Contributors

Stargazers

Watchers

language-recognition's Issues

versions

Hey,

What was the Python version used?

Having issues with librosa, any chance I could find out the versions for the modules aswel?

Mapping Prediction index to Labels

Hi @pietz

I was able to run the model and get the prediction out of it but I could not map the prediction index to the label?
Could you kindly upload the labels file so I can map the prediction index to the correct label?

Typos in the README.md file

The punctuations and decimals have been used at irrelevant places. Please look into it.

Non random data

Hey, so I am trying to make the system work with very specific data (non randomised train,valid,test data), No matter what, the accuracy never goes above 60%. With your randomised code, same dataset, it goes up to 90% (in accuracy measurement), but I need to add specific datasets for train/valid/test.

Any chance you could help somehow?

Newly created list looks something like this: https://gyazo.com/7c6222c37cc8176f3af541ef310709d6 .

Other thing I've noticed was that the main data file y IDs are different, so instead of "dask.array.image.imread('data/jpg/*.jpg')" I do each file individually, combine it. Still gives me 60% only, accuracy. This code seems to work, outputs of *.jpg and the code bellow match (with 2 layers ordered as needed)

# Manually load all images into one dataset
def loadAllImages(filePath = ""):
    removeFile(filePath)

    d_list = pd.read_csv("data/list_full.csv")

    d = np.zeros(shape=(18000, 192, 192), dtype="uint8")
    for i, j in d_list.iterrows():
        di = dask.array.image.imread('data/jpg/' + j["filename"] + '.jpg')
        dn = di.compute()
        d[i] = dn

    ddask = da.from_array(d, chunks=(1, 192, 192))

    ddask.to_hdf5(filePath, 'data')
    print("Finished compiling data", ddask, ddask.shape, ddask.size, type(ddask))

And instead of the random picking of train/valid/test I've changed to this code, which should in theory represent it perfectly. This matches the .csv list of files

    full_idx = []
    for i in range(data_size*out_dim):
    full_idx.append(i)

    full_idx = np.array(full_idx)

    tr_idx = full_idx[:(tr_size*out_dim)]
    va_idx = full_idx[(tr_size*out_dim):(tr_size*out_dim) + (va_size*out_dim)]
    te_idx = full_idx[(tr_size*out_dim) + (va_size*out_dim):]

I would love you forever if I could get any more of your help on this one, as this is an amazing system that I would wish to get working propely.

jpgs_to_h5 is too slow

da.to_hdf5 is very very slow on my machine, have you faced the issue?

Converted image is not similar to the sample. And accuracy is under 0.6%

I converted the mp3 to img.
And I got jpg like this

But sample is this

Those are totally different.

And the acc of predict and evaluate is under 0.7% (around 1/176) If I do that with the model both I trained and pietz trained.
acc during fit is also.

How can I solve that.
thx. and sorry for my poor English

I'm a bit of a beginner, I couldn't solve the problem?

(52800, 192, 192, 1) (52800, 176)
(4576, 192, 192, 1) (4576, 176)
(8800, 192, 192, 1) (8800, 176)

Your array structure is like this. When I try with a different mp3 dataset with 2 languages

(384, 192, 192) (384, 2)
(32, 192, 192) (32, 2)
(64, 192, 192) (64, 2)

is in the form.

model.evaluate(x_te, y_te,batch_size=batch_size)
result in line

2/2 [================================] - 1s 380ms/step - loss: 0.0737 - accuracy: 0.9844
[0.07371428608894348, 0.984375]

comes out in the form. While it should test as 64/64?

Python version

What is the python version used? Trying to force it to work on 3.6, but struggling.

Missing channel dimension

After I convert the wav file to jpg, I check the shape of my array, and it's (192, 192) instead of (192, 192, 1). This makes me can't go through the model training because the data doesn't fit.
This is what I got :
ValueError: Error when checking input: expected input_2 to have 4 dimensions, but got array with shape (63000, 192, 192)
Can anyone help me to solve this? Thank you!

jpgs_to_h5

I couldn't manage to get the jpgs_to_g5 function working:
I assume, it calls like this:

da.image.imread('data/jpg/*.jpg').to_hdf5('data/data.h5','data')
AttributeError: 'module' object has no attribute 'image'

I am using python 2.7...

librosa 0.6

Cannot call lr.logamplitude()
This function does not exist in librosa 0.6
Thanks a lot!

accuracy is stable

Hi,

I am a student.I saw your project on github and ran it. I tried to make a two-category of the language but the accuracy was very low. I don't understand why the accuracy of the training process has remained the same. Can you give me some help? Thank you!

Epoch 21/30
192/192 [==============================] - 14s 72ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 22/30
192/192 [==============================] - 14s 71ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 23/30
192/192 [==============================] - 13s 67ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 24/30
192/192 [==============================] - 14s 73ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 25/30
192/192 [==============================] - 14s 75ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 26/30
192/192 [==============================] - 15s 76ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 27/30
192/192 [==============================] - 14s 74ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 28/30
192/192 [==============================] - 14s 73ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 29/30
192/192 [==============================] - 14s 71ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 30/30
192/192 [==============================] - 14s 72ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
25/25 [==============================] - 0s 8ms/step
[9.026134490966797, 0.4399999976158142]

pietz / language-recognition Goto Github PK

language-recognition's Introduction

Spoken Language Recognition

language-recognition's People

Contributors

Stargazers

Watchers

Forkers

language-recognition's Issues

Recommend Projects

Recommend Topics

Recommend Org