Git Product home page Git Product logo

language-recognition's Introduction

Spoken Language Recognition

This notebook trains a convolutional neural network to classify audio files of voice recordings into the languages that were spoken. The dataset I used contained 66.000 files across 176 languages. I found it on TopCoder (https://goo.gl/G5XBJl). I liked the idea behind this problem, because it's very hard for humans to do. It's intersting to see that CNNs perform well on problems where intuition doesn't get you anywhere.

I included a saved version of my pretrained model, which evaluates to an accuracy of 98,79%. Further notes on development can be seen in the Jupyter Notebook.

language-recognition's People

Contributors

pietz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

language-recognition's Issues

versions

Hey,

What was the Python version used?

Having issues with librosa, any chance I could find out the versions for the modules aswel?

Mapping Prediction index to Labels

Hi @pietz

I was able to run the model and get the prediction out of it but I could not map the prediction index to the label?
Could you kindly upload the labels file so I can map the prediction index to the correct label?

Non random data

Hey, so I am trying to make the system work with very specific data (non randomised train,valid,test data), No matter what, the accuracy never goes above 60%. With your randomised code, same dataset, it goes up to 90% (in accuracy measurement), but I need to add specific datasets for train/valid/test.

Any chance you could help somehow?

Newly created list looks something like this: https://gyazo.com/7c6222c37cc8176f3af541ef310709d6 .

Other thing I've noticed was that the main data file y IDs are different, so instead of "dask.array.image.imread('data/jpg/*.jpg')" I do each file individually, combine it. Still gives me 60% only, accuracy. This code seems to work, outputs of *.jpg and the code bellow match (with 2 layers ordered as needed)

# Manually load all images into one dataset
def loadAllImages(filePath = ""):
    removeFile(filePath)

    d_list = pd.read_csv("data/list_full.csv")

    d = np.zeros(shape=(18000, 192, 192), dtype="uint8")
    for i, j in d_list.iterrows():
        di = dask.array.image.imread('data/jpg/' + j["filename"] + '.jpg')
        dn = di.compute()
        d[i] = dn

    ddask = da.from_array(d, chunks=(1, 192, 192))

    ddask.to_hdf5(filePath, 'data')
    print("Finished compiling data", ddask, ddask.shape, ddask.size, type(ddask))

And instead of the random picking of train/valid/test I've changed to this code, which should in theory represent it perfectly. This matches the .csv list of files

    full_idx = []
    for i in range(data_size*out_dim):
    full_idx.append(i)

    full_idx = np.array(full_idx)

    tr_idx = full_idx[:(tr_size*out_dim)]
    va_idx = full_idx[(tr_size*out_dim):(tr_size*out_dim) + (va_size*out_dim)]
    te_idx = full_idx[(tr_size*out_dim) + (va_size*out_dim):]

I would love you forever if I could get any more of your help on this one, as this is an amazing system that I would wish to get working propely.

Converted image is not similar to the sample. And accuracy is under 0.6%

I converted the mp3 to img.
And I got jpg like this
0a0p10uya0h mp3
But sample is this
0a0p10uya0h mp3
Those are totally different.

And the acc of predict and evaluate is under 0.7% (around 1/176) If I do that with the model both I trained and pietz trained.
acc during fit is also.

How can I solve that.
thx. and sorry for my poor English

I'm a bit of a beginner, I couldn't solve the problem?

(52800, 192, 192, 1) (52800, 176)
(4576, 192, 192, 1) (4576, 176)
(8800, 192, 192, 1) (8800, 176)

Your array structure is like this. When I try with a different mp3 dataset with 2 languages

(384, 192, 192) (384, 2)
(32, 192, 192) (32, 2)
(64, 192, 192) (64, 2)

is in the form.

model.evaluate(x_te, y_te,batch_size=batch_size)
result in line

2/2 [================================] - 1s 380ms/step - loss: 0.0737 - accuracy: 0.9844
[0.07371428608894348, 0.984375]

comes out in the form. While it should test as 64/64?

Python version

What is the python version used? Trying to force it to work on 3.6, but struggling.

Missing channel dimension

After I convert the wav file to jpg, I check the shape of my array, and it's (192, 192) instead of (192, 192, 1). This makes me can't go through the model training because the data doesn't fit.
This is what I got :
ValueError: Error when checking input: expected input_2 to have 4 dimensions, but got array with shape (63000, 192, 192)
Can anyone help me to solve this? Thank you!

jpgs_to_h5

I couldn't manage to get the jpgs_to_g5 function working:
I assume, it calls like this:

da.image.imread('data/jpg/*.jpg').to_hdf5('data/data.h5','data')
AttributeError: 'module' object has no attribute 'image'

I am using python 2.7...

librosa 0.6

Cannot call lr.logamplitude()
This function does not exist in librosa 0.6
Thanks a lot!

accuracy is stable

Hi,

I am a student.I saw your project on github and ran it. I tried to make a two-category of the language but the accuracy was very low. I don't understand why the accuracy of the training process has remained the same. Can you give me some help? Thank you!

Epoch 21/30
192/192 [==============================] - 14s 72ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 22/30
192/192 [==============================] - 14s 71ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 23/30
192/192 [==============================] - 13s 67ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 24/30
192/192 [==============================] - 14s 73ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 25/30
192/192 [==============================] - 14s 75ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 26/30
192/192 [==============================] - 15s 76ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 27/30
192/192 [==============================] - 14s 74ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 28/30
192/192 [==============================] - 14s 73ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 29/30
192/192 [==============================] - 14s 71ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
Epoch 30/30
192/192 [==============================] - 14s 72ms/step - loss: 8.6467 - acc: 0.4635 - val_loss: 6.0443 - val_acc: 0.6250
25/25 [==============================] - 0s 8ms/step
[9.026134490966797, 0.4399999976158142]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.