Git Product home page Git Product logo

voicemap's Introduction

voicemap

This repository contains code to build deep learning models to identify different speakers based on audio samples containg their voice.

The eventual aim is for this repository to become a pip-installable python package for quickly and easily performing speaker identification related tasks.

This tensorflow/Keras/python2.7 branch is discontinued. Work is continuing on the pytorch-python-3.6 branch which will become the master branch.

Instructions

Requirements

Make a new virtualenv and install requirements from requirements.txt with the following command.

pip install -r requirements.txt

This project was written in Python 2.7.12 so I cannot guarantee it works on any other version.

Data

Get training data here: http://www.openslr.org/12

  • train-clean-100.tar.gz
  • train-clean-360.tar.gz
  • dev-clean.tar.gz

Place the unzipped training data into the data/ folder so the file structure is as follows:

data/
    LibriSpeech/
        dev-clean/
        train-clean-100/
        train-clean-360/
        SPEAKERS.TXT

Please use the SPEAKERS.TXT supplied in the repo as I've made a few corrections to the one found at openslr.org.

Run tests

This requires the LibriSpeech data.

python -m unittest tests.tests

Contents

voicemap

This package contains re-usable code for defining network architectures, interacting with datasets and many utility functions.

experiments

This package contains experiments in the form of python scripts.

notebooks

This folder contains Jupyter notebooks used for interactive visualisation and analysis.

voicemap's People

Contributors

oscarknagg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

voicemap's Issues

Should include empty folders 'logs' and 'models' in repository

Running experiments/librispeech_classifier.py for the first two times leads to errors:
No such file or directory: '/repos/voicemap/models/baseline_classifier_stochastic=True_r=1.torch'
and
No such file or directory: '/repos/voicemap/logs/baseline_classifier_stochastic=True_r=1.torch'
this way these folders should be created manually by user.

Maybe it would be better to include them in repository?

Error on the 15th cell of Embedding_Space_Visualisation.ipynb


RuntimeError Traceback (most recent call last)
in ()
3 for i in n_random_speakers:
4 ids = valid.df[valid.df['speaker_id']==i]['id'].sample(m_samples, replace=True).values
----> 5 Z = [train[i] for i in ids]
6 X_ = np.stack(zip(*Z)[0])[:, :, np.newaxis]
7 y_ = np.stack(zip(*Z)[1])[:, np.newaxis]

/Users/andrew/voicemap/voicemap/librispeech.py in getitem(self, index)
104
105 def getitem(self, index):
--> 106 instance, samplerate = sf.read(self.datasetid_to_filepath[index])
107 # Choose a random sample of the file
108 if self.stochastic:

/Users/andrew/.virtualenvs/voicemap/lib/python2.7/site-packages/soundfile.pyc in read(file, frames, start, stop, dtype, always_2d, fill_value, out, samplerate, channels, format, subtype, endian, closefd)
255 """
256 with SoundFile(file, 'r', samplerate, channels,
--> 257 subtype, endian, format, closefd) as f:
258 frames = f._prepare_read(start, stop, frames)
259 data = f.read(frames, dtype, always_2d, fill_value, out)

/Users/andrew/.virtualenvs/voicemap/lib/python2.7/site-packages/soundfile.pyc in init(self, file, mode, samplerate, channels, subtype, endian, format, closefd)
627 self._info = _create_info_struct(file, mode, samplerate, channels,
628 format, subtype, endian)
--> 629 self._file = self._open(file, mode_int, closefd)
630 if set(mode).issuperset('r+') and self.seekable():
631 # Move write position to 0 (like in Python file objects)

/Users/andrew/.virtualenvs/voicemap/lib/python2.7/site-packages/soundfile.pyc in _open(self, file, mode_int, closefd)
1182 raise TypeError("Invalid file: {0!r}".format(self.name))
1183 _error_check(_snd.sf_error(file_ptr),
-> 1184 "Error opening {0!r}: ".format(self.name))
1185 if mode_int == _snd.SFM_WRITE:
1186 # Due to a bug in libsndfile version <= 1.0.25, frames != 0

/Users/andrew/.virtualenvs/voicemap/lib/python2.7/site-packages/soundfile.pyc in _error_check(err, prefix)
1355 if err != 0:
1356 err_str = _snd.sf_error_number(err)
-> 1357 raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
1358
1359

RuntimeError: Error opening 'train-clean-100': System error.

Active?

Are you still interested in porting this to Python 3? I tried running the setup for the pytorch branch, and it appears few of the requirements have been updated. If you're interested and this project is still active, I could help with the port and packaging.

ValueError: bad marshal data (unknown type code)

I converted the jupyter notebook "Embedding_Space_Visualisation.ipynb" to the python script and I was running using the Python3 environment with Cuda-10.1 installed on my laptop.

I am getting a ValueError: bad marshal data (unknown type code)

Traceback (most recent call last):
File "Embedding_Space_Visualisation.py", line 38, in
siamese = load_model(model_path)
File "/home/user/.local/lib/python3.6/site-packages/keras/engine/saving.py", line 492, in load_wrapper
return load_function(*args, **kwargs)
File "/home/user/.local/lib/python3.6/site-packages/keras/engine/saving.py", line 584, in load_model
model = _deserialize_model(h5dict, custom_objects, compile)
File "/home/user/.local/lib/python3.6/site-packages/keras/engine/saving.py", line 274, in _deserialize_model
model = model_from_config(model_config, custom_objects=custom_objects)
File "/home/user/.local/lib/python3.6/site-packages/keras/engine/saving.py", line 627, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "/home/user/.local/lib/python3.6/site-packages/keras/layers/init.py", line 168, in deserialize
printable_module_name='layer')
File "/home/user/.local/lib/python3.6/site-packages/keras/utils/generic_utils.py", line 147, in deserialize_keras_object
list(custom_objects.items())))
File "/home/user/.local/lib/python3.6/site-packages/keras/engine/network.py", line 1056, in from_config
process_layer(layer_data)
File "/home/user/.local/lib/python3.6/site-packages/keras/engine/network.py", line 1042, in process_layer
custom_objects=custom_objects)
File "/home/user/.local/lib/python3.6/site-packages/keras/layers/init.py", line 168, in deserialize
printable_module_name='layer')
File "/home/user/.local/lib/python3.6/site-packages/keras/utils/generic_utils.py", line 147, in deserialize_keras_object
list(custom_objects.items())))
File "/home/user/.local/lib/python3.6/site-packages/keras/layers/core.py", line 764, in from_config
function = func_load(config['function'], globs=globs)
File "/home/user/.local/lib/python3.6/site-packages/keras/utils/generic_utils.py", line 237, in func_load
code = marshal.loads(raw_code)
ValueError: bad marshal data (unknown type code)

Can you please help me with the error?
Every help will be appreciated.
Thank you.

Improvment to 97% accuracy

Hello, i don’t know if this repo is active but if i can help, i used this repo for my project and i find a method to improve the loss / accuracy just by L2-normalizing the output of the encoder
My score with it is actually 0.97% accuracy and 0.02 val-BCEloss training for 25 epochs on a mixt of LibriSpeech and CommonVoice (fr) datasets (360 speakers in train set and 150 in validation set with 200 pairs for each speaker (100 same and 100 not same) (batch_size of size 32 (16 same and 16 not) with embedding dim 64)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.