oscarknagg / voicemap Goto Github PK

Identifying people from small audio fragments

Python 100.00%

machine-learning speaker-identification speaker-recognition convolutional-neural-networks

voicemap's Introduction

voicemap

This repository contains code to build deep learning models to identify different speakers based on audio samples containg their voice.

The eventual aim is for this repository to become a pip-installable python package for quickly and easily performing speaker identification related tasks.

This tensorflow/Keras/python2.7 branch is discontinued. Work is continuing on the pytorch-python-3.6 branch which will become the master branch.

Instructions

Requirements

Make a new virtualenv and install requirements from requirements.txt with the following command.

pip install -r requirements.txt

This project was written in Python 2.7.12 so I cannot guarantee it works on any other version.

Data

Get training data here: http://www.openslr.org/12

train-clean-100.tar.gz
train-clean-360.tar.gz
dev-clean.tar.gz

Place the unzipped training data into the data/ folder so the file structure is as follows:

data/
    LibriSpeech/
        dev-clean/
        train-clean-100/
        train-clean-360/
        SPEAKERS.TXT

Please use the SPEAKERS.TXT supplied in the repo as I've made a few corrections to the one found at openslr.org.

Run tests

This requires the LibriSpeech data.

python -m unittest tests.tests

voicemap

This package contains re-usable code for defining network architectures, interacting with datasets and many utility functions.

experiments

This package contains experiments in the form of python scripts.

notebooks

This folder contains Jupyter notebooks used for interactive visualisation and analysis.

voicemap's People

Contributors

Stargazers

Watchers

voicemap's Issues

Should include empty folders 'logs' and 'models' in repository

Running experiments/librispeech_classifier.py for the first two times leads to errors:
No such file or directory: '/repos/voicemap/models/baseline_classifier_stochastic=True_r=1.torch'
and
No such file or directory: '/repos/voicemap/logs/baseline_classifier_stochastic=True_r=1.torch'
this way these folders should be created manually by user.

Maybe it would be better to include them in repository?

Error on the 15th cell of Embedding_Space_Visualisation.ipynb

RuntimeError Traceback (most recent call last)
in ()
3 for i in n_random_speakers:
4 ids = valid.df[valid.df['speaker_id']==i]['id'].sample(m_samples, replace=True).values
----> 5 Z = [train[i] for i in ids]
6 X_ = np.stack(zip(*Z)[0])[:, :, np.newaxis]
7 y_ = np.stack(zip(*Z)[1])[:, np.newaxis]

/Users/andrew/voicemap/voicemap/librispeech.py in getitem(self, index)
104
105 def getitem(self, index):
--> 106 instance, samplerate = sf.read(self.datasetid_to_filepath[index])
107 # Choose a random sample of the file
108 if self.stochastic:

/Users/andrew/.virtualenvs/voicemap/lib/python2.7/site-packages/soundfile.pyc in read(file, frames, start, stop, dtype, always_2d, fill_value, out, samplerate, channels, format, subtype, endian, closefd)
255 """
256 with SoundFile(file, 'r', samplerate, channels,
--> 257 subtype, endian, format, closefd) as f:
258 frames = f._prepare_read(start, stop, frames)
259 data = f.read(frames, dtype, always_2d, fill_value, out)

/Users/andrew/.virtualenvs/voicemap/lib/python2.7/site-packages/soundfile.pyc in init(self, file, mode, samplerate, channels, subtype, endian, format, closefd)
627 self._info = _create_info_struct(file, mode, samplerate, channels,
628 format, subtype, endian)
--> 629 self._file = self._open(file, mode_int, closefd)
630 if set(mode).issuperset('r+') and self.seekable():
631 # Move write position to 0 (like in Python file objects)

/Users/andrew/.virtualenvs/voicemap/lib/python2.7/site-packages/soundfile.pyc in _open(self, file, mode_int, closefd)
1182 raise TypeError("Invalid file: {0!r}".format(self.name))
1183 _error_check(_snd.sf_error(file_ptr),
-> 1184 "Error opening {0!r}: ".format(self.name))
1185 if mode_int == _snd.SFM_WRITE:
1186 # Due to a bug in libsndfile version <= 1.0.25, frames != 0

/Users/andrew/.virtualenvs/voicemap/lib/python2.7/site-packages/soundfile.pyc in _error_check(err, prefix)
1355 if err != 0:
1356 err_str = _snd.sf_error_number(err)
-> 1357 raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
1358
1359

RuntimeError: Error opening 'train-clean-100': System error.

Active?

Are you still interested in porting this to Python 3? I tried running the setup for the pytorch branch, and it appears few of the requirements have been updated. If you're interested and this project is still active, I could help with the port and packaging.

ValueError: bad marshal data (unknown type code)

I converted the jupyter notebook "Embedding_Space_Visualisation.ipynb" to the python script and I was running using the Python3 environment with Cuda-10.1 installed on my laptop.

I am getting a ValueError: bad marshal data (unknown type code)

Traceback (most recent call last):
File "Embedding_Space_Visualisation.py", line 38, in
siamese = load_model(model_path)
File "/home/user/.local/lib/python3.6/site-packages/keras/engine/saving.py", line 492, in load_wrapper
return load_function(*args, **kwargs)
File "/home/user/.local/lib/python3.6/site-packages/keras/engine/saving.py", line 584, in load_model
model = _deserialize_model(h5dict, custom_objects, compile)
File "/home/user/.local/lib/python3.6/site-packages/keras/engine/saving.py", line 274, in _deserialize_model
model = model_from_config(model_config, custom_objects=custom_objects)
File "/home/user/.local/lib/python3.6/site-packages/keras/engine/saving.py", line 627, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "/home/user/.local/lib/python3.6/site-packages/keras/layers/init.py", line 168, in deserialize
printable_module_name='layer')
File "/home/user/.local/lib/python3.6/site-packages/keras/utils/generic_utils.py", line 147, in deserialize_keras_object
list(custom_objects.items())))
File "/home/user/.local/lib/python3.6/site-packages/keras/engine/network.py", line 1056, in from_config
process_layer(layer_data)
File "/home/user/.local/lib/python3.6/site-packages/keras/engine/network.py", line 1042, in process_layer
custom_objects=custom_objects)
File "/home/user/.local/lib/python3.6/site-packages/keras/layers/init.py", line 168, in deserialize
printable_module_name='layer')
File "/home/user/.local/lib/python3.6/site-packages/keras/utils/generic_utils.py", line 147, in deserialize_keras_object
list(custom_objects.items())))
File "/home/user/.local/lib/python3.6/site-packages/keras/layers/core.py", line 764, in from_config
function = func_load(config['function'], globs=globs)
File "/home/user/.local/lib/python3.6/site-packages/keras/utils/generic_utils.py", line 237, in func_load
code = marshal.loads(raw_code)
ValueError: bad marshal data (unknown type code)

Can you please help me with the error?
Every help will be appreciated.
Thank you.

What is this paper?

What is this paper? Can you tell me the procedure of this experiment?

Improvment to 97% accuracy

Hello, i don’t know if this repo is active but if i can help, i used this repo for my project and i find a method to improve the loss / accuracy just by L2-normalizing the output of the encoder
My score with it is actually 0.97% accuracy and 0.02 val-BCEloss training for 25 epochs on a mixt of LibriSpeech and CommonVoice (fr) datasets (360 speakers in train set and 150 in validation set with 200 pairs for each speaker (100 same and 100 not same) (batch_size of size 32 (16 same and 16 not) with embedding dim 64)

Please provide a pretrained model of some sorts

It is not possible to download such a huge dataset for those with limited bandwidth. It would be great if there were a pre-trained model available.

oscarknagg / voicemap Goto Github PK

voicemap's Introduction

voicemap

Instructions

Requirements

Data

Run tests

Contents

voicemap

experiments

notebooks

voicemap's People

Contributors

Stargazers

Watchers

Forkers

voicemap's Issues

Recommend Projects

Recommend Topics

Recommend Org