jymsuper / speakerrecognition_tutorial Goto Github PK
View Code? Open in Web Editor NEWSimple d-vector based Speaker Recognition (verification and identification) using Pytorch
License: MIT License
Simple d-vector based Speaker Recognition (verification and identification) using Pytorch
License: MIT License
SIr i am unable to convert single wav to p file, can you do it please sir
Hi @jymsuper ,
Thanks for sharing this excellent codes.
I have go through the identification.py and verification.py files for calculate the perfermanes after the enroll process.
Could you give me some ideas about how to calculate the EER ?
Many thanks
please provide a file
@jymsuper I want to know it can be verified (not be identified) on the open set? That is to say, the test speakers not in training dataset. If possible, I want to know performance.
SpeakerRecognition_tutorial/enroll.py
Line 73 in 6dce646
however there is not averagin operation and only agregates embeddings
SpeakerRecognition_tutorial/enroll.py
Line 90 in 6dce646
is that true?
When new speaker is coming, test the speaker verification , output is wrong.
python3 SpeakerRecognition_tutorial/identification.py
Traceback (most recent call last):
File "SpeakerRecognition_tutorial/identification.py", line 10, in
from DB_wav_reader import read_feats_structure
File "/content/SpeakerRecognition_tutorial/DB_wav_reader.py", line 11, in
np.set_printoptions(threshold=np.nan)
File "/usr/local/lib/python3.6/dist-packages/numpy/core/arrayprint.py", line 259, in set_printoptions
floatmode, legacy)
File "/usr/local/lib/python3.6/dist-packages/numpy/core/arrayprint.py", line 95, in _make_options_dict
raise ValueError("threshold must be non-NAN, try "
ValueError: threshold must be non-NAN,
Hello sir, how to remove denoising noise for feature extraction.
sorry,my english is bad
class TruncatedInputfromMFB(object):
"""
input size : (n_frames, dim=40)
output size : (1, n_win=40, dim=40) => one context window is chosen randomly
"""
def __init__(self, input_per_file=1):
super(TruncatedInputfromMFB, self).__init__()
self.input_per_file = input_per_file
def __call__(self, frames_features):
network_inputs = []
num_frames = len(frames_features)
win_size = c.NUM_WIN_SIZE
half_win_size = int(win_size/2)
#if num_frames - half_win_size < half_win_size:
while num_frames - half_win_size <= half_win_size:
frames_features = np.append(frames_features, frames_features[:num_frames,:], axis=0)
num_frames = len(frames_features)
for i in range(self.input_per_file):
j = random.randrange(half_win_size, num_frames - half_win_size)
if not j:
frames_slice = np.zeros(num_frames, c.FILTER_BANK, 'float64')
frames_slice[0:(frames_features.shape)[0]] = frames_features.shape
else:
frames_slice = frames_features[j - half_win_size:j + half_win_size]
network_inputs.append(frames_slice)
return np.array(network_inputs)
frames_slice = np.zeros(num_frames, c.FILTER_BANK, 'float64')
Is the code wrong?
is
frames_slice = np.zeros((num_frames, c.FILTER_BANK), 'float64')
아래 코드의 TRAIN_WAV_DIR 와 DEV_WAV_DIR 부분은 무엇을 의미하고 있는건가요?
TRAIN_WAV_DIR = '/home/admin/Desktop/read_25h_2/train'
DEV_WAV_DIR = '/home/admin/Desktop/read_25h_2/dev'
Before training, I modified SR_Dataset.py
line 206 train_DB = read_DB_structure(c.TRAIN_WAV_DIR)
, and I delete line 20 in DB_wav_reader.py
follow issue #2, but when I run train.py, an error has occurred.
Traceback (most recent call last):
File "train.py", line 328, in <module>
main()
File "train.py", line 92, in main
train_dataset, valid_dataset, n_classes = load_dataset(val_ratio)
File "train.py", line 22, in load_dataset
train_DB, valid_DB = split_train_dev(c.TRAIN_WAV_DIR, val_ratio)
File "train.py", line 65, in split_train_dev
(train_len / total_len) * 100))
ZeroDivisionError: division by zero
I don't know how to fix it. Can you give me some ways to prepare the dataset? I use another dataset.
Thank you. @jymsuper
안녕하세요 화자인식에 관심이 있어 이것을 이용하여 프로젝트를 진행중입니다.
제가 거의 이쪽에 관한 지식은 전무한상태에서 시작하려다보니 에러메시지가 떠도 무슨의미인지 잘 모르겟어서 여쭤봅니다. 적힌대로 train->enroll->identification->verification순으로 실행을 해봤는데 최종 결과에서 음성이 일치하는지 비교할때 화자의 이름이 아니고 test라고 뜨더라고요 원인이 궁금합니다.
그리고 train.py에서 epoch1 진행후에
( The epoch parameter in scheduler.step()
was not necessary and is being deprecated where possible. Please use scheduler.step()
to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.)이런 에러메시지가 뜨는데 원인을 잘모르겟어서 여쭤봅니다
What modifications do I need to do in order to add new speaker using enroll.py
file. @jymsuper
Hello,
thanks for this great tutorial!
I'm not able to reproduce the feature extraction step, can you please point me to the right direction?
Now I'm using logfbanks from python_speech_features library, with sr=16000, n_filters=40.
Many thanks!
i got this when training with my own .p files
Training set 21600 utts (90.0%)
Validation set 2400 utts (10.0%)
Total 24000 utts
Number of classes (speakers):
240
<torch.utils.data.dataloader.DataLoader object at 0x00000218F4F8A710>
Train Epoch: 1 [ 0/ 21600 ( 0%)] Time 3.002 (3.002) Loss 5.5635 Acc 0.0000
Train Epoch: 1 [ 5376/ 21600 ( 25%)] Time 0.095 (0.113) Loss 5.2005 Acc 1.3910
Train Epoch: 1 [ 10752/ 21600 ( 50%)] Time 0.032 (0.098) Loss 4.7337 Acc 2.0177
Traceback (most recent call last):
File "D:\Python\train.py", line 290, in <module>
main()
File "D:\Python\train.py", line 120, in main
train_loss = train(train_loader, model, criterion, optimizer, use_cuda, epoch, n_classes)
File "D:\Python\train.py", line 157, in train
for batch_idx, (data) in enumerate(train_loader):
File "C:\Users\TA\anaconda3\envs\Python\lib\site-packages\torch\utils\data\dataloader.py", line 652, in __next__
data = self._next_data()
File "C:\Users\TA\anaconda3\envs\Python\lib\site-packages\torch\utils\data\dataloader.py", line 692, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "C:\Users\TA\anaconda3\envs\Python\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\TA\anaconda3\envs\Python\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "D:\Python\SR_Dataset.py", line 195, in __getitem__
label = self.spk_to_idx[label]
KeyError: 'wav'
Process finished with exit code 1
Help me pls
Hi, @jymsuper!
I want to extract embedding from the .wav sound. Tell me please, how can I do this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.