Git Product home page Git Product logo

speakerrecognition_tutorial's People

Contributors

jymsuper avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speakerrecognition_tutorial's Issues

1,2달 전에 질문 올렸던 프로젝트 진행자입니다.

image
image
image
현재 저는 화자 등록과 인식 기능을 분리하여 실시간으로 작동되는 화자인식 프로그램을 만들었는데, 조용하고 좋은 마이크로 했을때는 인식률이 나쁘지 않은데 예기치 않은 변수가 존재하며 인식률에 의문이 생기기 시작했습니다. 그래서 여쭤보고 싶은게 cnn기반인 Resnet모델을 바꿔보면 더 나은 결과를 볼수 있지 않을까 하는생각입니다. 그래서 현재는 'Resnet18'인 기본 모델로 작동되는것 같은데 혹시 여기 사진에 올려져 있는것처럼 34,50,102,152 이렇게 있는걸 확인했는데 어떻게 바꾸는지 혹시 알수 있을까요?

Error

SIr i am unable to convert single wav to p file, can you do it please sir

how to calculate EER in this code?

Hi @jymsuper ,

Thanks for sharing this excellent codes.

I have go through the identification.py and verification.py files for calculate the perfermanes after the enroll process.
Could you give me some ideas about how to calculate the EER ?

Many thanks

performance

@jymsuper I want to know it can be verified (not be identified) on the open set? That is to say, the test speakers not in training dataset. If possible, I want to know performance.

Error

When new speaker is coming, test the speaker verification , output is wrong.

ValueError: threshold must be non-NAN

python3 SpeakerRecognition_tutorial/identification.py

Traceback (most recent call last):
File "SpeakerRecognition_tutorial/identification.py", line 10, in
from DB_wav_reader import read_feats_structure
File "/content/SpeakerRecognition_tutorial/DB_wav_reader.py", line 11, in
np.set_printoptions(threshold=np.nan)
File "/usr/local/lib/python3.6/dist-packages/numpy/core/arrayprint.py", line 259, in set_printoptions
floatmode, legacy)
File "/usr/local/lib/python3.6/dist-packages/numpy/core/arrayprint.py", line 95, in _make_options_dict
raise ValueError("threshold must be non-NAN, try "
ValueError: threshold must be non-NAN,

TruncatedInputfromMFB

sorry,my english is bad

class TruncatedInputfromMFB(object):
    """
    input size : (n_frames, dim=40)
    output size : (1, n_win=40, dim=40) => one context window is chosen randomly
    """
    def __init__(self, input_per_file=1):
        super(TruncatedInputfromMFB, self).__init__()
        self.input_per_file = input_per_file
    
    def __call__(self, frames_features):
        network_inputs = []
        num_frames = len(frames_features)
        
        win_size = c.NUM_WIN_SIZE
        half_win_size = int(win_size/2)
        #if num_frames - half_win_size < half_win_size:
        while num_frames - half_win_size <= half_win_size:
            frames_features = np.append(frames_features, frames_features[:num_frames,:], axis=0)
            num_frames =  len(frames_features)
            
        for i in range(self.input_per_file):
            j = random.randrange(half_win_size, num_frames - half_win_size)
            if not j:
                frames_slice = np.zeros(num_frames, c.FILTER_BANK, 'float64')
                frames_slice[0:(frames_features.shape)[0]] = frames_features.shape
            else:
                frames_slice = frames_features[j - half_win_size:j + half_win_size]
            network_inputs.append(frames_slice)
        return np.array(network_inputs)

frames_slice = np.zeros(num_frames, c.FILTER_BANK, 'float64')Is the code wrong?
is
frames_slice = np.zeros((num_frames, c.FILTER_BANK), 'float64')

run train.py error

Before training, I modified SR_Dataset.py line 206 train_DB = read_DB_structure(c.TRAIN_WAV_DIR) , and I delete line 20 in DB_wav_reader.py follow issue #2, but when I run train.py, an error has occurred.

Traceback (most recent call last):
  File "train.py", line 328, in <module>
    main()
  File "train.py", line 92, in main
    train_dataset, valid_dataset, n_classes = load_dataset(val_ratio)
  File "train.py", line 22, in load_dataset
    train_DB, valid_DB = split_train_dev(c.TRAIN_WAV_DIR, val_ratio)
  File "train.py", line 65, in split_train_dev
    (train_len / total_len) * 100))
ZeroDivisionError: division by zero

I don't know how to fix it. Can you give me some ways to prepare the dataset? I use another dataset.

Thank you. @jymsuper

(EPOCH_DEPRECATION_WARNING, UserWarning)오류

안녕하세요 화자인식에 관심이 있어 이것을 이용하여 프로젝트를 진행중입니다.
제가 거의 이쪽에 관한 지식은 전무한상태에서 시작하려다보니 에러메시지가 떠도 무슨의미인지 잘 모르겟어서 여쭤봅니다. 적힌대로 train->enroll->identification->verification순으로 실행을 해봤는데 최종 결과에서 음성이 일치하는지 비교할때 화자의 이름이 아니고 test라고 뜨더라고요 원인이 궁금합니다.
그리고 train.py에서 epoch1 진행후에
( The epoch parameter in scheduler.step() was not necessary and is being deprecated where possible. Please use scheduler.step() to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.)이런 에러메시지가 뜨는데 원인을 잘모르겟어서 여쭤봅니다

Features Computation

Hello,
thanks for this great tutorial!
I'm not able to reproduce the feature extraction step, can you please point me to the right direction?

Now I'm using logfbanks from python_speech_features library, with sr=16000, n_filters=40.

Many thanks!

Resnet모델관련 질문 드립니다.

image
image
윗사진은 화자인식의 resnet.py의 resnet모델 코드인데 실제 resnet34와 그 숫자를 비교해봤을때
resnet vs 화자인식의 resnet모델
64 16
128 32
256 64
512 128
이렇게 차이가 나는걸 확인할수 있는데 혹시 이렇게 짜신 이유가 용량이 너무 커서 그러신건가요?
원래 Resnet 모델로 바꿔서 돌려보니깐 GPU가 부족하다고 하긴하네요. 이렇게 바꾸신 특별한 이유가 있으면 궁금합니다.
감사합니다.

train with own dataset

i got this when training with my own .p files


Training set 21600 utts (90.0%)
Validation set 2400 utts (10.0%)
Total 24000 utts

Number of classes (speakers):
240

<torch.utils.data.dataloader.DataLoader object at 0x00000218F4F8A710>
Train Epoch:   1 [       0/   21600 (  0%)]	Time 3.002 (3.002)	Loss 5.5635	Acc 0.0000
Train Epoch:   1 [    5376/   21600 ( 25%)]	Time 0.095 (0.113)	Loss 5.2005	Acc 1.3910
Train Epoch:   1 [   10752/   21600 ( 50%)]	Time 0.032 (0.098)	Loss 4.7337	Acc 2.0177
Traceback (most recent call last):
  File "D:\Python\train.py", line 290, in <module>
    main()
  File "D:\Python\train.py", line 120, in main
    train_loss = train(train_loader, model, criterion, optimizer, use_cuda, epoch, n_classes)
  File "D:\Python\train.py", line 157, in train
    for batch_idx, (data) in enumerate(train_loader):
  File "C:\Users\TA\anaconda3\envs\Python\lib\site-packages\torch\utils\data\dataloader.py", line 652, in __next__
    data = self._next_data()
  File "C:\Users\TA\anaconda3\envs\Python\lib\site-packages\torch\utils\data\dataloader.py", line 692, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "C:\Users\TA\anaconda3\envs\Python\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\TA\anaconda3\envs\Python\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "D:\Python\SR_Dataset.py", line 195, in __getitem__
    label = self.spk_to_idx[label]
KeyError: 'wav'

Process finished with exit code 1

Help me pls

train.py 오류

image

ZeroDivisionError: division by zero
어떻게 해결해야 할까요?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.