<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

performance about speakerrecognition_tutorial HOT 7 CLOSED

jymsuper commented on June 15, 2024 1

performance

from speakerrecognition_tutorial.

Comments (7)

jymsuper commented on June 15, 2024 1

You have to change the function read_MFB according to your situation.
From line 12 to line 16, we load feature (it is assumed the feature is saved using pickle) and label.
Feature size should be (n_frames, dim) as written in the comment.
Label should be the speaker identity in string.

You can remove from line 20 to 24 because it is assumed that the front and back of the utterance is silence.

from speakerrecognition_tutorial.

jymsuper commented on June 15, 2024

Yes, it is possible. Actually, uploaded files for enrollment and test are all not in the training dataset.
Uploaded wav files are all clean data, so the performance is quite good.
If you want to test performance in more challenging conditions (more noisy or shorter utterance,...), you have to increase the amount of training data and model size. More advanced loss function or pooling method (attentive pooling...) also can be used.

from speakerrecognition_tutorial.

ooobsidian commented on June 15, 2024

Thank you for your reply. I don't know how many speakers can ResNet-18 distinguish. Shall I change to a larger model? My training data has 855 speakers, so what do you suggest？

from speakerrecognition_tutorial.

jymsuper commented on June 15, 2024

I think ResNet-34 is good for your condition. You can also make the model wider (increase the number of channels). The best way is to perform experiments with all of them If it is possible.
In configure.py, NUM_WIN_SIZE (number of input frames) is set to 100. Increase this 200 or 300.
As the training set in this tutorial is very small, I set all the settings according to the small dataset.

from speakerrecognition_tutorial.

ooobsidian commented on June 15, 2024

Thank you very much for your help!!

from speakerrecognition_tutorial.

ooobsidian commented on June 15, 2024

Hi @jymsuper , I use .npy as feature file, and I change line12 in SR_Dataset.py follow #3 , but I have some troubles when I run train.py.

Traceback (most recent call last):
  File "train.py", line 328, in <module>
    main()
  File "train.py", line 135, in main
    epoch, n_classes)
  File "train.py", line 175, in train
    for batch_idx, (data) in enumerate(train_loader):
  File "/root/miniconda3/envs/3.6.7/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/root/miniconda3/envs/3.6.7/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/root/miniconda3/envs/3.6.7/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/root/miniconda3/envs/3.6.7/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/data/source/speaker_recognition_pytorch/SR_Dataset.py", line 221, in __getitem__
    feature, label = self.loader(feat_path)
  File "/data/source/speaker_recognition_pytorch/SR_Dataset.py", line 16, in read_MFB
    feature = feat_and_label['feat']  # size : (n_frames, dim=40)
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

It occured at

And I sent you an email, please check it, thanks.

from speakerrecognition_tutorial.

ooobsidian commented on June 15, 2024

@jymsuper I have changed above problems, I changed feature method in the process of serialization. But now, in train.py

transform = transforms.Compose([
        TruncatedInputfromMFB(),  # numpy array:(LICENSE, n_frames, n_dims)
        ToTensorInput()  # torch tensor:(LICENSE, n_dims, n_frames)
    ])

An error has occurred in method ToTensorInput() :

 File "/Users/obsidian/source/voiceprint_pytorch/SR_Dataset.py", line 127, in __call__
    (0, 2, 1))).float()  # output type => torch.FloatTensor, fast
ValueError: axes don't match array

Could you help me solve this problem? I have debug long time ☹️

from speakerrecognition_tutorial.

performance about speakerrecognition_tutorial HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent