Git Product home page Git Product logo

Comments (7)

jymsuper avatar jymsuper commented on June 15, 2024 1

You have to change the function read_MFB according to your situation.
From line 12 to line 16, we load feature (it is assumed the feature is saved using pickle) and label.
Feature size should be (n_frames, dim) as written in the comment.
Label should be the speaker identity in string.

You can remove from line 20 to 24 because it is assumed that the front and back of the utterance is silence.

from speakerrecognition_tutorial.

jymsuper avatar jymsuper commented on June 15, 2024

Yes, it is possible. Actually, uploaded files for enrollment and test are all not in the training dataset.
Uploaded wav files are all clean data, so the performance is quite good.
If you want to test performance in more challenging conditions (more noisy or shorter utterance,...), you have to increase the amount of training data and model size. More advanced loss function or pooling method (attentive pooling...) also can be used.

from speakerrecognition_tutorial.

ooobsidian avatar ooobsidian commented on June 15, 2024

Thank you for your reply. I don't know how many speakers can ResNet-18 distinguish. Shall I change to a larger model? My training data has 855 speakers, so what do you suggest?

from speakerrecognition_tutorial.

jymsuper avatar jymsuper commented on June 15, 2024

I think ResNet-34 is good for your condition. You can also make the model wider (increase the number of channels). The best way is to perform experiments with all of them If it is possible.
In configure.py, NUM_WIN_SIZE (number of input frames) is set to 100. Increase this 200 or 300.
As the training set in this tutorial is very small, I set all the settings according to the small dataset.

from speakerrecognition_tutorial.

ooobsidian avatar ooobsidian commented on June 15, 2024

Thank you very much for your help!!

from speakerrecognition_tutorial.

ooobsidian avatar ooobsidian commented on June 15, 2024

Hi @jymsuper , I use .npy as feature file, and I change line12 in SR_Dataset.py follow #3 , but I have some troubles when I run train.py.

Traceback (most recent call last):
  File "train.py", line 328, in <module>
    main()
  File "train.py", line 135, in main
    epoch, n_classes)
  File "train.py", line 175, in train
    for batch_idx, (data) in enumerate(train_loader):
  File "/root/miniconda3/envs/3.6.7/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/root/miniconda3/envs/3.6.7/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/root/miniconda3/envs/3.6.7/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/root/miniconda3/envs/3.6.7/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/data/source/speaker_recognition_pytorch/SR_Dataset.py", line 221, in __getitem__
    feature, label = self.loader(feat_path)
  File "/data/source/speaker_recognition_pytorch/SR_Dataset.py", line 16, in read_MFB
    feature = feat_and_label['feat']  # size : (n_frames, dim=40)
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

It occured at

And I sent you an email, please check it, thanks.

from speakerrecognition_tutorial.

ooobsidian avatar ooobsidian commented on June 15, 2024

@jymsuper I have changed above problems, I changed feature method in the process of serialization. But now, in train.py

transform = transforms.Compose([
        TruncatedInputfromMFB(),  # numpy array:(LICENSE, n_frames, n_dims)
        ToTensorInput()  # torch tensor:(LICENSE, n_dims, n_frames)
    ])

An error has occurred in method ToTensorInput() :

 File "/Users/obsidian/source/voiceprint_pytorch/SR_Dataset.py", line 127, in __call__
    (0, 2, 1))).float()  # output type => torch.FloatTensor, fast
ValueError: axes don't match array

Could you help me solve this problem? I have debug long time ☹️

from speakerrecognition_tutorial.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.