yistlin / dvector Goto Github PK

View Code? Open in Web Editor NEW

268.0 12.0 45.0 88 KB

Speaker embedding (d-vector) trained with GE2E loss

Python 100.00%

speaker-embedding ge2e pytorch dvector speaker-verification speaker-encoder torchscript

dvector's People

Contributors

Stargazers

Watchers

dvector's Issues

How long should the utterances be?

Hi, awesome repo! I am wondering, how should I cut my data? Is this setup good for sentence-level utterances?
In the code, it seems that the model is trained on short chunks of voiced audio, but in the viasualization script, the embeddings are extracted from whole utterances (so a single embedding vector for the whole sentence). Did you experiment with different setups? For example, did you try to extract embeddings for segments of the given utterance and average them? My data is in an ASR setup, so it is pretty much sentence based. Any advice how to further segment such data? Regards, Jan

Issue in visualise.py -> Unknown builtin op: torchaudio_sox::apply_effects_tensor.

Running visualize("preprocessed", "preprocessed/wav2mel.pt", "dvector-step5000.pt", ".")

Getting the following error while torch.jit.load

RuntimeError: 
Unknown builtin op: torchaudio_sox::apply_effects_tensor.
Could not find any similar ops to torchaudio_sox::apply_effects_tensor. This op may not exist or may not be currently supported in TorchScript.
:
  File "code/__torch__/torchaudio/sox_effects/sox_effects.py", line 5
    effects: List[List[str]],
    channels_first: bool=True) -> Tuple[Tensor, int]:
  _0, _1 = ops.torchaudio_sox.apply_effects_tensor(tensor, sample_rate, effects, channels_first)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  return (_0, _1)
'apply_effects_tensor' is being compiled since it was called from 'SoxEffects.forward'
Serialized   File "code/__torch__/data/wav2mel.py", line 33
    wav_tensor: Tensor,
    sample_rate: int) -> Tensor:
    _0 = __torch__.torchaudio.sox_effects.sox_effects.apply_effects_tensor
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    effects = self.effects
    _1 = _0(wav_tensor, sample_rate, effects, True, )

runtime error in preprocess.py: does not have a getstate method defined

I run the code in preprocess.py and it has an error:
Traceback (most recent call last): File "preprocess.py", line 92, in <module> preprocess(**vars(PARSER.parse_args())) File "preprocess.py", line 71, in preprocess for speaker_name, mel_tensor in tqdm(dataloader, ncols=0, desc="Preprocess"): File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/site-packages/tqdm/std.py", line 1178, in __iter__ for obj in iterable: File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 359, in __iter__ return self._get_iterator() File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 305, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 918, in __init__ w.start() File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/multiprocessing/context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/multiprocessing/context.py", line 283, in _Popen return Popen(process_obj) File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__ super().__init__(process_obj) File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__ self._launch(process_obj) File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) RuntimeError: Tried to serialize object __torch__.data.wav2mel.Wav2Mel which does not have a __getstate__ method defined!
does anyone know how to solve this?
my version:
python==3.8 torch==1.8.0 torchaudio==0.8.0

model loading issue

I am using pre-trained model to get the embedding but getting below error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-5-4bcf886e7e7f> in <module>
      2 import torchaudio
      3 
----> 4 wav2mel = torch.jit.load("wav2mel.pt")
      5 dvector = torch.jit.load("dvector.pt").eval()
      6 

~/anaconda3/lib/python3.8/site-packages/torch/jit/_serialization.py in load(f, map_location, _extra_files)
    159     cu = torch._C.CompilationUnit()
    160     if isinstance(f, str) or isinstance(f, pathlib.Path):
--> 161         cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
    162     else:
    163         cpp_module = torch._C.import_ir_module_from_buffer(

RuntimeError: 
Class Namespace cannot be used as a value:
Serialized   File "code/__torch__/torchaudio/sox_effects/sox_effects.py", line 5
    effects: List[List[str]],
    channels_first: bool=True) -> Tuple[Tensor, int]:
  in_signal = __torch__.torch.classes.torchaudio.TensorSignal.__new__(__torch__.torch.classes.torchaudio.TensorSignal)
                                                                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  _0 = (in_signal).__init__(tensor, sample_rate, channels_first, )
  out_signal = ops.torchaudio.sox_effects_apply_effects_tensor(in_signal, effects)
'apply_effects_tensor' is being compiled since it was called from 'SoxEffects.forward'
Serialized   File "code/__torch__/data/wav2mel.py", line 29
    wav_tensor: Tensor,
    sample_rate: int) -> Tensor:
    _0 = __torch__.torchaudio.sox_effects.sox_effects.apply_effects_tensor
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _1 = _0(wav_tensor, sample_rate, self.effects, True, )
    wav_tensor1, _2, = _1

cannot reshape tensor of 0 elements into shape [-1, 0]

When the input tensor shape is [1, 800] or [1, 320] and When I use the following code

mel_tensor = wav2mel(wav_tensor, 16000) # 16000 is the sample rate

I met with the following error:

Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/data/wav2mel.py", line 20, in forward
sample_rate: int) -> Tensor:
wav_tensor0 = (self.sox_effects).forward(wav_tensor, sample_rate, )
mel_tensor = (self.log_melspectrogram).forward(wav_tensor0, )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
return mel_tensor
class SoxEffects(Module):
File "code/torch/data/wav2mel.py", line 43, in forward
def forward(self: torch.data.wav2mel.LogMelspectrogram,
wav_tensor: Tensor) -> Tensor:
_3 = (self.melspectrogram).forward(wav_tensor, )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
mel_tensor = torch.numpy_T(torch.squeeze(_3, 0))
_4 = torch.clamp(mel_tensor, 1.0000000000000001e-09, None)
File "code/torch/torchaudio/transforms.py", line 20, in forward
def forward(self: torch.torchaudio.transforms.MelSpectrogram,
waveform: Tensor) -> Tensor:
specgram = (self.spectrogram).forward(waveform, )
~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
mel_specgram = (self.mel_scale).forward(specgram, )
return mel_specgram
File "code/torch/torchaudio/transforms.py", line 41, in forward
waveform: Tensor) -> Tensor:
_0 = torch.torchaudio.functional.functional.spectrogram
_1 = _0(waveform, 0, self.window, 400, 160, 400, 2., False, self.center, self.pad_mode, self.onesided, )
~~ <--- HERE
return _1
class MelScale(Module):
File "code/torch/torchaudio/functional/functional.py", line 18, in spectrogram
waveform0 = waveform
shape = torch.size(waveform0)
waveform2 = torch.reshape(waveform0, [-1, shape[-1]])
~~~~~~~~~~~~~ <--- HERE
spec_f = torch.torch.functional.stft(waveform2, n_fft, hop_length, win_length, window, center, pad_mode, False, onesided, True, )
_0 = torch.slice(shape, 0, -1, 1)

Traceback of TorchScript, original code (most recent call last):
File "/home/yist/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torchaudio/transforms.py", line 96, in forward
Fourier bins, and time is the number of window hops (n_frame).
"""
return F.spectrogram(
~~~~~~~~~~~~~ <--- HERE
waveform,
self.pad,
File "/home/yist/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 88, in spectrogram
# pack batch
shape = waveform.size()
waveform = waveform.reshape(-1, shape[-1])
~~~~~~~~~~~~~~~~ <--- HERE
# default values are consistent with librosa.core.spectrum._spectrogram
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

How can I solve this problem?

Can not reshape tensor due visualization

I am run visualization and get error:

[INFO] model loaded.
Preprocess: 1% 12/889 [00:00<00:27, 31.83it/s]
Traceback (most recent call last):
File "visualize.py", line 89, in
visualize(**vars(PARSER.parse_args()))
File "visualize.py", line 43, in visualize
mel_tensor = wav2mel(wav_tensor, sample_rate)
File "/home/vvs/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/data/wav2mel.py", line 20, in forward
sample_rate: int) -> Tensor:
wav_tensor0 = (self.sox_effects).forward(wav_tensor, sample_rate, )
mel_tensor = (self.log_melspectrogram).forward(wav_tensor0, )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
return mel_tensor

.....

Traceback of TorchScript, original code (most recent call last):
File "/home/yist/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torchaudio/transforms.py", line 96, in forward
Fourier bins, and time is the number of window hops (n_frame).
"""
return F.spectrogram(
~~~~~~~~~~~~~ <--- HERE
waveform,
self.pad,
File "/home/yist/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 88, in spectrogram
# pack batch
shape = waveform.size()
waveform = waveform.reshape(-1, shape[-1])
~~~~~~~~~~~~~~~~ <--- HERE

# default values are consistent with librosa.core.spectrum._spectrogram

RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm

I wanna get a 601 dim dvector,and follow the step "Train from scratch"

python preprocess.py ../LibriSpeech/train-clean-360 -o preprocessed
python train.py preprocessed train601

then there is an error Traceback to FILE "modules/dvector.py"
"""Forward a batch through network."""
lstm_outs, _ = self.lstm(inputs) # (batch, seg_len, dim_cell)
embeds = torch.tanh(self.embedding(lstm_outs)) # (batch, seg_len, dim_emb)
~~~~~~~~~~~~~~ <--- HERE

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

window size -› seg_len

How to realize the window size is drawn from a uniform distribution within [240ms, 1600ms] during training？

In your source code dvector.py, there are two questions. One is the conditional judgment: if utterance. size (1) < = self. seg _ len:, which should be compared with the 0 th dimension, because the 1 ST dimension is 40, so the horizontal dimension is smaller than seg_len=160, and the following sliding window part unfold cannot be reached; Second, the output shape of unfold is [bacth_size, 40, seg_len], while the input shape of AttentivePooledLSTMDvector should be [bacth_size, seg_len, 40], that is, size(-1) must be 40.

As for the uniform distribution seg_len, can I directly add the evenly distributed seg_len when traversing each utterance?

I hope you can give me an answer, thank you!

Make preprocessing fully differentiable with torch API

I appreciate your efforts, nice work.
But your audio_toolkit was implement in librosa and numpy, which was not differentiable.
It might limited the application. Eg. If I have an TTS model to generated Mel spectrogram, and if your dvector if fully differentiable, we can use this like a discriminator, to force the TTS model output exactly as expected person.
From waveform to Melspectrogram, you can make preprocessing fully differentiable with torchaudio, and it seems it can keep consitency with librosa

RuntimeError: Unknown builtin op: torchaudio::sox_effects_apply_effects_tensor.

When i run the usage,i encountered a problem:
C:\Users\86151\Desktop\Voice-Recognize-system-master\Scripts\python.exe C:\Users\86151\Desktop\Voice-Recognize-system-master\dvector-master\demo.py
Traceback (most recent call last):
File "C:\Users\86151\Desktop\Voice-Recognize-system-master\dvector-master\demo.py", line 5, in
wav2mel = torch.jit.load("wav2mel.pt")
File "C:\Users\86151\Desktop\Voice-Recognize-system-master\lib\site-packages\torch\jit_serialization.py", line 162, in load
cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files, _restore_shapes) # type: ignore[call-arg]
RuntimeError:
Unknown builtin op: torchaudio::sox_effects_apply_effects_tensor.
Could not find any similar ops to torchaudio::sox_effects_apply_effects_tensor. This op may not exist or may not be currently supported in TorchScript.
:
File "code/torch/torchaudio/sox_effects/sox_effects.py", line 5
effects: List[List[str]],
channels_first: bool=True) -> Tuple[Tensor, int]:
_0, _1 = ops.torchaudio.sox_effects_apply_effects_tensor(tensor, sample_rate, effects, channels_first)
~ <--- HERE
return (_0, _1)
'apply_effects_tensor' is being compiled since it was called from 'SoxEffects.forward'
Serialized File "code/torch/data/wav2mel.py", line 31
wav_tensor: Tensor,
sample_rate: int) -> Tensor:
_0 = torch.torchaudio.sox_effects.sox_effects.apply_effects_tensor
~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_1 = _0(wav_tensor, sample_rate, self.effects, True, )
wav_tensor1, _2, = _1

System:Windows 11
torch :1.11.0+cu113
torchaudio:0.11.0+cu113
I saw that windows may have problems with sox,so is there any solvement to the problem?Like changing a package or something?

yistlin / dvector Goto Github PK

dvector's People

Contributors

Stargazers

Watchers

Forkers

dvector's Issues

How long should the utterances be?

Issue in visualise.py -> Unknown builtin op: torchaudio_sox::apply_effects_tensor.

runtime error in preprocess.py: does not have a getstate method defined

model loading issue

cannot reshape tensor of 0 elements into shape [-1, 0]

Can not reshape tensor due visualization

CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm

window size -› seg_len

Make preprocessing fully differentiable with torch API

RuntimeError: Unknown builtin op: torchaudio::sox_effects_apply_effects_tensor.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent