Git Product home page Git Product logo

dvector's People

Contributors

abreuwallace avatar cyhuang-tw avatar yistlin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dvector's Issues

How long should the utterances be?

Hi, awesome repo! I am wondering, how should I cut my data? Is this setup good for sentence-level utterances?
In the code, it seems that the model is trained on short chunks of voiced audio, but in the viasualization script, the embeddings are extracted from whole utterances (so a single embedding vector for the whole sentence). Did you experiment with different setups? For example, did you try to extract embeddings for segments of the given utterance and average them? My data is in an ASR setup, so it is pretty much sentence based. Any advice how to further segment such data? Regards, Jan

Issue in visualise.py -> Unknown builtin op: torchaudio_sox::apply_effects_tensor.

Running visualize("preprocessed", "preprocessed/wav2mel.pt", "dvector-step5000.pt", ".")

Getting the following error while torch.jit.load

RuntimeError: 
Unknown builtin op: torchaudio_sox::apply_effects_tensor.
Could not find any similar ops to torchaudio_sox::apply_effects_tensor. This op may not exist or may not be currently supported in TorchScript.
:
  File "code/__torch__/torchaudio/sox_effects/sox_effects.py", line 5
    effects: List[List[str]],
    channels_first: bool=True) -> Tuple[Tensor, int]:
  _0, _1 = ops.torchaudio_sox.apply_effects_tensor(tensor, sample_rate, effects, channels_first)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  return (_0, _1)
'apply_effects_tensor' is being compiled since it was called from 'SoxEffects.forward'
Serialized   File "code/__torch__/data/wav2mel.py", line 33
    wav_tensor: Tensor,
    sample_rate: int) -> Tensor:
    _0 = __torch__.torchaudio.sox_effects.sox_effects.apply_effects_tensor
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    effects = self.effects
    _1 = _0(wav_tensor, sample_rate, effects, True, )

runtime error in preprocess.py: does not have a __getstate__ method defined

I run the code in preprocess.py and it has an error:
Traceback (most recent call last): File "preprocess.py", line 92, in <module> preprocess(**vars(PARSER.parse_args())) File "preprocess.py", line 71, in preprocess for speaker_name, mel_tensor in tqdm(dataloader, ncols=0, desc="Preprocess"): File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/site-packages/tqdm/std.py", line 1178, in __iter__ for obj in iterable: File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 359, in __iter__ return self._get_iterator() File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 305, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 918, in __init__ w.start() File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/multiprocessing/context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/multiprocessing/context.py", line 283, in _Popen return Popen(process_obj) File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__ super().__init__(process_obj) File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__ self._launch(process_obj) File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/Users/xinyuewang/opt/anaconda3/lib/python3.8/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) RuntimeError: Tried to serialize object __torch__.data.wav2mel.Wav2Mel which does not have a __getstate__ method defined!
does anyone know how to solve this?
my version:
python==3.8 torch==1.8.0 torchaudio==0.8.0

model loading issue

I am using pre-trained model to get the embedding but getting below error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-5-4bcf886e7e7f> in <module>
      2 import torchaudio
      3 
----> 4 wav2mel = torch.jit.load("wav2mel.pt")
      5 dvector = torch.jit.load("dvector.pt").eval()
      6 

~/anaconda3/lib/python3.8/site-packages/torch/jit/_serialization.py in load(f, map_location, _extra_files)
    159     cu = torch._C.CompilationUnit()
    160     if isinstance(f, str) or isinstance(f, pathlib.Path):
--> 161         cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
    162     else:
    163         cpp_module = torch._C.import_ir_module_from_buffer(

RuntimeError: 
Class Namespace cannot be used as a value:
Serialized   File "code/__torch__/torchaudio/sox_effects/sox_effects.py", line 5
    effects: List[List[str]],
    channels_first: bool=True) -> Tuple[Tensor, int]:
  in_signal = __torch__.torch.classes.torchaudio.TensorSignal.__new__(__torch__.torch.classes.torchaudio.TensorSignal)
                                                                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  _0 = (in_signal).__init__(tensor, sample_rate, channels_first, )
  out_signal = ops.torchaudio.sox_effects_apply_effects_tensor(in_signal, effects)
'apply_effects_tensor' is being compiled since it was called from 'SoxEffects.forward'
Serialized   File "code/__torch__/data/wav2mel.py", line 29
    wav_tensor: Tensor,
    sample_rate: int) -> Tensor:
    _0 = __torch__.torchaudio.sox_effects.sox_effects.apply_effects_tensor
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _1 = _0(wav_tensor, sample_rate, self.effects, True, )
    wav_tensor1, _2, = _1

cannot reshape tensor of 0 elements into shape [-1, 0]

When the input tensor shape is [1, 800] or [1, 320] and When I use the following code

mel_tensor = wav2mel(wav_tensor, 16000) # 16000 is the sample rate

I met with the following error:

Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/data/wav2mel.py", line 20, in forward
sample_rate: int) -> Tensor:
wav_tensor0 = (self.sox_effects).forward(wav_tensor, sample_rate, )
mel_tensor = (self.log_melspectrogram).forward(wav_tensor0, )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
return mel_tensor
class SoxEffects(Module):
File "code/torch/data/wav2mel.py", line 43, in forward
def forward(self: torch.data.wav2mel.LogMelspectrogram,
wav_tensor: Tensor) -> Tensor:
_3 = (self.melspectrogram).forward(wav_tensor, )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
mel_tensor = torch.numpy_T(torch.squeeze(_3, 0))
_4 = torch.clamp(mel_tensor, 1.0000000000000001e-09, None)
File "code/torch/torchaudio/transforms.py", line 20, in forward
def forward(self: torch.torchaudio.transforms.MelSpectrogram,
waveform: Tensor) -> Tensor:
specgram = (self.spectrogram).forward(waveform, )
~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
mel_specgram = (self.mel_scale).forward(specgram, )
return mel_specgram
File "code/torch/torchaudio/transforms.py", line 41, in forward
waveform: Tensor) -> Tensor:
_0 = torch.torchaudio.functional.functional.spectrogram
_1 = _0(waveform, 0, self.window, 400, 160, 400, 2., False, self.center, self.pad_mode, self.onesided, )
~~ <--- HERE
return _1
class MelScale(Module):
File "code/torch/torchaudio/functional/functional.py", line 18, in spectrogram
waveform0 = waveform
shape = torch.size(waveform0)
waveform2 = torch.reshape(waveform0, [-1, shape[-1]])
~~~~~~~~~~~~~ <--- HERE
spec_f = torch.torch.functional.stft(waveform2, n_fft, hop_length, win_length, window, center, pad_mode, False, onesided, True, )
_0 = torch.slice(shape, 0, -1, 1)

Traceback of TorchScript, original code (most recent call last):
File "/home/yist/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torchaudio/transforms.py", line 96, in forward
Fourier bins, and time is the number of window hops (n_frame).
"""
return F.spectrogram(
~~~~~~~~~~~~~ <--- HERE
waveform,
self.pad,
File "/home/yist/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 88, in spectrogram
# pack batch
shape = waveform.size()
waveform = waveform.reshape(-1, shape[-1])
~~~~~~~~~~~~~~~~ <--- HERE

# default values are consistent with librosa.core.spectrum._spectrogram

RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

How can I solve this problem?

Can not reshape tensor due visualization

I am run visualization and get error:

[INFO] model loaded.
Preprocess: 1% 12/889 [00:00<00:27, 31.83it/s]
Traceback (most recent call last):
File "visualize.py", line 89, in
visualize(**vars(PARSER.parse_args()))
File "visualize.py", line 43, in visualize
mel_tensor = wav2mel(wav_tensor, sample_rate)
File "/home/vvs/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/data/wav2mel.py", line 20, in forward
sample_rate: int) -> Tensor:
wav_tensor0 = (self.sox_effects).forward(wav_tensor, sample_rate, )
mel_tensor = (self.log_melspectrogram).forward(wav_tensor0, )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
return mel_tensor

.....

Traceback of TorchScript, original code (most recent call last):
File "/home/yist/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torchaudio/transforms.py", line 96, in forward
Fourier bins, and time is the number of window hops (n_frame).
"""
return F.spectrogram(
~~~~~~~~~~~~~ <--- HERE
waveform,
self.pad,
File "/home/yist/.pyenv/versions/3.8.5/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 88, in spectrogram
# pack batch
shape = waveform.size()
waveform = waveform.reshape(-1, shape[-1])
~~~~~~~~~~~~~~~~ <--- HERE

# default values are consistent with librosa.core.spectrum._spectrogram

RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm

I wanna get a 601 dim dvector,and follow the step "Train from scratch"

  1. python preprocess.py ../LibriSpeech/train-clean-360 -o preprocessed
  2. python train.py preprocessed train601

then there is an error Traceback to FILE "modules/dvector.py"
"""Forward a batch through network."""
lstm_outs, _ = self.lstm(inputs) # (batch, seg_len, dim_cell)
embeds = torch.tanh(self.embedding(lstm_outs)) # (batch, seg_len, dim_emb)
~~~~~~~~~~~~~~ <--- HERE

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

window size -› seg_len

How to realize the window size is drawn from a uniform distribution within [240ms, 1600ms] during training?

In your source code dvector.py, there are two questions. One is the conditional judgment: if utterance. size (1) < = self. seg _ len:, which should be compared with the 0 th dimension, because the 1 ST dimension is 40, so the horizontal dimension is smaller than seg_len=160, and the following sliding window part unfold cannot be reached; Second, the output shape of unfold is [bacth_size, 40, seg_len], while the input shape of AttentivePooledLSTMDvector should be [bacth_size, seg_len, 40], that is, size(-1) must be 40.

As for the uniform distribution seg_len, can I directly add the evenly distributed seg_len when traversing each utterance?

I hope you can give me an answer, thank you!
image

Make preprocessing fully differentiable with torch API

I appreciate your efforts, nice work.
But your audio_toolkit was implement in librosa and numpy, which was not differentiable.
It might limited the application. Eg. If I have an TTS model to generated Mel spectrogram, and if your dvector if fully differentiable, we can use this like a discriminator, to force the TTS model output exactly as expected person.
From waveform to Melspectrogram, you can make preprocessing fully differentiable with torchaudio, and it seems it can keep consitency with librosa

RuntimeError: Unknown builtin op: torchaudio::sox_effects_apply_effects_tensor.

When i run the usage,i encountered a problem:
C:\Users\86151\Desktop\Voice-Recognize-system-master\Scripts\python.exe C:\Users\86151\Desktop\Voice-Recognize-system-master\dvector-master\demo.py
Traceback (most recent call last):
File "C:\Users\86151\Desktop\Voice-Recognize-system-master\dvector-master\demo.py", line 5, in
wav2mel = torch.jit.load("wav2mel.pt")
File "C:\Users\86151\Desktop\Voice-Recognize-system-master\lib\site-packages\torch\jit_serialization.py", line 162, in load
cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files, _restore_shapes) # type: ignore[call-arg]
RuntimeError:
Unknown builtin op: torchaudio::sox_effects_apply_effects_tensor.
Could not find any similar ops to torchaudio::sox_effects_apply_effects_tensor. This op may not exist or may not be currently supported in TorchScript.
:
File "code/torch/torchaudio/sox_effects/sox_effects.py", line 5
effects: List[List[str]],
channels_first: bool=True) -> Tuple[Tensor, int]:
_0, _1 = ops.torchaudio.sox_effects_apply_effects_tensor(tensor, sample_rate, effects, channels_first)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
return (_0, _1)
'apply_effects_tensor' is being compiled since it was called from 'SoxEffects.forward'
Serialized File "code/torch/data/wav2mel.py", line 31
wav_tensor: Tensor,
sample_rate: int) -> Tensor:
_0 = torch.torchaudio.sox_effects.sox_effects.apply_effects_tensor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_1 = _0(wav_tensor, sample_rate, self.effects, True, )
wav_tensor1, _2, = _1

System:Windows 11
torch :1.11.0+cu113
torchaudio:0.11.0+cu113
I saw that windows may have problems with sox,so is there any solvement to the problem?Like changing a package or something?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.